Skip to content

[Performance] [email protected] - The GEMM performance with the column major B matrix is not as good as row major B matrix. #2354

@chengjunlu

Description

@chengjunlu

The performance gap is found in #2347

Need to investigate root cause of the performance drops of the column major B matrix case.
Roughly 1.5x worse than the row major B matrix case.

(I): Detected 7680 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 0 spills
✅ Triton and Torch match
Time for torch: 0.31633758544921875 ms
Time for triton: 0.44517597556114197 ms
Compute A x B.T
OpenCL API not available for this operation
OpenCL API not available for this operation
OpenCL API not available for this operation
OpenCL API not available for this operation
(I): Detected 7680 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 0 spills
✅ Triton and Torch match
Time for torch: 0.3375360071659088 ms
Time for triton: 0.6348815560340881 ms

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions