Add torch compliant grouped gemm API for CK FP8 rowwise #4486

cthi · 2025-07-14T19:02:42Z

Summary:
For PyTorch integration we will need to support several additional cases, as well as leverage slightly different API. This is best observed through the torch test cases, e.g. test_scaled_grouped_gemm_2d_3d, test_scaled_grouped_gemm_3d_2d

Natalias grouped gemm API doc

A summary is we need these cases:

Pytorch API uses offsets instead of sizes, so we update the kernel setting the grouped gemm parameters to take in offsets as well, and support the above cases.

For BMM we could alternatively leverage the AMD FP8 BMM kernel in FBGEMM. But we can get some "free" support by doing this in the grouped kernel.
We don't add support for 2D-2D yet, that will come after.
I've not yet updated the heuristics to account for the new cases properly. This will come after with a re-tune for generic shapes, as opposed to llama specific.

Differential Revision: D78119166

Summary: For PyTorch integration we will need to support several additional cases, as well as leverage slightly different API. This is best observed through the torch test cases, e.g. [test_scaled_grouped_gemm_2d_3d](https://www.internalfb.com/code/fbsource/[fbdb0063f1c1ecca30f5eab8b5341643f680ed51]/fbcode/caffe2/test/test_matmul_cuda.py?lines=1793), [test_scaled_grouped_gemm_3d_2d](https://www.internalfb.com/code/fbsource/[fbdb0063f1c1ecca30f5eab8b5341643f680ed51]/fbcode/caffe2/test/test_matmul_cuda.py?lines=1854) - [Natalias grouped gemm API doc](https://docs.google.com/document/d/1985La6wUUVH1AGBkNhaGKUXzx-9ybtbUp567-vYVOM4/edit?tab=t.0#heading=h.g8lzbjnyzzx9) **A summary is we need these cases:** |**Input Type** | Notes | | 2D-3D | same as fbgemm stacked for MoE | | 3D-2D | not sure use-case for this yet | | 2D-2D | I think this is for backward? | | 3D-3D (BMM) | [Could alternatively leverage FBGEMM BMM kernel](https://www.internalfb.com/code/fbsource/fbcode/deeplearning/fbgemm/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_batched/) | Pytorch API uses offsets instead of sizes, so we update the kernel setting the grouped gemm parameters to take in offsets as well, and support the above cases. - For BMM we could alternatively leverage the AMD FP8 BMM kernel in FBGEMM. But we can get some "free" support by doing this in the grouped kernel. - We don't add support for 2D-2D yet, that will come after. - I've not yet updated the heuristics to account for the new cases properly. This will come after with a re-tune for generic shapes, as opposed to llama specific. Differential Revision: D78119166

facebook-github-bot · 2025-07-14T19:02:51Z

This pull request was exported from Phabricator. Differential Revision: D78119166

netlify · 2025-07-14T19:03:14Z

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Name	Link
🔨 Latest commit	`53208aa`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68755455eec197000889cce0
😎 Deploy Preview	https://deploy-preview-4486--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

facebook-github-bot added the cla signed label Jul 14, 2025

facebook-github-bot added the fb-exported label Jul 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add torch compliant grouped gemm API for CK FP8 rowwise #4486

Add torch compliant grouped gemm API for CK FP8 rowwise #4486

Uh oh!

cthi commented Jul 14, 2025

Uh oh!

facebook-github-bot commented Jul 14, 2025

Uh oh!

netlify bot commented Jul 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add torch compliant grouped gemm API for CK FP8 rowwise #4486

Are you sure you want to change the base?

Add torch compliant grouped gemm API for CK FP8 rowwise #4486

Uh oh!

Conversation

cthi commented Jul 14, 2025

Uh oh!

facebook-github-bot commented Jul 14, 2025

Uh oh!

netlify bot commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorch-fbgemm-docs ready!

Uh oh!

Uh oh!

netlify bot commented Jul 14, 2025 •

edited

Loading