Skip to content

Fp qmv#2984

Merged
awni merged 10 commits intomainfrom
fp_qmv
Jan 27, 2026
Merged

Fp qmv#2984
awni merged 10 commits intomainfrom
fp_qmv

Conversation

@awni
Copy link
Member

@awni awni commented Jan 11, 2026

  • Adds a basic qmv kernel for fp quants for CUDA.
  • Adds a simple quantize-dequantize kernel for CUDA, Metal, CPU
  • Routes the qqmv to the quantize-dequantize + qmv for all backends

@awni awni force-pushed the fp_qmv branch 7 times, most recently from 458262b to 19db566 Compare January 22, 2026 18:22
@awni
Copy link
Member Author

awni commented Jan 22, 2026

Moving out of draft.

There is a nice speedup over qqmm with cublas for the qmv case. On a Spark:

quant GB/s pre GB/s post
nvfp4 164.163 232.178
mxfp8 178.034 221.105

@awni awni marked this pull request as ready for review January 22, 2026 18:28
@awni
Copy link
Member Author

awni commented Jan 22, 2026

I think we can optimize the fp_qmv a bit more.. but it's a good start so probably worth landing and hill-climbing.

@awni awni requested review from angeloskath and zcbenz January 22, 2026 18:51
Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great and the perf seems already great... I guess the hill will be small :-)

@awni awni merged commit 4912cc4 into main Jan 27, 2026
16 checks passed
@awni awni deleted the fp_qmv branch January 27, 2026 14:33
awni pushed a commit to NripeshN/mlx that referenced this pull request Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants