Fp qmv by awni · Pull Request #2984 · ml-explore/mlx

awni · 2026-01-11T17:00:40Z

Adds a basic qmv kernel for fp quants for CUDA.
Adds a simple quantize-dequantize kernel for CUDA, Metal, CPU
Routes the qqmv to the quantize-dequantize + qmv for all backends

awni · 2026-01-22T18:28:02Z

Moving out of draft.

There is a nice speedup over qqmm with cublas for the qmv case. On a Spark:

quant	GB/s pre	GB/s post
nvfp4	164.163	232.178
mxfp8	178.034	221.105

awni · 2026-01-22T18:51:25Z

I think we can optimize the fp_qmv a bit more.. but it's a good start so probably worth landing and hill-climbing.

angeloskath

It looks great and the perf seems already great... I guess the hill will be small :-)

awni force-pushed the fp_qmv branch from 7f97c39 to f527572 Compare January 12, 2026 19:58

awni force-pushed the fp_qmv branch 7 times, most recently from 458262b to 19db566 Compare January 22, 2026 18:22

awni marked this pull request as ready for review January 22, 2026 18:28

awni requested review from angeloskath and zcbenz January 22, 2026 18:51

angeloskath approved these changes Jan 22, 2026

View reviewed changes

Awni Hannun and others added 7 commits January 26, 2026 07:20

add very basic fp qmv

407fd22

working for batched

de54e34

use uint32

c07f76c

route qqmv to qmv with qauntize-dequantize kernel

1267e5a

cleanup

90ade47

fix older cuda

019afcf

cpu and metal

0e18bb0

awni force-pushed the fp_qmv branch from 4a9723a to 2d0692f Compare January 26, 2026 15:26

cleanup

3f79c1e

awni force-pushed the fp_qmv branch from 2d0692f to 3f79c1e Compare January 26, 2026 15:53

Some fixes

724f9f2

awni force-pushed the fp_qmv branch 2 times, most recently from 0fba562 to 66c3deb Compare January 26, 2026 21:53

zcbenz mentioned this pull request Jan 26, 2026

[CUDA] Add CUDA implementation for QuantizedMatmul #3066

Closed

awni force-pushed the fp_qmv branch from 66c3deb to 2cccd2b Compare January 27, 2026 01:00

fix

be9be24

awni force-pushed the fp_qmv branch from 2cccd2b to be9be24 Compare January 27, 2026 01:20

awni merged commit 4912cc4 into main Jan 27, 2026
16 checks passed

awni deleted the fp_qmv branch January 27, 2026 14:33

awni pushed a commit to NripeshN/mlx that referenced this pull request Jan 27, 2026

Fp qmv (ml-explore#2984)

0cb7965

BrewTestBot mentioned this pull request Jan 27, 2026

mlx 0.30.4 Homebrew/homebrew-core#264789

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fp qmv#2984

Fp qmv#2984
awni merged 10 commits intomainfrom
fp_qmv

awni commented Jan 11, 2026 •

edited

Loading

Uh oh!

awni commented Jan 22, 2026

Uh oh!

awni commented Jan 22, 2026

Uh oh!

angeloskath left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

awni commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awni commented Jan 22, 2026

Uh oh!

awni commented Jan 22, 2026

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awni commented Jan 11, 2026 •

edited

Loading