[CUDA] Support array mask in SDPA by zcbenz · Pull Request #2822 · ml-explore/mlx

zcbenz · 2025-11-23T00:57:12Z

Note that cuDNN does not support boolean masks, so we have to convert boolean masks to additive masks with where(mask, full_like(mask, 0), full_like(mask, -inf)), which has some performance penalty. (PyTorch does the same thing too.)

What cuDNN does support is setting padding mask directly: we pass the sequence lengths and cuDNN will apply padding masks automatically, and it works together with the set_causal_mask flag. I don't know how much performance gain this approach brings, but I think it worths a try as a future work.

awni

Looks great!

As a general guideline we should work towards avoiding in dispatching differently based on the back-end because it breaks the ability to exporting from machine to another (e.g. export on cuda would not work on Metal). The fast primitives are one place where we break this guideline a lot (e.g. cpu vs gpu). Just elaborating for future reference as we may want to push the mask -> float into the primitive.

[CUDA] Support array mask in SDPA

c7a5073

awni approved these changes Nov 25, 2025

View reviewed changes

zcbenz merged commit 704fd1a into ml-explore:main Nov 26, 2025
10 checks passed

zcbenz deleted the cudnn-sdpa-masks branch November 26, 2025 02:08

BrewTestBot mentioned this pull request Dec 18, 2025

mlx 0.30.1 Homebrew/homebrew-core#259125

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Support array mask in SDPA#2822

[CUDA] Support array mask in SDPA#2822
zcbenz merged 1 commit intoml-explore:mainfrom
zcbenz:cudnn-sdpa-masks

zcbenz commented Nov 23, 2025

Uh oh!

awni left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zcbenz commented Nov 23, 2025

Uh oh!

awni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants