Speed up scalars part 2 by awni · Pull Request #2669 · ml-explore/mlx

awni · 2025-10-13T15:06:25Z

Uses a seemingly better memory policy for scalars. Helps a lot on B200 and H100:

mlx_lm.benchmark --model mlx-community/Meta-Llama-3.1-8B-Instruct-bf16 -p 32 -g 128

Device	Pre tok/s	Post tok/s
B200	195.95	229.40
H100	142.38	162.67

Training Qwen3 0.6B:

Device	Pre tok/s	Post tok/s
B200	61944	63942
H100	40826	41698

angeloskath

Nice.

speed up scalars

5e66b95

awni requested review from angeloskath and zcbenz October 13, 2025 15:06

angeloskath approved these changes Oct 13, 2025

View reviewed changes

awni merged commit 25e2356 into ml-explore:main Oct 13, 2025
7 checks passed

faisalmemon pushed a commit to faisalmemon/mlx that referenced this pull request Oct 30, 2025

speed up scalars (ml-explore#2669)

d0896a0

awni deleted the speed_up_scalars_2 branch November 1, 2025 20:04

BrewTestBot mentioned this pull request Nov 20, 2025

mlx 0.30.0 Homebrew/homebrew-core#255173

Merged

1 task

Provide feedback