Skip to content

Speed up scalars part 2#2669

Merged
awni merged 1 commit intoml-explore:mainfrom
awni:speed_up_scalars_2
Oct 13, 2025
Merged

Speed up scalars part 2#2669
awni merged 1 commit intoml-explore:mainfrom
awni:speed_up_scalars_2

Conversation

@awni
Copy link
Member

@awni awni commented Oct 13, 2025

Uses a seemingly better memory policy for scalars. Helps a lot on B200 and H100:

mlx_lm.benchmark --model mlx-community/Meta-Llama-3.1-8B-Instruct-bf16 -p 32 -g 128
Device Pre tok/s Post tok/s
B200 195.95 229.40
H100 142.38 162.67

Training Qwen3 0.6B:

Device Pre tok/s Post tok/s
B200 61944 63942
H100 40826 41698

@awni awni requested review from angeloskath and zcbenz October 13, 2025 15:06
Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@awni awni merged commit 25e2356 into ml-explore:main Oct 13, 2025
7 checks passed
faisalmemon pushed a commit to faisalmemon/mlx that referenced this pull request Oct 30, 2025
@awni awni deleted the speed_up_scalars_2 branch November 1, 2025 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants