Unsloth Dynamic 2.0 Per-Tensor Quantization Recipe for MLX #1062

Brooooooklyn · 2026-03-26T16:13:02Z

Brooooooklyn
Mar 26, 2026

Unsloth Dynamic 2.0 Per-Tensor Quantization Recipe for MLX

I've implemented Unsloth's Dynamic 2.0 per-tensor quantization strategy in mlx-node, targeting Qwen3.5 hybrid models (full attention + GatedDeltaNet layers). Sharing here since this approach could benefit mlx-lm users as well.

What it does

Instead of uniform N-bit quantization, the recipe assigns each weight tensor a different bit-width based on Unsloth's KLD sensitivity research (150+ benchmarks across 121 configurations):

Tensor	Bits	Why
`gate_proj`, `up_proj`	3	Robust to quantization noise
`down_proj`	4	Slightly more sensitive
`q/k/v_proj`, `in_proj_*`	5 + AWQ	KLD 1.5-2.9, AWQ-correctable via input_layernorm
`embed_tokens`	5	<1% of model size, big quality impact
`lm_head`	6	Safest tensor (KLD ~0.05)
`o_proj`, `out_proj`	bf16	High KLD (up to 6.0), no preceding norm for AWQ

Combined with AWQ pre-scaling (4 groups exploiting norm->projection pairs), this achieves ~3-bit average with significantly better quality than uniform Q3.

Key findings for Qwen3.5

linear_attn.out_proj is the most sensitive tensor (KLD ~6.0) -- must stay bf16
o_proj has no preceding norm layer, so AWQ can't help -- bf16 is the only safe option
Spending extra bits on embed_tokens/lm_head has negligible size impact but dramatically reduces output degradation

Models & Code & References

Models: Qwen 3.5 Unsloth MLX Collection on Hugging Face
Blog post: https://lyn.one/unsloth-quantize-recipe
Implementation: mlx-node (Rust + MLX C++)
Unsloth benchmark: https://unsloth.ai/docs/models/qwen3.5/gguf-benchmarks

The imatrix data comes from Unsloth's open-source GGUF repos on huggingface, calibrated on conversational and coding data. Would love to see your testing in mlx-lm/mlx-vlm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsloth Dynamic 2.0 Per-Tensor Quantization Recipe for MLX #1062

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Unsloth Dynamic 2.0 Per-Tensor Quantization Recipe for MLX #1062

Uh oh!

Brooooooklyn Mar 26, 2026

Unsloth Dynamic 2.0 Per-Tensor Quantization Recipe for MLX

What it does

Key findings for Qwen3.5

Models & Code & References

Replies: 0 comments

Brooooooklyn
Mar 26, 2026