Feature request
Currently, when I try to run DiffusionGemma in NVFP4 through transformers I see the warning
[transformers] Unknown quantization type, got modelopt - supported types are: ['awq', ..., 'gemma']
After installing nvidia-modelopt[hf] transformers is downgraded.
- nvidia-modelopt[hf] installed nvidia-modelopt 0.46.0.dev70+g93dd08f42
- Its hf extra requires transformers<5.10,>=4.56
- Pip resolved that to transformers 5.9.0
- With transformers 5.9.0, this fails immediately:
ImportError: cannot import name 'DiffusionGemmaForBlockDiffusion' from 'transformers'
When I use latest transformers==5.12.1 and nvidia-modelopt from Git, I get the following errors:
[transformers] This checkpoint seem corrupted. The tied weights mapping for this model specifies to tie model.decoder.layers.9.experts.down_proj to model.encoder.language_model.layers.9.experts.down_proj, but both are absent from the checkpoint, and we could not find another related tied weight for those keys
[transformers] DiffusionGemmaForBlockDiffusion LOAD REPORT from: nvidia/diffusiongemma-26B-A4B-it-NVFP4
Key | Status |
-------------------------------------------------------------------------+------------+-
model.decoder.layers.{0...29}.experts.{0...127}.down_proj.weight | UNEXPECTED |
model.decoder.layers.{0...29}.experts.{0...127}.gate_proj.weight_scale_2 | UNEXPECTED |
model.decoder.layers.{0...29}.experts.{0...127}.up_proj.weight_scale_2 | UNEXPECTED |
model.decoder.layers.{0...29}.experts.{0...127}.up_proj.input_scale | UNEXPECTED |
model.decoder.layers.{0...29}.experts.{0...127}.up_proj.weight | UNEXPECTED |
model.decoder.layers.{0...29}.experts.{0...127}.up_proj.weight_scale | UNEXPECTED |
model.decoder.layers.{0...29}.experts.{0...127}.gate_proj.input_scale | UNEXPECTED |
model.decoder.layers.{0...29}.experts.{0...127}.gate_proj.weight_scale | UNEXPECTED |
model.decoder.layers.{0...29}.experts.{0...127}.down_proj.weight_scale | UNEXPECTED |
model.decoder.layers.{0...29}.experts.{0...127}.gate_proj.weight | UNEXPECTED |
model.decoder.layers.{0...29}.experts.{0...127}.down_proj.input_scale | UNEXPECTED |
model.decoder.layers.{0...29}.experts.{0...127}.down_proj.weight_scale_2 | UNEXPECTED |
model.decoder.layers.{0...29}.experts.down_proj | MISSING |
model.encoder.language_model.layers.{0...29}.experts.gate_up_proj | MISSING |
model.encoder.language_model.layers.{0...29}.experts.down_proj | MISSING |
model.decoder.layers.{0...29}.experts.gate_up_proj | MISSING |
Notes:
- UNEXPECTED: can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING: those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.
Motivation
The suggested path for DiffusionGemma in NVFP4 is to use vLLM however for a single user setup or custom scripts, using transformers library is much more reasonable.
Your contribution
I myself am not capable to do this, unless letting some model vibecode this. What GPT-5.5 suggested to fix is:
-
Register a modelopt quantizer
- Add quantizer_modelopt.py and map quant_method: "modelopt" in the auto-quantizer registry.
-
Implement NVFP4 linear modules
- Replace relevant nn.Linear layers with ModelOpt-compatible NVFP4 linear layers.
- Keep packed 4-bit weights and both scaling tensors on GPU; never expand them to BF16/FP16.
- Dispatch forward passes to NVIDIA’s ModelOpt/CUTLASS kernels.
-
Map DiffusionGemma’s MoE checkpoint layout
- Load checkpoint keys such as per-expert gate_proj, up_proj, and down_proj, including their weight_scale,
weight_scale_2, and input_scale.
- Adapt the DiffusionGemma MoE implementation so it accepts its per-expert, quantized layout instead of expecting fused BF16 gate_up_proj / down_proj tensors.
-
Keep quantization metadata through loading
- Ensure from_pretrained() does not emit “unknown quantization type … skipping quantization.”
- Prevent the current false-success path where missing BF16 MoE tensors are randomly initialized.
Feature request
Currently, when I try to run DiffusionGemma in NVFP4 through transformers I see the warning
After installing
nvidia-modelopt[hf]transformers is downgraded.ImportError: cannot import name 'DiffusionGemmaForBlockDiffusion' from 'transformers'When I use latest
transformers==5.12.1andnvidia-modeloptfrom Git, I get the following errors:Motivation
The suggested path for DiffusionGemma in NVFP4 is to use vLLM however for a single user setup or custom scripts, using transformers library is much more reasonable.
Your contribution
I myself am not capable to do this, unless letting some model vibecode this. What GPT-5.5 suggested to fix is:
Register a modelopt quantizer
Implement NVFP4 linear modules
Map DiffusionGemma’s MoE checkpoint layout
weight_scale_2, and input_scale.
Keep quantization metadata through loading