🚀 The feature, motivation and pitch
DFlash has recently emerged in a blog post as a potentially superior method for speculative decoding compared to Eagle-3. Speculators now has the ability to train a dflash model. However, we can't directly load speculators produced models in vllm yet without conversion. Similar algorithms like Eagle3 already has speculators support. So users can serve a speculator models as simple as
vllm serve RedHatAI/Qwen3-235B-A22B-Instruct-2507-speculator.eagle3
We would like the same for dflash models as well.
Alternatives
We currently have to convert a dflash speculator model to the format vllm expects. The process is manual and might discourage users from trying out our dflash training support.
Additional context
shanjiaz/dflash-qwen3-8b: This is a manually converted model that serves correctly on the working branch of dflash support.
shanjiaz/qwen3-8b-speculator-format: This is produced by speculators training code and has the expected speculators format.
Note: The second model is trained on a much smaller dataset, would not be as good. Will provide a fully validated model soon. These models are just for reference.
Would be great to add tests in similar format as eagle3 speculators tests
Before submitting a new issue...
🚀 The feature, motivation and pitch
DFlash has recently emerged in a blog post as a potentially superior method for speculative decoding compared to Eagle-3. Speculators now has the ability to train a dflash model. However, we can't directly load speculators produced models in vllm yet without conversion. Similar algorithms like Eagle3 already has speculators support. So users can serve a speculator models as simple as
We would like the same for dflash models as well.
Alternatives
We currently have to convert a dflash speculator model to the format vllm expects. The process is manual and might discourage users from trying out our dflash training support.
Additional context
shanjiaz/dflash-qwen3-8b: This is a manually converted model that serves correctly on the working branch of dflash support.shanjiaz/qwen3-8b-speculator-format: This is produced by speculators training code and has the expected speculators format.Note: The second model is trained on a much smaller dataset, would not be as good. Will provide a fully validated model soon. These models are just for reference.
Would be great to add tests in similar format as eagle3 speculators tests
Before submitting a new issue...