Skip to content

[Feature]: dflash speculator model support #38240

@shanjiaz

Description

@shanjiaz

🚀 The feature, motivation and pitch

DFlash has recently emerged in a blog post as a potentially superior method for speculative decoding compared to Eagle-3. Speculators now has the ability to train a dflash model. However, we can't directly load speculators produced models in vllm yet without conversion. Similar algorithms like Eagle3 already has speculators support. So users can serve a speculator models as simple as

vllm serve RedHatAI/Qwen3-235B-A22B-Instruct-2507-speculator.eagle3

We would like the same for dflash models as well.

Alternatives

We currently have to convert a dflash speculator model to the format vllm expects. The process is manual and might discourage users from trying out our dflash training support.

Additional context

shanjiaz/dflash-qwen3-8b: This is a manually converted model that serves correctly on the working branch of dflash support.
shanjiaz/qwen3-8b-speculator-format: This is produced by speculators training code and has the expected speculators format.

Note: The second model is trained on a much smaller dataset, would not be as good. Will provide a fully validated model soon. These models are just for reference.

Would be great to add tests in similar format as eagle3 speculators tests

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions