Eagle3: add norm_before_fc for gpt-oss draft models by shubhra · Pull Request #337 · vllm-project/speculators

shubhra · 2026-03-09T19:50:57Z

When training Eagle3 draft models for gpt-oss, we observed exploding hidden states in the draft path. This PR adds an optional RMSNorm before the fc (on the concatenated 3× aux hidden states) to stabilize training. The behavior is gated by norm_before_fc so only gpt-oss (or models that need it) use it; others are unchanged.

Description

The Eagle3 fusion path concatenates three aux hidden states and projects via fc. gpt-oss exhibits exploding states in this path; we add an optional RMSNorm (input_norm) before the fc to stabilize. The norm runs at train time (speculators) and inference (vLLM), so there is no train–serve mismatch.

Config: norm_before_fc: bool = False on Eagle3SpeculatorConfig (same style as norm_before_residual). When True, the draft model uses the pre-fc norm.
Core: Create/apply input_norm only when config.norm_before_fc; otherwise fc gets the raw concat as before.
Loading: "input_norm.weight" in _keys_to_ignore_on_load_missing so old checkpoints without the norm still load.

Tests

Training with norm_before_fc=True uses input_norm; with False (default) behavior matches pre-PR.

Related: Inference support in vLLM: vllm-project/vllm#36545

github-actions · 2026-03-09T19:53:53Z

📦 Build Artifacts Available
The build artifacts (`.whl` and `.tar.gz`) have been successfully generated and are available for download: https://github.com/vllm-project/speculators/actions/runs/22872426973/artifacts/5837286616.
They will be retained for up to 30 days.
Commit: fd27b1b

Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>

shanjiaz

Thanks for adding the fix!

Follow-up to #337. Expose --norm-before-fc in train.py, add norm_before_fc to TrainArgs in gen_and_train.py, and set norm_before_fc=True in the gpt-oss example. Made-with: Cursor

Follow-up to #337. Expose --norm-before-fc in train.py, add norm_before_fc to TrainArgs in gen_and_train.py, and set norm_before_fc=True in the gpt-oss example. Made-with: Cursor Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>

…#347) Follow-up to [#337](#337), which added config + core support for `norm_before_fc` but did not expose it in the training CLI or the combined pipeline. Without this, users running `train.py` or `gen_and_train.py` had no way to enable the pre-FC norm, so gpt-oss models could not be trained correctly via the standard scripts. ## Changes - **train.py:** Add `--norm-before-fc` flag so the training script can pass `norm_before_fc=True` into the Eagle3 config. - **gen_and_train.py:** Add `norm_before_fc` to `TrainArgs` so the combined pipeline forwards the flag to `train.py`. - **gpt_oss_20b_ultrachat_5k.py:** Set `norm_before_fc=True` in the gpt-oss example so it trains with the stabilizing norm out of the box. With these changes, gpt-oss models train correctly (`--norm-before-fc`), and all other models continue to train as before (flag defaults to off). ## Tests - `train.py --norm-before-fc` creates the draft model with `input_norm`; omitting the flag matches pre-#337 behavior. - gpt-oss example runs end-to-end with the norm enabled. ## Related - Core + config: [#337](#337) - Inference support in vLLM: [vllm-project/vllm#36545](vllm-project/vllm#36545) Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>

When training Eagle3 draft models for gpt-oss, we observed exploding hidden states in the draft path. This PR adds an optional RMSNorm before the fc (on the concatenated 3× aux hidden states) to stabilize training. The behavior is gated by `norm_before_fc` so only gpt-oss (or models that need it) use it; others are unchanged. #### Description The Eagle3 fusion path concatenates three aux hidden states and projects via fc. `gpt-oss` exhibits exploding states in this path; we add an optional RMSNorm (`input_norm`) before the fc to stabilize. The norm runs at train time (speculators) and inference (vLLM), so there is no train–serve mismatch. - **Config:** `norm_before_fc: bool = False` on `Eagle3SpeculatorConfig` (same style as `norm_before_residual`). When True, the draft model uses the pre-fc norm. - **Core:** Create/apply `input_norm` only when `config.norm_before_fc`; otherwise fc gets the raw concat as before. - **Loading:** `"input_norm.weight"` in `_keys_to_ignore_on_load_missing` so old checkpoints without the norm still load. #### Tests - Training with `norm_before_fc=True` uses `input_norm`; with `False` (default) behavior matches pre-PR. **Related:** Inference support in vLLM: vllm-project/vllm#36545 --------- Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>

…vllm-project#347) Follow-up to [vllm-project#337](vllm-project#337), which added config + core support for `norm_before_fc` but did not expose it in the training CLI or the combined pipeline. Without this, users running `train.py` or `gen_and_train.py` had no way to enable the pre-FC norm, so gpt-oss models could not be trained correctly via the standard scripts. ## Changes - **train.py:** Add `--norm-before-fc` flag so the training script can pass `norm_before_fc=True` into the Eagle3 config. - **gen_and_train.py:** Add `norm_before_fc` to `TrainArgs` so the combined pipeline forwards the flag to `train.py`. - **gpt_oss_20b_ultrachat_5k.py:** Set `norm_before_fc=True` in the gpt-oss example so it trains with the stabilizing norm out of the box. With these changes, gpt-oss models train correctly (`--norm-before-fc`), and all other models continue to train as before (flag defaults to off). ## Tests - `train.py --norm-before-fc` creates the draft model with `input_norm`; omitting the flag matches pre-vllm-project#337 behavior. - gpt-oss example runs end-to-end with the norm enabled. ## Related - Core + config: [vllm-project#337](vllm-project#337) - Inference support in vLLM: [vllm-project/vllm#36545](vllm-project/vllm#36545) Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>

shubhra force-pushed the gpt_oss_norm_fix branch from 582b6d8 to ed449e9 Compare March 9, 2026 19:51

shubhra requested review from fynnsu and shanjiaz March 9, 2026 19:52

shubhra force-pushed the gpt_oss_norm_fix branch from 15a9fe7 to c69cbd7 Compare March 9, 2026 20:00

shubhra added 3 commits March 9, 2026 20:05

Eagle3: add norm_before_fc for gpt-oss draft models

1e32c0a

Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>

Fix long comment

62ca2e7

Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>

Add noqa: C901 for forward to satisfy ruff complexity check

fd27b1b

Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>

shubhra force-pushed the gpt_oss_norm_fix branch from e95d297 to fd27b1b Compare March 9, 2026 20:05

shubhra mentioned this pull request Mar 9, 2026

[Speculative Decoding] Add norm_before_fc for gpt-oss draft models vllm-project/vllm#36545

Merged

rahul-tuli approved these changes Mar 10, 2026

View reviewed changes

shanjiaz approved these changes Mar 10, 2026

View reviewed changes

shanjiaz merged commit ac6db62 into main Mar 10, 2026
12 checks passed

shanjiaz deleted the gpt_oss_norm_fix branch March 10, 2026 13:30

shubhra mentioned this pull request Mar 16, 2026

Eagle3: wire norm_before_fc into training scripts and gpt-oss example #347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eagle3: add norm_before_fc for gpt-oss draft models#337

Eagle3: add norm_before_fc for gpt-oss draft models#337
shanjiaz merged 3 commits into
mainfrom
gpt_oss_norm_fix

shubhra commented Mar 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

shanjiaz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shubhra commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Uh oh!

github-actions Bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shanjiaz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shubhra commented Mar 9, 2026 •

edited

Loading

github-actions Bot commented Mar 9, 2026 •

edited

Loading