Eagle3: add norm_before_fc for gpt-oss draft models#337
Merged
Conversation
582b6d8 to
ed449e9
Compare
|
📦 Build Artifacts Available |
15a9fe7 to
c69cbd7
Compare
Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>
Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>
Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>
e95d297 to
fd27b1b
Compare
rahul-tuli
approved these changes
Mar 10, 2026
shanjiaz
approved these changes
Mar 10, 2026
shanjiaz
left a comment
Collaborator
There was a problem hiding this comment.
Thanks for adding the fix!
shubhra
added a commit
that referenced
this pull request
Mar 16, 2026
Follow-up to #337. Expose --norm-before-fc in train.py, add norm_before_fc to TrainArgs in gen_and_train.py, and set norm_before_fc=True in the gpt-oss example. Made-with: Cursor
shubhra
added a commit
that referenced
this pull request
Mar 16, 2026
Follow-up to #337. Expose --norm-before-fc in train.py, add norm_before_fc to TrainArgs in gen_and_train.py, and set norm_before_fc=True in the gpt-oss example. Made-with: Cursor Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>
shubhra
added a commit
that referenced
this pull request
Mar 16, 2026
Follow-up to #337. Expose --norm-before-fc in train.py, add norm_before_fc to TrainArgs in gen_and_train.py, and set norm_before_fc=True in the gpt-oss example. Made-with: Cursor Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>
shanjiaz
pushed a commit
that referenced
this pull request
Mar 17, 2026
…#347) Follow-up to [#337](#337), which added config + core support for `norm_before_fc` but did not expose it in the training CLI or the combined pipeline. Without this, users running `train.py` or `gen_and_train.py` had no way to enable the pre-FC norm, so gpt-oss models could not be trained correctly via the standard scripts. ## Changes - **train.py:** Add `--norm-before-fc` flag so the training script can pass `norm_before_fc=True` into the Eagle3 config. - **gen_and_train.py:** Add `norm_before_fc` to `TrainArgs` so the combined pipeline forwards the flag to `train.py`. - **gpt_oss_20b_ultrachat_5k.py:** Set `norm_before_fc=True` in the gpt-oss example so it trains with the stabilizing norm out of the box. With these changes, gpt-oss models train correctly (`--norm-before-fc`), and all other models continue to train as before (flag defaults to off). ## Tests - `train.py --norm-before-fc` creates the draft model with `input_norm`; omitting the flag matches pre-#337 behavior. - gpt-oss example runs end-to-end with the norm enabled. ## Related - Core + config: [#337](#337) - Inference support in vLLM: [vllm-project/vllm#36545](vllm-project/vllm#36545) Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>
YzTongNiar
pushed a commit
to YzTongNiar/speculators
that referenced
this pull request
Apr 10, 2026
When training Eagle3 draft models for gpt-oss, we observed exploding hidden states in the draft path. This PR adds an optional RMSNorm before the fc (on the concatenated 3× aux hidden states) to stabilize training. The behavior is gated by `norm_before_fc` so only gpt-oss (or models that need it) use it; others are unchanged. #### Description The Eagle3 fusion path concatenates three aux hidden states and projects via fc. `gpt-oss` exhibits exploding states in this path; we add an optional RMSNorm (`input_norm`) before the fc to stabilize. The norm runs at train time (speculators) and inference (vLLM), so there is no train–serve mismatch. - **Config:** `norm_before_fc: bool = False` on `Eagle3SpeculatorConfig` (same style as `norm_before_residual`). When True, the draft model uses the pre-fc norm. - **Core:** Create/apply `input_norm` only when `config.norm_before_fc`; otherwise fc gets the raw concat as before. - **Loading:** `"input_norm.weight"` in `_keys_to_ignore_on_load_missing` so old checkpoints without the norm still load. #### Tests - Training with `norm_before_fc=True` uses `input_norm`; with `False` (default) behavior matches pre-PR. **Related:** Inference support in vLLM: vllm-project/vllm#36545 --------- Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>
YzTongNiar
pushed a commit
to YzTongNiar/speculators
that referenced
this pull request
Apr 10, 2026
…vllm-project#347) Follow-up to [vllm-project#337](vllm-project#337), which added config + core support for `norm_before_fc` but did not expose it in the training CLI or the combined pipeline. Without this, users running `train.py` or `gen_and_train.py` had no way to enable the pre-FC norm, so gpt-oss models could not be trained correctly via the standard scripts. ## Changes - **train.py:** Add `--norm-before-fc` flag so the training script can pass `norm_before_fc=True` into the Eagle3 config. - **gen_and_train.py:** Add `norm_before_fc` to `TrainArgs` so the combined pipeline forwards the flag to `train.py`. - **gpt_oss_20b_ultrachat_5k.py:** Set `norm_before_fc=True` in the gpt-oss example so it trains with the stabilizing norm out of the box. With these changes, gpt-oss models train correctly (`--norm-before-fc`), and all other models continue to train as before (flag defaults to off). ## Tests - `train.py --norm-before-fc` creates the draft model with `input_norm`; omitting the flag matches pre-vllm-project#337 behavior. - gpt-oss example runs end-to-end with the norm enabled. ## Related - Core + config: [vllm-project#337](vllm-project#337) - Inference support in vLLM: [vllm-project/vllm#36545](vllm-project/vllm#36545) Signed-off-by: Shubhra Pandit <shubhra.pandit@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When training Eagle3 draft models for gpt-oss, we observed exploding hidden states in the draft path. This PR adds an optional RMSNorm before the fc (on the concatenated 3× aux hidden states) to stabilize training. The behavior is gated by
norm_before_fcso only gpt-oss (or models that need it) use it; others are unchanged.Description
The Eagle3 fusion path concatenates three aux hidden states and projects via fc.
gpt-ossexhibits exploding states in this path; we add an optional RMSNorm (input_norm) before the fc to stabilize. The norm runs at train time (speculators) and inference (vLLM), so there is no train–serve mismatch.norm_before_fc: bool = FalseonEagle3SpeculatorConfig(same style asnorm_before_residual). When True, the draft model uses the pre-fc norm.input_normonly whenconfig.norm_before_fc; otherwise fc gets the raw concat as before."input_norm.weight"in_keys_to_ignore_on_load_missingso old checkpoints without the norm still load.Tests
norm_before_fc=Trueusesinput_norm; withFalse(default) behavior matches pre-PR.Related: Inference support in vLLM: vllm-project/vllm#36545