Skip to content

Conversation

@meenchen
Copy link
Contributor

@meenchen meenchen commented Dec 12, 2025

What does this PR do?

Type of change: New example

Overview:

Add QAD Training example for Megatron-LM

  • File Structure

    • qad.sh / sbatch_qad.sh - Training and SLURM submission scripts
    • data_utils/ - Dataset download and preprocessing utilities
    • configs/ - Configuration templates for Qwen3-30B-A3B (MoE) and Qwen3-8B (Dense)
  • Key Features

    • One-button dataset generation (OpenScience + Nemotron-v2)
    • Config-based training scripts, keep all tunable knobs into a single config file

Usage

  1. Generate dataset
bash data_utils/generate_dataset.sh \
    --output-dir /path/to/datasets \
    --mlm-path /path/to/Megatron-LM \
    --tokenizer Qwen/Qwen3-30B-A3B-Instruct-2507
  1. Create a config based on templates
  2. Kick off training with Slurm:
sbatch sbatch_qad.sh --config configs/my-experiment.conf

Testing

QAD with Qwen3-30B-A3B-instruct-2507 NVFP4 (all layers quantized)

  • GPQA:
    BF16: 0.549
    NVFP4 (PTQ): 0.4949
    NVFP4 (QAD): 0.5202

  • Livecodebench:
    BF16: 0.3987
    NVFP4 (PTQ): 0.37
    NVFP4 (QAD): 0.3855

  • Scicode:
    BF16: 0.325
    NVFP4 (PTQ): 0.276
    NVFP4 (QAD): 0.3146

  • AIME
    BF16: 0.6049
    NVFP4 (PTQ): 0.55
    NVFP4 (QAD): 0.5431

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@meenchen meenchen requested a review from a team as a code owner December 12, 2025 20:53
@meenchen meenchen requested a review from cjluo-nv December 12, 2025 20:53
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 12, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@meenchen meenchen marked this pull request as draft December 12, 2025 20:54
@cjluo-nv
Copy link
Collaborator

@meenchen could you update the PR title? Looks like there is a typo

@meenchen meenchen changed the title MLM QQD example MLM QAD example Dec 15, 2025
Signed-off-by: Wei-Ming Chen <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: Wei-Ming Chen <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
@meenchen meenchen force-pushed the weimingc/mlm_qad_exammple branch from 1eb34ca to 8515a03 Compare December 17, 2025 22:46
@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.73%. Comparing base (7a36ccc) to head (3f1e67f).
⚠️ Report is 69 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #682      +/-   ##
==========================================
+ Coverage   74.57%   74.73%   +0.16%     
==========================================
  Files         183      192       +9     
  Lines       18412    18870     +458     
==========================================
+ Hits        13730    14103     +373     
- Misses       4682     4767      +85     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
@meenchen meenchen self-assigned this Dec 18, 2025
@meenchen meenchen marked this pull request as ready for review December 18, 2025 18:08
@meenchen meenchen changed the title MLM QAD example [OMNIML-3017] MLM QAD example Dec 18, 2025
@meenchen meenchen requested a review from mxinO December 18, 2025 18:15
Signed-off-by: weimingc <[email protected]>
Copy link
Collaborator

@ChenhanYu ChenhanYu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved and provided some comments offline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants