-
Notifications
You must be signed in to change notification settings - Fork 218
[OMNIML-3017] MLM QAD example #682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@meenchen could you update the PR title? Looks like there is a typo |
Signed-off-by: Wei-Ming Chen <[email protected]> Signed-off-by: weimingc <[email protected]>
Signed-off-by: Wei-Ming Chen <[email protected]> Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
1eb34ca to
8515a03
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #682 +/- ##
==========================================
+ Coverage 74.57% 74.73% +0.16%
==========================================
Files 183 192 +9
Lines 18412 18870 +458
==========================================
+ Hits 13730 14103 +373
- Misses 4682 4767 +85 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
Signed-off-by: weimingc <[email protected]>
ChenhanYu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved and provided some comments offline.
What does this PR do?
Type of change: New example
Overview:
Add QAD Training example for Megatron-LM
File Structure
Key Features
Usage
bash data_utils/generate_dataset.sh \ --output-dir /path/to/datasets \ --mlm-path /path/to/Megatron-LM \ --tokenizer Qwen/Qwen3-30B-A3B-Instruct-2507Testing
QAD with Qwen3-30B-A3B-instruct-2507 NVFP4 (all layers quantized)
GPQA:
BF16: 0.549
NVFP4 (PTQ): 0.4949
NVFP4 (QAD): 0.5202
Livecodebench:
BF16: 0.3987
NVFP4 (PTQ): 0.37
NVFP4 (QAD): 0.3855
Scicode:
BF16: 0.325
NVFP4 (PTQ): 0.276
NVFP4 (QAD): 0.3146
AIME
BF16: 0.6049
NVFP4 (PTQ): 0.55
NVFP4 (QAD): 0.5431
Before your PR is "Ready for review"
Additional Information