-
Notifications
You must be signed in to change notification settings - Fork 218
Add support for Qwen3-Omni-30B-A3B-Thinking #677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Signed-off-by: ajrasane <[email protected]>
examples/llm_ptq/hf_ptq.py
Outdated
| "qwen3omni only supports one dataset for calibration, can extend this in the future" | ||
| ) | ||
| assert processor is not None, "The processor must be set for qwen3omni model." | ||
| dataset_name = args.dataset[0] if args.dataset else "scienceqa" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still recommend scienceqa as the default calib dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to cnn_dailymail
| num_samples=args.calib_size[0], | ||
| ) | ||
| elif model_type == "qwen3omni": | ||
| assert len(args.calib_size) == 1, ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this part, I think we may want to host it in a model specific python file/module. E.g. llm_ptq/models/qwen3omni.py.
@shengliangxu WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need to do it for now, I'll come up with a full design doc and then we can convert the whole repo afterwards. Even if we separate things out now, we may still refactor these anyway.
examples/llm_ptq/hf_ptq.py
Outdated
| # if args.verbose: | ||
| # mtq.print_quant_summary(full_model) | ||
|
|
||
| import contextlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to the top
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| torch.cuda.empty_cache() | ||
|
|
||
| free_mem_before, max_allocated_before = _get_free_gpu_mem() | ||
| is_enc_dec = model_type_is_enc_dec(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we merge this into _model_requires_generate?
| self.tokenizer = tokenizer | ||
| # Handle invalid device values that can come from multi-GPU models with device_map="auto" | ||
| if device is None or str(device) in ("auto", "meta", "cpu"): | ||
| device = "cuda" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe print a warning?
And does it mean if "cuda" not in str(device): device="cuda"?
Signed-off-by: ajrasane <[email protected]>
Signed-off-by: ajrasane <[email protected]>
Signed-off-by: ajrasane <[email protected]>
Signed-off-by: ajrasane <[email protected]>
Signed-off-by: ajrasane <[email protected]>
| model_is_already_quantized = is_quantized(model) | ||
|
|
||
| model_type = get_model_type(model) | ||
| if model_type == "qwen3omni" and os.environ.get("DISABLE_TALKER", "0") == "1": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably need to find a better way for configurations like this
Signed-off-by: ajrasane <[email protected]>
Comment out import and registration of Qwen3OmniMoe classes. Signed-off-by: Chenjie Luo <[email protected]>
8410674 to
7f80e6f
Compare
Signed-off-by: ajrasane <[email protected]>
7f80e6f to
0c4b38f
Compare
What does this PR do?
Type of change:
Model support
Overview:
Usage
Testing
Able to quantize model and generate output
Before your PR is "Ready for review"