You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have completed the initial Ascend NPU enablement in vllm-omni v0.11.0rc1 and v0.12.0rc1, with support for most mainstream models such as Qwen3-Omni and the Qwen-Image series.
Building on this foundation, the next phase will focus on systematically expanding model coverage and prioritizing performance optimization efforts, with a clear roadmap to improve scalability, stability, and overall serving efficiency on Ascend NPU.
Version match
Currently, vLLM-Omni’s NPU support depends on vLLM-Ascend, the Ascend support plugin of vLLM. The AR (auto-regressive) path is jointly supported by vLLM and vLLM-Ascend.
Meanwhile, MindIE-SD serves as a standalone Ascend-optimized diffusion operator library. It is currently integrated through the FlashAttentionBackend and a set of CustomOp, delivering Ascend-native operators to improve the performance of diffusion models.
We're also building the separate plugin platform in vLLM-Omni to support scalable hardware better in the future.
We are actively working to simplify the installation of mindie-sd. Eventually, it will be available via pip install mindie-sd. At the moment, however, some additional work is required.
git clone https://gitcode.com/Ascend/MindIE-SD.git && cd MindIE-SD
# Need to comment the line `source ${current_script_dir}/build_tik_ops.sh` in build/build_ops.sh
sed -i 's|^\(\s*\)source ${current_script_dir}/build_tik_ops.sh|\1# source ${current_script_dir}/build_tik_ops.sh|' build/build_ops.sh
python setup.py bdist_wheel
cd dist
pip install mindiesd-*.whl
Memory usage: currently, Qwen2.5-Omni and Qwen3-Omni have to separate talker to one different device from thinker. We expect to make them together so that Qwen2.5-Omni and Qwen3-Omni would only need 2 and 4 cards.
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Background
We have completed the initial Ascend NPU enablement in vllm-omni v0.11.0rc1 and v0.12.0rc1, with support for most mainstream models such as Qwen3-Omni and the Qwen-Image series.
Building on this foundation, the next phase will focus on systematically expanding model coverage and prioritizing performance optimization efforts, with a clear roadmap to improve scalability, stability, and overall serving efficiency on Ascend NPU.
Version match
Currently, vLLM-Omni’s NPU support depends on vLLM-Ascend, the Ascend support plugin of vLLM. The AR (auto-regressive) path is jointly supported by vLLM and vLLM-Ascend.
Meanwhile, MindIE-SD serves as a standalone Ascend-optimized diffusion operator library. It is currently integrated through the
FlashAttentionBackendand a set ofCustomOp, delivering Ascend-native operators to improve the performance of diffusion models.We're also building the separate plugin platform in vLLM-Omni to support scalable hardware better in the future.
How to install MindIE-SD
Official Link: MindIE-SD
We are actively working to simplify the installation of mindie-sd. Eventually, it will be available via pip install mindie-sd. At the moment, however, some additional work is required.
Feature Support
Omni(AR+Generator) Pipeline
Diffusion Pipeline
Qwen-Image-Edit-2511OptimizationOthers(UX & Hardware Scalable)
Docs
Known Issues
conv3d,nn.Linear, and others.Model Support List
Qwen3OmniMoeForConditionalGenerationQwen/Qwen3-Omni-30B-A3B-InstructQwen2_5OmniForConditionalGenerationQwen/Qwen2.5-Omni-7B,Qwen/Qwen2.5-Omni-3BBagelForConditionalGenerationByteDance-Seed/BAGEL-7B-MoTQwenImagePipelineQwen/Qwen-ImageQwenImagePipelineQwen/Qwen-Image-2512QwenImageEditPipelineQwen/Qwen-Image-EditQwenImageEditPlusPipelineQwen/Qwen-Image-Edit-2509QwenImageLayeredPipelineQwen/Qwen-Image-LayeredZImagePipelineTongyi-MAI/Z-Image-TurboWanPipelineWan-AI/Wan2.2-T2V-A14B-Diffusers,Wan-AI/Wan2.2-TI2V-5B-DiffusersWanImageToVideoPipelineWan-AI/Wan2.2-I2V-A14B-DiffusersOvisImagePipelineOvisAI/Ovis-ImageLongcatImagePipelinemeituan-longcat/LongCat-ImageLongCatImageEditPipelinemeituan-longcat/LongCat-Image-EditStableDiffusion3Pipelinestabilityai/stable-diffusion-3.5-mediumFlux2KleinPipelineblack-forest-labs/FLUX.2-klein-4B,black-forest-labs/FLUX.2-klein-9BStableAudioPipelinestabilityai/stable-audio-open-1.0Qwen3TTSForConditionalGenerationQwen/Qwen3-TTS-12Hz-1.7B-CustomVoiceQwen3TTSForConditionalGenerationQwen/Qwen3-TTS-12Hz-1.7B-VoiceDesignQwen3TTSForConditionalGenerationQwen/Qwen3-TTS-12Hz-0.6B-BaseFeedback Period.
No response
CC List.
No response
Any Other Things.
No response
Before submitting a new issue...