Last updated: 2025-05-10
Overview
HunyuanImage-3.0-Instruct uses a two-stage AR (Auto-Regressive) + DIT (Diffusion Transformer) architecture: AR handles token generation, DIT handles image denoising. This document serves as the unified tracker for feature implementation, optimization, and maintenance of this model in vllm-omni.
1. AR Module
1.1 Functional Support
1.2 Performance Features
| Task |
PR |
Author |
Priority |
Deadline |
| performance analyze |
- |
@TaffyOfficial |
P1 |
2026/5/19 |
2. DIT Module
2.1 Functional Support
2.2 Performance Features
2.3 Quantilization
3. AR + DIT Joint Inference
| Task |
PR |
Author |
Priority |
Deadline |
| AR + DIT with KV recompute |
#3107 |
@skf-1999 |
P0 |
✅ |
| AR + DIT with KV reuse |
#3346 |
@Bounty-hunter |
P0 |
✅ |
| Online mode adaptation |
#3410 |
@skf-1999 |
P0 |
✅ |
| Offline-Online Accuracy Alignment Check/Fix |
|
@skf-1999 |
P0 |
2026/5/13 |
| YR connector (NPU) |
#3180 |
@yangsonglin13 |
P1 |
|
| Skip encoding the parts in the DiT stage that have already been encoded during the AR stage, such as the system prompt and image tokens |
|
|
P1 |
close no obvious benefit |
3.1 Large-scale Deployment
Production readiness for large-scale deployment, focusing on multi-replica and high-concurrency scenarios.
| Task |
PR |
Author |
Priority |
Deadline |
| Single-node, multi-replica, uniform AR/DIT config (e.g. both TP2) |
- |
- |
P0 |
- |
| Multi-node, multi-replica, uniform AR/DIT config (e.g. both TP2) |
- |
- |
P0 |
- |
| Single-node, multi-replica, heterogeneous AR/DIT config |
- |
- |
P0 |
- |
| Multi-node, multi-replica, heterogeneous AR/DIT config |
- |
- |
P0 |
- |
3.2 Baseline
4. Cross-cutting / Maintenance
4.1 Known bug/issue
4.2 CI & Quality
| Category |
Test case |
test_file |
Covered Scenario |
| Accuracy |
AR + DIT accuracy ci |
tests\e2e\accuracy\test_hunyuan_image3.py |
kv_reuse AR accuracy online/offline |
| Accuracy |
DIT accuracy ci |
- |
DIT accuracy |
| Performance |
DIT performance ci |
run_diffusion_benchmark.py |
TP SP CFG parallel |
| Performance |
end2end performance ci |
- |
end2end AR performance |
5. Appendix
5.1 Performance Data (L20x)
Overview
HunyuanImage-3.0-Instructuses a two-stage AR (Auto-Regressive) + DIT (Diffusion Transformer) architecture: AR handles token generation, DIT handles image denoising. This document serves as the unified tracker for feature implementation, optimization, and maintenance of this model in vllm-omni.1. AR Module
1.1 Functional Support
1.2 Performance Features
2. DIT Module
2.1 Functional Support
2.2 Performance Features
2.3 Quantilization
3. AR + DIT Joint Inference
3.1 Large-scale Deployment
Production readiness for large-scale deployment, focusing on multi-replica and high-concurrency scenarios.
3.2 Baseline
4. Cross-cutting / Maintenance
4.1 Known bug/issue
4.2 CI & Quality
tests\e2e\accuracy\test_hunyuan_image3.pykv_reuseAR accuracyonline/offlineDIT accuracyrun_diffusion_benchmark.pyTPSPCFG parallelend2endAR performance5. Appendix
5.1 Performance Data (L20x)