Skip to content

[RFC]: HunyuanImage Model deployment optimization #2015

@Bounty-hunter

Description

@Bounty-hunter

Last updated: 2025-05-10

Overview

HunyuanImage-3.0-Instruct uses a two-stage AR (Auto-Regressive) + DIT (Diffusion Transformer) architecture: AR handles token generation, DIT handles image denoising. This document serves as the unified tracker for feature implementation, optimization, and maintenance of this model in vllm-omni.


1. AR Module

1.1 Functional Support

Task PR Author Priority Deadline
HunyuanImage-3.0 AR #759 @usberkeley -
HunyuanImage-3.0-Instruct AR #2713 @TaffyOfficial -
AR accuracy bench #3332 @TaffyOfficial P0
Multi-image input support #3444 @TaffyOfficial P0
Config refactor #3172 @Fishermanykx P0

1.2 Performance Features

Task PR Author Priority Deadline
performance analyze - @TaffyOfficial P1 2026/5/19

2. DIT Module

2.1 Functional Support

Task PR Author Priority Deadline
Hunyuanimage3.0 DIT #1085 @ElleElleWu P0

2.2 Performance Features

Task PR Author Priority Deadline
TP/EP - - -
SP (Sequence Parallelism) #2163 @Bounty-hunter P1
timing tool #1757 @Bounty-hunter P0
CFG Parallel #1751 @nussejzz P1
TeaCache #1927 @nussejzz P1
VAE Parallel #3091 @Fishermanykx P1 -
Flash Attention #2981 @Bounty-hunter P1

2.3 Quantilization

Task PR Author Priority Deadline
NPU offline quantilization 2979 @jiangmengyu18 P0

3. AR + DIT Joint Inference

Task PR Author Priority Deadline
AR + DIT with KV recompute #3107 @skf-1999 P0
AR + DIT with KV reuse #3346 @Bounty-hunter P0
Online mode adaptation #3410 @skf-1999 P0
Offline-Online Accuracy Alignment Check/Fix @skf-1999 P0 2026/5/13
YR connector (NPU) #3180 @yangsonglin13 P1
Skip encoding the parts in the DiT stage that have already been encoded during the AR stage, such as the system prompt and image tokens P1 close no obvious benefit

3.1 Large-scale Deployment

Production readiness for large-scale deployment, focusing on multi-replica and high-concurrency scenarios.

Task PR Author Priority Deadline
Single-node, multi-replica, uniform AR/DIT config (e.g. both TP2) - - P0 -
Multi-node, multi-replica, uniform AR/DIT config (e.g. both TP2) - - P0 -
Single-node, multi-replica, heterogeneous AR/DIT config - - P0 -
Multi-node, multi-replica, heterogeneous AR/DIT config - - P0 -

3.2 Baseline

Task PR Author Priority Deadline
Baseline evaluation and analyze - @Bounty-hunter @fake0fan P0 -

4. Cross-cutting / Maintenance

4.1 Known bug/issue

Task PR Author Priority Deadline
Rebase 0.20.0 accuracy issue #3373 @Bounty-hunter P0
#3477
#3499 #3500 @Bounty-hunter P0
#3503 @Fishermanykx P0
DIT-only accuracy issue P0
AR+DIT + multi-image-input + kvreuse accuracy issue P0
AR + DIT start up issue P0
recaption issue P0

4.2 CI & Quality

Category Test case test_file Covered Scenario
Accuracy AR + DIT accuracy ci tests\e2e\accuracy\test_hunyuan_image3.py kv_reuse AR accuracy online/offline
Accuracy DIT accuracy ci - DIT accuracy
Performance DIT performance ci run_diffusion_benchmark.py TP SP CFG parallel
Performance end2end performance ci - end2end AR performance
Task PR Author Priority Deadline
AR+DIT+kv_reuse+offline accuracy ci #3655 @Bounty-hunter P0
AR+DIT+kv_reuse+online accuracy ci - @BLANKETusers P0
DIT performance ci #2495 @Bounty-hunter P0
Add DIT performance ci to local test - @TaffyOfficial P0
DIT accuracy ci - @BLANKETusers P0
end2end performance ci current depend on benchmark P0

5. Appendix

5.1 Performance Data (L20x)

performance data hear

Metadata

Metadata

Labels

good first issueGood for newcomershelp wantedExtra attention is neededhigh priorityhigh priority issue, needs to be done asapnew modeladd new model

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions