This repository implements multi-GPU parallelization of TeaCache within the xDiT framework. The supported models include:
Please run the appropriate script (run_[your_model].sh) to get started.
Note: All following experiments are conducted on the A800
Model
Method
8x
Wan 2.1-T2V-14B
USP
473
TeaCache(0.2)
241
Model
Method
1x
2x
4x
8x
CogvideoX
Ulysses
133.5
77.7
59.41
36.68
TeaCache(slow)
103.3
60.9
46.8
29.6
TeaCache(fast)
78
46.7
36.5
23.7
Performance on HunyuanVideo
Model
Method
1x
2x
4x
8x
HunyuanVideo
Ulysses
3086.5
1624.4
856.6
480.4
TeaCache(slow)
1884.5
1001
540.8
312.9
TeaCache(fast)
1395.2
753.9
418.4
254.1
Model
Method
1x
2x
4x
8x
Flux
Ulysses
12.9
7.91
4.4
3.61
TeaCache(slow)
12.1
4.61
2.68
2.24
TeaCache(fast)
7.9
2.45
1.53
1.3
Model
Method
1x
2x
3x
6x
ConsistID
Ulysses
218.64
123.84
88.6
54.09
TeaCache(slow)-0.1
142.18
81.39
60.71
39.77
TeaCache(fast)-0.2
90.34
53.82
41.34
27.18
This respotory is built based on TeaCache and xDiT , thanks for their contributors!