-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
[Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM #23693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8293182
37c9bab
df8f889
f93bdd3
9ccfd09
020269c
ffb740a
04f11d9
2259b47
9c60a62
2a7f25f
a8439e2
00f526f
18bf91e
2dc3b8b
9edd082
952f3c5
e4419df
37bdf9f
020d9b0
2f39206
7b31e8a
a743a35
f0b66d6
5cc573e
895a6c2
5b0249b
62da375
252bf08
0323e29
8f59252
90e46ee
065816d
6645882
d6eca0c
21d9529
8ea80fc
92e0cc7
44a595f
d4b502a
8332924
243eac5
d463976
e34e441
919eef9
2731e8c
18e7d6c
539c0c3
5f4a501
e080e06
2e3484c
f8848bb
8a75b3a
a8675b7
a00dabc
05ddc34
60499f6
e6e3407
642bf2d
ef3c01c
d682f5e
b74c731
1d112d9
0889f66
ff2dd13
a4def24
930efd0
96c0c4e
97dbafa
144b148
44a2b34
e2ba707
0e2b4bd
54deb61
78228a6
af68574
4672c72
d833982
57d404b
f7a3ee0
c0efbbb
0767d98
0e499c4
3d833aa
18f7bfb
ce3ef95
be2e163
9b7edc0
0c03d15
3112714
1ca6541
fc562e2
a9d47e8
631be12
6e2a3c0
17a7cee
1d75a02
7e2ff26
2f3461a
83caef8
7cc5a54
0056be2
f7b6e60
510e839
bb0645c
3a41a3d
06cc133
908e9f8
10ca263
82ae694
1a0e711
716b032
dc1b6af
bfa828f
462c6b0
9033056
376e7eb
9b5913e
b53450e
29a5ac1
6d83b5e
1ba3ae8
ee70ce0
b9ad5e4
1c41175
582d301
ba17d95
e283eff
4819bb8
6b0c303
5bbfd95
2cf200c
28e7c30
44ead56
e526b1c
dd2a94f
5215c80
9e16220
6d76bd0
090f485
143b09e
32de502
fc0aca4
31ba624
85ee541
9ac75b5
34f0057
ac6e221
c8fdd62
bca8aa9
a35416e
4126a89
52fd4c1
717163a
197dad1
7813e15
968647a
57423ee
a3c2d62
bff1216
ee00620
6b9bda2
a5bda74
8f63ba9
d62286f
fe19b91
64457a2
a762835
528be37
e104dfa
df6ed10
9390dcb
6660171
01c70b4
d464b9e
53f5071
76f3c96
307ecf0
21b0f16
c1c003f
aebacdc
4718a2d
0c54343
b6d162f
44124af
7427b2d
756d721
9602070
32fb038
e42c0e7
10518bd
9e1f1af
b2ed6c3
ba00047
49cdc3d
ec9f13d
6d31123
2def98d
9da3928
6b6358a
87d300e
46895f3
4114f5c
2276ac6
178ec20
ef313e5
813ba08
9e08d5d
1e3a145
fc18cf4
b99ea7d
0d36d13
d3ec67b
120569a
0e479cb
880783e
73848ab
7b239fd
9c6c6fd
ff75a86
fa30304
bbec31e
9185ffc
911dbe7
bce1898
588e79a
a3d9969
92081eb
462d035
4fba0fe
e0d65df
77bc884
38a25a0
025e726
b1269ef
fe098a7
2c5f726
76b6248
bc1fcb0
9e86147
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -327,6 +327,9 @@ class EngineArgs: | |
data_parallel_hybrid_lb: bool = False | ||
data_parallel_backend: str = ParallelConfig.data_parallel_backend | ||
enable_expert_parallel: bool = ParallelConfig.enable_expert_parallel | ||
enable_dbo: bool = ParallelConfig.enable_dbo | ||
dbo_decode_token_threshold: int = \ | ||
ParallelConfig.dbo_decode_token_threshold | ||
eplb_config: EPLBConfig = get_field(ParallelConfig, "eplb_config") | ||
enable_eplb: bool = ParallelConfig.enable_eplb | ||
expert_placement_strategy: ExpertPlacementStrategy = \ | ||
|
@@ -695,6 +698,11 @@ def add_cli_args(parser: FlexibleArgumentParser) -> FlexibleArgumentParser: | |
parallel_group.add_argument( | ||
"--enable-expert-parallel", | ||
**parallel_kwargs["enable_expert_parallel"]) | ||
parallel_group.add_argument("--enable-dbo", | ||
**parallel_kwargs["enable_dbo"]) | ||
parallel_group.add_argument( | ||
"--dbo-decode-token-threshold", | ||
**parallel_kwargs["dbo_decode_token_threshold"]) | ||
Comment on lines
+703
to
+705
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the future plan for this argument? Will we add a separate There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep we are planning to add a prefill version of this argument. |
||
parallel_group.add_argument("--enable-eplb", | ||
**parallel_kwargs["enable_eplb"]) | ||
parallel_group.add_argument("--eplb-config", | ||
|
@@ -1339,6 +1347,8 @@ def create_engine_config( | |
data_parallel_backend=self.data_parallel_backend, | ||
data_parallel_hybrid_lb=self.data_parallel_hybrid_lb, | ||
enable_expert_parallel=self.enable_expert_parallel, | ||
enable_dbo=self.enable_dbo, | ||
dbo_decode_token_threshold=self.dbo_decode_token_threshold, | ||
enable_eplb=self.enable_eplb, | ||
eplb_config=self.eplb_config, | ||
expert_placement_strategy=self.expert_placement_strategy, | ||
|
Uh oh!
There was an error while loading. Please reload this page.