[CB] Changes to increase max_batch_tokens by remi-or · Pull Request #46712 · huggingface/transformers

remi-or · 2026-06-17T08:05:58Z

Summary

This PR aims at increasing the performances of continuous batching, especially for prefill batches. To that order, it changes the following things:

rework the cache size estimator by adding a default max_batch_tokens of 8192 and applying a VRAM-based bound. In effect, this means that the prefill batches can now be much bigger
along with this behavioral change, the cache size estimator has been reworked to be more readable and self contained
deprecate the max_cached_graph attributes, because it was made useless by the graph pool: since all graphs share the same pool, the memory footprint of new graphs is negligible. Actually, this parameter was hurting performance because discarding and recording new graphs led to fragmentation and more memory being used in the end

❗ This PR hinges on #46587 to be merged.

Performance ✅

label	main acc	new PR acc	main (tok/s)	new PR (tok/s)	diff
gsm8k_default	0.825	0.822	2477.36	3301.09	+33.25%
gsm8k_sampling	0.790	0.788	1952.49	2936.00	+50.37%
gsm8k_compile	0.825	0.823	2477.76	3444.92	+39.03%
gsm8k_no_fast_decode	0.825	0.822	2364.45	2348.65	-0.67%
gsm8k_bare_bones	0.825	0.816	1886.61	1911.51	+1.32%
ifeval_default	0.457	0.445	8958.99	10163.10	+13.44%
few_blocks	-	-	666.69	674.44	+1.16%
multi_return_seq	-	-	1626.68	1764.44	+8.47%
rollouts_1024	-	-	3545.67	3609.38	+1.80%
rollouts_2048	-	-	3363.86	3394.07	+0.90%
rollouts_4096	-	-	2958.96	2984.04	+0.85%
rollouts_8192	-	-	2360.50	2362.29	+0.08%
rollouts_16384	-	-	1549.78	1537.00	-0.83%

Big performance improvements for prefill-bound workloads. Small accuracy regression on "bare-bones", because the size of the batch affects the generated tokens.

Tests ✅

This PR adds a new test to verify the actual memory footprint is aligned with the expected one.
All tests pass

AI Review ✅

Reviewed and addressed.

remi-or added 18 commits June 16, 2026 06:37

Core

7d0bda3

Fixes

aa25359

Review

b839f1e

Add compile flushing to test

96a64f8

Fix serialization dropping attributes

9f3f876

comments

1d32c75

Starting to remodel

2f4b782

Simplification 1/n

2976a80

Simplification 2/n

4f74630

Simplification 3/n

d29c9de

Simplification 4/n

a963a89

Simplification 5/n

613c63a

Fix test

21332dc

Enforce right order

fafa8fb

Added a more down-to earth test

93ff18f

Deprecated max_cached_graphs

422fc23

Style

de41d2d

Review

7e801db

remi-or requested a review from ArthurZucker June 17, 2026 08:05

remi-or added this to Continuous batching Jun 17, 2026

github-project-automation Bot moved this to Backlog in Continuous batching Jun 17, 2026

remi-or moved this from Backlog to In review in Continuous batching Jun 17, 2026

remi-or self-assigned this Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CB] Changes to increase max_batch_tokens#46712

[CB] Changes to increase max_batch_tokens#46712
remi-or wants to merge 18 commits into
mainfrom
cb-more-prefill

remi-or commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

remi-or commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance ✅

Tests ✅

AI Review ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

remi-or commented Jun 17, 2026 •

edited

Loading