Skip to content

[CB] Fix offloading#46587

Open
remi-or wants to merge 4 commits into
mainfrom
cb-fix-offloading
Open

[CB] Fix offloading#46587
remi-or wants to merge 4 commits into
mainfrom
cb-fix-offloading

Conversation

@remi-or

@remi-or remi-or commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR changes the offloading mechanism so that it can offload multiple requests at the same time.
Before, offloading triggered when we could not schedule a batch: we offloaded ONE request, and waited for the next forward to schedule the next batch. This lead to an arbitrary number of requests stalling while at least one could be scheduled, and we had to wait for the next forward pass.
Now, offloading triggers whenever there are requests that are stalled, and we offload several requests at the same time: enough so that no request is stalled. Also, in the case where we cannot schedule a batch at all, we loop inside the batch preparation phase until we know for sure a batch can't be schedule (error case) or a batch is scheduled. This way, we don't have to wait for the next forward pass.
This leads to much better results for workloads where the cache pressure is high, like ifeval (cf. below)

Performance ✅

label main acc PR acc main (tok/s) PR (tok/s) diff
gsm8k_default 0.822 0.825 2434.72 2440.98 +0.3%
gsm8k_sampling 0.795 0.773 1956.93 1956.57 -0.0%
gsm8k_compile 0.822 0.825 2455.82 2464.62 +0.4%
gsm8k_no_fast_decode 0.822 0.825 2353.98 2357.43 +0.1%
gsm8k_bare_bones 0.821 0.825 1874.35 1872.97 -0.1%
ifeval_default 0.445 0.457 8927.97 9585.02 +7.4%
few_blocks - - 662.31 655.57 -1.0%
multi_return_seq - - 1571.01 1600.24 +1.9%
rollouts_1024 - - 3641.04 3646.78 +0.2%
rollouts_2048 - - 3451.27 3363.73 -2.5%
rollouts_4096 - - 2710.43 2712.28 +0.1%
rollouts_8192 - - 2401.28 2356.5 -1.9%
rollouts_16384 - - 1571.05 1560.67 -0.7%

The only workload that benefits from this change is ifeval: +7.4% throughput. The other workloads see no change: there is a slight regression of throughput for RL length 2048 and 8192, but I re-run the numbers and it went away. I blame torch compile

Tests ✅

All tests pass.

Agent review ✅

Reviewed by claude using coumpound engineering plugin.

@remi-or remi-or requested a review from ArthurZucker June 12, 2026 06:56
@remi-or remi-or self-assigned this Jun 12, 2026
@remi-or remi-or moved this from Backlog to In review in Continuous batching Jun 12, 2026
@remi-or remi-or moved this from In review to In progress in Continuous batching Jun 12, 2026
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@remi-or remi-or force-pushed the cb-fix-offloading branch from e616a33 to 58be02a Compare June 12, 2026 09:26
@github-actions

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

@remi-or remi-or moved this from In progress to In review in Continuous batching Jun 12, 2026
@remi-or remi-or force-pushed the cb-fix-offloading branch from 58be02a to 96a64f8 Compare June 16, 2026 06:37
@github-actions

Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=46587&sha=96a64f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

2 participants