[Bugfix] DiffusionGemma: only pop a request's logprobs when it commits (#45689) by waynehacking8 · Pull Request #45754 · vllm-project/vllm

waynehacking8 · 2026-06-16T01:19:10Z

Purpose

In a mixed decode batch, DiffusionSampler.__call__ gated logprobs
reassembly on the batch-wide is_committing.any(), then popped
self._pending_logprobs[slot] for every decode request that had a stashed
entry — without checking whether that request was itself committing this
step:

if is_committing.any() and self._pending_logprobs:
    for i in range(num_reqs):
        ...
        if is_decode_np[i] and slot in self._pending_logprobs:   # no per-request committing check
            lp = self._pending_logprobs.pop(slot)

So when request A commits while a co-batched request B merely converged this
step (its logprobs stashed via just_converged, but is_committing[B] is
False), B's stash is consumed one step early. B's eventual committed response
then returns fewer logprob rows than tokens, crashing the OpenAI chat formatter
with IndexError: list index out of range (in _create_chat_logprobs).

Credit to @masterFoad for the root-cause analysis and the suggested fix in the
issue.

Fix

Pop a request's stash only when that request commits: build the set of
committing slots from the pre-step is_committing snapshot and require
slot in committing_slots. A converged-but-not-committing request keeps its
stash until its own commit. The commit-loop assembly is extracted into a
_assemble_committed_logprobs static helper so the interleaving is unit-testable
without a GPU. No behavior change for the single-request / aligned cases.

Not a duplicate

gh issue view 45689 --comments → unassigned, no prior PR.
gh pr list --state open --search "45689" → none.
The reporter's other open PRs are all in spec-decode (a different subsystem); no competing PR for this fix.

Test plan / results

New tests/models/test_diffusion_gemma_logprobs.py (pure CPU, no GPU):

$ python -m pytest tests/models/test_diffusion_gemma_logprobs.py -q
4 passed

Covers: only-committing-pops, both-committing-emit-in-order (incl.
cu_num_generated_tokens), no-committing-returns-None, non-decode-skipped.
Verified the test catches the bug: removing the slot in committing_slots
guard makes test_only_committing_request_pops_its_logprobs and
test_no_committing_request_returns_none fail (the early/erroneous pop).

pre-commit run --files vllm/model_executor/models/diffusion_gemma.py tests/models/test_diffusion_gemma_logprobs.py → all hooks pass (ruff check + format, mypy-3.10, SPDX, …).

Notes

AI-assisted: this change was developed with AI assistance (Claude) and reviewed
end-to-end by the submitter.

In a mixed decode batch, `DiffusionSampler.__call__` gated logprobs reassembly on the batch-wide `is_committing.any()`, then popped `_pending_logprobs[slot]` for every decode request that had a stashed entry — without checking whether that request was itself committing this step. So when request A commits while a co-batched request B merely converged this step (its logprobs stashed via `just_converged`), B's stash was consumed one step early. B's eventual committed response then returned fewer logprob rows than tokens, crashing the OpenAI chat formatter with `IndexError` (issue vllm-project#45689). Pop a request's stash only when that request commits: build the set of committing slots from the pre-step `is_committing` snapshot and require `slot in committing_slots`. A converged-but-not-committing request keeps its stash until its own commit. The commit-loop assembly is extracted into `_assemble_committed_logprobs` so the interleaving can be unit tested without a GPU. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Wayne Chiu <waynehacking8@gmail.com>

waynehacking8 requested review from AndreasKaratzas, DarkLight1337 and ywang96 as code owners June 16, 2026 01:19

mergify Bot added the bug Something isn't working label Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] DiffusionGemma: only pop a request's logprobs when it commits (#45689)#45754

[Bugfix] DiffusionGemma: only pop a request's logprobs when it commits (#45689)#45754
waynehacking8 wants to merge 1 commit into
vllm-project:mainfrom
waynehacking8:wayne/fix-45689-diffgemma-logprobs

waynehacking8 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

waynehacking8 commented Jun 16, 2026

Purpose

Fix

Not a duplicate

Test plan / results

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant