Skip to content

[Bugfix] DiffusionGemma: only pop a request's logprobs when it commits (#45689)#45754

Open
waynehacking8 wants to merge 1 commit into
vllm-project:mainfrom
waynehacking8:wayne/fix-45689-diffgemma-logprobs
Open

[Bugfix] DiffusionGemma: only pop a request's logprobs when it commits (#45689)#45754
waynehacking8 wants to merge 1 commit into
vllm-project:mainfrom
waynehacking8:wayne/fix-45689-diffgemma-logprobs

Conversation

@waynehacking8

Copy link
Copy Markdown
Contributor

Purpose

Fixes #45689.

In a mixed decode batch, DiffusionSampler.__call__ gated logprobs
reassembly on the batch-wide is_committing.any(), then popped
self._pending_logprobs[slot] for every decode request that had a stashed
entry — without checking whether that request was itself committing this
step:

if is_committing.any() and self._pending_logprobs:
    for i in range(num_reqs):
        ...
        if is_decode_np[i] and slot in self._pending_logprobs:   # no per-request committing check
            lp = self._pending_logprobs.pop(slot)

So when request A commits while a co-batched request B merely converged this
step (its logprobs stashed via just_converged, but is_committing[B] is
False), B's stash is consumed one step early. B's eventual committed response
then returns fewer logprob rows than tokens, crashing the OpenAI chat formatter
with IndexError: list index out of range (in _create_chat_logprobs).

Credit to @masterFoad for the root-cause analysis and the suggested fix in the
issue.

Fix

Pop a request's stash only when that request commits: build the set of
committing slots from the pre-step is_committing snapshot and require
slot in committing_slots. A converged-but-not-committing request keeps its
stash until its own commit. The commit-loop assembly is extracted into a
_assemble_committed_logprobs static helper so the interleaving is unit-testable
without a GPU. No behavior change for the single-request / aligned cases.

Not a duplicate

  • gh issue view 45689 --comments → unassigned, no prior PR.
  • gh pr list --state open --search "45689" → none.
  • The reporter's other open PRs are all in spec-decode (a different subsystem); no competing PR for this fix.

Test plan / results

New tests/models/test_diffusion_gemma_logprobs.py (pure CPU, no GPU):

$ python -m pytest tests/models/test_diffusion_gemma_logprobs.py -q
4 passed

Covers: only-committing-pops, both-committing-emit-in-order (incl.
cu_num_generated_tokens), no-committing-returns-None, non-decode-skipped.
Verified the test catches the bug: removing the slot in committing_slots
guard makes test_only_committing_request_pops_its_logprobs and
test_no_committing_request_returns_none fail (the early/erroneous pop).

pre-commit run --files vllm/model_executor/models/diffusion_gemma.py tests/models/test_diffusion_gemma_logprobs.py → all hooks pass (ruff check + format, mypy-3.10, SPDX, …).

Notes

AI-assisted: this change was developed with AI assistance (Claude) and reviewed
end-to-end by the submitter.

In a mixed decode batch, `DiffusionSampler.__call__` gated logprobs
reassembly on the batch-wide `is_committing.any()`, then popped
`_pending_logprobs[slot]` for every decode request that had a stashed
entry — without checking whether that request was itself committing this
step. So when request A commits while a co-batched request B merely
converged this step (its logprobs stashed via `just_converged`), B's
stash was consumed one step early. B's eventual committed response then
returned fewer logprob rows than tokens, crashing the OpenAI chat
formatter with `IndexError` (issue vllm-project#45689).

Pop a request's stash only when that request commits: build the set of
committing slots from the pre-step `is_committing` snapshot and require
`slot in committing_slots`. A converged-but-not-committing request keeps
its stash until its own commit. The commit-loop assembly is extracted
into `_assemble_committed_logprobs` so the interleaving can be unit
tested without a GPU.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Wayne Chiu <waynehacking8@gmail.com>
@mergify mergify Bot added the bug Something isn't working label Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: DiffusionGemma chat logprobs can crash in batched requests

1 participant