[Bug]: [HiggsAudioV3] Talker CUDA graph output differs from eager at the first decode step (parity follow-up)

### Describe the bug

Follow-up to #4562. With the talker capture fix and an up-to-date SM70 decode
kernel, the Stage-0 CUDA graph (`low_latency`) profile produces **correct,
intelligible** speech in real time — the output transcribes back to the input
prompt. However, it is **not bit-identical to the eager profile**: the waveform
is a different (valid) rendering of the same words — near-zero correlation with
the eager output, and slightly quieter.

The divergence is isolated to the **first decode step** (the prefill->decode
transition). Per-step talker LM-logit means:

```
eager:  step1 13.43   step2 0.312   step3 13.20   ...
graph:  step1 13.43   step2 13.56   step3 13.59   ...
```

Step 1 (prefill, eager in both) and steps 3+ match eager within tolerance; only
step 2 (the first captured decode) diverges. The seed/feedback state is identical
at step 2 in both modes (`count == 1`, `has_codes == 1`), so it is not a
seed-timing difference.

Two candidate causes (not fully isolated):

- the audio-feedback embedding reading a stale `_decode_last_codes` (the BOC seed)
  under capture at the first decode — forcing `has_codes = 1` does not change the
  result, which is consistent with the codes themselves being stale rather than
  the gate;
- the prefill->decode attention transition under `FULL_DECODE_ONLY` (the first
  replay after an eager prefill).

### Impact

Parity / cosmetic only — the speech is correct and intelligible (verified by
transcription). Filing for awareness as a follow-up to the capture-crash fix;
not a blocker.

### Environment

Tesla V100 / sm70, `FLASH_ATTN_V100` backend.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: [HiggsAudioV3] Talker CUDA graph output differs from eager at the first decode step (parity follow-up) #4564

Describe the bug

Impact

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: [HiggsAudioV3] Talker CUDA graph output differs from eager at the first decode step (parity follow-up) #4564

Description

Describe the bug

Impact

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions