Skip to content

fix(serve): don't classify Gemma 4 closed prefilled thinking block as reasoning#46589

Closed
Anai-Guo wants to merge 1 commit into
huggingface:mainfrom
Anai-Guo:fix-gemma4-serve-closed-thinking-block
Closed

fix(serve): don't classify Gemma 4 closed prefilled thinking block as reasoning#46589
Anai-Guo wants to merge 1 commit into
huggingface:mainfrom
Anai-Guo:fix-gemma4-serve-closed-thinking-block

Conversation

@Anai-Guo

Copy link
Copy Markdown

What

Fixes #46561. With transformers serve and a Gemma 4 model, a normal (thinking-disabled) chat completion comes back with empty content — the whole answer is misclassified as reasoning_content.

Root cause

Gemma 4's chat template prefills an empty, already-closed thinking block at the end of the prompt when thinking is disabled, so the prompt tail is <|channel>thought\n<channel|>[..., 100, 45518, 107, 101] where start_ids = [100, 45518, 107] and the trailing token 101 is the thinking end token (<channel|>).

_starts_in_thinking() matches start_ids at the tail while tolerating one trailing token (so genuine prefilled openers like DeepSeek-R1 / QwQ that emit <think>\n still match). The closing tag was accepted as that tolerated trailing token, so the heuristic wrongly reported start_in_thinking=True. Because the output contains no thinking markers, parse_reasoning() then takes the "prefilled opener truncated before close" branch and returns content="" with everything in reasoning.

Fix

Thread the already-computed end_id into _starts_in_thinking() and reject the one-trailing-token match when that trailing token is the thinking end token — a closed prefilled block means the prompt is not left inside a thinking block. The trailing == 0 (ends exactly on opener) and trailing == 1 with a non-end token (DeepSeek-R1 / QwQ \n) cases are unchanged.

Tests

Added TestStartsInThinking (CPU-only, no model download) covering: ends-exactly-on-opener, opener + whitespace trailing, the Gemma 4 closed-block case (returns False), the closed block without end_id (back-compat still matches), no-thinking tail, and batched input_ids.

🤖 Generated with Claude Code

Gemma 4 with thinking disabled prefills an empty, already-closed thinking
block (`<|channel>thought
<channel|>`) at the prompt tail. `_starts_in_thinking`
tolerates one trailing token after the opener, so the closing tag was mistaken
for the `
` trailing case and the whole completion was reclassified as
reasoning_content, leaving `content` empty.

Reject the tail match when the single trailing token is the thinking end token,
so closed prefilled blocks are not treated as in-thinking. The DeepSeek-R1 / QwQ
prefilled-opener case (trailing `
`) is unchanged.
@github-actions

Copy link
Copy Markdown
Contributor

CI Dashboard: View test results in Grafana

@Rocketknight1

Copy link
Copy Markdown
Member

Please stop the random drive-by Claude PRs! We can run Claude Code ourselves if we need it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

serve: Gemma 4 non-thinking responses returned as reasoning_content with empty content

2 participants