fix(serve): don't classify Gemma 4 closed prefilled thinking block as reasoning by Anai-Guo · Pull Request #46589 · huggingface/transformers

Anai-Guo · 2026-06-12T07:11:17Z

What

Fixes #46561. With transformers serve and a Gemma 4 model, a normal (thinking-disabled) chat completion comes back with empty content — the whole answer is misclassified as reasoning_content.

Root cause

Gemma 4's chat template prefills an empty, already-closed thinking block at the end of the prompt when thinking is disabled, so the prompt tail is <|channel>thought\n<channel|> → [..., 100, 45518, 107, 101] where start_ids = [100, 45518, 107] and the trailing token 101 is the thinking end token (<channel|>).

_starts_in_thinking() matches start_ids at the tail while tolerating one trailing token (so genuine prefilled openers like DeepSeek-R1 / QwQ that emit <think>\n still match). The closing tag was accepted as that tolerated trailing token, so the heuristic wrongly reported start_in_thinking=True. Because the output contains no thinking markers, parse_reasoning() then takes the "prefilled opener truncated before close" branch and returns content="" with everything in reasoning.

Fix

Thread the already-computed end_id into _starts_in_thinking() and reject the one-trailing-token match when that trailing token is the thinking end token — a closed prefilled block means the prompt is not left inside a thinking block. The trailing == 0 (ends exactly on opener) and trailing == 1 with a non-end token (DeepSeek-R1 / QwQ \n) cases are unchanged.

Tests

Added TestStartsInThinking (CPU-only, no model download) covering: ends-exactly-on-opener, opener + whitespace trailing, the Gemma 4 closed-block case (returns False), the closed block without end_id (back-compat still matches), no-thinking tail, and batched input_ids.

🤖 Generated with Claude Code

Gemma 4 with thinking disabled prefills an empty, already-closed thinking block (`<|channel>thought <channel|>`) at the prompt tail. `_starts_in_thinking` tolerates one trailing token after the opener, so the closing tag was mistaken for the ` ` trailing case and the whole completion was reclassified as reasoning_content, leaving `content` empty. Reject the tail match when the single trailing token is the thinking end token, so closed prefilled blocks are not treated as in-thinking. The DeepSeek-R1 / QwQ prefilled-opener case (trailing ` `) is unchanged.

github-actions · 2026-06-12T07:12:06Z

CI Dashboard: View test results in Grafana

Rocketknight1 · 2026-06-12T10:39:57Z

Please stop the random drive-by Claude PRs! We can run Claude Code ourselves if we need it!

Rocketknight1 closed this Jun 12, 2026

Rocketknight1 added the Code agent slop label Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(serve): don't classify Gemma 4 closed prefilled thinking block as reasoning#46589

fix(serve): don't classify Gemma 4 closed prefilled thinking block as reasoning#46589
Anai-Guo wants to merge 1 commit into
huggingface:mainfrom
Anai-Guo:fix-gemma4-serve-closed-thinking-block

Anai-Guo commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Rocketknight1 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Anai-Guo commented Jun 12, 2026

What

Root cause

Fix

Tests

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Rocketknight1 commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants