fix(serve): don't classify Gemma 4 closed prefilled thinking block as reasoning#46589
Closed
Anai-Guo wants to merge 1 commit into
Closed
fix(serve): don't classify Gemma 4 closed prefilled thinking block as reasoning#46589Anai-Guo wants to merge 1 commit into
Anai-Guo wants to merge 1 commit into
Conversation
Gemma 4 with thinking disabled prefills an empty, already-closed thinking block (`<|channel>thought <channel|>`) at the prompt tail. `_starts_in_thinking` tolerates one trailing token after the opener, so the closing tag was mistaken for the ` ` trailing case and the whole completion was reclassified as reasoning_content, leaving `content` empty. Reject the tail match when the single trailing token is the thinking end token, so closed prefilled blocks are not treated as in-thinking. The DeepSeek-R1 / QwQ prefilled-opener case (trailing ` `) is unchanged.
Contributor
|
CI Dashboard: View test results in Grafana |
Member
|
Please stop the random drive-by Claude PRs! We can run Claude Code ourselves if we need it! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Fixes #46561. With
transformers serveand a Gemma 4 model, a normal (thinking-disabled) chat completion comes back with emptycontent— the whole answer is misclassified asreasoning_content.Root cause
Gemma 4's chat template prefills an empty, already-closed thinking block at the end of the prompt when thinking is disabled, so the prompt tail is
<|channel>thought\n<channel|>→[..., 100, 45518, 107, 101]wherestart_ids = [100, 45518, 107]and the trailing token101is the thinking end token (<channel|>)._starts_in_thinking()matchesstart_idsat the tail while tolerating one trailing token (so genuine prefilled openers like DeepSeek-R1 / QwQ that emit<think>\nstill match). The closing tag was accepted as that tolerated trailing token, so the heuristic wrongly reportedstart_in_thinking=True. Because the output contains no thinking markers,parse_reasoning()then takes the "prefilled opener truncated before close" branch and returnscontent=""with everything in reasoning.Fix
Thread the already-computed
end_idinto_starts_in_thinking()and reject the one-trailing-token match when that trailing token is the thinking end token — a closed prefilled block means the prompt is not left inside a thinking block. Thetrailing == 0(ends exactly on opener) andtrailing == 1with a non-end token (DeepSeek-R1 / QwQ\n) cases are unchanged.Tests
Added
TestStartsInThinking(CPU-only, no model download) covering: ends-exactly-on-opener, opener + whitespace trailing, the Gemma 4 closed-block case (returnsFalse), the closed block withoutend_id(back-compat still matches), no-thinking tail, and batchedinput_ids.🤖 Generated with Claude Code