fix: Ban qwen3 tito model and handle non-tool appended messages in TITO dummy assistant by guapisolo · Pull Request #949 · radixark/miles

guapisolo · 2026-04-07T20:36:40Z

Summary

Globally ban --tito-model=qwen3 — Qwen3's context-sensitive template (last_query_index, <think> block) breaks the dummy-prefix diff, producing wrong incremental tokens. Raises ValueError at arg validation with a clear message.
Remove the reasoning_content: " " workaround from _build_dummy_assistant (no longer needed with Qwen3 banned)
_build_dummy_assistant now counts all tool messages in appended (not just leading contiguous), using tool_indices for correct indexing
Adds test_dummy_prefix_match (GLM47-only) to guard the prefix invariant
TITO parametrized tests scoped to GLM47 only

Test plan

48 tests pass (GLM47 parametrized + Qwen3 boundary + factory)
Verified GLM47 prefix match for all trajectory × split combinations
Verified ValueError raised when --tito-model=qwen3

Made with Cursor

🤖 Generated with Claude Code

Previously `_build_dummy_assistant` assumed all appended messages were tool responses and always generated matching `tool_calls`. When a non-tool message (e.g. `user`) was appended, the template rendered incorrect turn-transition tokens (`<|observation|>` instead of `<|user|>`). Now the function only generates `tool_calls` for the leading contiguous tool messages. If the first appended message is not a tool role, no `tool_calls` are emitted so the template picks the correct boundary. Made-with: Cursor

gemini-code-assist

Code Review

This pull request updates the _build_dummy_assistant function to correctly handle leading tool messages within appended_messages, ensuring proper turn-transition tokens are rendered. Feedback suggests restoring the reasoning_content field to maintain consistency with reasoning models and simplifying the tool_calls list comprehension using enumerate on a slice of the messages.

gemini-code-assist · 2026-04-07T20:38:30Z

miles/utils/chat_template_utils/tito_tokenizer.py

+    assistant: dict[str, Any] = {"role": "assistant", "content": ""}
+    if num_leading_tools > 0:
+        assistant["tool_calls"] = [
            {
-                "id": resp.get("tool_call_id") or f"call0000{i}",
+                "id": appended_messages[i].get("tool_call_id") or f"call0000{i}",
                "type": "function",
                "function": {
-                    "name": resp.get("name") or "dummy_func",
+                    "name": appended_messages[i].get("name") or "dummy_func",
                    "arguments": {},
                },
            }
-            for i, resp in enumerate(tool_responses)
-        ],
-    }
+            for i in range(num_leading_tools)
+        ]


The reasoning_content field was removed from the dummy assistant message. This field was present in the previous implementation (line 32) and is often necessary for reasoning models to correctly render turn boundaries (e.g., to ensure the reasoning block is closed). Unless its removal was intentional to fix a specific issue, it should be restored.

Additionally, the tool_calls generation can be simplified using enumerate on a slice of appended_messages.

Suggested change

assistant: dict[str, Any] = {"role": "assistant", "content": ""}

if num_leading_tools > 0:

assistant["tool_calls"] = [

{

"id": resp.get("tool_call_id") or f"call0000{i}",

"id": appended_messages[i].get("tool_call_id") or f"call0000{i}",

"type": "function",

"function": {

"name": resp.get("name") or "dummy_func",

"name": appended_messages[i].get("name") or "dummy_func",

"arguments": {},

},

}

for i, resp in enumerate(tool_responses)

],

}

for i in range(num_leading_tools)

]

assistant: dict[str, Any] = {

"role": "assistant",

"content": "",

"reasoning_content": " ",

}

if num_leading_tools > 0:

assistant["tool_calls"] = [

{

"id": msg.get("tool_call_id") or f"call0000{i}",

"type": "function",

"function": {

"name": msg.get("name") or "dummy_func",

"arguments": {},

},

}

for i, msg in enumerate(appended_messages[:num_leading_tools])

]

…ix stability Without `reasoning_content: " "`, Qwen3's template inserts an empty <think> block only when the assistant is the last message, breaking the prefix-diff invariant. Add test_dummy_prefix_match to guard this. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Non-contiguous tool messages (e.g. tool, system, tool) now all get matching tool_calls in the dummy assistant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Qwen3's context-sensitive template (last_query_index / <think> block) breaks the dummy-prefix diff when user messages are appended. Raise ValueError at arg validation if --tito-allowed-append-roles includes 'user' with a non-glm47 --tito-model. Add test_dummy_prefix_match (GLM47-only) to guard the prefix invariant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Qwen3's context-sensitive template (<think> block / last_query_index) breaks the dummy-prefix diff. Ban --tito-model=qwen3 at arg validation, remove the reasoning_content workaround, and scope TITO tests to GLM47. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

maocheng23 · 2026-04-08T21:53:01Z

miles/utils/chat_template_utils/tito_tokenizer.py

+    All ``tool`` messages in *appended_messages* (not just leading contiguous
+    ones) get matching ``tool_calls``.  If there are no tool messages the dummy
+    assistant has no ``tool_calls`` — so the template renders the correct
+    turn-transition tokens (e.g. ``<|user|>`` instead of ``<|observation|>``).


For my sanity check, now our TITO would only support GLM style? Can we still keep it flexible somehow?

If it would be too hard to support Qwen3 chat template, I think it's still good here

And we don't need reasoning_content any more?

For my sanity check, now our TITO would only support GLM style? Can we still keep it flexible somehow?

If it would be too hard to support Qwen3 chat template, I think it's still good here

This walkaround for glm 4.7 break qwen impl. Need fix to qwen3 chat template. Unblock TITO dev.

guapisolo requested review from fzyzcjy, maocheng23 and yueming-yuan as code owners April 7, 2026 20:36

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

guapisolo and others added 7 commits April 7, 2026 20:46

fix: count all tool messages in appended, not just leading contiguous

8043fcb

Non-contiguous tool messages (e.g. tool, system, tool) now all get matching tool_calls in the dummy assistant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: restore warning log for user in tito_allowed_append_roles

090d56f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: clarify error message — only glm47 is currently supported

b2d62d2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: remove qwen3 from e2e TITO test registries, default to glm47

afcea1f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

guapisolo added the run-ci-sglang label Apr 7, 2026

guapisolo changed the title ~~fix: handle non-tool appended messages in TITO dummy assistant~~ fix: Ban qwen3 tito model and handle non-tool appended messages in TITO dummy assistant Apr 7, 2026

ci: remove qwen3 e2e TITO test jobs from CI matrix

900274c

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

guapisolo requested a review from yushengsu-thu as a code owner April 7, 2026 23:01

maocheng23 reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Ban qwen3 tito model and handle non-tool appended messages in TITO dummy assistant#949

fix: Ban qwen3 tito model and handle non-tool appended messages in TITO dummy assistant#949
guapisolo wants to merge 9 commits intomainfrom
fix/tito-non-tool-append

guapisolo commented Apr 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

maocheng23 Apr 8, 2026

Uh oh!

maocheng23 Apr 8, 2026

Uh oh!

guapisolo Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guapisolo commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

maocheng23 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

maocheng23 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

guapisolo Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guapisolo commented Apr 7, 2026 •

edited

Loading

guapisolo Apr 8, 2026 •

edited

Loading