[v0.5.10][2] Fix apply_chat_template behavior for transformers >=5.0#926
[v0.5.10][2] Fix apply_chat_template behavior for transformers >=5.0#926yueming-yuan wants to merge 5 commits intobump-sglang-v0.5.10from
Conversation
…ng-v0.5.10" This reverts commit d549b26.
There was a problem hiding this comment.
Code Review
This pull request introduces a utility function _apply_chat_template_ids to handle changes in the transformers library (version 5.0+) where apply_chat_template may return a dictionary instead of a list. All existing calls to the tokenizer have been updated to use this wrapper. Feedback suggests adding type hints to the new function and explicitly setting return_dict=False to improve robustness and maintainability.
miles/utils/mask_utils.py
Outdated
| def _apply_chat_template_ids(tokenizer, messages, **kwargs) -> list[int]: | ||
| """Wrapper that always returns list[int] from apply_chat_template(tokenize=True). | ||
|
|
||
| transformers >=5.0 returns BatchEncoding instead of list[int].""" | ||
| result = tokenizer.apply_chat_template(messages, tokenize=True, **kwargs) | ||
| if isinstance(result, list): | ||
| return result | ||
| return result["input_ids"] |
There was a problem hiding this comment.
The _apply_chat_template_ids wrapper is a good addition for compatibility with transformers 5.0. However, it can be improved by adding type hints for better maintainability and consistency with the rest of the file. Also, explicitly setting return_dict=False via kwargs.setdefault ensures that current versions of transformers return the expected list type, while the isinstance check provides a robust fallback for future versions where the default might change or the flag might be ignored.
Note: Passing tokenize in kwargs to this function will cause a TypeError because it is already explicitly passed to apply_chat_template.
| def _apply_chat_template_ids(tokenizer, messages, **kwargs) -> list[int]: | |
| """Wrapper that always returns list[int] from apply_chat_template(tokenize=True). | |
| transformers >=5.0 returns BatchEncoding instead of list[int].""" | |
| result = tokenizer.apply_chat_template(messages, tokenize=True, **kwargs) | |
| if isinstance(result, list): | |
| return result | |
| return result["input_ids"] | |
| def _apply_chat_template_ids(tokenizer: AutoTokenizer, messages: list[dict], **kwargs) -> list[int]: | |
| """Wrapper that always returns list[int] from apply_chat_template(tokenize=True). | |
| transformers >=5.0 returns BatchEncoding instead of list[int].""" | |
| kwargs.setdefault("return_dict", False) | |
| result = tokenizer.apply_chat_template(messages, tokenize=True, **kwargs) | |
| if isinstance(result, list): | |
| return result | |
| return result["input_ids"] |
Remove models broken by transformers v5 tokenizer unification (DeepSeek-V3, step3, glm-4-9b-chat) and track them in a TOOL_CALL_KNOWN_FAILURES list with root cause comments. Add new passing models: Qwen3.5, Qwen3-Coder-Next, GLM-4.7-Flash, Kimi-K2.5, MiniMax-M2.5, Nemotron-3-Super. Clean up debug helpers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
transformers >=5.0 changed apply_chat_template(tokenize=True) to return BatchEncoding instead of list[int]. Pass return_dict=False to all 6 call sites in mask_utils.py to ensure list[int] on both v4 and v5. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
775a86c to
f058d37
Compare
Move Step-3.5-Flash from known failures into active tool-call test models, and clarify comments for remaining transformers v5 tokenizer/template incompatibilities. Made-with: Cursor
|
A report generated by cc & codex and briefly reviewed by me. generally make sense. Transformers v5 Tokenizer Compatibility AnalysisBackground
i.e. the decode of the token delta should equal the text delta. For a few models, that assumption effectively relies on This document separates:
Root Cause 1: LlamaTokenizer Overwrites ByteLevel with MetaspaceDirectly affected models: What changedIn v4, In v5, self._tokenizer.pre_tokenizer = Metaspace(replacement="▁", prepend_scheme=always)
self._tokenizer.decoder = Sequence([Replace("▁", " "), ByteFallback(), Fuse(), Strip()])This happens because ConsequencesEncoding changed -- the Metaspace pre_tokenizer handles spaces differently from ByteLevel: Decoding changed -- Note: the token's stored text ( Why this explains the failing models
Upstream references
Root Cause 2: Legacy
|
| Model | Root Cause | Upstream Issue |
|---|---|---|
| deepseek-ai/DeepSeek-V3 | LlamaTokenizer overwrites ByteLevel with Metaspace | #43066 |
| deepseek-ai/DeepSeek-V3.1 | Tool-call chat template expects string function.arguments; current dummy tool-call shape provides a dict |
Model-side template issue |
| stepfun-ai/step3 | Same as above | Same |
| THUDM/glm-4-9b-chat | v5 removed legacy _decode segmentation, exposing custom tokenizer bug |
N/A (model-side bug) |
Summary of Passing Models
| Model | Tokenizer Class | Backend | Decoder | Why Unaffected |
|---|---|---|---|---|
| Qwen2.5-0.5B-Instruct | Qwen2Tokenizer | TokenizersBackend | ByteLevel | Hardcoded ByteLevel matches tokenizer.json |
| Qwen3-0.6B | Qwen2Tokenizer | TokenizersBackend | ByteLevel | Same |
| Qwen3-4B-Instruct-2507 | Qwen2Tokenizer | TokenizersBackend | ByteLevel | Same |
| Qwen3-Coder-30B-A3B-Instruct | Qwen2Tokenizer | TokenizersBackend | ByteLevel | Same |
| Qwen3.5-0.8B | Qwen2Tokenizer | TokenizersBackend | ByteLevel | Same |
| Qwen3-Coder-Next | Qwen2Tokenizer | TokenizersBackend | ByteLevel | Same |
| Mistral-7B-Instruct-v0.3 | TokenizersBackend | TokenizersBackend | Metaspace | Direct load from tokenizer.json, no overwrite |
| GLM-4.7-Flash | TokenizersBackend | TokenizersBackend | ByteLevel | Direct load from tokenizer.json; does not use the old ChatGLM Python decode path |
| Step-3.5-Flash | TokenizersBackend | TokenizersBackend | ByteLevel-compatible | Direct load; passes tool-response round-trip as of April 8, 2026 |
| Nemotron-3-Super-120B | TokenizersBackend | TokenizersBackend | ByteLevel | Direct load from tokenizer.json, no overwrite |
| MiniMax-M2 | GPT2Tokenizer | TokenizersBackend | ByteLevel | Hardcoded ByteLevel matches tokenizer.json |
| MiniMax-M2.5 | GPT2Tokenizer | TokenizersBackend | ByteLevel | Same |
| internlm3-8b-instruct | InternLM3Tokenizer | PythonBackend | N/A | Pure Python tokenizer, no Rust decode, no bug |
| Kimi-K2-Instruct | TikTokenTokenizer | PythonBackend | N/A | Pure Python tokenizer, no Rust decode, no bug |
| Kimi-K2.5 | TikTokenTokenizer | PythonBackend | N/A | Same |
| MiMo-7B-RL | Qwen2Tokenizer | TokenizersBackend | ByteLevel | Hardcoded ByteLevel matches tokenizer.json |
ci-sglang-pr: sglang-miles-v0.5.10
Summary
transformers5.x changedapply_chat_template(tokenize=True)to returnBatchEncodinginstead oflist[int]_apply_chat_template_ids()wrapper that normalizes the return typeReplaces #925 (was merged then reverted).