Fix regression in ProcessorMixin._load_tokenizer_from_pretrained for tokenizers at root#46592
Open
punyamodi wants to merge 1 commit into
Open
Fix regression in ProcessorMixin._load_tokenizer_from_pretrained for tokenizers at root#46592punyamodi wants to merge 1 commit into
punyamodi wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a fallback path for loading an “additional tokenizer” when the expected subfolder load fails, while warning that this fallback behavior is deprecated.
Changes:
- Wrap
from_pretrained(..., subfolder=tokenizer_subfolder)in atry/exceptand fall back tosubfolder=subfolderon failure - Emit a warning message indicating the fallback is deprecated
7a4a264 to
39ffe6d
Compare
39ffe6d to
8e3e062
Compare
Contributor
|
CI Dashboard: View test results in Grafana |
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
In
transformersv5.x, a change was introduced to automatically look in subfolder directories named after the sub-processor attribute when loading additional/non-primary tokenizers (e.g. searching for files inbpe_tokenizer/when the sub-processor name isbpe_tokenizer).This regression breaks loading for older/existing model repositories where tokenizer files are placed at the root of the repository, but the sub-processor attribute name is configured to something else (for example, the UniversalActionProcessor of
physical-intelligence/fastwhich usesbpe_tokenizer). When attempting to load such processors usingAutoProcessor.from_pretrained(), it fails with a ValueError because it cannot locate files in the subfolder.This PR wraps the subfolder loading in a
try-exceptblock. If loading from the subfolder fails, it gracefully logs a deprecation warning and falls back to loading from the root of the repository (or the passedsubfolderdirectory).Testing
Verified by loading
physical-intelligence/fastwithAutoProcessor.from_pretrained("physical-intelligence/fast", trust_remote_code=True)and confirming it successfully falls back to root and loads the processor.