You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
I'm running the latest version of Kohya SS with transformers==4.44.2, and tried to caption images using Salesforce/blip2-opt-2.7b.
However, I encountered this error:
tokenizer = cls(*init_inputs, **init_kwargs)
File ".../tokenization_gpt2_fast.py", line 99, in init
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
Exception: data did not match any variant of untagged enum ModelWrapper at line 250373 column 3
It seems like Kohya is trying to load GPT2TokenizerFast instead of OPTTokenizer, which causes the crash.
I already verified that the model was downloaded completely, including:
config.json
model-00001-of-00002.safetensors
model.safetensors.index.json
tokenizer_config.json
vocab.json, merges.txt, etc.
The problem isn’t corrupted files, but rather tokenizer mismatch.
✅ Temporary workaround:
I switched to Salesforce/blip2-flan-t5-xl and it works perfectly, because it uses T5Tokenizer and doesn’t have this tokenizer-class ambiguity.
🙏 Feature request:
Can we make the tokenizer class more robust or allow manual override?
Even adding support for OPTTokenizer auto-detection when blip2-opt is used would fix this.
Thanks a lot!
Let me know if I can help test or debug further.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I'm running the latest version of Kohya SS with transformers==4.44.2, and tried to caption images using Salesforce/blip2-opt-2.7b.
However, I encountered this error:
tokenizer = cls(*init_inputs, **init_kwargs)
File ".../tokenization_gpt2_fast.py", line 99, in init
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
Exception: data did not match any variant of untagged enum ModelWrapper at line 250373 column 3
It seems like Kohya is trying to load GPT2TokenizerFast instead of OPTTokenizer, which causes the crash.
I already verified that the model was downloaded completely, including:
config.json
model-00001-of-00002.safetensors
model.safetensors.index.json
tokenizer_config.json
vocab.json, merges.txt, etc.
The problem isn’t corrupted files, but rather tokenizer mismatch.
✅ Temporary workaround:
I switched to Salesforce/blip2-flan-t5-xl and it works perfectly, because it uses T5Tokenizer and doesn’t have this tokenizer-class ambiguity.
🙏 Feature request:
Can we make the tokenizer class more robust or allow manual override?
Even adding support for OPTTokenizer auto-detection when blip2-opt is used would fix this.
Thanks a lot!
Let me know if I can help test or debug further.
Beta Was this translation helpful? Give feedback.
All reactions