[v0.5.10][3] Fix processor return_tensors duplicate kwarg for transformers >=5.0#927
[v0.5.10][3] Fix processor return_tensors duplicate kwarg for transformers >=5.0#927yueming-yuan wants to merge 3 commits intofix/mask-utils-transformers-v5-v2from
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the build_processor_kwargs function in miles/utils/processing_utils.py to move the return_tensors parameter from the top-level dictionary into modality-specific sub-dictionaries, specifically text_kwargs. This change is intended to prevent duplicate keyword argument errors in transformers >= 5.0. Feedback suggests explicitly removing any top-level return_tensors key from the result dictionary to ensure no conflicts occur if the key was included in the original input.
| result = dict(multimodal_inputs) if multimodal_inputs else {} | ||
|
|
||
| result.update(forced) | ||
|
|
||
| # set return_tensors="pt" for modality-specific outputs | ||
| # return_tensors=None for text (input_ids), "pt" for modality-specific outputs. | ||
| # Use per-modality dicts to avoid transformers >=5.0 duplicate kwarg error. | ||
| result["text_kwargs"] = {**result.get("text_kwargs", {}), "return_tensors": None} |
There was a problem hiding this comment.
To fully resolve the duplicate keyword argument issue in transformers >= 5.0, any existing top-level return_tensors should be removed from the result dictionary. Since the logic now explicitly sets return_tensors within modality-specific dictionaries (like text_kwargs), leaving it at the top level—if it was provided in the input multimodal_inputs—will still trigger a ValueError due to the conflict between the global and nested settings.
result = dict(multimodal_inputs) if multimodal_inputs else {}
result.pop("return_tensors", None)
# return_tensors=None for text (input_ids), "pt" for modality-specific outputs.
# Use per-modality dicts to avoid transformers >=5.0 duplicate kwarg error.
result["text_kwargs"] = {**result.get("text_kwargs", {}), "return_tensors": None}| result = dict(multimodal_inputs) if multimodal_inputs else {} | ||
|
|
||
| result.update(forced) | ||
|
|
||
| # set return_tensors="pt" for modality-specific outputs | ||
| # return_tensors=None for text (input_ids), "pt" for modality-specific outputs. | ||
| # Use per-modality dicts to avoid transformers >=5.0 duplicate kwarg error. | ||
| result["text_kwargs"] = {**result.get("text_kwargs", {}), "return_tensors": None} |
| # set return_tensors="pt" for modality-specific outputs | ||
| # return_tensors=None for text (input_ids), "pt" for modality-specific outputs. | ||
| # Use per-modality dicts to avoid transformers >=5.0 duplicate kwarg error. | ||
| result["text_kwargs"] = {**result.get("text_kwargs", {}), "return_tensors": None} |
There was a problem hiding this comment.
| result["text_kwargs"] = {**result.get("text_kwargs", {}), "return_tensors": None} | |
| result.pop("return_tensors", None) | |
| result["text_kwargs"] = {**result.get("text_kwargs", {}), "return_tensors": None} |
ci-sglang-pr: sglang-miles-v0.5.10