Skip to content

[v0.5.10][3] Fix processor return_tensors duplicate kwarg for transformers >=5.0#927

Open
yueming-yuan wants to merge 3 commits intofix/mask-utils-transformers-v5-v2from
fix/processor-return-tensors
Open

[v0.5.10][3] Fix processor return_tensors duplicate kwarg for transformers >=5.0#927
yueming-yuan wants to merge 3 commits intofix/mask-utils-transformers-v5-v2from
fix/processor-return-tensors

Conversation

@yueming-yuan
Copy link
Copy Markdown
Collaborator

@yueming-yuan yueming-yuan commented Apr 6, 2026

ci-sglang-pr: sglang-miles-v0.5.10

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the build_processor_kwargs function in miles/utils/processing_utils.py to move the return_tensors parameter from the top-level dictionary into modality-specific sub-dictionaries, specifically text_kwargs. This change is intended to prevent duplicate keyword argument errors in transformers >= 5.0. Feedback suggests explicitly removing any top-level return_tensors key from the result dictionary to ensure no conflicts occur if the key was included in the original input.

Comment on lines 33 to +37
result = dict(multimodal_inputs) if multimodal_inputs else {}

result.update(forced)

# set return_tensors="pt" for modality-specific outputs
# return_tensors=None for text (input_ids), "pt" for modality-specific outputs.
# Use per-modality dicts to avoid transformers >=5.0 duplicate kwarg error.
result["text_kwargs"] = {**result.get("text_kwargs", {}), "return_tensors": None}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To fully resolve the duplicate keyword argument issue in transformers >= 5.0, any existing top-level return_tensors should be removed from the result dictionary. Since the logic now explicitly sets return_tensors within modality-specific dictionaries (like text_kwargs), leaving it at the top level—if it was provided in the input multimodal_inputs—will still trigger a ValueError due to the conflict between the global and nested settings.

    result = dict(multimodal_inputs) if multimodal_inputs else {}
    result.pop("return_tensors", None)

    # return_tensors=None for text (input_ids), "pt" for modality-specific outputs.
    # Use per-modality dicts to avoid transformers >=5.0 duplicate kwarg error.
    result["text_kwargs"] = {**result.get("text_kwargs", {}), "return_tensors": None}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me

@yueming-yuan yueming-yuan changed the base branch from bump-sglang-v0.5.10 to fix/mask-utils-transformers-v5-v2 April 8, 2026 21:36
Copy link
Copy Markdown
Contributor

@maocheng23 maocheng23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but nit

Comment on lines 33 to +37
result = dict(multimodal_inputs) if multimodal_inputs else {}

result.update(forced)

# set return_tensors="pt" for modality-specific outputs
# return_tensors=None for text (input_ids), "pt" for modality-specific outputs.
# Use per-modality dicts to avoid transformers >=5.0 duplicate kwarg error.
result["text_kwargs"] = {**result.get("text_kwargs", {}), "return_tensors": None}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me

# set return_tensors="pt" for modality-specific outputs
# return_tensors=None for text (input_ids), "pt" for modality-specific outputs.
# Use per-modality dicts to avoid transformers >=5.0 duplicate kwarg error.
result["text_kwargs"] = {**result.get("text_kwargs", {}), "return_tensors": None}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
result["text_kwargs"] = {**result.get("text_kwargs", {}), "return_tensors": None}
result.pop("return_tensors", None)
result["text_kwargs"] = {**result.get("text_kwargs", {}), "return_tensors": None}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants