Skip to content

Conversation

ywx217
Copy link
Contributor

@ywx217 ywx217 commented Jul 18, 2025

Motivation

Behaviors of qwen3 reasoning parser are fixed:

  • In non stream mode, when thinking is disabled, output is returned as reasoning_content.
  • In non stream mode, when thinking content is larger than max_tokens, parser cannot handle output without </think> token. (Mentioned in issue [Bug] Qwen3 Reasoning Parser 解析错误 #3664)

Modification

  • qwen_qwq_reasoning_parser.py
    • When stream=False and enable_thinking=False, there's no <think> tag, so the whole model output should be regarded as content, not reasoning_content.
    • When stream=False, enable_thinking=True and max_token is a small value, incomplete think content can be correctly parsed.
  • test_qwen3_parser.py
    • relative test cases added

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

# no </think> in previous or delta, reasoning content continues
return DeltaMessage(reasoning_content=delta_text)
# no <think> in previous or delta, all content
return DeltaMessage(content=delta_text)
Copy link
Collaborator

@RunningLeon RunningLeon Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the model does not have reasoning ability, but the output becomes reasoning_content?

Copy link
Contributor Author

@ywx217 ywx217 Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific model name and model output for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the model does not have reasoning ability, the output should be normal content, not reasoning.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi, when enable_thinking=False, the parser is disabled in main branch

if VariableInterface.reasoning_parser is not None and request.enable_thinking is not False:

return reasoning_content, final_output

@classmethod
def _trim_newlines(cls, text: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why perform _trim_newlines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<think>\n{reasoning_content}\n</think>

to remove \n before and after reasoning_content

@lvhan028
Copy link
Collaborator

May provide reproducing code

@lvhan028
Copy link
Collaborator

lvhan028 commented Aug 6, 2025

@RunningLeon Is it fixed when you supported interns1 reasoning parser?

@RunningLeon
Copy link
Collaborator

@RunningLeon Is it fixed when you supported interns1 reasoning parser?

The first problem in the below should be fixed.
@ywx217 hi, as for the second one, does a user really need the uncomplete thinking_content in the case? If reasonable, this would be included.

  • When stream=False and enable_thinking=False, there's no <think> tag, so the whole model output should be regarded as content, not reasoning_content.
    - When stream=False, enable_thinking=True and max_token is a small value, incomplete think content can be correctly parsed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants