Support evaluation of models with interleaved thinking

To support evaluation of models with interleaved thinking, the input message should preserve `reasoning_content` from previous turns to maintain reasoning consistency.

The relevant code changes are as follows:
1. Enable thinking when calling the model
2. Support returning `reasoning_content` in `llm.py` and `schema.py`

https://github.com/scaleapi/mcp-atlas/blob/867003a0d259f4c706e2a653d4d465a002fe9835/services/mcp_eval/mcp_completion/llm.py#L75-L79
https://github.com/scaleapi/mcp-atlas/blob/867003a0d259f4c706e2a653d4d465a002fe9835/services/mcp_eval/mcp_completion/schema.py#L36-L41

Modified to:
```
 assistant_message = AssistantMessage( 
     role="assistant", 
     content=response.choices[0].message.content, 
     tool_calls=tool_calls, 
     reasoning_content=response.choices[0].message.reasoning_content,
 ) 
```

```
class AssistantMessage(BaseModel):
    """Assistant message."""

    role: Literal["assistant"]
    content: Optional[str] = None
    tool_calls: Optional[List[ToolCall]] = None
    reasoning_content: Optional[str] = None
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support evaluation of models with interleaved thinking #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	assistant_message = AssistantMessage(
	role="assistant",
	content=response.choices[0].message.content,
	tool_calls=tool_calls,
	)

	class AssistantMessage(BaseModel):
	"""Assistant message."""

	role: Literal["assistant"]
	content: Optional[str] = None
	tool_calls: Optional[List[ToolCall]] = None

Support evaluation of models with interleaved thinking #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions