Skip to content

Conversation

@madclaws
Copy link
Member

No description provided.

@coderabbitai
Copy link

coderabbitai bot commented Nov 17, 2025

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This PR introduces streaming chat functionality with Python code execution capabilities across the backend and API layers. Changes include: adding reqwest streaming support and new Rust dependencies (owo-colors, futures-util) to Cargo.toml; extending the ChatCompletionRequest model with chat_start and python_code fields in server/api.py; implementing a new generate_chat_stream async generator for token-by-token streaming responses; refactoring the MLX Rust runner to support streaming with a new ChatResponse struct containing think, reply, and code fields; adding logging infrastructure throughout server modules; and improving error handling in thought extraction for malformed responses.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Server as Server/API
    participant Runner as MLX Runner
    participant Sandbox as Sandbox

    Client->>Server: ChatCompletionRequest<br/>(streaming=true, python_code?, chat_start?)
    Server->>Server: validate & log request
    
    alt Streaming Path
        Server->>Runner: chat_stream(input, chat_start, python_code)
        activate Runner
        
        loop Stream tokens from model
            Runner->>Runner: parse markers<br/>(<think>, <reply>, <python>)
            Runner-->>Server: stream delta
        end
        
        alt python_code provided
            Runner->>Sandbox: execute_sandboxed_code(code)
            activate Sandbox
            Sandbox-->>Runner: execution result
            deactivate Sandbox
            Runner-->>Server: code execution delta
        end
        
        Runner-->>Server: final ChatResponse<br/>(think, reply, code)
        deactivate Runner
        
        Server->>Server: yield JSON chunks<br/>(OpenAI format)
        Server-->>Client: stream responses<br/>(token deltas, metadata, DONE)
    else Non-Streaming Path
        Server->>Runner: chat(input, chat_start, python_code)
        Runner-->>Server: ChatResponse
        Server->>Server: format response
        Server-->>Client: ChatCompletionResponse<br/>(single message)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • server/api.py: Core streaming implementation in generate_chat_stream; message formatting and token counting logic; Python code execution flow and error handling in streaming context
  • src/runner/mlx.rs: Streaming response parsing and ChatResponse struct construction; marker extraction logic (, , ); integration with chat_start and python_code parameters
  • Cross-module integration: Synchronization between Python server streaming and Rust runner streaming; OpenAI-compatible response format; graceful error handling and DONE signal
  • Backward compatibility: Non-streaming path changes to create_chat_completion; ensure existing clients unaffected

Possibly related PRs

  • Adding local memory management #7: Modifies MLX runner and server chat integration (src/runner/mlx.rs and server/api.py) to add streaming and interactive chat flows; this PR extends that foundation with stateful chat_start parameter and Python code execution capabilities.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/display-thinking-token

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8fd2ab and 4ccb284.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • Cargo.toml (1 hunks)
  • server/api.py (7 hunks)
  • server/cache_utils.py (0 hunks)
  • server/main.py (1 hunks)
  • server/mem_agent/engine.py (1 hunks)
  • server/mem_agent/utils.py (1 hunks)
  • server/system_prompt.txt (1 hunks)
  • src/runner/mlx.rs (5 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@madclaws madclaws marked this pull request as ready for review December 14, 2025 11:09
@madclaws madclaws merged commit fab93ce into main Dec 14, 2025
0 of 2 checks passed
@madclaws madclaws deleted the feat/display-thinking-token branch December 14, 2025 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants