Streaming support with thinking tokens in the CLI #15

madclaws · 2025-11-17T22:27:26Z

No description provided.

coderabbitai · 2025-11-17T22:27:31Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This PR introduces streaming chat functionality with Python code execution capabilities across the backend and API layers. Changes include: adding reqwest streaming support and new Rust dependencies (owo-colors, futures-util) to Cargo.toml; extending the ChatCompletionRequest model with chat_start and python_code fields in server/api.py; implementing a new generate_chat_stream async generator for token-by-token streaming responses; refactoring the MLX Rust runner to support streaming with a new ChatResponse struct containing think, reply, and code fields; adding logging infrastructure throughout server modules; and improving error handling in thought extraction for malformed responses.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Server as Server/API
    participant Runner as MLX Runner
    participant Sandbox as Sandbox

    Client->>Server: ChatCompletionRequest<br/>(streaming=true, python_code?, chat_start?)
    Server->>Server: validate & log request
    
    alt Streaming Path
        Server->>Runner: chat_stream(input, chat_start, python_code)
        activate Runner
        
        loop Stream tokens from model
            Runner->>Runner: parse markers<br/>(<think>, <reply>, <python>)
            Runner-->>Server: stream delta
        end
        
        alt python_code provided
            Runner->>Sandbox: execute_sandboxed_code(code)
            activate Sandbox
            Sandbox-->>Runner: execution result
            deactivate Sandbox
            Runner-->>Server: code execution delta
        end
        
        Runner-->>Server: final ChatResponse<br/>(think, reply, code)
        deactivate Runner
        
        Server->>Server: yield JSON chunks<br/>(OpenAI format)
        Server-->>Client: stream responses<br/>(token deltas, metadata, DONE)
    else Non-Streaming Path
        Server->>Runner: chat(input, chat_start, python_code)
        Runner-->>Server: ChatResponse
        Server->>Server: format response
        Server-->>Client: ChatCompletionResponse<br/>(single message)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

server/api.py: Core streaming implementation in generate_chat_stream; message formatting and token counting logic; Python code execution flow and error handling in streaming context
src/runner/mlx.rs: Streaming response parsing and ChatResponse struct construction; marker extraction logic (, , ); integration with chat_start and python_code parameters
Cross-module integration: Synchronization between Python server streaming and Rust runner streaming; OpenAI-compatible response format; graceful error handling and DONE signal
Backward compatibility: Non-streaming path changes to create_chat_completion; ensure existing clients unaffected

Possibly related PRs

Adding local memory management #7: Modifies MLX runner and server chat integration (src/runner/mlx.rs and server/api.py) to add streaming and interactive chat flows; this PR extends that foundation with stateful chat_start parameter and Python code execution capabilities.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/display-thinking-token

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d8fd2ab and 4ccb284.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

Cargo.toml (1 hunks)
server/api.py (7 hunks)
server/cache_utils.py (0 hunks)
server/main.py (1 hunks)
server/mem_agent/engine.py (1 hunks)
server/mem_agent/utils.py (1 hunks)
server/system_prompt.txt (1 hunks)
src/runner/mlx.rs (5 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Added thinking in the cli itself

dd4de02

feat: sloppy streaming from cli

4ccb284

madclaws marked this pull request as ready for review December 14, 2025 11:09

madclaws merged commit fab93ce into main Dec 14, 2025
0 of 2 checks passed

madclaws deleted the feat/display-thinking-token branch December 14, 2025 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming support with thinking tokens in the CLI #15

Streaming support with thinking tokens in the CLI #15

madclaws commented Nov 17, 2025

Uh oh!

coderabbitai bot commented Nov 17, 2025 •

edited

Loading

Review failed

Walkthrough

Sequence Diagram

Estimated code review effort

Possibly related PRs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Streaming support with thinking tokens in the CLI #15

Streaming support with thinking tokens in the CLI #15

Conversation

madclaws commented Nov 17, 2025

Uh oh!

coderabbitai bot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Sequence Diagram

Estimated code review effort

Possibly related PRs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Nov 17, 2025 •

edited

Loading