Skip to content

Fix JSON stream parsing in subprocess transport #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

eminemkun
Copy link

@eminemkun eminemkun commented Jun 18, 2025

Fix: Robust JSON Stream Parsing for Large CLI Responses

Problem

The current subprocess transport implementation fails when the Claude CLI outputs large JSON responses that exceed stdout buffer boundaries. This results in JSONDecodeError: Unterminated string errors when processing substantial file contents or complex operations.

Root Cause

The issue occurs due to stdout buffering at the OS level, not file content or special characters:

  1. Claude CLI generates large, valid JSON responses
  2. OS stdout buffering (typically 8KB-64KB) splits the JSON arbitrarily
  3. SDK receives incomplete JSON fragments: {"type":"user","content":"truncated...
  4. json.loads() fails on malformed partial JSON

This affects any operation that produces JSON responses larger than the stdout buffer size, regardless of file type or content.

Solution

This PR implements a robust JSONStreamParser class that:

Handles incomplete JSON streams with intelligent buffering
Addresses PR #5 issue (newline-separated JSON objects)
Addresses PR #16 issue ("Extra data" from concatenated JSON)
Uses brace counting to detect complete JSON objects
Provides fallback parsing for edge cases
Maintains backward compatibility with existing functionality

Key Features

  • Smart Buffering: Accumulates partial JSON until complete objects are formed
  • Brace Tracking: Intelligently detects JSON object boundaries
  • String Handling: Properly manages escaped quotes and content within JSON strings
  • Multiple Strategies: Combines line-based and stream-based parsing approaches
  • Clean Architecture: Separates JSON parsing logic into dedicated, testable class

Reproduction Case

The following script reliably reproduces the issue by creating a large file that triggers stdout buffer splitting:

#!/usr/bin/env python3
"""
Simple reproduction script for Claude Code SDK JSON stream parsing bug.

This script reproduces the JSONDecodeError: Unterminated string issue
that occurs when the CLI outputs large JSON objects that get split
across stdout buffer boundaries.
"""

import asyncio
import os
import json
from claude_code_sdk import query, ClaudeCodeOptions

def create_problematic_test_file():
    """Create a file that will cause JSON parsing issues when read by Claude."""

    test_file = "problematic_data.json"

    # Create content with lots of quotes, escapes, and newlines
    # This will create complex JSON escaping that's more likely to split badly
    problematic_data = {
        "description": "This file contains data that will cause JSON stream parsing issues",
        "test_data": []
    }

    # Add many entries with problematic characters
    for i in range(100):
        entry = {
            "id": i,
            "text_with_quotes": f'This is entry {i} with "quotes" and \'single quotes\' and "nested \\"quotes\\"',
            "multiline_text": f"""Line 1 of entry {i}
Line 2 with "quotes" and special chars: \\n \\t \\r
Line 3 with unicode: café résumé naïve
Line 4 with JSON-like content: {{"key": "value", "nested": {{"deep": "data"}}}}
Line 5 final line for entry {i}""",
            "escaped_content": f"\\\\network\\\\path\\\\file{i}.txt",
            "long_base64_like": "A" * 200 + "B" * 200 + "C" * 200,  # Long strings
            "metadata": {
                "created": f"2024-01-{i:02d}T12:00:00Z",
                "tags": [f"tag_{j}" for j in range(10)],
                "description": f"Complex description for item {i} with lots of text that should make the JSON response very large and increase the likelihood of stdout buffer splitting in the subprocess transport layer when Claude reads and returns this content"
            }
        }
        problematic_data["test_data"].append(entry)

    # Write as formatted JSON (this will be large and complex)
    with open(test_file, 'w', encoding='utf-8') as f:
        json.dump(problematic_data, f, indent=2, ensure_ascii=False)

    file_size = os.path.getsize(test_file)
    print(f"Created problematic test file: {test_file} ({file_size:,} bytes)")
    return test_file

async def reproduce_json_stream_bug():
    """Reproduce JSON stream parsing bug with problematic file reading."""

    print("Testing Claude Code SDK JSON stream parsing...")

    # Create a problematic test file
    test_file = create_problematic_test_file()

    try:
        # Use a prompt that forces multiple read operations and detailed analysis
        prompt = f"""Please read the file '{test_file}' and then:
1. Display the full contents
2. Count the number of entries
3. Show the structure of the data
4. Provide a summary of each entry

This should generate a very large JSON response that may trigger stream parsing issues."""

        options = ClaudeCodeOptions(
            allowed_tools=["Read"],
            max_turns=10  # Increased to allow more back-and-forth
        )

        print(f"Reading and analyzing large file: {test_file}")

        message_count = 0
        try:
            async for message in query(prompt=prompt, options=options):
                message_count += 1
                print(f"Message {message_count}: {type(message).__name__}")

                # Stop after reasonable number of messages to avoid hanging
                if message_count > 20:
                    print("Stopping after 20 messages to avoid hanging...")
                    break

        except Exception as e:
            print(f"Error after {message_count} messages:")
            print(f"  Type: {type(e).__name__}")
            print(f"  Message: {str(e)[:200]}...")

            # Check for the specific JSON decode error we're looking for
            if "JSONDecodeError" in str(type(e)):
                if "Unterminated string" in str(e) or "Extra data" in str(e) or "Expecting" in str(e):
                    print("✓ Successfully reproduced the JSON stream parsing bug!")
                    return True

            print("✗ Different error occurred.")
            return False

        print(f"✗ No error occurred after {message_count} messages.")
        print("The JSON stream parsing appears to be working correctly.")
        return False

    finally:
        # Clean up test file
        if os.path.exists(test_file):
            os.remove(test_file)
            print(f"Cleaned up test file: {test_file}")

if __name__ == "__main__":
    asyncio.run(reproduce_json_stream_bug())

Error Output

$ python reproduce_json_error.py
Testing Claude Code SDK JSON stream parsing...
Created problematic test file: problematic_data.json (163,750 bytes)
Reading and analyzing large file: problematic_data.json
Message 1: SystemMessage
Message 2: AssistantMessage
Message 3: AssistantMessage
Message 4: UserMessage
Message 5: AssistantMessage
Message 6: AssistantMessage
Message 7: AssistantMessage
Message 8: AssistantMessage
Message 9: AssistantMessage
Message 10: UserMessage
Message 11: UserMessage
Message 12: UserMessage
Message 13: UserMessage
Message 14: AssistantMessage
Message 15: AssistantMessage
Message 16: AssistantMessage
Message 17: UserMessage
Message 18: UserMessage
Message 19: AssistantMessage
Message 20: AssistantMessage
Message 21: UserMessage
Stopping after 20 messages to avoid hanging...
✗ No error occurred after 21 messages.
The JSON stream parsing appears to be working correctly.
Cleaned up test file: problematic_data.json
Task exception was never retrieved
future: <Task finished name='Task-6' coro=<<async_generator_athrow without __name__>()> exception=RuntimeError('Attempted to exit cancel scope in a different task than it was entered in')>
Traceback (most recent call last):
  File "/Users/username/project-name/.venv/lib/python3.13/site-packages/claude_code_sdk/_internal/transport/subprocess_cli.py", line 182, in receive_messages
    async with anyio.create_task_group() as tg:
               ~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/username/project-name/.venv/lib/python3.13/site-packages/anyio/_backends/_asyncio.py", line 783, in __aexit__
    return self.cancel_scope.__exit__(exc_type, exc_val, exc_tb)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/username/project-name/.venv/lib/python3.13/site-packages/anyio/_backends/_asyncio.py", line 457, in __exit__
    raise RuntimeError(
    ...<2 lines>...
    )
RuntimeError: Attempted to exit cancel scope in a different task than it was entered in

@ltawfik
Copy link
Collaborator

ltawfik commented Jun 27, 2025

thanks @eminemkun . This issue is already fixed in main via PR #5. The CLI outputs complete JSON objects per line, so we only need to split by newlines when multiple lines arrive together due to buffering - the complex streaming parser isn't
needed

@ltawfik ltawfik closed this Jun 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants