Skip to content

Performance Issues with Server-side Multi-instance Deployment #333

@konglinghai123

Description

@konglinghai123

Performance Issues with Server-side Multi-instance Deployment

Summary

When deploying Claude Agent SDK in a server-side environment with multiple instances, we're experiencing significant performance bottlenecks that impact production use cases.

Issues

1. Slow Initialization for Each Instance

Every time we create a new SDK instance, the initialization process is extremely slow (20-30+ seconds based on related issues #2166, #3044). In a server environment handling multiple concurrent requests, this startup time is prohibitive.

Impact:

  • High latency for initial requests
  • Poor user experience
  • Resource waste waiting for initialization

2. Process Resource Consumption with Multiple Sessions

When managing multiple active sessions simultaneously, the resource consumption (memory, CPU) accumulates significantly, leading to:

  • Increased server costs
  • Potential memory leaks or performance degradation over time (similar to #10881)
  • Difficulty in horizontal scaling

3. No Way to Maintain "Warm" Instances

Currently, there's no official mechanism to:

  • Keep SDK processes in a "warm" state between requests
  • Reuse initialized instances across different sessions
  • Dynamically switch tool/working directories for a running instance

Feature Request: Dynamic Tool Directory Switching

To address these performance issues, we would like to request a feature that allows dynamic switching of tool directories for an already-initialized SDK instance.

Proposed Solution:

# Initialize once (warm instance)
client = ClaudeSDKClient()

# Dynamically switch working directory per session
client.set_tool_directory("/path/to/session1/tools")
response1 = client.send_message("Task for session 1")

# Reuse same instance for different session
client.set_tool_directory("/path/to/session2/tools")
response2 = client.send_message("Task for session 2")

Benefits:

  1. Instance Pooling: Create a pool of pre-warmed SDK instances that can be reused
  2. Reduced Latency: Eliminate 20-30s initialization delay for each request
  3. Better Resource Utilization: Maintain fewer processes while handling more sessions
  4. Improved Scalability: Enable efficient horizontal scaling in server environments

Current Workarounds Attempted

  • Creating instances on-demand: Too slow (20-30s per instance)
  • Keeping long-lived instances: Resource consumption becomes problematic
  • One instance per session: Not scalable for high-concurrency scenarios

Alternative Suggestions

If dynamic directory switching is not feasible, other solutions that would help:

  1. Significantly faster initialization (< 1s target)
  2. Built-in instance pooling/warm-up mechanism
  3. Lightweight "reset" method to reuse instances for different contexts
  4. Better process lifecycle management APIs

Use Case

Our server handles multiple users concurrently, each potentially starting new coding tasks. We need to:

  • Respond quickly to initial requests (< 2s target)
  • Maintain reasonable resource usage
  • Scale horizontally as user load increases

Currently, the SDK's architecture seems optimized for single-user CLI usage rather than multi-tenant server deployments.

Question for Maintainers

Is server-side multi-instance deployment a supported use case? If so, what are the recommended patterns for:

  • Instance lifecycle management
  • Resource optimization
  • Minimizing initialization overhead

Related Issues

  • #2166 - Initialization extremely slow
  • #3044 - SDK mode startup time
  • #10881 - Performance degradation in long sessions

Would appreciate any guidance or roadmap information on improving server-side deployment patterns. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions