Skip to content

No heartbeat event during LLM-streamed tool-argument construction; watchdogs can't distinguish working from hung #1274

@kevinlims

Description

@kevinlims

Summary

When the model streams a large tool-call argument value (e.g., a 20KB file_text for create), the event stream is functionally silent for 10–30 minutes between tool.execution_start and tool.execution_complete. A stall watchdog watching the event stream cannot distinguish "model is productively building a large output" from "session has hung."

Repro

const session = await client.createSession({ /* ... */ });
session.on((evt) => console.log(evt.type, evt.data));
await session.sendAndWait({
  prompt: "Create a file at ./report.json containing a JSON document " +
          "with at least 20KB of structured analysis data. Use the create tool.",
});

Observe the event stream between tool.execution_start (with empty arguments) and tool.execution_complete (with the full argument). Empirically, no events fire for the duration of argument construction.

Expected

A heartbeat event during argument streaming — byte count, token count, or even a simple tool.argument_progress ping — so consumers can tell a working session from a hung one.

Actual

tool.execution_start fires once, then silence for the duration of LLM argument streaming, then tool.execution_complete fires once when the full tool invocation finishes (which includes both arg construction AND tool execution).

assistant.streaming_delta exists with cumulative byte counts (totalResponseSizeBytes) and fires during response streaming, but it has not been verified whether it continues to fire during the tool-argument construction phase. This would be useful to clarify: if streaming_delta does cover that phase, the gap is consumer-side documentation; if it doesn't, the gap is upstream.

Evidence (SDK source)

nodejs/src/generated/session-events.ts: the SessionEvent union includes:

  • assistant.streaming_delta (AssistantStreamingDeltaData.totalResponseSizeBytes) — described as "Streaming response progress with cumulative byte count"
  • assistant.reasoning_delta, assistant.message_delta — response-side
  • tool.execution_progress, tool.execution_partial_result — fire AFTER execution_start for tools that emit stdout (bash, powershell)

There is no tool.argument_progress or equivalent event specifically scoped to in-flight tool-argument streaming.

Consumer impact

Consumers with long-running tool calls must either tune stall watchdogs to ceilings that mask real failures (15+ minutes) or route artifact completion through filesystem polling instead of session events.

Suggested fix

Either:

  • Confirm + document that assistant.streaming_delta continues firing during in-flight tool-argument construction. If so, this is a doc-only fix.
  • Emit a tool.argument_progress event during LLM-streamed argument construction carrying byte/token count or a small delta. Even one event every ~10 seconds would let consumers tell working sessions from hung ones.

Related

Environment

- SDK: @github/copilot-sdk@0.3.0
- CLI: @github/copilot@1.0.45
- Node: 22 LTS
- OS: Windows 11
- Model: claude-sonnet-4-6

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions