Skip to content

perf(langchain): stop inlining agent state into tool-dispatch Sends#36960

Merged
Sydney Runkle (sydney-runkle) merged 5 commits into
masterfrom
sr/tool-call-context-fix
Apr 27, 2026
Merged

perf(langchain): stop inlining agent state into tool-dispatch Sends#36960
Sydney Runkle (sydney-runkle) merged 5 commits into
masterfrom
sr/tool-call-context-fix

Conversation

@sydney-runkle
Copy link
Copy Markdown
Collaborator

@sydney-runkle Sydney Runkle (sydney-runkle) commented Apr 22, 2026

Summary

Stop inlining the full agent state into every tool-dispatch Send in create_agent. Dispatch with the bare list form Send("tools", [tool_call]) and let ToolNode hydrate ToolRuntime.state from graph channels at tool-execution time.

Depends on langchain-ai/langgraph#7594 which teaches ToolNode to read channel state via CONFIG_KEY_READ when given a bare tool-call list. uv.lock pins that branch for CI while the langgraph PR is in flight — this pin will be reverted to a published langgraph version before merge.

What was happening

Before this change, every pending tool call produced a Send whose payload was:

ToolCallWithContext(
    __type="tool_call_with_context",
    tool_call=tool_call,
    state=state,   # ← the FULL agent state dict, including messages list
)

For any agent that runs many turns, state["messages"] grows linearly with the conversation. Every super-step that dispatches tools serializes that whole list into every Send, and those Sends live forever in the checkpointer's __pregel_tasks writes. The result is O(N²) __pregel_tasks storage across a run.

What changed

  • libs/langchain_v1/langchain/agents/factory.py:
    • _make_model_to_tools_edge now returns Send("tools", [tool_call]) — no inlined state.
    • Drops the ToolCallWithContext import.
  • libs/langchain_v1/pyproject.toml + libs/langchain_v1/uv.lock:
    • Temporary [tool.uv.sources] pin on langgraph, langgraph-prebuilt, langgraph-checkpoint to the companion PR branch so CI exercises both changes end-to-end. Revert after langgraph release.

Why it's safe

  • Same snapshot semantics as before. Send is emitted at the end of the model super-step and consumed at the start of the tools super-step; channels by that point reflect every write from the model super-step (including the new AIMessage). Parallel tool tasks all see the same values since sibling writes don't land until end-of-super-step.
  • Legacy ToolCallWithContext input path is preserved in ToolNode — no-op for any external caller still constructing it by hand.

Test plan

  • tests/unit_tests/agents/738 passed, 2 skipped, 1 xfailed
  • ruff check . / ruff format . — clean
  • mypy langchain/agents/factory.py — clean
  • Before/after benchmark (below)

Benchmark

Script runs create_agent with a mock GenericFakeChatModel and two tools (write_file, edit_file). Each of the N turns dispatches 2 tool calls. After the run, the InMemorySaver is inspected for bytes stored under __pregel_tasks — the channel that carries the tool-dispatch Send payloads.

N TASKS before TASKS after ratio
5 87.6 KB 4.7 KB 18.6× smaller
10 335 KB 9.4 KB 35.7× smaller
25 2.05 MB 23.7 KB 86.5× smaller
50 8.14 MB 47.6 KB 171× smaller
100 32.5 MB 95.3 KB 341× smaller
200 130 MB 192 KB 677× smaller
500 815 MB 482 KB 1,691× smaller

Growth shape:

  • Before: per-Send bytes scale with current messages length (full state is inlined), so total TASKS across N turns = Σ(2 × k) for k=1..N ≈ O(N²).
  • After: per-Send bytes are constant — just the tool_call dict. Total TASKS is O(#dispatches), completely independent of conversation length. In this bench with ~2 dispatches/turn: 940–964 bytes per turn across N=5..500, essentially flat.

An agent that makes 100 tool calls in a single turn pays the same TASKS cost as one that makes 100 across 50 turns — which is the semantically correct behavior.

Note: the messages channel is unchanged by this PR — it's still the dominant storage term (growing O(N²) via add_messages). TASKS was a second, compounding cost sitting on top of it; at N=100 it added 40% on top of messages, at N=500 it added 67%. After the fix, TASKS is a rounding error regardless of N.

In create_agent's model_to_tools edge, dispatch each tool call via the
bare list form `Send("tools", [tool_call])` instead of wrapping it in
ToolCallWithContext(state=state, ...). The tool node now hydrates
ToolRuntime.state from graph channels at tool-execution time (see
langchain-ai/langgraph#7594), so inlining the full state dict into
every Send is no longer needed.

This eliminates an O(N^2) storage term on __pregel_tasks checkpoint
writes: previously each turn's Sends carried a serialized snapshot of
the entire messages list at dispatch time. For a 500-turn agent run,
this drops __pregel_tasks storage from ~815 MB to ~482 KB (1,691x
reduction). See benchmark in PR description.

Back-compat: the legacy ToolCallWithContext input shape is still
accepted by ToolNode for any external dispatcher that hasn't migrated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added internal langchain `langchain` package issues & PRs performance size: XS < 50 LOC labels Apr 22, 2026
Temporary [tool.uv.sources] overrides so CI exercises this PR together
with langchain-ai/langgraph#7594 (which ships the ToolNode state_keys
hydration path this PR relies on). Revert after the langgraph release
that contains #7594 lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the dependencies Pull requests that update a dependency file (e.g. `pyproject.toml` or `uv.lock`) label Apr 22, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 22, 2026

Merging this PR will not alter performance

✅ 2 untouched benchmarks
⏩ 13 skipped benchmarks1


Comparing sr/tool-call-context-fix (9b4ca6e) with master (78546e9)2

Open in CodSpeed

Footnotes

  1. 13 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on master (aac258e) during the generation of this report, so 78546e9 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

@sydney-runkle Sydney Runkle (sydney-runkle) changed the title perf(agents): stop inlining agent state into tool-dispatch Sends perf(langchain): stop inlining agent state into tool-dispatch Sends Apr 22, 2026
Paired langgraph PR no longer takes state_keys — ToolNode now reads the
full channel state via CONFIG_KEY_READ internally. Revert create_agent's
factory to its original state schema resolution location and drop the
state_keys argument.

Lockfile updated to the latest commit on sr/tool-call-no-state-inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sydney-runkle Sydney Runkle (sydney-runkle) force-pushed the sr/tool-call-context-fix branch 2 times, most recently from a53c614 to 93177e8 Compare April 23, 2026 00:24
Picks up the simplified inline ToolNode read (no longer uses a helper,
drops the unused ChannelRead import).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sydney Runkle (sydney-runkle) added a commit to langchain-ai/langgraph that referenced this pull request Apr 27, 2026
#7594)

## Summary

When `ToolNode` receives a bare `[tool_call]` list via the Send API (the
dispatch shape `create_agent` will use once langchain-ai/langchain#36960
lands), hydrate `ToolRuntime.state` from the current channel values
instead of requiring the dispatcher to inline the full agent state dict
into every `Send.arg`.

Motivation: the paired langchain PR drops the `ToolCallWithContext`
wrapper from `create_agent`'s tool dispatch, which eliminates an O(N²)
storage term on `__pregel_tasks` checkpoint writes. Without this
companion change there would be no path for the tool node to see the
graph state.

## What changed

- `libs/prebuilt/langgraph/prebuilt/tool_node.py` — `_extract_state`
grows a third branch for list-form input. When the input is a list whose
last entry is a `ToolCall` dict, read the current channel values via
`CONFIG_KEY_READ` and return them as the state dict.

The full new logic is four lines inline in `_extract_state`:

```python
read = config.get(CONF, {}).get(CONFIG_KEY_READ)
if read is None:
    return {}
# Pregel installs CONFIG_KEY_READ as
# `functools.partial(local_read, scratchpad, channels, managed, task)`.
channels = read.args[1]
return cast("dict[str, Any]", read(list(channels), False))
```

- No changes to the pregel read machinery (`local_read`, `ChannelRead`).
- Only channel values are read; managed values have their own injection
path (`ToolRuntime.context`, `InjectedContext`) and were never in the
pre-fix inlined state dict, so we don't add them here.
- Falls back to `{}` when invoked outside a Pregel context (e.g. direct
`ToolNode(...).invoke([tool_call])` from a test harness), which
preserves existing `ToolNode` direct-invocation test behavior.

- `libs/prebuilt/tests/test_on_tool_call.py` — two new tests covering
the list-form hydration path (sync + async). They build a
`functools.partial` that matches Pregel's real `CONFIG_KEY_READ` shape
and assert `ToolRuntime.state` reflects the current channel values.

## Why it's safe

- **Same snapshot semantics as before.** `Send` is emitted at
end-of-super-step-N; consumed at start-of-super-step-N+1. Channels at
that point reflect every write from super-step N (including the new
AIMessage the tool calls originated from). Parallel tool tasks in the
tools super-step all read the same values since sibling writes don't
land until end-of-super-step.
- **Legacy `ToolCallWithContext` path preserved.** External dispatchers
that still inline state continue to work unchanged — `_extract_state`
checks that branch first.

## Test plan

- [x] `make test` in `libs/prebuilt` — **204 pass**
- [x] Two new hydration tests (sync + async) green
- [x] `make format` / `make lint` / `mypy` clean

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merges origin/master to pick up langchain-core 1.3.2 (required by
langgraph-prebuilt 1.0.12), removes temporary git source pins for
langgraph packages (fix is now in the published 1.1.10 release), and
updates the langgraph lower bound to >=1.1.10.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@sydney-runkle Sydney Runkle (sydney-runkle) merged commit 3b945d0 into master Apr 27, 2026
54 checks passed
@sydney-runkle Sydney Runkle (sydney-runkle) deleted the sr/tool-call-context-fix branch April 27, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file (e.g. `pyproject.toml` or `uv.lock`) internal langchain `langchain` package issues & PRs performance size: XS < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants