Skip to content

fix(core)!: include multimodal blocks in get_buffer_string prefix format#38174

Open
Mason Daugherty (mdrxy) wants to merge 1 commit into
masterfrom
mdrxy/core/get-buffer-string-multimodal
Open

fix(core)!: include multimodal blocks in get_buffer_string prefix format#38174
Mason Daugherty (mdrxy) wants to merge 1 commit into
masterfrom
mdrxy/core/get-buffer-string-multimodal

Conversation

@mdrxy

@mdrxy Mason Daugherty (mdrxy) commented Jun 15, 2026

Copy link
Copy Markdown
Member

Breaking change - v2 candidate

get_buffer_string now includes image/audio/video block references (e.g. [image: <url>]) in the default prefix format instead of dropping them.


The default (non-XML) get_buffer_string only kept text content and silently dropped image, audio, and video blocks. That is lossy: a user who asks "what does this screenshot show?" with an attached image URL ends up with a summary that no longer references the image at all. Only the format="xml" path preserved that information, so callers such as SummarizationMiddleware had to opt into XML to avoid the loss.

This updates the shared utility so every caller of the default format benefits: non-base64 image, image_url (OpenAI-style), audio, and video blocks are appended as a concise human-readable reference. Plain string content and the XML format are unchanged, and base64/data: media is still omitted to avoid dumping payloads.

Before / After

Given a multimodal HumanMessage:

from langchain_core.messages import HumanMessage, get_buffer_string

msg = HumanMessage(content=[
    {"type": "text", "text": "What does this screenshot show?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/screenshot.png"}},
])
get_buffer_string([msg])

Before (image URL silently dropped):

Human: What does this screenshot show?

After (image reference preserved):

Human: What does this screenshot show? [image: https://example.com/screenshot.png]

Made by Open SWE

…rmat

Non-XML `get_buffer_string` dropped image/audio/video content blocks,
losing references like image URLs the user explicitly mentioned. The
prefix path now appends non-base64 image, audio, and video blocks as a
human-readable reference (e.g. `[image: <url>]`) so all callers of the
default format benefit. Plain string content and XML output are
unchanged.

Co-authored-by: open-swe[bot] <open-swe@users.noreply.github.com>
@github-actions github-actions Bot added core `langchain-core` package issues & PRs fix For PRs that implement a fix internal size: S 50-199 LOC labels Jun 15, 2026
@mdrxy Mason Daugherty (mdrxy) marked this pull request as ready for review June 15, 2026 19:39

@open-swe open-swe Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Open SWE Review: No issues found

Open SWE reviewed this PR and found no potential bugs to report.

Open in WebView Open SWE trace

@mdrxy Mason Daugherty (mdrxy) changed the title fix(core): include multimodal blocks in get_buffer_string prefix format fix(core)!: include multimodal blocks in get_buffer_string prefix format Jun 15, 2026
@github-actions github-actions Bot added the breaking Breaking changes label Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking changes core `langchain-core` package issues & PRs fix For PRs that implement a fix internal size: S 50-199 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant