Skip to content

Optimize InputSource handling with Cow and add benchmarks#1832

Merged
mre merged 5 commits intomasterfrom
cow
Sep 5, 2025
Merged

Optimize InputSource handling with Cow and add benchmarks#1832
mre merged 5 commits intomasterfrom
cow

Conversation

@mre
Copy link
Copy Markdown
Member

@mre mre commented Aug 29, 2025

Out of curiosity, I finally tackled this TODO from the code:

// TODO: consider using Cow (to avoid one .clone() for String types)

I took a look and it turns out it doesn't touch as many places as I thought it would, so I went with it.

The result is lesser allocations for places where we handle raw input strings.

I did add a micro-benchmark for input content creation and it turns out to be decently fast.

On my machine:

  • Extract from large docs: 28% improvement (21.4ms vs 29.3ms baseline)
  • InputContent::from_string: 28% improvement (5.3µs vs 7.4µs baseline)
  • InputContent::from_owned_string: 9% improvement (14.2µs vs 15.7µs baseline)
  • from_string with owned baseline: 14% improvement (8.1µs vs 9.4µs baseline)

Not earth-shaking, but we're already operating at a decent level of performance by now, so I take it.

The question is: what do y'all think? Does it make the code too complex or is the complexity manageable? Feedback very welcome!

mre added 4 commits August 30, 2025 00:21
…the String without an unnecessary allocation when it's already `Cow::Owned`
Copy link
Copy Markdown
Member

@thomas-zahner thomas-zahner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an improvement as it doesn't increase complexity very much 👍

@mre mre merged commit da89fd6 into master Sep 5, 2025
6 checks passed
@mre mre deleted the cow branch September 5, 2025 08:30
@mre mre mentioned this pull request Sep 3, 2025
@katrinafyi
Copy link
Copy Markdown
Member

Should this todo also be changed?

/// Truncate the source in case it gets too long
///
/// This is only needed for string inputs.
/// For other inputs, the source is simply a "label" (an enum variant).
// TODO: This would not be necessary if we used `Cow` for the source.
fn truncate_source(source: &InputSource) -> InputSource {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants