[BigString] Harvest some low-hanging performance fruit #486

lorentey · 2025-06-18T03:18:02Z

This builds on (and temporarily includes the commits of) #485, adds some basic benchmarks, and uses those to make and verify a single set of performance improvements by making use of the known character/scalar/UTF-16 counts to speed up intra-chunk distance calculations.

This makes a roughly 40-50% improvement for distance measurements over BigString’s character and Unicode Scalar views. Interestingly, we don’t see an improvement for UTF-16 views — I’m guessing this is due to String’s breadcrumbing. (And it will go away with the upcoming UTF8Spanification of this type.)

$ ./Utils/run-benchmarks.sh results compare before.json after.json --output cmp.html
Tasks with difference scores larger than 1.05:
  Score   Sum     Improvements Regressions  Name
  1.506   1.506   1.513(#48)   0.9925(#8)   BigString.distance(from:to:) (*)
  1.400   1.400   1.405(#51)   0.9950(#5)   BigString.unicodeScalars.distance(from:to:) (*)
2 images written to cmp.html

02 BigString unicodeScalars distance(from:to:)

(The benchmarks I added only show an improvement for multi-chunk big string instances, but this is merely an artifact of the specific algorithm that is being measured — for single-chunk strings, the old measurement method was already optimal on that payload, so the new heuristic never triggers.)

rdar://153701624

Checklist

I've read the Contribution Guidelines
My contributions are licensed under the Swift license.
I've followed the coding style of the rest of the project.
I've added tests covering all new code paths my change adds to the project (if appropriate).
I've added benchmarks covering new functionality (if appropriate).
I've verified that my change does not break any existing tests or introduce unexplained benchmark regressions.
I've updated the documentation if necessary.

lorentey · 2025-06-18T03:24:42Z

For reference, here is where we stand versus the standard String type:

It's pretty good, but not quite perfect! BigString's curves very nicely flattens out after a chunk's worth of data.

The beneficial effect of String's breadcrumbs shows up very clear in the UTF-16 baseline. I don't know why String's UTF-8 view would have a slower distance operation than BigString (optimizer interference from the bridging paths?), but it's nice we have work to do there too.

lorentey · 2025-06-18T03:34:03Z

(Full results are attached below -- it's a single HTML file)

results.tar.gz

This implements a roughly 40-45% improvement for distance measurements over `BigString`’s character and Unicode Scalar views. Interestingly, we don’t see an improvement for UTF-16 views — I’m guessing this is due to String’s breadcrumbing. (And it will go away with the upcoming UTF8Span-ification of this type.) The benchmarks I added only show an improvement for multi-chunk big string instances, but this is merely an artifact of the specific algorithm that is being measured — for single-chunk strings, the existing measurement method is already optimal, so the new heuristic never triggers for this particular payload.

lorentey added this to the 1.2.1 milestone Jun 18, 2025

lorentey requested a review from Azoy June 18, 2025 03:18

lorentey added the RopeModule Positional B-trees label Jun 18, 2025

lorentey added 2 commits June 23, 2025 15:21

[BigString] Add some basic benchmarks

f1b0eba

lorentey force-pushed the BigString-easy-speedups branch from 819124a to fc41185 Compare June 23, 2025 22:22

lorentey merged commit 96bf8b8 into apple:release/1.2 Jun 24, 2025
20 checks passed

lorentey deleted the BigString-easy-speedups branch June 24, 2025 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BigString] Harvest some low-hanging performance fruit #486

[BigString] Harvest some low-hanging performance fruit #486

Uh oh!

lorentey commented Jun 18, 2025 •

edited

Loading

Uh oh!

lorentey commented Jun 18, 2025

Uh oh!

lorentey commented Jun 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[BigString] Harvest some low-hanging performance fruit #486

[BigString] Harvest some low-hanging performance fruit #486

Uh oh!

Conversation

lorentey commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

lorentey commented Jun 18, 2025

Uh oh!

lorentey commented Jun 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lorentey commented Jun 18, 2025 •

edited

Loading