Skip to content

Conversation

@lorentey
Copy link
Member

@lorentey lorentey commented Jun 18, 2025

This builds on (and temporarily includes the commits of) #485, adds some basic benchmarks, and uses those to make and verify a single set of performance improvements by making use of the known character/scalar/UTF-16 counts to speed up intra-chunk distance calculations.

This makes a roughly 40-50% improvement for distance measurements over BigString’s character and Unicode Scalar views. Interestingly, we don’t see an improvement for UTF-16 views — I’m guessing this is due to String’s breadcrumbing. (And it will go away with the upcoming UTF8Spanification of this type.)

$ ./Utils/run-benchmarks.sh results compare before.json after.json --output cmp.html
Tasks with difference scores larger than 1.05:
  Score   Sum     Improvements Regressions  Name
  1.506   1.506   1.513(#48)   0.9925(#8)   BigString.distance(from:to:) (*)
  1.400   1.400   1.405(#51)   0.9950(#5)   BigString.unicodeScalars.distance(from:to:) (*)
2 images written to cmp.html
01 BigString distance(from:to:) 02 BigString unicodeScalars distance(from:to:)

(The benchmarks I added only show an improvement for multi-chunk big string instances, but this is merely an artifact of the specific algorithm that is being measured — for single-chunk strings, the old measurement method was already optimal on that payload, so the new heuristic never triggers.)

rdar://153701624

Checklist

  • I've read the Contribution Guidelines
  • My contributions are licensed under the Swift license.
  • I've followed the coding style of the rest of the project.
  • I've added tests covering all new code paths my change adds to the project (if appropriate).
  • I've added benchmarks covering new functionality (if appropriate).
  • I've verified that my change does not break any existing tests or introduce unexplained benchmark regressions.
  • I've updated the documentation if necessary.

@lorentey lorentey added this to the 1.2.1 milestone Jun 18, 2025
@lorentey lorentey requested a review from Azoy June 18, 2025 03:18
@lorentey
Copy link
Member Author

For reference, here is where we stand versus the standard String type:

09 Character-based distance 10 UnicodeScalar-based distance 12 UTF-16 distance 11 UTF-8 distance

It's pretty good, but not quite perfect! BigString's curves very nicely flattens out after a chunk's worth of data.

The beneficial effect of String's breadcrumbs shows up very clear in the UTF-16 baseline. I don't know why String's UTF-8 view would have a slower distance operation than BigString (optimizer interference from the bridging paths?), but it's nice we have work to do there too.

@lorentey
Copy link
Member Author

(Full results are attached below -- it's a single HTML file)

results.tar.gz

@lorentey lorentey added the RopeModule Positional B-trees label Jun 18, 2025
lorentey added 2 commits June 23, 2025 15:21
This implements a roughly 40-45% improvement for distance measurements over `BigString`’s character and Unicode Scalar views.

Interestingly, we don’t see an improvement for UTF-16 views — I’m guessing this is due to String’s breadcrumbing. (And it will go away with the upcoming UTF8Span-ification of this type.)

The benchmarks I added only show an improvement for multi-chunk big string instances, but this is merely an artifact of the specific algorithm that is being measured — for single-chunk strings, the existing measurement method is already optimal, so the new heuristic never triggers for this particular payload.
@lorentey lorentey force-pushed the BigString-easy-speedups branch from 819124a to fc41185 Compare June 23, 2025 22:22
@lorentey lorentey merged commit 96bf8b8 into apple:release/1.2 Jun 24, 2025
20 checks passed
@lorentey lorentey deleted the BigString-easy-speedups branch June 24, 2025 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RopeModule Positional B-trees

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant