Skip to content

Conversation

@lorentey
Copy link
Member

@lorentey lorentey commented Jul 15, 2024

When ingesting a String instance, BigString assumes that the input string has a reasonably efficient UTF-8 view.

Unfortunately, that is very much not the case when the input happens to be backed by a bridged NSString object — it appears that in this case, the ingester loop invokes some operation(s) with linear complexity in the size of the entire input, rendering the ingester’s overall complexity quadratic.

The BigString ingester is only expected to operate within a single chunk at the time. It’s unclear precisely which operation triggers the quadratic behavior; ideally we should figure it out and resolve it with a more targeted fix.

In the meantime, a blunt stopgap fix is to force-transcode the input string to UTF-8 at the time the ingester is initialized. This unnecessarily wastes some (temporary) memory on holding the transcoded string, but it avoids the quadratic cliff.

Before:
results-before

After:
results-after

rdar://131176282

Checklist

  • I've read the Contribution Guidelines
  • My contributions are licensed under the Swift license.
  • I've followed the coding style of the rest of the project.
  • I've added tests covering all new code paths my change adds to the project (if appropriate).
  • I've added benchmarks covering new functionality (if appropriate).
  • I've verified that my change does not break any existing tests or introduce unexplained benchmark regressions.
  • I've updated the documentation if necessary.

@lorentey lorentey added the RopeModule Positional B-trees label Jul 15, 2024
@lorentey lorentey added this to the 1.1.3 milestone Jul 15, 2024
@lorentey lorentey requested a review from jmschonfeld July 15, 2024 20:08
When ingesting a `String` instance, `BigString` assumes that the input string has a reasonably efficient UTF-8 view.

Unfortunately, that is very much not the case when the input happens to be backed by a bridged NSString object — it appears that in this case, the ingester loop invokes some operation(s) with linear complexity in the size of the entire input, rendering the ingester’s overall complexity quadratic.

The BigString ingester is only expected to operate within a single chunk at the time. It’s unclear precisely which operation triggers the quadratic behavior; ideally we should figure it out and resolve it with a more targeted fix.

In the meantime, a blunt stopgap fix is to force-transcode the input string to UTF-8 at the time the ingester is initialized. This unnecessarily wastes some (temporary) memory on holding the transcoded string, but it avoids the quadratic cliff.
@lorentey lorentey force-pushed the accidentally-quadratic-BigString-init branch from f3e6ec1 to 5e1fe6e Compare July 15, 2024 20:13
@lorentey
Copy link
Member Author

@swift-ci test

@lorentey lorentey merged commit 2db08b3 into release/1.1 Jul 15, 2024
@lorentey lorentey deleted the accidentally-quadratic-BigString-init branch July 15, 2024 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RopeModule Positional B-trees

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants