Skip to content

Avoid repeated copying in clean_tokens. #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 16, 2022

Conversation

nnethercote
Copy link
Contributor

clean_tokens takes a vector of tokens and removes some. The decision
of which tokens to remove is somewhat complicated, so it can't use
retain or filter, but instead does everyting itself. This includes
calling remove on every individual removed token. Unfortunately, this
results in quadratic-ish behaviour when the number of tokens removed is
large, due to all the shuffling down of tokens after each removed token.
The closer the removed tokens are to the start of the vector, the worse
it is.

This commit switches to a two pass approach. The first pass records
which tokens should be removed, using an auxiliary vector of bools. The
second pass does the removal, using Vec::retain. When running
rustdoc (which uses minifier) on helloworld, this reduces memcpy
traffic from this:

297,675,803 bytes in 869,856 blocks, avg size 342.21 bytes

to this:

179,253,407 bytes in 868,510 blocks, avg size 206.39 bytes

`clean_tokens` takes a vector of tokens and removes some. The decision
of which tokens to remove is somewhat complicated, so it can't use
`retain` or `filter`, but instead does everyting itself. This includes
calling `remove` on every individual removed token. Unfortunately, this
results in quadratic-ish behaviour when the number of tokens removed is
large, due to all the shuffling down of tokens after each removed token.
The closer the removed tokens are to the start of the vector, the worse
it is.

This commit switches to a two pass approach. The first pass records
which tokens should be removed, using an auxiliary vector of bools. The
second pass does the removal, using `Vec::retain`. When running
`rustdoc` (which uses `minifier`) on `helloworld`, this reduces `memcpy`
traffic from this:
```
297,675,803 bytes in 869,856 blocks, avg size 342.21 bytes
```
to this:
```
179,253,407 bytes in 868,510 blocks, avg size 206.39 bytes
```
@GuillaumeGomez
Copy link
Owner

Very interesting approach! Thanks!

@GuillaumeGomez GuillaumeGomez merged commit 8f0f4df into GuillaumeGomez:master Aug 16, 2022
@nnethercote nnethercote deleted the clean_tokens branch August 16, 2022 11:45
@nnethercote
Copy link
Contributor Author

Thank you for the fast response. Would you be able to make a new release so we can take advantage of the speedup in rustdoc?

@GuillaumeGomez
Copy link
Owner

Already on it: #100. ;)

bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 21, 2022
…on, r=nnethercote

Update minifier version to 0.2.2

Following [this PR](GuillaumeGomez/minifier-rs#98), the CSS minification should be much faster now (thanks to `@nnethercote).`

r? `@nnethercote`
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 20, 2024
…thercote

Update minifier version to 0.2.2

Following [this PR](GuillaumeGomez/minifier-rs#98), the CSS minification should be much faster now (thanks to `@nnethercote).`

r? `@nnethercote`
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 27, 2024
…thercote

Update minifier version to 0.2.2

Following [this PR](GuillaumeGomez/minifier-rs#98), the CSS minification should be much faster now (thanks to `@nnethercote).`

r? `@nnethercote`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants