Skip to content

Improve TLS codegen by marking the panic/init path as cold #143511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

orlp
Copy link
Contributor

@orlp orlp commented Jul 5, 2025

This is an extension of the performance improvements seen from #141685. I noticed that the non-const TLS still didn't have the #[cold] attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever.

These paths are taken either only once per thread (init) or never (panic, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to #[cold] I added the more aggressive #[inline(never)] to both cold paths as well.

@rustbot
Copy link
Collaborator

rustbot commented Jul 5, 2025

r? @workingjubilee

rustbot has assigned @workingjubilee.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 5, 2025
@compiler-errors
Copy link
Member

Not sure if this will show up at all on perf but 🤷

@bors2 try @rust-timer queue

Do you have any local benchmarks?

@rust-timer

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Jul 5, 2025
Improve TLS codegen by marking the panic/init path as cold

This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever.

These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 5, 2025
@rust-bors
Copy link

rust-bors bot commented Jul 5, 2025

⌛ Trying commit db7b096 with merge 9f2c18a

To cancel the try build, run the command @bors2 try cancel.

@orlp
Copy link
Contributor Author

orlp commented Jul 5, 2025

@compiler-errors No I don't have any local benchmarks. But I look at assembly output a lot, and trust me when I say these code paths should never get inlined.

Could you restart the benchmark with my second commit included?

@compiler-errors
Copy link
Member

@bors2 try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors
Copy link

rust-bors bot commented Jul 5, 2025

⌛ Trying commit cf4669e with merge 8b17150

(The previously running try build was automatically cancelled.)

To cancel the try build, run the command @bors2 try cancel.

rust-bors bot added a commit that referenced this pull request Jul 5, 2025
Improve TLS codegen by marking the panic/init path as cold

This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever.

These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.
@rust-bors
Copy link

rust-bors bot commented Jul 6, 2025

☀️ Try build successful (CI)
Build commit: 8b17150 (8b17150009e237f23856ea93eb9b208049d8a621, parent: 175e04331be56c5b4bdf77478434b1a5e0556770)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (8b17150): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 1
Improvements ✅
(primary)
-0.3% [-0.3%, -0.3%] 1
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.3%] 1
All ❌✅ (primary) -0.3% [-0.3%, -0.3%] 1

Max RSS (memory usage)

Results (primary 5.4%, secondary 2.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
5.4% [4.3%, 7.1%] 3
Regressions ❌
(secondary)
2.4% [2.4%, 2.4%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 5.4% [4.3%, 7.1%] 3

Cycles

Results (primary 2.6%, secondary -2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.6% [2.6%, 2.6%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.8% [-2.8%, -2.8%] 1
All ❌✅ (primary) 2.6% [2.6%, 2.6%] 1

Binary size

Results (primary 0.0%, secondary 0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.5%] 15
Regressions ❌
(secondary)
0.1% [0.0%, 0.1%] 37
Improvements ✅
(primary)
-0.2% [-0.7%, -0.0%] 5
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.0% [-0.7%, 0.5%] 20

Bootstrap: 459.09s -> 461.518s (0.53%)
Artifact size: 372.18 MiB -> 372.13 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025
@orlp
Copy link
Contributor Author

orlp commented Jul 6, 2025

I removed some inline(never)s because they pessimized codegen. I had forgotten that the get() call which returns the TLS pointer still gets wrapped inside LocalKey and checked again to see if a panic is required. Now this PR only adds hot paths with #[cold] for the fallback.

Codegen is still nicer just due to the addition of #[cold], it moves the initialization out of the hot path at least (and the compiler may still decide to not inline it).

@lqd
Copy link
Member

lqd commented Jul 6, 2025

@bors2 try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors
Copy link

rust-bors bot commented Jul 6, 2025

⌛ Trying commit 92fa8e8 with merge 9782d0a

To cancel the try build, run the command @bors2 try cancel.

rust-bors bot added a commit that referenced this pull request Jul 6, 2025
Improve TLS codegen by marking the panic/init path as cold

This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever.

These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025
@rust-bors
Copy link

rust-bors bot commented Jul 6, 2025

☀️ Try build successful (CI)
Build commit: 9782d0a (9782d0a1d99759de86b20e0863061637a0a3c245, parent: c83e217d268d25960a0c79c6941bcb3917a6a0af)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (9782d0a): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.3%] 2
All ❌✅ (primary) - - 0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary 0.0%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.0% [0.0%, 0.0%] 1
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 9
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 1
All ❌✅ (primary) 0.0% [0.0%, 0.0%] 1

Bootstrap: 461.809s -> 462.209s (0.09%)
Artifact size: 372.19 MiB -> 372.13 MiB (-0.02%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants