Improve TLS codegen by marking the panic/init path as cold #143511

orlp · 2025-07-05T23:17:16Z

This is an extension of the performance improvements seen from #141685. I noticed that the non-const TLS still didn't have the #[cold] attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever.

These paths are taken either only once per thread (init) or never (panic, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to #[cold] I added the more aggressive #[inline(never)] to both cold paths as well.

rustbot · 2025-07-05T23:17:20Z

r? @workingjubilee

rustbot has assigned @workingjubilee.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

compiler-errors · 2025-07-05T23:30:07Z

Not sure if this will show up at all on perf but 🤷

@bors2 try @rust-timer queue

Do you have any local benchmarks?

Improve TLS codegen by marking the panic/init path as cold This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever. These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.

rust-bors · 2025-07-05T23:30:15Z

⌛ Trying commit db7b096 with merge 9f2c18a…

To cancel the try build, run the command @bors2 try cancel.

orlp · 2025-07-05T23:32:01Z

@compiler-errors No I don't have any local benchmarks. But I look at assembly output a lot, and trust me when I say these code paths should never get inlined.

Could you restart the benchmark with my second commit included?

compiler-errors · 2025-07-05T23:32:53Z

@bors2 try @rust-timer queue

rust-bors · 2025-07-05T23:32:57Z

⌛ Trying commit cf4669e with merge 8b17150…

(The previously running try build was automatically cancelled.)

To cancel the try build, run the command @bors2 try cancel.

Improve TLS codegen by marking the panic/init path as cold This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever. These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.

rust-bors · 2025-07-06T01:46:59Z

☀️ Try build successful (CI)
Build commit: 8b17150 (8b17150009e237f23856ea93eb9b208049d8a621, parent: 175e04331be56c5b4bdf77478434b1a5e0556770)

rust-timer · 2025-07-06T10:56:21Z

Finished benchmarking commit (8b17150): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	1
Improvements ✅ (primary)	-0.3%	[-0.3%, -0.3%]	1
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	1
All ❌✅ (primary)	-0.3%	[-0.3%, -0.3%]	1

Max RSS (memory usage)

Results (primary 5.4%, secondary 2.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	5.4%	[4.3%, 7.1%]	3
Regressions ❌ (secondary)	2.4%	[2.4%, 2.4%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	5.4%	[4.3%, 7.1%]	3

Cycles

Results (primary 2.6%, secondary -2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	2.6%	[2.6%, 2.6%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.8%	[-2.8%, -2.8%]	1
All ❌✅ (primary)	2.6%	[2.6%, 2.6%]	1

Binary size

Results (primary 0.0%, secondary 0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.5%]	15
Regressions ❌ (secondary)	0.1%	[0.0%, 0.1%]	37
Improvements ✅ (primary)	-0.2%	[-0.7%, -0.0%]	5
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.0%	[-0.7%, 0.5%]	20

Bootstrap: 459.09s -> 461.518s (0.53%)
Artifact size: 372.18 MiB -> 372.13 MiB (-0.01%)

orlp · 2025-07-06T13:43:47Z

I removed some inline(never)s because they pessimized codegen. I had forgotten that the get() call which returns the TLS pointer still gets wrapped inside LocalKey and checked again to see if a panic is required. Now this PR only adds hot paths with #[cold] for the fallback.

Codegen is still nicer just due to the addition of #[cold], it moves the initialization out of the hot path at least (and the compiler may still decide to not inline it).

lqd · 2025-07-06T15:01:45Z

@bors2 try @rust-timer queue

rust-bors · 2025-07-06T15:01:48Z

⌛ Trying commit 92fa8e8 with merge 9782d0a…

To cancel the try build, run the command @bors2 try cancel.

Improve TLS codegen by marking the panic/init path as cold This is an extension of the performance improvements seen from <#141685>. I noticed that the non-`const` TLS still didn't have the `#[cold]` attribute for the uninit/panic path, and I also realized that neither implementation should have the initialization or panic path inlined, ever. These paths are taken either only once per thread (`init`) or never (`panic`, in a well-behaving Rust program), thus they don't deserve to litter the code generated each time you access a thread-local variable. So in addition to `#[cold]` I added the more aggressive `#[inline(never)]` to both cold paths as well.

rust-bors · 2025-07-06T17:15:29Z

☀️ Try build successful (CI)
Build commit: 9782d0a (9782d0a1d99759de86b20e0863061637a0a3c245, parent: c83e217d268d25960a0c79c6941bcb3917a6a0af)

rust-timer · 2025-07-06T22:56:58Z

Finished benchmarking commit (9782d0a): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	2
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary 0.0%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.0%	[0.0%, 0.0%]	1
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	9
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.0%	[-0.0%, -0.0%]	1
All ❌✅ (primary)	0.0%	[0.0%, 0.0%]	1

Bootstrap: 461.809s -> 462.209s (0.09%)
Artifact size: 372.19 MiB -> 372.13 MiB (-0.02%)

Improve TLS codegen by marking the panic/init path as cold

db7b096

rustbot assigned workingjubilee Jul 5, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 5, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 5, 2025

Also apply opt to OS-specific TLS impls

cf4669e

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025

Don't use inline(never)

92fa8e8

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve TLS codegen by marking the panic/init path as cold #143511

Improve TLS codegen by marking the panic/init path as cold #143511

orlp commented Jul 5, 2025

Uh oh!

rustbot commented Jul 5, 2025

Uh oh!

compiler-errors commented Jul 5, 2025

Uh oh!

This comment has been minimized.

rust-bors bot commented Jul 5, 2025

Uh oh!

orlp commented Jul 5, 2025

Uh oh!

compiler-errors commented Jul 5, 2025

Uh oh!

This comment has been minimized.

rust-bors bot commented Jul 5, 2025

Uh oh!

rust-bors bot commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jul 6, 2025

Uh oh!

orlp commented Jul 6, 2025

Uh oh!

lqd commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-bors bot commented Jul 6, 2025

Uh oh!

rust-bors bot commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jul 6, 2025

Uh oh!

Uh oh!

Improve TLS codegen by marking the panic/init path as cold #143511

Are you sure you want to change the base?

Improve TLS codegen by marking the panic/init path as cold #143511

Conversation

orlp commented Jul 5, 2025

Uh oh!

rustbot commented Jul 5, 2025

Uh oh!

compiler-errors commented Jul 5, 2025

Uh oh!

This comment has been minimized.

rust-bors bot commented Jul 5, 2025

Uh oh!

orlp commented Jul 5, 2025

Uh oh!

compiler-errors commented Jul 5, 2025

Uh oh!

This comment has been minimized.

rust-bors bot commented Jul 5, 2025

Uh oh!

rust-bors bot commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jul 6, 2025

Overall result: ❌✅ regressions and improvements - no action needed

Uh oh!

orlp commented Jul 6, 2025

Uh oh!

lqd commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-bors bot commented Jul 6, 2025

Uh oh!

rust-bors bot commented Jul 6, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jul 6, 2025

Overall result: ✅ improvements - no action needed

Uh oh!

Uh oh!