Skip to content

subscriber: skip EnvFilter RwLock in span callbacks when no dynamic directives are configured#3487

Open
kv-cache wants to merge 1 commit intotokio-rs:mainfrom
kv-cache:fix/envfilter-skip-rwlock-without-dynamics
Open

subscriber: skip EnvFilter RwLock in span callbacks when no dynamic directives are configured#3487
kv-cache wants to merge 1 commit intotokio-rs:mainfrom
kv-cache:fix/envfilter-skip-rwlock-without-dynamics

Conversation

@kv-cache
Copy link

@kv-cache kv-cache commented Mar 7, 2026

Motivation

EnvFilter maintains two RwLock-protected HashMaps (by_id and by_cs) for tracking per-span field-level match state. These are only populated when has_dynamics is true — i.e., when the filter includes span-name or field-level directives like my_crate[span_name]=debug.

However, the span lifecycle callbacks (on_new_span, on_enter, on_exit, on_close, on_record) unconditionally acquire the by_id read lock (or write lock for on_close) on every invocation, even when has_dynamics is false and the maps are guaranteed empty.

For the common case of pure target=level directives (e.g. RUST_LOG=info,hyper=warn), these locks are acquired and released on every span enter/exit/close/record with no effect.

Problem

Under high concurrency this causes measurable lock contention. A GDB backtrace captured from a Tokio-based HTTP server (80+ async workers, 3 EnvFilter instances as per-layer filters) during a load test showed ~25 threads contending on RwLock::read inside EnvFilter::on_enter, on_exit, and cares_about_span, with threads blocked in read_contended by on_close's write lock:

#0 std::sys::sync::rwlock::futex::RwLock::read_contended
#1 RwLock::read (self=0x55e0c7694178)
#2 RwLock<HashMap<SpanId, MatchSet<SpanMatch>>>::read
#3 EnvFilter::on_enter at filter/env/mod.rs:585

Solution

Add an early if !self.has_dynamics { return; } guard to each of the five span lifecycle methods. When dynamic directives are present, the existing code path runs unchanged. When they're absent (the common case), the lock is never touched.

@kv-cache kv-cache changed the base branch from main to v0.1.x March 7, 2026 19:46
@kv-cache kv-cache force-pushed the fix/envfilter-skip-rwlock-without-dynamics branch from b8d6b03 to 91a9c77 Compare March 7, 2026 19:48
@kv-cache kv-cache changed the base branch from v0.1.x to main March 7, 2026 19:49
@kv-cache kv-cache force-pushed the fix/envfilter-skip-rwlock-without-dynamics branch from 91a9c77 to ef1310f Compare March 7, 2026 19:53
…irectives

EnvFilter maintains `by_id: RwLock<HashMap<Id, SpanMatcher>>` and
`by_cs: RwLock<HashMap<Identifier, CallsiteMatcher>>` for tracking
per-span field-level match state. These maps are only populated when
`has_dynamics` is true (i.e., when the filter has span/field-level
directives like `my_crate[span_name]=debug`).

However, the span lifecycle callbacks (`on_new_span`, `on_enter`,
`on_exit`, `on_close`, `on_record`) unconditionally acquire the
`by_id` read lock (or write lock for `on_close`) on every call. When
`has_dynamics` is false — which is the common case for pure
target=level configurations like `RUST_LOG=info,hyper=warn` — these
maps are always empty, and the lock acquisition is wasted work.

Under high concurrency this becomes a bottleneck: a GDB backtrace of a
Tokio-based HTTP server under load showed ~25 worker threads contending
on these RwLock acquisitions, with threads in `read_contended` blocked
by `on_close`'s write lock. The contention scales with the number of
EnvFilter instances in the subscriber layer stack (one per layer when
using per-layer filtering).

This commit adds an early `if !self.has_dynamics { return; }` guard to
each of the five span lifecycle methods. This is a pure no-op when
dynamic directives are present (the existing code path runs unchanged),
and completely eliminates the lock traffic for the common static-only
configuration.
@kv-cache kv-cache force-pushed the fix/envfilter-skip-rwlock-without-dynamics branch from ef1310f to 446ca4b Compare March 7, 2026 19:56
@kv-cache kv-cache marked this pull request as ready for review March 7, 2026 20:27
@kv-cache kv-cache requested review from a team, hawkw and hds as code owners March 7, 2026 20:27
@hds
Copy link
Contributor

hds commented Mar 7, 2026

Thank you for submitting this PR!

Would it be possible to add regressions tests for this change?

Normally we run and avoid unit tests, but this might be a case where it's rhetorical only way to test.

Also, do you have any perf timings before and after the change? Is be interested to see them in the PR message or comments. Thanks!

@kv-cache
Copy link
Author

kv-cache commented Mar 8, 2026

Thanks! I had originally added 2 tests but they seemed redundant to existing tests. Let me take a look again.
Re perf: With this change I hope to not see those locks in cpu profiles again but I dont expect an explicit e2e gain from it - its more about mechanical sympathy on data paths. Maybe more visible with criterion benchmarks. I'll post an update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants