subscriber: skip EnvFilter RwLock in span callbacks when no dynamic directives are configured#3487
Open
kv-cache wants to merge 1 commit intotokio-rs:mainfrom
Open
Conversation
b8d6b03 to
91a9c77
Compare
91a9c77 to
ef1310f
Compare
…irectives
EnvFilter maintains `by_id: RwLock<HashMap<Id, SpanMatcher>>` and
`by_cs: RwLock<HashMap<Identifier, CallsiteMatcher>>` for tracking
per-span field-level match state. These maps are only populated when
`has_dynamics` is true (i.e., when the filter has span/field-level
directives like `my_crate[span_name]=debug`).
However, the span lifecycle callbacks (`on_new_span`, `on_enter`,
`on_exit`, `on_close`, `on_record`) unconditionally acquire the
`by_id` read lock (or write lock for `on_close`) on every call. When
`has_dynamics` is false — which is the common case for pure
target=level configurations like `RUST_LOG=info,hyper=warn` — these
maps are always empty, and the lock acquisition is wasted work.
Under high concurrency this becomes a bottleneck: a GDB backtrace of a
Tokio-based HTTP server under load showed ~25 worker threads contending
on these RwLock acquisitions, with threads in `read_contended` blocked
by `on_close`'s write lock. The contention scales with the number of
EnvFilter instances in the subscriber layer stack (one per layer when
using per-layer filtering).
This commit adds an early `if !self.has_dynamics { return; }` guard to
each of the five span lifecycle methods. This is a pure no-op when
dynamic directives are present (the existing code path runs unchanged),
and completely eliminates the lock traffic for the common static-only
configuration.
ef1310f to
446ca4b
Compare
Contributor
|
Thank you for submitting this PR! Would it be possible to add regressions tests for this change? Normally we run and avoid unit tests, but this might be a case where it's rhetorical only way to test. Also, do you have any perf timings before and after the change? Is be interested to see them in the PR message or comments. Thanks! |
Author
|
Thanks! I had originally added 2 tests but they seemed redundant to existing tests. Let me take a look again. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
EnvFiltermaintains twoRwLock-protectedHashMaps (by_idandby_cs) for tracking per-span field-level match state. These are only populated whenhas_dynamicsistrue— i.e., when the filter includes span-name or field-level directives likemy_crate[span_name]=debug.However, the span lifecycle callbacks (
on_new_span,on_enter,on_exit,on_close,on_record) unconditionally acquire theby_idread lock (or write lock foron_close) on every invocation, even whenhas_dynamicsisfalseand the maps are guaranteed empty.For the common case of pure
target=leveldirectives (e.g.RUST_LOG=info,hyper=warn), these locks are acquired and released on every span enter/exit/close/record with no effect.Problem
Under high concurrency this causes measurable lock contention. A GDB backtrace captured from a Tokio-based HTTP server (80+ async workers, 3
EnvFilterinstances as per-layer filters) during a load test showed ~25 threads contending onRwLock::readinsideEnvFilter::on_enter,on_exit, andcares_about_span, with threads blocked inread_contendedbyon_close's write lock:Solution
Add an early
if !self.has_dynamics { return; }guard to each of the five span lifecycle methods. When dynamic directives are present, the existing code path runs unchanged. When they're absent (the common case), the lock is never touched.