[api] Fix memory leaks in TracerProvider.GetTracer API#4906
[api] Fix memory leaks in TracerProvider.GetTracer API#4906CodeBlanch merged 13 commits intoopen-telemetry:mainfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4906 +/- ##
==========================================
+ Coverage 83.21% 83.51% +0.29%
==========================================
Files 295 295
Lines 12294 12324 +30
==========================================
+ Hits 10231 10292 +61
+ Misses 2063 2032 -31
Flags with carried forward coverage won't be shown. Click here to find out more.
|
| { | ||
| if (this.tracers == null) | ||
| { | ||
| // Note: We check here for a race with Dispose and return a |
There was a problem hiding this comment.
I believe we need to set this.tracers = null inside the same lock. Else we could still run into a situation where some thread calling Dispose sets this.tracers to null after this if check and before the new entry is added to the dictionary. We would want to return a no-op tracer in that case, but we would end up returning a valid tracer.
There was a problem hiding this comment.
I just checked it a couple times. I think it is good! Could be I'm not seeing something though. Can you write out a flow for me that you think is flawed? Here are a couple flows I'm imagining.
Case where Dispose runs in the middle of the writer and gets the lock...
- Writer thread reads the
this.tracerson Line 58. It is valid so it begins its work. - Dispose thread sets
this.tracerstonull. - Dispose thread takes the lock.
- Reader thread misses the cache and tries to take the lock. It has to wait.
- Dispose thread finishes its clean up and releases the lock.
- Writer thread gets the lock. Now it checks
this.tracers == null. This will betruenow and it will return a no-op instance.
Case where Dispose runs in the middle of the writer and waits on the lock...
- Writer thread reads the
this.tracerson Line 58. It is valid so it begins its work. - Reader thread misses the cache and takes the lock. Inside the lock it checks
this.tracers == nullwhich isfalse. It begins to do its work. - Dispose thread sets
this.tracerstonull. - Dispose thread tries to takes the lock. It has to wait.
- Writer thread adds a new tracer to the cache and releases the lock. It doesn't care that
this.tracersis now actuallynullbecause it is working on a local copy. - Dispose thread gets the lock and makes all the tracers in the cache no-ops including the one that was just added.
There was a problem hiding this comment.
For case 2,
Writer thread adds a new tracer to the cache and releases the lock. It doesn't care that this.tracers is now actually null because it is working on a local copy.
I think this is more of design choice. Yes, it doesn't care that this.tracers is now actually null but it could care about it 😄.
I was thinking we could offer a stronger guarantee that we would never return a Tracer when TracerProvider is disposed or being disposed. We could avoid this limbo state where the Dispose method may or may not have marked the newly returned Tracer no-op when its being used.
There was a problem hiding this comment.
I merged the PR because I think what's there will work well enough. I'll circle back to this comment when I have a sec to see if I can simplify it or clean it up in a way that doesn't introduce a bunch of contention.
utpilla
left a comment
There was a problem hiding this comment.
Left a non-blocking comment: #4906 (comment)
Changes
TracerProvidernow maintains a cache of theTracers it has issued. When disposed it will turn them into no-op instances and release their associatedActivitySources.Details
Consider the following simple application:
Running that we will see memory growing per iteration that is never released:
What's going on here?
Today we create a
Tracereach timeGetTraceris called which is handed its ownActivitySource. Creating spuriousActivitySources is dangerous because there is a static list of all active sources.Tracerdoes NOT implementIDisposableso users aren't given a chance to do this correctly.After the cache introduced on this PR the graph looks like this:
Merge requirement checklist
CHANGELOG.mdfiles updated for non-trivial changes