-
Notifications
You must be signed in to change notification settings - Fork 6k
Fix crash about 'SkiaUnrefQueue::Drain' is called after 'IOManager' reset #32106
Fix crash about 'SkiaUnrefQueue::Drain' is called after 'IOManager' reset #32106
Conversation
Gold has detected about 1 new digest(s) on patchset 1. |
Gold has detected about 1 new digest(s) on patchset 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nit.
@chinmaygarde FYI
|
||
@pragma('vm:entry-point') | ||
void frameCallback(_Image, int) { | ||
print('called back'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the print
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and consider adding a comment about why this method is empty and what test is using it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
PostTaskSync(runners.GetIOTaskRunner(), [&]() { | ||
// 'SkiaUnrefQueue.Drain' will be called after 'io_manager.reset()' in this | ||
// test, If the resource context has been destroyed at that time, it will | ||
// crash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surprising to me. But tracing through it seems to make sense. Adding some notes here so I dont' forget this later:
- Drain() currently checks whether the weak pointer is still valid or not before trying to call anything on it
- However, calling
unref
on the SkImage_Lazy ends up freeing a GrBackendTexture. That object seems to assume that something else is keeping the context alive. This seems like it might be a bad assumption on Skia's part, but in Skia's defense we're doing something pretty weird here by keeping GPU resident objects alive without keeping the GrDirectContext alive ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
This PR changes the Does that create a risk that the resource
|
Ahh @jason-simmons I forgot about #13237. In that case it seems like we'r emissing a drain call somewhere. |
Here's what seems like it's happening:
I'm thining we could fix this by having another method on |
During At that point the assumption is that no further objects will be queued to the In particular, image decode tasks run on the Without that there is a risk of a race like this:
The If that is not possible, then maybe this might work?
|
Jason is right - I was incorrect, we're guarding everything we need to with a @jason-simmons I think what you're suggesting replacing the DCHECK added in #13237 with a task schedule to make sure one final Drain is called and the GrDirectContext is released on the IO task runner (i.e. the task runner it's holding). |
Thanks for your suggestions! @jason-simmons @dnfield |
…e task runner's thread
(Add a new test to verify that the resource context is destroyed in the task runner's thread 44be8cd) |
void Unref(SkRefCnt* object); | ||
using ResourceContext = T; | ||
|
||
void Unref(SkRefCnt* object) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove these back to their own implementation file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test TEST_F(SkiaGpuObjectTest, UnrefResourceContextInTaskRunnerThread)
in this commit 44be8cd to make sure the context is released in the io thread, but in order to write this test , I had to change UnrefQueue
to a template class. So need to move all the code into the header file. I have no idea how to change my new test if the code is removed back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe my lack of C++ knowledge is showing, but shouldn't we be able to define the template methods in the cpp file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the template is defined in the implementation file, the test file cannot included it, so the compiler cannot help us generate the class UnrefQueue<TestResourceContext>
, so the test code cannot be compiled.
If we really need to hide the implementation, we can consider introducing a skia_gpu_object_impl.h
and put the implementation code in that file. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave this as is in that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move more of the impl of SkiaUnrefQueue back to its own C++ file. Otherwise, thsi change LGTM if it LGT @jason-simmons
We believe flutter/flutter#48062, #32106, flutter/flutter#50959, and flutter/flutter#100122 are related. |
Great work, @ColdPaleLight! Thanks for fixing this. And thanks to @dnfield and @jason-simmons for the thorough review! |
my pleasure. |
Is there any plan release in new version? |
This fix will be in the April beta release, and in the Q2 stable release. Unfortunately, I don't think we're able to cherry-pick it into the 2.10 stable. |
@ColdPaleLight @zanderso |
@Sheng20152 Yes the patches mentioned above are in the 3.0 stable release. |
fml::RefPtr<fml::TaskRunner> unref_queue_task_runner); | ||
fml::RefPtr<fml::TaskRunner> unref_queue_task_runner, | ||
fml::TimeDelta unref_queue_drain_delay = | ||
fml::TimeDelta::FromMilliseconds(8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ColdPaleLight I'm wondering whether number 8
here has any potential meaning?
I'm facing a problem when Supporting another platform because the delayed task posted in UnrefQueue#Unref
in release mode (In debug mode, everything goes as expected)
When I change 8
to 0
, it goes as expected in release mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ColdPaleLight I'm wondering whether number
8
here has any potential meaning? I'm facing a problem when Supporting another platform because the delayed task posted inUnrefQueue#Unref
in release mode (In debug mode, everything goes as expected) When I change8
to0
, it goes as expected in release mode.
I guess you can find the answer in the PR #9486. My commit did not change the value of delay, it just moved it to another place.
fix flutter/flutter#87895
change the type of
context_
ofSkiaUnrefQueue
fromfml::WeakPtr<GrDirectContext>
tosk_sp<GrDirectContext> context
We should ensure that resource context is valid when
SkiaUnrefQueue::Drain
is called, otherwise a crash may occur.See flutter/flutter#87895
The new test will crash if without this patch, stack trace is following
Stack Trace
cc @dnfield
Pre-launch Checklist
writing and running engine tests.
///
).