Skip to content

Conversation

@999pingGG
Copy link
Contributor

sigh Here we go again. This is a follow-up to my previous PR. The comment in the code explains it all. Because I had been testing in debug mode with a simple clear screen app, I didn't notice this happened. This time I've thoroughly tested and I can switch apps, use picture-in-picture mode, resize the window and everything is good.

@thatcosmonaut
Copy link
Collaborator

Surely there's a better way that apps are supposed to handle this? Ignoring surface errors and hoping the swapchain gets recreated later smells really fishy to me.

@999pingGG
Copy link
Contributor Author

999pingGG commented Dec 20, 2025

I thought about that, but it would have to happen somehow after acquiring the swapchain texture and before submitting the command buffer. When you successfully acquire an image you reasonably assume you will be able to present it, but it's not the case here because Android destroys the surface asynchronously. The way to do it like you describe would probably be to asynchronously signal a flag when the activity generates a surface destroyed event and checking it before presenting, but I don't think it's practical to do that. EDIT: At least not with the current SDL GPU API.

@999pingGG
Copy link
Contributor Author

Another possibly "more robust" way to work around this would be to ignore the error just once when it happens; two surface lost errors in a row is a hard error. Tho I agree all of this is hacky and I wish there was a better option.

@thatcosmonaut
Copy link
Collaborator

Could we use AddEventWatch to detect the surface destroy event? We're using that to flag when the swapchain should be recreated on resize, so maybe we could detect it and cancel presenting. Mostly I'd like to avoid treating errors as expected occurrences, that could lead to all kinds of problems.

@999pingGG
Copy link
Contributor Author

999pingGG commented Dec 20, 2025

I just did some tests. Adding an event watch doesn't help because the app's thread acquires the swapchain texture > records command buffer > submits. No chance for event handlers to run. But I see messages from SDL in the Android logcat, presumably from another thread, showing a sequence roughly like this when the app goes into the background: App acquires swapchain texture > SDL logs onPause() > surfaceDestroyed() > nativePause() > onTrimMemory() > onStop() > vkQueuePresentKHR VK_ERROR_SURFACE_LOST_KHR. So maybe there's a good way after all, but requires bigger changes which I'm not sure how to best implement. Can someone give some pointers?
EDIT: To be clear, those messages come from the SDL Java code. SDLSurface.java in particular has the function surfaceDestroyed() that calls onNativeSurfaceDestroyed() in SDL_android.c. I think we should signal SDL GPU from here somehow, using a mutex. How exactly could this be done?

@thatcosmonaut
Copy link
Collaborator

Android's awfulness continues to amaze me.

OK, since we've established that Android can destroy our presentation surface literally any time it wants, how about this: if we get SURFACE_LOST_KHR, we clean the command buffer and do not submit it, and mark the swapchain and surface as needing to be recreated. That way the app will eventually proceed to acquiring the swapchain where either the swapchain will be properly recreated or an error can be properly detected.

@slouken
Copy link
Collaborator

slouken commented Dec 21, 2025

OK, since we've established that Android can destroy our presentation surface literally any time it wants, how about this: if we get SURFACE_LOST_KHR, we clean the command buffer and do not submit it, and mark the swapchain and surface as needing to be recreated. That way the app will eventually proceed to acquiring the swapchain where either the swapchain will be properly recreated or an error can be properly detected.

This seems like a good approach to me. @999pingGG, are you able to implement and test this?

@999pingGG
Copy link
Contributor Author

Sure, I'm absolutely interested in fixing this! But I'm afraid something is not entirely clear to me. Isn't this what is kind of already happening? In my PR we treat VK_SUCCESS and VK_ERROR_SURFACE_LOST_KHR equally, which means VULKAN_INTERNAL_ReleaseCommandBuffer() will be called at the end of the function (is that what you mean by "clean the command buffer and not submit it" since I see the call to vkQueueSubmit() before vkQueuePresentKHR(), and the only thing throwing an error is the latter, no way to know beforehand if it will fail?). Then, assuming the app gives the chance for event handlers to run, this code here will run and mark the surface and swapchain for recreation: https://github.com/999pingGG/SDL/blob/d202a7ba79ead0435deb530d55e52085d14cab2a/src/gpu/vulkan/SDL_gpu_vulkan.c#L9663-L9669 (the surface recreation bits were previously contributed by me because Android needs it)
So they are recreated the next time we acquire an image, after returning from the background, and presentation succeeds having failed just once.
If there's a misunderstanding in my part, please let me know!

@thatcosmonaut
Copy link
Collaborator

You're right, I forgot that Present happens separately after Submit. (I wonder what happens to the draw commands that write to the swapchain if the surface has been destroyed...) I still think we should explicitly mark the swapchain for recreate if the surface is lost on Present and not wait for the event to do it.

@999pingGG
Copy link
Contributor Author

999pingGG commented Dec 21, 2025

Sorry, I did a mistake and the PR closed, I'll open another one. EDIT: Nevermind, this thing fixed itself. How about my latest changes? Your idea is much better.

@999pingGG 999pingGG reopened this Dec 21, 2025
@thatcosmonaut
Copy link
Collaborator

Looks good, thanks.

@slouken slouken merged commit 5a25720 into libsdl-org:main Dec 22, 2025
46 checks passed
@slouken
Copy link
Collaborator

slouken commented Dec 22, 2025

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants