Description
Status update: SEH stack unwinding has been implemented in go1.20 for windows/amd64. Still missing windows/arm64.
Background
Go binaries can't be reliably debugged or profiled with native Windows tools, such as WinDbg or the Windows Performance Analyzer, because Go does not generate PE files which contains the necessary static data that Win32 functions like RtlVirtualUnwind and StackWalk use to unwind the stack.
Delve and go tool pprof
are great tools for developing on Windows, but production environments that run on Windows tend to rely on language-agnostic tools provided by Microsoft for profiling and troubleshooting. Stack unwinding is such a fundamental thing for all these tools, and Go not supporting it is a major pain point in the Windows ecosystem, at least when running production workloads.
Proposal
The Go compiler and linker should emit the necessary SEH static data for each Go function to reliably unwind and walk the stack using the Windows stack unwind conventions for each architecture:
- amd64: https://learn.microsoft.com/en-us/cpp/build/exception-handling-x64#unwind-data-for-exception-handling-debugger-support
- arm64: https://learn.microsoft.com/en-us/cpp/build/arm64-exception-handling#arm64-exception-handling-information
- arm: https://learn.microsoft.com/en-us/cpp/build/arm-exception-handling#arm-exception-handling-1
- 386: there is no official way to do stack unwinding on x86, tools use obscure heuristics sometimes helped by data in the PDBs. Let's exclude windows/386 from this proposal.
This new information will slightly increase the size of the final binary, around 12 bytes per non-leaf functions.
Stack unwinding overview
Note: each architecture has slightly different design, the following explanation is based on x64.
Stack unwinding normally take place in these three cases:
- When a hardware or software exception is raised, the SEH mechanism will chime in to find an appropriate exception handler for it. It will first look in the registered vector exceptions handlers, and if none trap the exception, the stack will be unwind looking for a function that has a registered SEH exception handler than traps the exception. This last part of the process internally calls RtlVirtualUnwind.
- Debuggers and profiles directly call RtlVirtualUnwind to unwind the stack, i.e. runtime: WinDbg fails to unwind the Go stack #57404
- When writing minidump files, i.e. with MinidumpWriteDump, which also calls RtlVirtualUnwind. Notice that Go will support creating minidumps on crash once runtime: use MiniDumpWriteDump for GOTRACEBACK=crash on Windows #49471 is implemented.
RtlVirtualUnwind unwinds exactly one frame from the stack and has two important parameters: ControlPC
and FunctionEntry
. The former is the PC from where to start the unwinding, and the later is the frame information of the current function. This frame information is what comes from the static data in the PE files, more specifically from the .pdata
and .xdata
sections. It contains the following bits: function length, prolog length, frame pointer location (if used), where does the stack grow to accommodate local variable, how to restore non-volatile registers, and the exception handler address (if any). RtlVirtualUnwind will use this information to restore the context of the calling function without physically walking the stack. If this information is not present (current situation in Go binaries), it will naively take the return address from the last 4/8 bytes of the stack, which really only works for leaf functions, and for non-leaf functions it means that the return address points to whatever value the last local variable happens to contain.
There is one important outcome of this explanation: tools using RtlVirtualUnwind will unwind Go binaries even if no unwind information is present in the PE (current situation), this process will never work correctly unless unwinding a leaf function. So, whatever we do, even if not perfect, will be an improvement over the current situation.
Implementation
I would rather keep the implementation details out of this discussion, it is doable and there any many ways to implement it, from naively generating the info in the linker (see prototype CL 457455) to a detailed stack frame representation generated by the compiler and the linker.
If this proposal is accepted, I would suggest implementing it incrementally, starting by just enabling stack walking and finishing with an accurate representation of the non-volatile registries at every frame of the call stack.
@golang/windows @golang/compiler
Metadata
Metadata
Assignees
Type
Projects
Status