Two-phase unwinding and its implications #123
Description
Foreword
Recently, the need for two-phase stack unwinding has been suggested. Two-phase unwinding is a useful feature for debugging, because the stack can be preserved when an exception is not caught and the program crashes. Also, it is necessary to implement the full semantics of languages such as C# (due to its when
clause). While each language may implement it in their own library if necessary, supporting it in the spec level can be more convenient or faster. We didn’t think it was an essential feature to support in the spec before, and I think whether we should support this in the spec level or not is also something we should discuss.
In this issue, I’ll address the implication to the current proposal if we decide to support it. This issue is not only about adding syntax for two-phase unwinding itself; it is more about the extensibility of the current proposal in case we support the two-phase later.
Two-phase unwinding
Two-phase unwinding consists of two phases. The first phase is the ‘search’ phase; it walks up the stack without unwinding it and checks each catch
instruction (= landing pad) to see if it catches the current exception. If it does not catch the current exception, we continue the search. If the search phase ends and none of the catches catches the current exception, we end the unwinding there without running the second phase, leaving the call stack intact. If we found a catch
that catches the exception, we remember the result, and begin the second, ‘cleanup’ phase. In this phase, we actually unwind the stack, while running all cleanup (= destructor) code, until we reach the catch
that catches the exception. After we arrive at the destination catch
, we stop the second phase unwinding there, and transfer the control flow to that catch
.
Two-phase unwinding will require catch
instruction to have a way of filtering an exception. That can be an index to a wasm function, or a code block attached to a catch
instruction. And the VM should be able to run these functions or code blocks without disturbing the call stack during the first ‘search’ phase of two phase unwinding.
But other than adding a filter function to catch
, we may need more changes to the current proposal. The most important one of them is exnref
, which I will elaborate on next.
Background on first-class exnref
We introduced first-class exnref
type in 2018, mainly to support more expressive code transformation in the toolchain.
The specific problem that sparked this discussion was that, after the “CFG stackification” phase in the toolchain, in which we place marker instructions like block
, try
, loop
, and end
, it is possible that there can be mismatches in calls’ or throws’ unwind destinations. This problem was first discussed in #29 (with different solutions then). To re-summarize the problem here, suppose we have the following CFG:
bb0:
call @foo (if it throws, unwind to bb2)
bb1:
call @bar (if it throws, unwind to bb3)
bb2 (ehpad):
catch
...
bb3 (ehpad):
catch
handler body
And the CFG is sorted in this order. Then after placing try
markers, the wasm code will look like:
try $label1
try
call @foo
call @bar (if it throws, it should unwind to catch2, but is caught by catch1)
catch1 <- ehpad (bb2)
...
end_try
catch2 <- ehpad (bb3)
handler body
end_try
The first-class exnref
allows us to fix this kind of mismatches because with that now exnref
s can escape their original catch
blocks they are originally caught and we can freely branch to the right handler to use them.
Incompatibility of first-class exnref
and two-phase unwinding
Suppose we extend our current proposal to support two-phase stack unwinding. In the first phase of the unwinding, we walk up the stack to examine each catch
’s filter function without actually unwinding the stack, meaning we shouldn’t run any catch
bodies. In order to do that, we can use the internal EH pad stack maintained in the VM. But the problem when we have exnref
s that can escape catch
bodies is, we don’t have a way to find out the next catch
in this first search phase.
try
try
call @foo
catch1
end
catch2
end
In this example, when foo
throws, the first phase should check catch1
first, and if it does not catch the current exception, it should check catch2
next. But with exnref
, the code can look like this:
try
try
call @foo
catch1
local.set 0
br 1
end
catch2
end
In this case, semantically the program shouldn’t run catch2
body anymore, because we branch out of both try
blocks. But the first phase, which does not run any catch
bodies and only check filter functions, will check catch2
after checking catch1
. This was not a problem when we only have a single phase, because in that case we run the actual program and unwind the stack as we go. But with two-phase unwinding, we need a way to examine a sequence of catches in the first phase without running the program.
Recently #118 and #122 claimed the current proposal is not extensible to support the two-phase unwinding feature. I, @dschuff, @tlively and @RossTate had a video meeting, and while I’m not sure Ross’s reasoning was the same as the things I described here, we agreed that escaping of first-class exnref
can cause problems for future two-phase unwinding.
Necessary changes to the current proposal
In short, we need to remove the first-class exnref
, which pretty much means going back to the first version of the exception handling proposal. catch
instruction now can be back to catch $tag
and catch_all
(but this may not be a must). We may not need br_on_exn
anymore if we restore tagged catches, but having it can also increase code generation flexibility, in which case, it will assume the current exception as an implicit argument. But if we remove first-class exnref
, we need a way to solve the unwind destination mismatch problem I described above.
Adding catch_br
instruction
One possible addition to the current spec I and @dschuff thought about is catch_br
instruction. So now we have two kinds of try-catches: one is the same as the current one:
try
…
catch
…
end
On top of that, we now have one additional syntax:
try
…
catch_br $label
catch_br
does not have a body, so it does not need an end
at the end. If any instruction between try
and catch_br
throws, we transfer control flow to an outer catch
that is specified by the label. The label will be written as an immediate in binary, as in the case of br
. The unwind mismatch example above can be solved using this instruction:
try $label2
try
call @foo (unwinds to catch1)
try
call @bar (should unwind to catch2)
catch_br $label2
catch1
...
end_try
catch2
handler body
end_try
Now we introduce an internal try
-catch_br
, so when bar
throws, we can transfer the control flow to catch2
, bypassing catch1
. We can follow these labels in the first phase search too. Because these labels (= immediates) are statically determined, we don’t need to run or refer to any catch bodies to search the stack.
Splitting of catch
and unwind
The second, ‘cleanup’ phase involves running destructor code and unwinding the stack. But after running destructor code, we need a way to resume the second phase unwind until we arrive at the destination catch found by the search phase. rethrow
will not solve this problem, because rethrow
is the same as throw
except it retains auxiliary information, such as stack traces, so it will trigger a full two-phase search from scratch. One way to fix this is splitting catch
and unwind
block, so that we add try
-unwind
, and assume at the end of unwind
block the second phase search is automatically resumed:
try
…
unwind
destructor code
end
This was one of changes suggested in #118. Another alternative to this is to add another instruction resume
. This is different from rethrow
; this does not initiate a full two-phase search. This merely resumes the second phase unwinding. The toolchain will be responsible for generating resume
after destructor code.
Concluding remarks
Apparently this is a lot to put in a single issue post, but I’d like to start discussions from here. There are many things we need to discuss:
- Should we add two-phase stack unwinding to the spec?
- If we add two-phase unwinding, what will the end goal look like? (This post is mostly dedicated to this part)
- If we add two-phase unwinding, should we do that in the current EH proposal, or the follow-on proposal?
- If we decide to split the EH proposals into current and follow-on, what is the splitting point?
- If we decide to split the EH proposals into current and follow-on, how should two modules compiled with first and follow-on proposals work together?
Apparently we can’t discuss all these at once; I think we should start from 1 and 2.
Also I’d like to hear from VM people as well, because two-phase unwinding, especially running filter functions in a separate stack, will likely to be not a simple matter from the VM side.