Unwinding as a modifier to branching #124
Description
I think there's a way to view unwinding as a modifier to branching. But before I get into that, I first want to make sure I'm on the same page as everyone with respect to branching.
Branching
Contrary to what their names suggest, branching instructions like br
do not correspond at the assembly level to jumping instructions. Rather, that's only half of the story. The other half has to do with the stack.
Suppose an engine has some type that is managed using reference counting—call it refcounted
. Then consider the branch in the following code that calls some function $foo : [] -> [refcounted]
:
(block $target ;; [] -> []
(call $foo)
(br $target)
)
This branch does not compile to simply a jump to the assembly-code location corresponding to $target
. It also cleans up the portion of the stack that is not relevant to $target
. In this case, that involves decrementing the reference count of the refcounted
value returned by $foo
that is sitting on the stack. In a single-threaded engine, that might in turn result in freeing the corresponding memory space and consequently require recursively decrementing the counts of any values contained therein. Depending on how large this structure is and how much of it is no longer necessary, this might take a while. So a simple instruction like br
can actually execute a fair amount behind the scenes depending on how the engine manages resources and the stack.
All this is to illustrate that a label like $target
more accurately corresponds to an assembly-code location and a location within the stack, and likewise an instruction like br $target
more accurately corresponds to "clean up the stack up to the stack location corresponding to $target
and then jump to the assembly-code location corresponding to $target
".
Note, though, that "clean up" here only corresponds to the engine's resources on the stack. But what about the application's resources? That's where unwinding comes into play.
Unwinding
For the sake of this discussion, I am going to say that unwinders are specified using try instr1* unwind instr2* end
, which executes instr*
but indicates that instr2* : [] -> []
should be used to "unwind the stack", i.e. to perform application-level clean up. In a second, I'll get to how one causes the stack to be "unwound" rather than just "cleaned up".
Now consider some surface-level code using the common finally
construct:
while (true) {
File file = open(...);
try {
try {
if (...)
break;
} catch (SomeException) {}
} finally {
close(file);
}
}
Normally a break
in a while
loop would translate to a br
in WebAssembly, but the finally
clause in this snippet requires that its body be executed no matter how control leaves the try
body. We could consider inlining the body of the finally
at the break
, but that results in code duplication, plus it would result in incorrectly catching SomeException
if one gets thrown by close(file)
.nor does it work as well in other examples where there are other try
/catch
clauses surrounding the break
).
Really what we want to do is to extend the semantics of "clean up the stack" that is already part of branching to incorporate "and execute unwinders". That is, we want to modify the branch instruction so that it also unwinds the stack.
One way we could enable this is to introduce an unwinding
instruction that must precede a branching instruction, and its semantics is to modify that branching instruction to execute the unwinders in unwind
clauses as it cleans up the stack. With this, the break
instruction in the example above would translate to the instruction sequence unwinding (br $loop_exit)
.
Exception Handling
So far I haven't talked about exception handling, just unwinding. This illustrates that unwinding the stack, like cleaning up the stack, is its own concept. And although unwinding is an integral part of exception handling, bundling it with exception handling as the current proposal does is a misunderstanding of the concept.
But if unwinding is a separable component of exception handling, what is the other component? The answer to that depends on whether you're talking about single-phase exception handling or two-phase exception handling.
Single-Phase
Again, for the sake of this discussion, I am going to say that single-phase exception handling is done using try instr* catch $event $label
, which indicates that any $event
exceptions thrown from instr*
should be caught and handled by $label
(where the types of $event
and $label
match).
Now, consider the following WebAssembly program:
(try
(try
(throw $event)
unwind
...
end)
catch $event $label)
We can reduce this program to the following:
(try
(try
unwinding (br $label)
unwind
...
end)
catch $event $label)
When we can see the contents of the stack, we can replace a throw $event
with a unwinding (br $label)
to whatever label the event is currently bound to in the stack. That is, events are dynamically scoped variables that get bound to labels, and throw
means "branch-and-unwind to whatever label the event is bound to in the current stack". (Of course, an important optimization is to unwind the stack as you search for these dynamically-scoped binding.)
This suggests that we can break throw
up into two parts: unwinding
and br_stack $event
. The latter is an instruction that just transfers control to and does necessary cleanup up to some label determined by the current stack. This instruction on its own could even have utility, say for more severe exceptions that want to bypass unwinders or guarantee transfer.
Two-Phase
In two-phase exception handling, you use some form of stack inspection to determine the target label before you execute the unwinding
branch to that label.
For the sake of this discussion, I'll say that an inspection is begun by using the instruction call_stack $call_tag
, which looks up the stack for contexts of the form answer $call_tag instr1* within instr2* end
(where execution is currently within instr2*
, in which case the instructions instr1*
are executed as the body of a dynamically-scoped function (see WebAssembly/design#1356 for more info).
As an example of unwinding in two-phase exception handling, consider the following C# code:
bool flag = …;
try {
…
} catch (Exception) when (flag = !flag) {
println(“caught first”);
} catch (Exception) {
println(“caught second”);
}
This would be compiled to the following WebAssembly code (assuming C# throw
compiles to call_stack $csharp_throw
):
(call_tag $csharp_throw : [csharp_ref] -> [])
(block $outer
(block $first
(block $second
(answer $csharp_throw ;; [csharp_ref] -> []
(local.set $flag (i32.xor 1 (local.get $flag)))
unwinding (br_if $first (local.get $flag))
unwinding (br $second)
within
…
(br $outer)
)
) ;; $second : [csharp_ref]
(call $println “caught second”)
(br $outer)
) ;; $first : [csharp_ref]
(call $println “caught first”)
(br $outer)
) ;; $outer : []
Notice that the answer csharp_throw
has a bunch of unwinding
branches. Which of these gets executed depends on the state of the flag
variable at the time the exception reaches the try
in the C# source code. (Note that there is no try
nor events in the compiled WebAssembly.) Depending on that flag
, we'll either having an unwinding
branch to $first
or an unwinding
branch to $second
. In either case, the semantics is "clean up and unwind the stack up to the stack location corresponding to the chosen label and then jump to the assembly-code location corresponding to the chosen label". The difference between here and the original examples using finally
is that the portion of the stack that needs to be cleaned up and unwound is not known statically. That is important for implementation (e.g. because it requires stack walking), but semantically speaking it is straightforward and aligns well with the existing abstractions in WebAssembly.
Summary
Regardless of whether we want to actually make an unwinding
instruction, the important thing to note here is that unwinding is always done with a destination. How that destination is determined varies, but in most of the examples above the destination is known before unwinding begins.
The current proposal is about single-phase exception handling. But as I've tried to illustrate here, single-phase exception handling is really two concepts combined: dynamically-scoped branching and unwinding. So for the proposal to be extensible, it is important that its design for unwinding is compatible with other notions of destination, even if this proposal on its own solely enables dynamically-scoped destination labels (i.e. events). Ideas like those in #123 would help achieve this goal.