Two-phase unwinding and its implications

## Foreword
Recently, the need for two-phase stack unwinding has been suggested. Two-phase unwinding is a useful feature for debugging, because the stack can be preserved when an exception is not caught and the program crashes. Also, it is necessary to implement the full semantics of languages such as C# (due to its `when` clause). While each language may implement it in their own library if necessary, supporting it in the spec level can be more convenient or faster. We didn’t think it was an essential feature to support in the spec before, and I think whether we should support this in the spec level or not is also something we should discuss.

In this issue, I’ll address the implication to the current proposal if we decide to support it. This issue is not only about adding syntax for two-phase unwinding itself; it is more about the extensibility of the current proposal in case we support the two-phase later.


## Two-phase unwinding
Two-phase unwinding consists of two phases. The first phase is the ‘search’ phase; it walks up the stack without unwinding it and checks each `catch` instruction (= landing pad) to see if it catches the current exception. If it does not catch the current exception, we continue the search. If the search phase ends and none of the catches catches the current exception, we end the unwinding there without running the second phase, leaving the call stack intact. If we found a `catch` that catches the exception, we remember the result, and begin the second, ‘cleanup’ phase. In this phase, we actually unwind the stack, while running all cleanup (= destructor) code, until we reach the `catch` that catches the exception. After we arrive at the destination `catch`, we stop the second phase unwinding there, and transfer the control flow to that `catch`.

Two-phase unwinding will require `catch` instruction to have a way of _filtering_ an exception. That can be an index to a wasm function, or a code block attached to a `catch` instruction. And the VM should be able to run these functions or code blocks without disturbing the call stack during the first ‘search’ phase of two phase unwinding.

But other than adding a filter function to `catch`, we may need more changes to the current proposal. The most important one of them is `exnref`, which I will elaborate on next.


## Background on first-class `exnref`
We introduced first-class `exnref` type in 2018, mainly to support more expressive code transformation in the toolchain.

The specific problem that sparked this discussion was that, after the “CFG stackification” phase in the toolchain, in which we place marker instructions like `block`, `try`, `loop`, and `end`, it is possible that there can be mismatches in calls’ or throws’ unwind destinations. This problem was first discussed in #29 (with different solutions then). To re-summarize the problem here, suppose we have the following CFG:
```
bb0:
  call @foo (if it throws, unwind to bb2)
bb1:
  call @bar (if it throws, unwind to bb3)
bb2 (ehpad):
  catch
  ...
bb3 (ehpad):
  catch
  handler body
```

And the CFG is sorted in this order. Then after placing `try` markers, the wasm code will look like:
```
try $label1
  try
    call @foo
    call @bar   (if it throws, it should unwind to catch2, but is caught by catch1)
  catch1         <- ehpad (bb2)
    ...
  end_try
catch2           <- ehpad (bb3)
  handler body
end_try
```

The first-class `exnref` allows us to fix this kind of mismatches because with that now `exnref`s can escape their original `catch` blocks they are originally caught and we can freely branch to the right handler to use them.


## Incompatibility of first-class `exnref` and two-phase unwinding
Suppose we extend our current proposal to support two-phase stack unwinding. In the first phase of the unwinding, we walk up the stack to examine each `catch`’s filter function without actually unwinding the stack, meaning we shouldn’t run any `catch` bodies. In order to do that, we can use the internal EH pad stack maintained in the VM. But the problem when we have `exnref`s that can escape `catch` bodies is, we don’t have a way to find out the next `catch` in this first search phase.
```
try
  try
    call @foo
  catch1
  end
catch2
end
```
In this example, when `foo` throws, the first phase should check `catch1` first, and if it does not catch the current exception, it should check `catch2` next. But with `exnref`, the code can look like this:
```
try
  try
    call @foo
  catch1
    local.set 0
    br 1
  end
catch2
end
```
In this case, semantically the program shouldn’t run `catch2` body anymore, because we branch out of both `try` blocks. But the first phase, which does not run any `catch` bodies and only check filter functions, will check `catch2` after checking `catch1`. This was not a problem when we only have a single phase, because in that case we run the actual program and unwind the stack as we go. But with two-phase unwinding, we need a way to examine a sequence of catches in the first phase without running the program.

Recently #118 and #122 claimed the current proposal is not extensible to support the two-phase unwinding feature. I, @dschuff, @tlively and @RossTate had a video meeting, and while I’m not sure Ross’s reasoning was the same as the things I described here, we agreed that escaping of first-class `exnref` can cause problems for future two-phase unwinding.


## Necessary changes to the current proposal
In short, we need to remove the first-class `exnref`, which pretty much means going back to the [first version](https://github.com/WebAssembly/exception-handling/blob/master/proposals/old/Exceptions.md) of the exception handling proposal. `catch` instruction now can be back to `catch $tag` and `catch_all` (but this may not be a must). We may not need `br_on_exn` anymore if we restore tagged catches, but having it can also increase code generation flexibility, in which case, it will assume the current exception as an implicit argument. But if we remove first-class `exnref`, we need a way to solve the unwind destination mismatch problem I described above.

### Adding `catch_br` instruction
One possible addition to the current spec I and @dschuff thought about is `catch_br` instruction. So now we have two kinds of try-catches: one is the same as the current one:
```
try
  …
catch
  …
end
```

On top of that, we now have one additional syntax:
```
try
  …
catch_br $label
```
`catch_br` does not have a body, so it does not need an `end` at the end. If any instruction between `try` and `catch_br` throws, we transfer control flow to an outer `catch` that is specified by the label. The label will be written as an immediate in binary, as in the case of `br`. The unwind mismatch example above can be solved using this instruction:
```
try $label2
  try
    call @foo (unwinds to catch1)
    try
      call @bar    (should unwind to catch2)
    catch_br $label2
  catch1
    ...
  end_try
catch2
  handler body
end_try
```
Now we introduce an internal `try`-`catch_br`, so when `bar` throws, we can transfer the control flow to `catch2`, bypassing `catch1`. We can follow these labels in the first phase search too. Because these labels (= immediates) are statically determined, we don’t need to run or refer to any catch bodies to search the stack.

### Splitting of `catch` and `unwind`
The second, ‘cleanup’ phase involves running destructor code and unwinding the stack. But after running destructor code, we need a way to _resume_ the second phase unwind until we arrive at the destination catch found by the search phase. `rethrow` will not solve this problem, because `rethrow` is the same as `throw` except it retains auxiliary information, such as stack traces, so it will trigger a full two-phase search from scratch. One way to fix this is splitting `catch` and `unwind` block, so that we add `try`-`unwind`, and assume at the end of `unwind` block the second phase search is automatically resumed:
```
try
  …
unwind
  destructor code
end
```

This was one of changes suggested in #118. Another alternative to this is to add another instruction `resume`. This is different from `rethrow`; this does not initiate a full two-phase search. This merely resumes the second phase unwinding. The toolchain will be responsible for generating `resume` after destructor code.


## Concluding remarks
Apparently this is a lot to put in a single issue post, but I’d like to start discussions from here. There are many things we need to discuss:

1. Should we add two-phase stack unwinding to the spec?
2. If we add two-phase unwinding, what will the end goal look like? (This post is mostly dedicated to this part)
3. If we add two-phase unwinding, should we do that in the current EH proposal, or the follow-on proposal?
4. If we decide to split the EH proposals into current and follow-on, what is the splitting point?
5. If we decide to split the EH proposals into current and follow-on, how should two modules compiled with first and follow-on proposals work together?

Apparently we can’t discuss all these at once; I think we should start from 1 and 2.

Also I’d like to hear from VM people as well, because two-phase unwinding, especially running filter functions in a separate stack, will likely to be not a simple matter from the VM side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Two-phase unwinding and its implications #123

Foreword

Two-phase unwinding

Background on first-class `exnref`

Incompatibility of first-class `exnref` and two-phase unwinding

Necessary changes to the current proposal

Adding `catch_br` instruction

Splitting of `catch` and `unwind`

Concluding remarks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Two-phase unwinding and its implications #123

Description

Foreword

Two-phase unwinding

Background on first-class exnref

Incompatibility of first-class exnref and two-phase unwinding

Necessary changes to the current proposal

Adding catch_br instruction

Splitting of catch and unwind

Concluding remarks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Background on first-class `exnref`

Incompatibility of first-class `exnref` and two-phase unwinding

Adding `catch_br` instruction

Splitting of `catch` and `unwind`