Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

Two-phase unwinding and its implications #123

Closed
@aheejin

Description

@aheejin

Foreword

Recently, the need for two-phase stack unwinding has been suggested. Two-phase unwinding is a useful feature for debugging, because the stack can be preserved when an exception is not caught and the program crashes. Also, it is necessary to implement the full semantics of languages such as C# (due to its when clause). While each language may implement it in their own library if necessary, supporting it in the spec level can be more convenient or faster. We didn’t think it was an essential feature to support in the spec before, and I think whether we should support this in the spec level or not is also something we should discuss.

In this issue, I’ll address the implication to the current proposal if we decide to support it. This issue is not only about adding syntax for two-phase unwinding itself; it is more about the extensibility of the current proposal in case we support the two-phase later.

Two-phase unwinding

Two-phase unwinding consists of two phases. The first phase is the ‘search’ phase; it walks up the stack without unwinding it and checks each catch instruction (= landing pad) to see if it catches the current exception. If it does not catch the current exception, we continue the search. If the search phase ends and none of the catches catches the current exception, we end the unwinding there without running the second phase, leaving the call stack intact. If we found a catch that catches the exception, we remember the result, and begin the second, ‘cleanup’ phase. In this phase, we actually unwind the stack, while running all cleanup (= destructor) code, until we reach the catch that catches the exception. After we arrive at the destination catch, we stop the second phase unwinding there, and transfer the control flow to that catch.

Two-phase unwinding will require catch instruction to have a way of filtering an exception. That can be an index to a wasm function, or a code block attached to a catch instruction. And the VM should be able to run these functions or code blocks without disturbing the call stack during the first ‘search’ phase of two phase unwinding.

But other than adding a filter function to catch, we may need more changes to the current proposal. The most important one of them is exnref, which I will elaborate on next.

Background on first-class exnref

We introduced first-class exnref type in 2018, mainly to support more expressive code transformation in the toolchain.

The specific problem that sparked this discussion was that, after the “CFG stackification” phase in the toolchain, in which we place marker instructions like block, try, loop, and end, it is possible that there can be mismatches in calls’ or throws’ unwind destinations. This problem was first discussed in #29 (with different solutions then). To re-summarize the problem here, suppose we have the following CFG:

bb0:
  call @foo (if it throws, unwind to bb2)
bb1:
  call @bar (if it throws, unwind to bb3)
bb2 (ehpad):
  catch
  ...
bb3 (ehpad):
  catch
  handler body

And the CFG is sorted in this order. Then after placing try markers, the wasm code will look like:

try $label1
  try
    call @foo
    call @bar   (if it throws, it should unwind to catch2, but is caught by catch1)
  catch1         <- ehpad (bb2)
    ...
  end_try
catch2           <- ehpad (bb3)
  handler body
end_try

The first-class exnref allows us to fix this kind of mismatches because with that now exnrefs can escape their original catch blocks they are originally caught and we can freely branch to the right handler to use them.

Incompatibility of first-class exnref and two-phase unwinding

Suppose we extend our current proposal to support two-phase stack unwinding. In the first phase of the unwinding, we walk up the stack to examine each catch’s filter function without actually unwinding the stack, meaning we shouldn’t run any catch bodies. In order to do that, we can use the internal EH pad stack maintained in the VM. But the problem when we have exnrefs that can escape catch bodies is, we don’t have a way to find out the next catch in this first search phase.

try
  try
    call @foo
  catch1
  end
catch2
end

In this example, when foo throws, the first phase should check catch1 first, and if it does not catch the current exception, it should check catch2 next. But with exnref, the code can look like this:

try
  try
    call @foo
  catch1
    local.set 0
    br 1
  end
catch2
end

In this case, semantically the program shouldn’t run catch2 body anymore, because we branch out of both try blocks. But the first phase, which does not run any catch bodies and only check filter functions, will check catch2 after checking catch1. This was not a problem when we only have a single phase, because in that case we run the actual program and unwind the stack as we go. But with two-phase unwinding, we need a way to examine a sequence of catches in the first phase without running the program.

Recently #118 and #122 claimed the current proposal is not extensible to support the two-phase unwinding feature. I, @dschuff, @tlively and @RossTate had a video meeting, and while I’m not sure Ross’s reasoning was the same as the things I described here, we agreed that escaping of first-class exnref can cause problems for future two-phase unwinding.

Necessary changes to the current proposal

In short, we need to remove the first-class exnref, which pretty much means going back to the first version of the exception handling proposal. catch instruction now can be back to catch $tag and catch_all (but this may not be a must). We may not need br_on_exn anymore if we restore tagged catches, but having it can also increase code generation flexibility, in which case, it will assume the current exception as an implicit argument. But if we remove first-class exnref, we need a way to solve the unwind destination mismatch problem I described above.

Adding catch_br instruction

One possible addition to the current spec I and @dschuff thought about is catch_br instruction. So now we have two kinds of try-catches: one is the same as the current one:

try
  …
catch
  …
end

On top of that, we now have one additional syntax:

try
  …
catch_br $label

catch_br does not have a body, so it does not need an end at the end. If any instruction between try and catch_br throws, we transfer control flow to an outer catch that is specified by the label. The label will be written as an immediate in binary, as in the case of br. The unwind mismatch example above can be solved using this instruction:

try $label2
  try
    call @foo (unwinds to catch1)
    try
      call @bar    (should unwind to catch2)
    catch_br $label2
  catch1
    ...
  end_try
catch2
  handler body
end_try

Now we introduce an internal try-catch_br, so when bar throws, we can transfer the control flow to catch2, bypassing catch1. We can follow these labels in the first phase search too. Because these labels (= immediates) are statically determined, we don’t need to run or refer to any catch bodies to search the stack.

Splitting of catch and unwind

The second, ‘cleanup’ phase involves running destructor code and unwinding the stack. But after running destructor code, we need a way to resume the second phase unwind until we arrive at the destination catch found by the search phase. rethrow will not solve this problem, because rethrow is the same as throw except it retains auxiliary information, such as stack traces, so it will trigger a full two-phase search from scratch. One way to fix this is splitting catch and unwind block, so that we add try-unwind, and assume at the end of unwind block the second phase search is automatically resumed:

try
  …
unwind
  destructor code
end

This was one of changes suggested in #118. Another alternative to this is to add another instruction resume. This is different from rethrow; this does not initiate a full two-phase search. This merely resumes the second phase unwinding. The toolchain will be responsible for generating resume after destructor code.

Concluding remarks

Apparently this is a lot to put in a single issue post, but I’d like to start discussions from here. There are many things we need to discuss:

  1. Should we add two-phase stack unwinding to the spec?
  2. If we add two-phase unwinding, what will the end goal look like? (This post is mostly dedicated to this part)
  3. If we add two-phase unwinding, should we do that in the current EH proposal, or the follow-on proposal?
  4. If we decide to split the EH proposals into current and follow-on, what is the splitting point?
  5. If we decide to split the EH proposals into current and follow-on, how should two modules compiled with first and follow-on proposals work together?

Apparently we can’t discuss all these at once; I think we should start from 1 and 2.

Also I’d like to hear from VM people as well, because two-phase unwinding, especially running filter functions in a separate stack, will likely to be not a simple matter from the VM side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions