cranelift: stack-switching support #11003

posborne · 2025-06-10T17:47:43Z

These changes pull in the cranelift changes from #10177 with some additional stacked changes to resolve conflicts, align with previous changes in the stack-switching series, and address feedback items which were raised on previous iterations of the PR (but mostly not changing anything of significant substance). Tracking Issue: #10248.

The stack-switching feature flag is retained and used minimally in this changeset in order to avoid compilation problems, but not really used beyond that. There is at least one item in the tracking issue already related to likely finding a way to drop these compilation flags in most places but I think it is worth deferring that here as it will required touching code more broadly.

CC @frank-emrich @dhil

This initial commit represents the "pr2" base commit with minimal merge conflicts resolved. Due to OOB conflicts, this commit is not functional as-is, but using it as a base in order to allow for easier reviewing of the delta from this commit to what will be used for the PR against upstream. Co-authored-by: Daniel Hillerström <[email protected]> Co-authored-by: Paul Osborne <[email protected]>

This first set of changes updates the base pr in order to compiled and pass basic checks (compile, clippy, fmt) with the biggest part of the change being to eliminate injection of tracing/assertions in JIT'ed code.

…c_environ members

At this point, the only bit we really branch on is what we do in order to avoid problems tying into wasmtime_environ. This is basd on the approach and macro used by the gc code for converting presence/absence of the cranelift feature flag to cranelift compile time. This is a bit of a half-measure for now as we still compile most stack-switching code in cranelift, but this does enough to avoid causing problems with missing definitions in wasmtime_environ.

Replace either with infallible From or fallible, panicing TryFrom alternatives where required.

After removing emission of runtime trace logging and assertions, there were several unused parameters. Remove those from the ControlEffect signatures completely.

This matches a change to the mirrored runtime type in the upstream changes.

crates/cranelift/src/stack_switching/instructions.rs

fitzgen

Thanks! Lots of comments below but for the most part these should be pretty straightforward to fix.

cranelift/codegen/src/ir/function.rs

crates/cranelift/src/stack_switching/mod.rs

crates/cranelift/src/func_environ.rs

crates/cranelift/src/translate/code_translator.rs

crates/cranelift/src/stack_switching/fatpointer.rs

crates/cranelift/src/stack_switching/instructions.rs

crates/cranelift/src/stack_switching/mod.rs

Co-authored-by: Daniel Hillerström <[email protected]>

The extra parameters here used to be used for emitting runtime assertions, but with those gone we just had unused params and lifetimes, clean those out.

There's already a stub elsewhere and this is not called, when exceptions are added and it is time to revisit, this method can be restored.

Rename VMHostArray -> VMHostArrayRef Change impl to compute address with offset upfront rather than on each load.

posborne · 2025-06-16T16:29:24Z

Pushed most of the updates but still need to make the suggested updates around the fat pointer stuff and resolve conflicts with upstream.

This matches the directory structure for gc and aids in visibility for a few members required by stack-switching code in cranelift.

…nelift As part of this, updated translate_ref_is_null to use the wasm type rather than brancing on the ir type being an i128.

posborne · 2025-06-23T16:15:48Z

@fitzgen I believe this is ready for review again, let me know if there's pieces I missed in my previous updates. As noted, I chose to defer fatpointer changes for now to be potentially addressed as part of (or following) changes to the translation stack.

fitzgen

LGTM modulo a few final comments below, we should be able to merge this and move on to follow ups after they are addressed.

Thanks!

fitzgen · 2025-06-24T15:56:52Z

crates/cranelift/src/func_environ.rs

@@ -138,7 +139,7 @@ pub struct FuncEnvironment<'module_environment> {
    pcc_vmctx_memtype: Option<ir::MemoryType>,

    /// Caches of signatures for builtin functions.
-    builtin_functions: BuiltinFunctions,
+    pub(crate) builtin_functions: BuiltinFunctions,


This shouldn't need to be pub(crate) after the stack_switching module moved to func_environ::stack_switching, right?

fitzgen · 2025-06-24T15:56:57Z

crates/cranelift/src/func_environ.rs

+
+    /// Used by the stack switching feature. If set, we have a allocated a
+    /// slot on this function's stack to be used for the
+    /// current stack's `handler_list` field.
+    pub(crate) stack_switching_handler_list_buffer: Option<ir::StackSlot>,
+
+    /// Used by the stack switching feature. If set, we have a allocated a
+    /// slot on this function's stack to be used for the
+    /// current continuation's `values` field.
+    pub(crate) stack_switching_values_buffer: Option<ir::StackSlot>,


crates/cranelift/src/func_environ.rs

fitzgen · 2025-06-24T16:27:01Z

crates/cranelift/src/func_environ/stack_switching/fatpointer.rs

+    let (lsbs, msbs) = pos.ins().isplit(contobj);
+
+    let (revision_counter, contref) = match env.isa().endianness() {
+        ir::Endianness::Little => (lsbs, msbs),
+        ir::Endianness::Big => {
+            let pad_bits = 64 - env.pointer_type().bits();
+            let contref = pos.ins().ushr_imm(lsbs, pad_bits as i64);
+            (msbs, contref)
+        }
+    };
+    let contref = if env.pointer_type().bits() < I64.bits() {
+        pos.ins().ireduce(env.pointer_type(), contref)
+    } else {
+        contref
+    };
+    (revision_counter, contref)


Can this just be something like

Suggested change

let (lsbs, msbs) = pos.ins().isplit(contobj);

let (revision_counter, contref) = match env.isa().endianness() {

ir::Endianness::Little => (lsbs, msbs),

ir::Endianness::Big => {

let pad_bits = 64 - env.pointer_type().bits();

let contref = pos.ins().ushr_imm(lsbs, pad_bits as i64);

(msbs, contref)

}

};

let contref = if env.pointer_type().bits() < I64.bits() {

pos.ins().ireduce(env.pointer_type(), contref)

} else {

contref

};

(revision_counter, contref)

let ptr_ty = env.pointer_type();

assert!(ptr_ty.bits() <= 64);

let contref = pos.ins().ireduce(ptr_ty, contobj);

let shifted = pos.ins().ushr_imm(contobj, 64);

let revision_counter = pos.ins().ireduce(it::types::I64, shifted);

(revision_counter, contref)

?

That is, we define the fat pointer as 128 bits where the upper 64 are the revision counter and the bottom sizeof(pointer) bits are the pointer to the continuation. This way I don't think we ever have to branch on endianness or pointer width.

Yeah, this makes sense to me, will adopt this approach (with a quick round of testing) until we can split into individual ir values.

fitzgen · 2025-06-24T16:36:40Z

crates/cranelift/src/func_environ/stack_switching/fatpointer.rs

+    let contref_addr = if env.pointer_type().bits() < I64.bits() {
+        pos.ins().uextend(I64, contref_addr)
+    } else {
+        contref_addr
+    };
+    let (msbs, lsbs) = match env.isa().endianness() {
+        ir::Endianness::Little => (contref_addr, revision_counter),
+        ir::Endianness::Big => {
+            let pad_bits = 64 - env.pointer_type().bits();
+            let lsbs = pos.ins().ishl_imm(contref_addr, pad_bits as i64);
+            (revision_counter, lsbs)
+        }
+    };
+
+    let lsbs = pos.ins().uextend(ir::types::I128, lsbs);
+    let msbs = pos.ins().uextend(ir::types::I128, msbs);
+    let msbs = pos.ins().ishl_imm(msbs, 64);
+    pos.ins().bor(lsbs, msbs)


And then construction would become something like this:

Suggested change

let contref_addr = if env.pointer_type().bits() < I64.bits() {

pos.ins().uextend(I64, contref_addr)

} else {

contref_addr

};

let (msbs, lsbs) = match env.isa().endianness() {

ir::Endianness::Little => (contref_addr, revision_counter),

ir::Endianness::Big => {

let pad_bits = 64 - env.pointer_type().bits();

let lsbs = pos.ins().ishl_imm(contref_addr, pad_bits as i64);

(revision_counter, lsbs)

}

};

let lsbs = pos.ins().uextend(ir::types::I128, lsbs);

let msbs = pos.ins().uextend(ir::types::I128, msbs);

let msbs = pos.ins().ishl_imm(msbs, 64);

pos.ins().bor(lsbs, msbs)

assert!(env.pointer_type().bits() <= 64);

let contref_addr = pos.ins().uextend(ir::types::I28, contref_addr);

let revision_counter = pos.ins().uextend(ir::types::I128, revision_counter);

let shifted_counter = pos.ins().ishl_imm(revision_counter, 64);

pos.ins().bor(shifted_counter, contref_addr)

Note: if you want to switch the order of the contref pointer and the counter in the fat pointer, that's fine (I think I unwittingly switched it from what is in the code now), but the important point is that we should be able do all of this fat pointer construction and destruction without branching on endianness and pointer width.

…nelift

posborne · 2025-07-01T17:03:28Z

@fitzgen I am hunting down one regression with the latest, probably around the changes to table ops based on wasm types (though I haven't found a smoking gun yet). These tests passed prior to some of the latest changes on this branch and the test is doing a table.grow of continuations followed by a resume of the first added to the table which traps on a null reference.

In the course of the various runtime updates, the layout of the runtime VMContObj got switched around. This resulted in failures when doing certain table operations on continuations. This change fixes that layout problem and adds some tests with offsets to avoid the problem. Due to the way that we interact with the VMContObj in cranelift, we don't use these offsets outside of the tests.

…nelift

posborne · 2025-07-14T17:53:15Z

Pulled in and resolved conflicts (some pretty mild ones from #11216. @fitzgen Review would be appreciated; I think in our last convos I had played around with a different fatpointer impl but I'm fine with this as-is (possibly to revisit later on).

cranelift/codegen/src/ir/function.rs

bjorn3 · 2025-07-14T18:08:53Z

cranelift/codegen/src/isa/x64/abi.rs

@@ -194,6 +200,7 @@ impl ABIMachineSpec for X64ABIMachineSpec {
            if param.value_type.bits() > 64
                && !(param.value_type.is_vector() || param.value_type.is_float())
                && !flags.enable_llvm_abi_extensions()
+                && !is_tail


Why is this necessary? Wasm doesn't have 128bit integers.

At least in this iteration of the changes, the stack switching code is using an i128 for it's vmcontobj fat pointer (consisting of vmcontref pointer and revision). @fitzgen and I have discussed a bit about how we can get rid of this and just have two ir values through the transformation, but it will require some additional changes to how we model the relationship between wasm and clif values to support having one of the former map to two distinct ir values.

This method isn't required as sized_stack_slots is already pub.

…nelift

fitzgen

LGTM! Couple final nitpicks before this lands, but once those are addressed, feel free to add it to the merge queue or ask me to do that (I forget if you have those permissions or not).

Thanks for sticking through with this one!

fitzgen · 2025-07-24T18:17:02Z

crates/cranelift/src/func_environ.rs

+    /// Used by the stack switching feature. If set, we have a allocated a
+    /// slot on this function's stack to be used for the
+    /// current stack's `handler_list` field.
+    pub(crate) stack_switching_handler_list_buffer: Option<ir::StackSlot>,


This field is only accessed in a submodule, so it does not need to be pub(crate).

fitzgen · 2025-07-24T18:18:45Z

crates/cranelift/src/func_environ.rs

+    /// Used by the stack switching feature. If set, we have a allocated a
+    /// slot on this function's stack to be used for the
+    /// current continuation's `values` field.
+    pub stack_switching_values_buffer: Option<ir::StackSlot>,


Similarly, this is only accessed in submodules as well, and not across crate boundaries at all, so it does not need to be pub.

fitzgen · 2025-07-24T18:22:53Z

crates/environ/src/vmoffsets.rs

+
+    /// Return the offset of `VMContObj::revision`
+    fn vmcontobj_revision(&self) -> u8 {
+        8


To work across non-64-bit ISAs, which we support today via, for example, the 32-bit Pulley interpreter, this needs to be

Suggested change

8

self.size()

fitzgen · 2025-07-24T18:25:40Z

crates/environ/src/vmoffsets.rs

+
+    /// Return the size of `VMHostArray`.
+    fn size_of_vmcontobj(&self) -> u8 {
+        16


And then this should be something like

Suggested change

16

u8::try_from(align(

u32::from(self.vmcontobj_revision())

+ u32::try_from(core::mem::size_of::<u64>()).unwrap(),

u32::from(self.size()),

))

.unwrap()

fitzgen · 2025-07-24T18:28:03Z

crates/wasmtime/src/runtime/vm/stack_switching.rs

-// FIXME(frank-emrich) Does this actually need to be 16-byte aligned any
-// more? Now that we use I128 on the Cranelift side (see
-// [wasmtime_cranelift::stack_switching::fatpointer::pointer_type]), it
-// should be fine to use the natural alignment of the type.
 #[repr(C, align(16))]


I don't think this align(16) needs to be here anymore, as described in the deleted FIXME comment? But if we are leaving it, then we should have a follow up item to investigate in the meta issue and also should adjust the size_of_vmcontobj method above to align to 16 instead of the natural alignment that my suggestion comment sketched out.

I wasn't sure if we wanted to pad out this struct to always be packed as 128 bits or as two words. I think two words makes more sense and aligns with what you are saying but means that the cranelift generated code will definitely need to be updated to be able to support 32-bit target architectures (probably along with changes discussed before to change how we do the fatpointer handling).

I'll land changes after thinking this a bit more and update the meta-issue with any notes on 32-bit targets.

frank-emrich and others added 7 commits June 10, 2025 16:17

cranelift: stack-switching updates pass 1

0853a16

This first set of changes updates the base pr in order to compiled and pass basic checks (compile, clippy, fmt) with the biggest part of the change being to eliminate injection of tracing/assertions in JIT'ed code.

cranelift: stack-switching: restore original visibility for a few fun…

858b22a

…c_environ members

cranelift: avoid "as" casts in stack-switching

100d621

Replace either with infallible From or fallible, panicing TryFrom alternatives where required.

cranelift: cleanup stack-switching control_effect signatures

00c2b56

After removing emission of runtime trace logging and assertions, there were several unused parameters. Remove those from the ControlEffect signatures completely.

cranelift: rename stack-switching VMArray to VMHostArray

e9fa92d

This matches a change to the mirrored runtime type in the upstream changes.

posborne requested review from a team as code owners June 10, 2025 17:47

posborne requested review from alexcrichton and removed request for a team June 10, 2025 17:47

posborne commented Jun 10, 2025

View reviewed changes

crates/cranelift/src/stack_switching/instructions.rs Outdated Show resolved Hide resolved

fitzgen requested review from fitzgen and removed request for alexcrichton June 10, 2025 18:13

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Jun 10, 2025

fitzgen reviewed Jun 11, 2025

View reviewed changes

dhil reviewed Jun 11, 2025

View reviewed changes

crates/cranelift/src/stack_switching/fatpointer.rs Outdated Show resolved Hide resolved

crates/cranelift/src/stack_switching/instructions.rs Outdated Show resolved Hide resolved

crates/cranelift/src/stack_switching/mod.rs Outdated Show resolved Hide resolved

posborne and others added 10 commits June 11, 2025 13:27

stack-switching: fix typo

866fbff

Co-authored-by: Daniel Hillerström <[email protected]>

stack-switching: used Index impl for get_stack_slot_data

115e503

stack-switching: use smallvec over vec in several cases

11c6b4a

stack-switching: avoid resumetable naming confusion

fa8001e

stack-switching: cleanup unused params from unchecked_get_continuation

51017f2

The extra parameters here used to be used for emitting runtime assertions, but with those gone we just had unused params and lifetimes, clean those out.

stack_switching: simplify store_data_entries assertion

44c2b34

stack-switching: simplify translate_table_{grow,fill} control flow

b8fb8b3

stack-switching: remove translate_resume_throw stub

a693350

There's already a stub elsewhere and this is not called, when exceptions are added and it is time to revisit, this method can be restored.

stack-switching: compute control_context_size based on target triple

eeab1b1

stack-switching: VMHostArrayRef updates

3bd138f

Rename VMHostArray -> VMHostArrayRef Change impl to compute address with offset upfront rather than on each load.

stack-switching: move cranelift code to live under func_environ

49313bc

This matches the directory structure for gc and aids in visibility for a few members required by stack-switching code in cranelift.

posborne added 2 commits June 18, 2025 19:41

Merge remote-tracking branch 'upstream/main' into stack-switching-cra…

5ccd3f7

…nelift As part of this, updated translate_ref_is_null to use the wasm type rather than brancing on the ir type being an i128.

stack-switching: formatting fix

c86b06a

fitzgen reviewed Jun 24, 2025

View reviewed changes

posborne added 4 commits June 30, 2025 17:04

stack-switching: reduce visibility on a few additional items

ab50f6c

stack-switching: simplify contobj fatptr con/de-struction

6797ea2

stack-switching: add disas tests to cover new instructions

a6a3ff8

Merge remote-tracking branch 'upstream/main' into stack-switching-cra…

71c1e6d

…nelift

github-actions bot added the wasmtime:api Related to the API of the `wasmtime` crate itself label Jul 2, 2025

Merge remote-tracking branch 'upstream/main' into stack-switching-cra…

3a26125

…nelift

Fix formatting of merge conflict resolution

8d06297

bjorn3 reviewed Jul 14, 2025

View reviewed changes

cranelift/codegen/src/ir/function.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Jul 14, 2025

View reviewed changes

posborne added 2 commits July 14, 2025 19:10

cranelift: remove ir::function::get_stack_slot_data

cae4878

This method isn't required as sized_stack_slots is already pub.

Merge remote-tracking branch 'upstream/main' into stack-switching-cra…

2bdfb25

…nelift

fitzgen approved these changes Jul 24, 2025

View reviewed changes

+        u8::try_from(align(
+            u32::from(self.vmcontobj_revision())
+                + u32::try_from(core::mem::size_of::<u64>()).unwrap(),
+            u32::from(self.size()),
+        ))
+        .unwrap()

cranelift: stack-switching support #11003

Are you sure you want to change the base?

cranelift: stack-switching support #11003

Conversation

posborne commented Jun 10, 2025

Uh oh!

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

posborne commented Jun 16, 2025

Uh oh!

posborne commented Jun 23, 2025

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

posborne commented Jul 1, 2025

Uh oh!

posborne commented Jul 14, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fitzgen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!