-
Notifications
You must be signed in to change notification settings - Fork 1.5k
cranelift: stack-switching support #11003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
cranelift: stack-switching support #11003
Conversation
This initial commit represents the "pr2" base commit with minimal merge conflicts resolved. Due to OOB conflicts, this commit is not functional as-is, but using it as a base in order to allow for easier reviewing of the delta from this commit to what will be used for the PR against upstream. Co-authored-by: Daniel Hillerström <[email protected]> Co-authored-by: Paul Osborne <[email protected]>
This first set of changes updates the base pr in order to compiled and pass basic checks (compile, clippy, fmt) with the biggest part of the change being to eliminate injection of tracing/assertions in JIT'ed code.
…c_environ members
At this point, the only bit we really branch on is what we do in order to avoid problems tying into wasmtime_environ. This is basd on the approach and macro used by the gc code for converting presence/absence of the cranelift feature flag to cranelift compile time. This is a bit of a half-measure for now as we still compile most stack-switching code in cranelift, but this does enough to avoid causing problems with missing definitions in wasmtime_environ.
Replace either with infallible From or fallible, panicing TryFrom alternatives where required.
After removing emission of runtime trace logging and assertions, there were several unused parameters. Remove those from the ControlEffect signatures completely.
This matches a change to the mirrored runtime type in the upstream changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Lots of comments below but for the most part these should be pretty straightforward to fix.
Co-authored-by: Daniel Hillerström <[email protected]>
The extra parameters here used to be used for emitting runtime assertions, but with those gone we just had unused params and lifetimes, clean those out.
There's already a stub elsewhere and this is not called, when exceptions are added and it is time to revisit, this method can be restored.
Rename VMHostArray -> VMHostArrayRef Change impl to compute address with offset upfront rather than on each load.
Pushed most of the updates but still need to make the suggested updates around the fat pointer stuff and resolve conflicts with upstream. |
This matches the directory structure for gc and aids in visibility for a few members required by stack-switching code in cranelift.
…nelift As part of this, updated translate_ref_is_null to use the wasm type rather than brancing on the ir type being an i128.
@fitzgen I believe this is ready for review again, let me know if there's pieces I missed in my previous updates. As noted, I chose to defer fatpointer changes for now to be potentially addressed as part of (or following) changes to the translation stack. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo a few final comments below, we should be able to merge this and move on to follow ups after they are addressed.
Thanks!
crates/cranelift/src/func_environ.rs
Outdated
@@ -138,7 +139,7 @@ pub struct FuncEnvironment<'module_environment> { | |||
pcc_vmctx_memtype: Option<ir::MemoryType>, | |||
|
|||
/// Caches of signatures for builtin functions. | |||
builtin_functions: BuiltinFunctions, | |||
pub(crate) builtin_functions: BuiltinFunctions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't need to be pub(crate)
after the stack_switching
module moved to func_environ::stack_switching
, right?
crates/cranelift/src/func_environ.rs
Outdated
|
||
/// Used by the stack switching feature. If set, we have a allocated a | ||
/// slot on this function's stack to be used for the | ||
/// current stack's `handler_list` field. | ||
pub(crate) stack_switching_handler_list_buffer: Option<ir::StackSlot>, | ||
|
||
/// Used by the stack switching feature. If set, we have a allocated a | ||
/// slot on this function's stack to be used for the | ||
/// current continuation's `values` field. | ||
pub(crate) stack_switching_values_buffer: Option<ir::StackSlot>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
let (lsbs, msbs) = pos.ins().isplit(contobj); | ||
|
||
let (revision_counter, contref) = match env.isa().endianness() { | ||
ir::Endianness::Little => (lsbs, msbs), | ||
ir::Endianness::Big => { | ||
let pad_bits = 64 - env.pointer_type().bits(); | ||
let contref = pos.ins().ushr_imm(lsbs, pad_bits as i64); | ||
(msbs, contref) | ||
} | ||
}; | ||
let contref = if env.pointer_type().bits() < I64.bits() { | ||
pos.ins().ireduce(env.pointer_type(), contref) | ||
} else { | ||
contref | ||
}; | ||
(revision_counter, contref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be something like
let (lsbs, msbs) = pos.ins().isplit(contobj); | |
let (revision_counter, contref) = match env.isa().endianness() { | |
ir::Endianness::Little => (lsbs, msbs), | |
ir::Endianness::Big => { | |
let pad_bits = 64 - env.pointer_type().bits(); | |
let contref = pos.ins().ushr_imm(lsbs, pad_bits as i64); | |
(msbs, contref) | |
} | |
}; | |
let contref = if env.pointer_type().bits() < I64.bits() { | |
pos.ins().ireduce(env.pointer_type(), contref) | |
} else { | |
contref | |
}; | |
(revision_counter, contref) | |
let ptr_ty = env.pointer_type(); | |
assert!(ptr_ty.bits() <= 64); | |
let contref = pos.ins().ireduce(ptr_ty, contobj); | |
let shifted = pos.ins().ushr_imm(contobj, 64); | |
let revision_counter = pos.ins().ireduce(it::types::I64, shifted); | |
(revision_counter, contref) |
?
That is, we define the fat pointer as 128 bits where the upper 64 are the revision counter and the bottom sizeof(pointer)
bits are the pointer to the continuation. This way I don't think we ever have to branch on endianness or pointer width.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this makes sense to me, will adopt this approach (with a quick round of testing) until we can split into individual ir values.
let contref_addr = if env.pointer_type().bits() < I64.bits() { | ||
pos.ins().uextend(I64, contref_addr) | ||
} else { | ||
contref_addr | ||
}; | ||
let (msbs, lsbs) = match env.isa().endianness() { | ||
ir::Endianness::Little => (contref_addr, revision_counter), | ||
ir::Endianness::Big => { | ||
let pad_bits = 64 - env.pointer_type().bits(); | ||
let lsbs = pos.ins().ishl_imm(contref_addr, pad_bits as i64); | ||
(revision_counter, lsbs) | ||
} | ||
}; | ||
|
||
let lsbs = pos.ins().uextend(ir::types::I128, lsbs); | ||
let msbs = pos.ins().uextend(ir::types::I128, msbs); | ||
let msbs = pos.ins().ishl_imm(msbs, 64); | ||
pos.ins().bor(lsbs, msbs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then construction would become something like this:
let contref_addr = if env.pointer_type().bits() < I64.bits() { | |
pos.ins().uextend(I64, contref_addr) | |
} else { | |
contref_addr | |
}; | |
let (msbs, lsbs) = match env.isa().endianness() { | |
ir::Endianness::Little => (contref_addr, revision_counter), | |
ir::Endianness::Big => { | |
let pad_bits = 64 - env.pointer_type().bits(); | |
let lsbs = pos.ins().ishl_imm(contref_addr, pad_bits as i64); | |
(revision_counter, lsbs) | |
} | |
}; | |
let lsbs = pos.ins().uextend(ir::types::I128, lsbs); | |
let msbs = pos.ins().uextend(ir::types::I128, msbs); | |
let msbs = pos.ins().ishl_imm(msbs, 64); | |
pos.ins().bor(lsbs, msbs) | |
assert!(env.pointer_type().bits() <= 64); | |
let contref_addr = pos.ins().uextend(ir::types::I28, contref_addr); | |
let revision_counter = pos.ins().uextend(ir::types::I128, revision_counter); | |
let shifted_counter = pos.ins().ishl_imm(revision_counter, 64); | |
pos.ins().bor(shifted_counter, contref_addr) |
Note: if you want to switch the order of the contref pointer and the counter in the fat pointer, that's fine (I think I unwittingly switched it from what is in the code now), but the important point is that we should be able do all of this fat pointer construction and destruction without branching on endianness and pointer width.
@fitzgen I am hunting down one regression with the latest, probably around the changes to table ops based on wasm types (though I haven't found a smoking gun yet). These tests passed prior to some of the latest changes on this branch and the test is doing a table.grow of continuations followed by a resume of the first added to the table which traps on a null reference. |
In the course of the various runtime updates, the layout of the runtime VMContObj got switched around. This resulted in failures when doing certain table operations on continuations. This change fixes that layout problem and adds some tests with offsets to avoid the problem. Due to the way that we interact with the VMContObj in cranelift, we don't use these offsets outside of the tests.
@@ -194,6 +200,7 @@ impl ABIMachineSpec for X64ABIMachineSpec { | |||
if param.value_type.bits() > 64 | |||
&& !(param.value_type.is_vector() || param.value_type.is_float()) | |||
&& !flags.enable_llvm_abi_extensions() | |||
&& !is_tail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? Wasm doesn't have 128bit integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in this iteration of the changes, the stack switching code is using an i128 for it's vmcontobj fat pointer (consisting of vmcontref pointer and revision). @fitzgen and I have discussed a bit about how we can get rid of this and just have two ir values through the transformation, but it will require some additional changes to how we model the relationship between wasm and clif values to support having one of the former map to two distinct ir values.
This method isn't required as sized_stack_slots is already pub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Couple final nitpicks before this lands, but once those are addressed, feel free to add it to the merge queue or ask me to do that (I forget if you have those permissions or not).
Thanks for sticking through with this one!
/// Used by the stack switching feature. If set, we have a allocated a | ||
/// slot on this function's stack to be used for the | ||
/// current stack's `handler_list` field. | ||
pub(crate) stack_switching_handler_list_buffer: Option<ir::StackSlot>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field is only accessed in a submodule, so it does not need to be pub(crate)
.
/// Used by the stack switching feature. If set, we have a allocated a | ||
/// slot on this function's stack to be used for the | ||
/// current continuation's `values` field. | ||
pub stack_switching_values_buffer: Option<ir::StackSlot>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this is only accessed in submodules as well, and not across crate boundaries at all, so it does not need to be pub
.
|
||
/// Return the offset of `VMContObj::revision` | ||
fn vmcontobj_revision(&self) -> u8 { | ||
8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To work across non-64-bit ISAs, which we support today via, for example, the 32-bit Pulley interpreter, this needs to be
8 | |
self.size() |
|
||
/// Return the size of `VMHostArray`. | ||
fn size_of_vmcontobj(&self) -> u8 { | ||
16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then this should be something like
16 | |
u8::try_from(align( | |
u32::from(self.vmcontobj_revision()) | |
+ u32::try_from(core::mem::size_of::<u64>()).unwrap(), | |
u32::from(self.size()), | |
)) | |
.unwrap() |
// FIXME(frank-emrich) Does this actually need to be 16-byte aligned any | ||
// more? Now that we use I128 on the Cranelift side (see | ||
// [wasmtime_cranelift::stack_switching::fatpointer::pointer_type]), it | ||
// should be fine to use the natural alignment of the type. | ||
#[repr(C, align(16))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this align(16) needs to be here anymore, as described in the deleted FIXME
comment? But if we are leaving it, then we should have a follow up item to investigate in the meta issue and also should adjust the size_of_vmcontobj
method above to align to 16 instead of the natural alignment that my suggestion comment sketched out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if we wanted to pad out this struct to always be packed as 128 bits or as two words. I think two words makes more sense and aligns with what you are saying but means that the cranelift generated code will definitely need to be updated to be able to support 32-bit target architectures (probably along with changes discussed before to change how we do the fatpointer handling).
I'll land changes after thinking this a bit more and update the meta-issue with any notes on 32-bit targets.
These changes pull in the cranelift changes from #10177 with some additional stacked changes to resolve conflicts, align with previous changes in the stack-switching series, and address feedback items which were raised on previous iterations of the PR (but mostly not changing anything of significant substance). Tracking Issue: #10248.
The
stack-switching
feature flag is retained and used minimally in this changeset in order to avoid compilation problems, but not really used beyond that. There is at least one item in the tracking issue already related to likely finding a way to drop these compilation flags in most places but I think it is worth deferring that here as it will required touching code more broadly.CC @frank-emrich @dhil