-
Notifications
You must be signed in to change notification settings - Fork 313
Add a new wit-dylib crate
#2304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new wit-dylib crate
#2304
Conversation
|
@dicej you're likely interested in this, and I'll also flag you for review on this. This is 9kloc so I don't expect a detailed review really, but I'm curious for your take on things at a high level. |
b13c611 to
e65b005
Compare
tschneidereit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very very cool, and I really like how it turned out!
Some bits of feedback after an extremely superficial skim:
Terminology
One thing I could see as a potentially useful change is to move away from the lift/lower terminology: when looking into the ComponentizePy code base a while ago that kept tripping me up since it's so overloaded with the Component Model's use of those terms.
Maybe instead this could talk about encoding/decoding, or even clearer something like {to,from}_canon?
Partial bindings
How, if at all, will this be able to handle situations in which the language VM already has implementations for some interfaces, but not all?
An obvious example that'll be extremely common is the VM implementation itself using libc and through it wasi:filesystem and other interfaces. But for example in StarlingMonkey we'd always want to use handwritten bindings for some key functionality, such as HTTP I/O, but still provide the ability to generate other bindings automatically.
ComponentizeJS currently handles this by allowing (with some carefully chosen defaults) the selection of interfaces to retain from the input .wasm file. I don't think we'd need to do the inspection part here, and instead could perhaps add support for just saying "exclude this interface/world"—ideally with semver-compatible matching.
Concrete examples could be something like
# List All The Interfaces
wasm-tools wit-dylib --interpreter my.so --exclude-interfaces "wasi:cli/[email protected]","wasi:cli/[email protected]",[...]
# Exclude entire worlds
wasm-tools wit-dylib --interpreter my.so --exclude-worlds "wasi:[email protected]","wasi:[email protected]",[...]Multiple dylibs
Just to double-check: if the interpreter itself needs additional dylibs, that's taken care of by wasm-tools component link --dl-openable, right? If not, we might want to add something for that here.
|
|
||
| // Entrypoint for WIT resource destructors. | ||
| // | ||
| // The `ty` poitns to `wit->resources` and `handle` is the value being |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // The `ty` poitns to `wit->resources` and `handle` is the value being | |
| // The `ty` points to `wit->resources` and `handle` is the value being |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This crate seems quite useful outside of testing, too. Maybe we could make it its own thing that VM implementers using Rust could make use of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I wasn't quite sure how to integrate this with publishing in this repository, but for now it's at least all copyable!
crates/wit-dylib/wit_dylib.h
Outdated
| // Note that during lowering a `uint64_t` is NOT an "owned" value meaning that | ||
| // these functions should not allocate memory as it otherwise won't get cleaned | ||
| // up. For example the return value of `wit_dylib_list_get` is considered to be | ||
| // "borrowed" and not needing cleanup. This is suitable if, for example, the | ||
| // returned value is a pointer into the original list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to expand on this comment a bit / point to some other source of documentation of how this is supposed to be used. Specifically, from this it's unclear what an implementer is supposed to do if their value representation doesn't allow them to follow the "don't allocate" restriction here.
Additionally, IIUC this applies to uint64_t * out params as well, right? And of course to returned const pointers, though making that explicit would be good, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an engine couldn't implement wit_dylib_list_get without allocating then for now it just wouldn't be supported by wit-dylib. I'd have to dig into an example to figure out how better to support it.
Otherwise though I agree this header is severly under-documented. It's one though where documentation probably wouldn't be as useful as an example, so it's one where I was hoping over time that we could add an example or two (e.g. I'd probably be the one to bind the first 2-or-so usages of this crate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not misunderstanding things, I don't think I can see a way to use this for things like lists of structs in either JS or Python: each struct would certainly have to change its memory representation in a to_canon operation.
|
Another thought: it might be nice to eventually make this extensible with components. Instead of the interpreter always being fully generic and all operations being dynamic and involving lots of (non-inlineable) calls, wouldn't it be nice if VM implementers could provide codegen components that export a world that E.g. instead of |
|
I guess another option for partial bindings could be to provide an option to omit bindings for interfaces that the interpreter already has bindings for. I don't think that'd be sufficient on its own though: it wouldn't cover situations in which manual implementations are spread across multiple dylibs. This is something I would very much like to support in StarlingMonkey eventually, so that the runtime can be modularized without recompilation. |
Heh it took me authoring what this is maybe my like sixth bindings generator and I'm feeling pretty comfy with lift/lower terminology now. AKA I like your idea!
The Asking about multiple dylibs makes me pause for this. That sort of works today but also doesn't. That'd effectively generate two in-memory
Two questions for you:
There should be no harm in giving interpreted content raw access to bindings (albeit it'd be slower than through the engine), but I also think it'd be reasonable to customize the generated dylib to skip functions. |
I did some work to make |
Okay, great. I guess exports are where things get more complicated, but they're also far less worrisome in this regard, and, if provided directly by the interpreter in addition to wanting to be able to generate bindings for them, need some amount of special casing in any case. So, that all seems fine.
If I understand what you're saying correctly, then at least for my envisioned use case this might not be too much of a problem: I don't think I'd want to have the interpreter part be spread across multiple dylibs. Instead, there'd always be an For incoming http, this would I think also all work, as long as I'm either able to tell
Maybe I'm overly pessimistic about our codegen capabilities, but I'm imagining that even with inlining we'd still have the dynamic dispatch and effectively a dynamic interpreter over the arguments and return value. What I'm imagining is I guess almost something like a patching baseline compiler that'd eliminate the dynamic dispatch and turn the interpreter into wasm bytecode. If you think that cranelift will actually see through all this and do something effectively very similar, then none of this is needed, of course! (To the degree it's needed at all: neither the dynamic dispatch nor the marshalling interpreter will probably be ridiculously slow. But I think the faster we can make these things, the nicer.
I think pretty different: what I'm imagining would be general-purpose and not need to know anything about the specific function in question. The "might not always generate code" part was more meant to enable incremental adoption where someone could provide a codegen plugin for simple cases like "take an int, return a float", but punt on cases like "take a list of lists of structs, return an abomination."
The more I think about it, the more I realize that this really is only relevant to exports, not imports. I agree that for imports it's entirely fine to just generate the bindings, and if someone truly wants to subset things, they can do so by manually subsetting the wit. |
dicej
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I'm excited to port componentize-py over to use this, and then add async support as well.
One thing I've been meaning to do in componentize-py's code generator is deduplicate the lifting and lowering code where appropriate. For example, if we notice we need to lift the same list<my-variant-with-complex-payloads> type in N different places, we could generate a helper function to do that instead of generating N copies of the same code. And that could be done recursively such that a given helper function might defer to other helper functions, inlining only where it makes sense. Not a blocker for this PR, of course, but something to consider later.
crates/wit-dylib/src/bindgen.rs
Outdated
| lift_f32 : [ValType::F32] -> [ValType::I64] = "wit_dylib_lift_f32", | ||
| lift_f64 : [ValType::F64] -> [ValType::I64] = "wit_dylib_lift_f64", | ||
| lift_string : [ValType::I32; 2] -> [ValType::I64] = "wit_dylib_lift_string", | ||
| lift_record : [ValType::I32, ValType::I32] -> [ValType::I64] = "wit_dylib_lift_record", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: might want to be consistent about choosing either the [ValType::I32; 2] style or the [ValType::I32, ValType::I32] style here and below.
| } | ||
|
|
||
| // No other types with intrinsics at this time (futures/streams are | ||
| // relative to where they show up in function types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // relative to where they show up in function types. | |
| // relative to where they show up in function types). |
Yeah, I think the next step is to move the repo to the |
Or do we want to make it a subcrate of Wizer? @pchickey I forget whether you had a plan for this already; I'm fine with whatever. |
SGTM! |
Yep, that should work as-is!
One option would be to exclude it from the world passed to
Ok makes, and I'm about to make a change which would defeat inlining even more in theory. In any case I'll file a follow-up issue for discussion on this.
We discussed this briefly, but to write down this here too -- I think adding async support will be a nontrivial, but very doable, amount of work in this framework. Basically I don't forsee any major issues beyond "just gotta get it done". Effectively it'd look like resource intrinsics where the type information would have more function pointers hanging off it.
Discussed offline with Joel as well, and I'm going to file a follow-up issue for this as I agree it would be nice to have. |
This commit is the integration of a new crate into `wasm-tools` dubbed `wit-dylib`. This is additionally integrated under a new `wasm-tools wit-dylib` subcommand. The purpose of this crate is to create a shared-everything dynamic library from a WIT world which implements the world in terms of a static interface of functions. The main use case envisioned for this is for componentizing interpreted programming languages. The approach taken here is that a shared-everything dynamic library is generated purely from the input of a WIT world and some configuration parameters. This generated dynamic library is then suitable to pass to `wasm-tools component link` to create a full component. This dynamic library might be further modified through means such as a GC pass or some form of pre-initialization. This overall architecture is lifted from componentize-py where it takes a very similar approach, but the support here is disentangled from any Python specifics. More information about this can be found in the README of the crate added here. This new crate is integrated not only as-is but with a start at what is supposed to be a comprehensive test suite of the generated code. Specifically there are a suite of `src/bin/*.rs` files which each file pretends to be an "interpreter" through an shared utility implementation amongst the crates. Effectively this boxes up all WIT values into a single `Val` representation. This enables testing all the various runtime behaviors with high-level facilities like println-debugging, vectors, strings, etc. Tests are modeled after `wit-bindgen test` where there's a "caller" and a "callee" where the caller imports an interface and the callee exports the interface. The `wasm-compose` crate composes these together to produce a component runnable with a `wasmtime` CLI to complete the test. Some possible FAQ-style questions: * **Why include this in `wasm-tools`?** - this is an interpreter-agnostic implementation of a component, for example nothing is Python-specific. It's intended that this is neutral and low-level enough to include in `wasm-tools`. Developers won't be using this day-in-and-day-out but it's hoped to be an integral part of componentizing interpreted languages. * **What languages will use this?** - for now, none, it's just starting. The Rust crate written at `crates/wit-dylib/test-programs` is intended to be suitable for external use but isn't published just yet. I hope to dabble with Lua after this lands with the `mlua` crate and my hope is to work with Joel to integrate this into `componentize-py`. Longer-term I'd also like to integrate this into StarlingMonkey. * **How is this used?** - the README contains a bit more information, but at a high level it's (a) write your interpreter and implement/use `wit_dylib.h`, (b) compile your interpreter as a shared library, (c) use `wit-dylib` for a WIT world, (d) link these together into a single component, and (e) profit. Various compiler flags are required to get this all passing, but that's the high-level bits. * **How does wizer work?** - it doesn't, Wizer only works with core modules and not components. The `component-init` phase of `componentize-py` will need to be extracted and put somewhere (probably Wizer itself). In the meantime this approach of using shared-everything dynamic linking is incompatible with Wizer. * **How different is this from `componentize-py`?** - very, I started with the same basic structure but ended up evolving relatively far from the specific implementation details of `componentize-py`. At a high-enough level the two continue to look the same but you don't have to go too far down to see how the implementations differ. * **Fuzzing?** - I haven't figured this out and this implementation is not fuzzed yet. It's still TBD what exactly this would look like. It's easy enough to generate an arbitrary world and then generate a dylib and assert it's valid but what really wants to happen is to validate that the actual generated code is correct. This'll take some more integration work. In the meantime it's intended that the test suite is comprehensive enough to be able to uncover and execute any bug found to serve as a regression suite. * **Why now?** - I had a itch and wanted to scratch it. It's expected that this will be a lynchpin of componentizing interpreted languages, but this is not all that's needed. For example `component-init` and/or Wizer integration is still needed. Basically this is a separable component I wanted to write, but there's yet more work to be done to fully integrate this everywhere.
* More flexible in the representation of a value chosen by an interpreter * Easier-to-understand interface: "just push and pop" * External lift/lower terminology is now "push" and "pop" * APIs were tweaked as necessary to minimize conversion functions and model the stack behavior.
e7d37fa to
bd42507
Compare
|
Ok I've now pushed up a new interface for doing all of this. The interpreter is now modeled as having a stack that's pushed/popped from for value conversion which I'm subjectively saying is easier to understand and implement. This is also required to be more flexible for some other dabbling I was doing. I'm going to do some more testing tomorrow and probably merge this afterwards. |
This commit is the integration of a new crate into
wasm-toolsdubbedwit-dylib. This is additionally integrated under a newwasm-tools wit-dylibsubcommand. The purpose of this crate is to create a shared-everything dynamic library from a WIT world which implements the world in terms of a static interface of functions. The main use case envisioned for this is for componentizing interpreted programming languages.The approach taken here is that a shared-everything dynamic library is generated purely from the input of a WIT world and some configuration parameters. This generated dynamic library is then suitable to pass to
wasm-tools component linkto create a full component. This dynamic library might be further modified through means such as a GC pass or some form of pre-initialization. This overall architecture is lifted from componentize-py where it takes a very similar approach, but the support here is disentangled from any Python specifics. More information about this can be found in the README of the crate added here.This new crate is integrated not only as-is but with a start at what is supposed to be a comprehensive test suite of the generated code. Specifically there are a suite of
src/bin/*.rsfiles which each file pretends to be an "interpreter" through an shared utility implementation amongst the crates. Effectively this boxes up all WIT values into a singleValrepresentation. This enables testing all the various runtime behaviors with high-level facilities like println-debugging, vectors, strings, etc. Tests are modeled afterwit-bindgen testwhere there's a "caller" and a "callee" where the caller imports an interface and the callee exports the interface. Thewasm-composecrate composes these together to produce a component runnable with awasmtimeCLI to complete the test.Some possible FAQ-style questions:
Why include this in
wasm-tools? - this is an interpreter-agnostic implementation of a component, for example nothing is Python-specific. It's intended that this is neutral and low-level enough to include inwasm-tools. Developers won't be using this day-in-and-day-out but it's hoped to be an integral part of componentizing interpreted languages.What languages will use this? - for now, none, it's just starting. The Rust crate written at
crates/wit-dylib/test-programsis intended to be suitable for external use but isn't published just yet. I hope to dabble with Lua after this lands with themluacrate and my hope is to work with Joel to integrate this intocomponentize-py. Longer-term I'd also like to integrate this into StarlingMonkey.How is this used? - the README contains a bit more information, but at a high level it's (a) write your interpreter and implement/use
wit_dylib.h, (b) compile your interpreter as a shared library, (c) usewit-dylibfor a WIT world, (d) link these together into a single component, and (e) profit. Various compiler flags are required to get this all passing, but that's the high-level bits.How does wizer work? - it doesn't, Wizer only works with core modules and not components. The
component-initphase ofcomponentize-pywill need to be extracted and put somewhere (probably Wizer itself). In the meantime this approach of using shared-everything dynamic linking is incompatible with Wizer.How different is this from
componentize-py? - very, I started with the same basic structure but ended up evolving relatively far from the specific implementation details ofcomponentize-py. At a high-enough level the two continue to look the same but you don't have to go too far down to see how the implementations differ.Fuzzing? - I haven't figured this out and this implementation is not fuzzed yet. It's still TBD what exactly this would look like. It's easy enough to generate an arbitrary world and then generate a dylib and assert it's valid but what really wants to happen is to validate that the actual generated code is correct. This'll take some more integration work. In the meantime it's intended that the test suite is comprehensive enough to be able to uncover and execute any bug found to serve as a regression suite.
Why now? - I had a itch and wanted to scratch it. It's expected that this will be a lynchpin of componentizing interpreted languages, but this is not all that's needed. For example
component-initand/or Wizer integration is still needed. Basically this is a separable component I wanted to write, but there's yet more work to be done to fully integrate this everywhere.