Skip to content

Conversation

@alexcrichton
Copy link
Member

This commit is the integration of a new crate into wasm-tools dubbed wit-dylib. This is additionally integrated under a new wasm-tools wit-dylib subcommand. The purpose of this crate is to create a shared-everything dynamic library from a WIT world which implements the world in terms of a static interface of functions. The main use case envisioned for this is for componentizing interpreted programming languages.

The approach taken here is that a shared-everything dynamic library is generated purely from the input of a WIT world and some configuration parameters. This generated dynamic library is then suitable to pass to wasm-tools component link to create a full component. This dynamic library might be further modified through means such as a GC pass or some form of pre-initialization. This overall architecture is lifted from componentize-py where it takes a very similar approach, but the support here is disentangled from any Python specifics. More information about this can be found in the README of the crate added here.

This new crate is integrated not only as-is but with a start at what is supposed to be a comprehensive test suite of the generated code. Specifically there are a suite of src/bin/*.rs files which each file pretends to be an "interpreter" through an shared utility implementation amongst the crates. Effectively this boxes up all WIT values into a single Val representation. This enables testing all the various runtime behaviors with high-level facilities like println-debugging, vectors, strings, etc. Tests are modeled after wit-bindgen test where there's a "caller" and a "callee" where the caller imports an interface and the callee exports the interface. The wasm-compose crate composes these together to produce a component runnable with a wasmtime CLI to complete the test.

Some possible FAQ-style questions:

  • Why include this in wasm-tools? - this is an interpreter-agnostic implementation of a component, for example nothing is Python-specific. It's intended that this is neutral and low-level enough to include in wasm-tools. Developers won't be using this day-in-and-day-out but it's hoped to be an integral part of componentizing interpreted languages.

  • What languages will use this? - for now, none, it's just starting. The Rust crate written at crates/wit-dylib/test-programs is intended to be suitable for external use but isn't published just yet. I hope to dabble with Lua after this lands with the mlua crate and my hope is to work with Joel to integrate this into componentize-py. Longer-term I'd also like to integrate this into StarlingMonkey.

  • How is this used? - the README contains a bit more information, but at a high level it's (a) write your interpreter and implement/use wit_dylib.h, (b) compile your interpreter as a shared library, (c) use wit-dylib for a WIT world, (d) link these together into a single component, and (e) profit. Various compiler flags are required to get this all passing, but that's the high-level bits.

  • How does wizer work? - it doesn't, Wizer only works with core modules and not components. The component-init phase of componentize-py will need to be extracted and put somewhere (probably Wizer itself). In the meantime this approach of using shared-everything dynamic linking is incompatible with Wizer.

  • How different is this from componentize-py? - very, I started with the same basic structure but ended up evolving relatively far from the specific implementation details of componentize-py. At a high-enough level the two continue to look the same but you don't have to go too far down to see how the implementations differ.

  • Fuzzing? - I haven't figured this out and this implementation is not fuzzed yet. It's still TBD what exactly this would look like. It's easy enough to generate an arbitrary world and then generate a dylib and assert it's valid but what really wants to happen is to validate that the actual generated code is correct. This'll take some more integration work. In the meantime it's intended that the test suite is comprehensive enough to be able to uncover and execute any bug found to serve as a regression suite.

  • Why now? - I had a itch and wanted to scratch it. It's expected that this will be a lynchpin of componentizing interpreted languages, but this is not all that's needed. For example component-init and/or Wizer integration is still needed. Basically this is a separable component I wanted to write, but there's yet more work to be done to fully integrate this everywhere.

@alexcrichton alexcrichton requested a review from a team as a code owner September 14, 2025 19:34
@alexcrichton alexcrichton requested review from dicej and pchickey and removed request for a team and pchickey September 14, 2025 19:34
@alexcrichton
Copy link
Member Author

@dicej you're likely interested in this, and I'll also flag you for review on this. This is 9kloc so I don't expect a detailed review really, but I'm curious for your take on things at a high level.

@alexcrichton alexcrichton force-pushed the wasm-interpreter-adapter branch from b13c611 to e65b005 Compare September 14, 2025 19:39
Copy link
Member

@tschneidereit tschneidereit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very very cool, and I really like how it turned out!

Some bits of feedback after an extremely superficial skim:

Terminology

One thing I could see as a potentially useful change is to move away from the lift/lower terminology: when looking into the ComponentizePy code base a while ago that kept tripping me up since it's so overloaded with the Component Model's use of those terms.

Maybe instead this could talk about encoding/decoding, or even clearer something like {to,from}_canon?

Partial bindings

How, if at all, will this be able to handle situations in which the language VM already has implementations for some interfaces, but not all?

An obvious example that'll be extremely common is the VM implementation itself using libc and through it wasi:filesystem and other interfaces. But for example in StarlingMonkey we'd always want to use handwritten bindings for some key functionality, such as HTTP I/O, but still provide the ability to generate other bindings automatically.

ComponentizeJS currently handles this by allowing (with some carefully chosen defaults) the selection of interfaces to retain from the input .wasm file. I don't think we'd need to do the inspection part here, and instead could perhaps add support for just saying "exclude this interface/world"—ideally with semver-compatible matching.

Concrete examples could be something like

# List All The Interfaces
wasm-tools wit-dylib --interpreter my.so --exclude-interfaces "wasi:cli/[email protected]","wasi:cli/[email protected]",[...]

# Exclude entire worlds
wasm-tools wit-dylib --interpreter my.so --exclude-worlds "wasi:[email protected]","wasi:[email protected]",[...]

Multiple dylibs

Just to double-check: if the interpreter itself needs additional dylibs, that's taken care of by wasm-tools component link --dl-openable, right? If not, we might want to add something for that here.


// Entrypoint for WIT resource destructors.
//
// The `ty` poitns to `wit->resources` and `handle` is the value being
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The `ty` poitns to `wit->resources` and `handle` is the value being
// The `ty` points to `wit->resources` and `handle` is the value being

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crate seems quite useful outside of testing, too. Maybe we could make it its own thing that VM implementers using Rust could make use of?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I wasn't quite sure how to integrate this with publishing in this repository, but for now it's at least all copyable!

Comment on lines 233 to 237
// Note that during lowering a `uint64_t` is NOT an "owned" value meaning that
// these functions should not allocate memory as it otherwise won't get cleaned
// up. For example the return value of `wit_dylib_list_get` is considered to be
// "borrowed" and not needing cleanup. This is suitable if, for example, the
// returned value is a pointer into the original list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to expand on this comment a bit / point to some other source of documentation of how this is supposed to be used. Specifically, from this it's unclear what an implementer is supposed to do if their value representation doesn't allow them to follow the "don't allocate" restriction here.

Additionally, IIUC this applies to uint64_t * out params as well, right? And of course to returned const pointers, though making that explicit would be good, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an engine couldn't implement wit_dylib_list_get without allocating then for now it just wouldn't be supported by wit-dylib. I'd have to dig into an example to figure out how better to support it.

Otherwise though I agree this header is severly under-documented. It's one though where documentation probably wouldn't be as useful as an example, so it's one where I was hoping over time that we could add an example or two (e.g. I'd probably be the one to bind the first 2-or-so usages of this crate)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not misunderstanding things, I don't think I can see a way to use this for things like lists of structs in either JS or Python: each struct would certainly have to change its memory representation in a to_canon operation.

@tschneidereit
Copy link
Member

Another thought: it might be nice to eventually make this extensible with components. Instead of the interpreter always being fully generic and all operations being dynamic and involving lots of (non-inlineable) calls, wouldn't it be nice if VM implementers could provide codegen components that export a world that wit-dylib can use to generate specialized functions?

E.g. instead of FunctionCompiler::export generating code that dynamically calls intrinsics for decoding canonical representations of arguments, then calls a function that does a dynamic dispatch in the interpreter, then dynamically calls intrinsics for encoding a canon representation of the return value, it'd call exports on the codegen component to generate the respective operations directly and emit a function that doesn't call any interpreter intrinsics. Ideally the codegen component could bail on any operation, in which case codegen would fall back to the dynamic version, so this could be used to optimize specific cases, instead of being an all-or-nothing affair.

@tschneidereit
Copy link
Member

I guess another option for partial bindings could be to provide an option to omit bindings for interfaces that the interpreter already has bindings for. I don't think that'd be sufficient on its own though: it wouldn't cover situations in which manual implementations are spread across multiple dylibs. This is something I would very much like to support in StarlingMonkey eventually, so that the runtime can be modularized without recompilation.

@alexcrichton
Copy link
Member Author

Maybe instead this could talk about encoding/decoding, or even clearer something like {to,from}_canon?

Heh it took me authoring what this is maybe my like sixth bindings generator and I'm feeling pretty comfy with lift/lower terminology now. AKA I like your idea!

How, if at all, will this be able to handle situations in which the language VM already has implementations for some interfaces, but not all?

...

if the interpreter itself needs additional dylibs

The wasm-tools component link subcommand should handle this well. It effectively unions all the worlds in all the input modules together for the final component. This means that for example wasi-libc would have its native bindings for things like wasi:filesystem. If a wit-dylib were generated for wasi:filesystem then in-content guests would also have access to wasi:filesystem via native language idioms (e.g. import { descriptor } from 'wasi:filesystem' in JS) too. When JS called it it'd go through the wit-dylib stub, but when wasi-libc calls it it'd go through the native path as well. Basically everything gets unioned together and everything should take the expected path through to the host.

Asking about multiple dylibs makes me pause for this. That sort of works today but also doesn't. That'd effectively generate two in-memory wit_t values and right now the interpreter hooks don't take a wit_t meaning that an interpreter wouldn'tk now which type indices go where. I can fix this though by updating the signature of all interpreter hooks to taking a const wit_t *wit parameter everywhere, though.

wouldn't it be nice if VM implementers could provide codegen components that export a world that wit-dylib can use to generate specialized functions?

Two questions for you:

  • With the assumption of cross-component inlining if something like wit_dylib_lift_u32 were simple enough that might get us a big chunk of the way there?
  • How different are you imagining this would be vs "the interpreter can always natively bind what it wants, e.g. wasi:http for fetch in JS"

omit bindings for interfaces that the interpreter already has bindings for

There should be no harm in giving interpreted content raw access to bindings (albeit it'd be slower than through the engine), but I also think it'd be reasonable to customize the generated dylib to skip functions.

@pchickey
Copy link
Contributor

pchickey commented Sep 15, 2025

How does wizer work? - it doesn't, Wizer only works with core modules and not components. The component-init phase of componentize-py will need to be extracted and put somewhere (probably Wizer itself). In the meantime this approach of using shared-everything dynamic linking is incompatible with Wizer.

I did some work to make component-init useful standalone, and the plan was eventually to integrate it into Wizer. (Thats behind various other items on my todo list right now so if anyone wants to take it, please do) Is the component-init-cli crate, as it exists in that repo today, useful for this approach today? Its not yet published on crates.io but I presume @dicej can easily remedy that if its useful

@tschneidereit
Copy link
Member

The wasm-tools component link subcommand should handle this well. [..] When JS called it it'd go through the wit-dylib stub, but when wasi-libc calls it it'd go through the native path as well. Basically everything gets unioned together and everything should take the expected path through to the host.

Okay, great. I guess exports are where things get more complicated, but they're also far less worrisome in this regard, and, if provided directly by the interpreter in addition to wanting to be able to generate bindings for them, need some amount of special casing in any case. So, that all seems fine.

Asking about multiple dylibs makes me pause for this. That sort of works today but also doesn't. That'd effectively generate two in-memory wit_t values and right now the interpreter hooks don't take a wit_t meaning that an interpreter wouldn'tk now which type indices go where. I can fix this though by updating the signature of all interpreter hooks to taking a const wit_t *wit parameter everywhere, though.

If I understand what you're saying correctly, then at least for my envisioned use case this might not be too much of a problem: I don't think I'd want to have the interpreter part be spread across multiple dylibs. Instead, there'd always be an interpreter.so and some additional dylibs that use their own bindings mechanism. To take a specific example, say I want to have (outgoing) http support (aka fetch) be a loadable module in StarlingMonkey: that'd reside in a dylib that uses C or Rust WIT bindings, but wouldn't ever be interested in having access to the wit-dylib.h types.

For incoming http, this would I think also all work, as long as I'm either able to tell wit-dylib to leave the incoming handler export alone and not generate its own version of it, or I have some kind of "if this is the http incoming handler, if fetch available, do this, otherwise do that" thing in the interpreter. Ideally the former would be possible.

  • With the assumption of cross-component inlining if something like wit_dylib_lift_u32 were simple enough that might get us a big chunk of the way there?

Maybe I'm overly pessimistic about our codegen capabilities, but I'm imagining that even with inlining we'd still have the dynamic dispatch and effectively a dynamic interpreter over the arguments and return value. What I'm imagining is I guess almost something like a patching baseline compiler that'd eliminate the dynamic dispatch and turn the interpreter into wasm bytecode.

If you think that cranelift will actually see through all this and do something effectively very similar, then none of this is needed, of course! (To the degree it's needed at all: neither the dynamic dispatch nor the marshalling interpreter will probably be ridiculously slow. But I think the faster we can make these things, the nicer.

  • How different are you imagining this would be vs "the interpreter can always natively bind what it wants, e.g. wasi:http for fetch in JS"

I think pretty different: what I'm imagining would be general-purpose and not need to know anything about the specific function in question. The "might not always generate code" part was more meant to enable incremental adoption where someone could provide a codegen plugin for simple cases like "take an int, return a float", but punt on cases like "take a list of lists of structs, return an abomination."

There should be no harm in giving interpreted content raw access to bindings (albeit it'd be slower than through the engine), but I also think it'd be reasonable to customize the generated dylib to skip functions.

The more I think about it, the more I realize that this really is only relevant to exports, not imports. I agree that for imports it's entirely fine to just generate the bindings, and if someone truly wants to subset things, they can do so by manually subsetting the wit.

Copy link
Collaborator

@dicej dicej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I'm excited to port componentize-py over to use this, and then add async support as well.

One thing I've been meaning to do in componentize-py's code generator is deduplicate the lifting and lowering code where appropriate. For example, if we notice we need to lift the same list<my-variant-with-complex-payloads> type in N different places, we could generate a helper function to do that instead of generating N copies of the same code. And that could be done recursively such that a given helper function might defer to other helper functions, inlining only where it makes sense. Not a blocker for this PR, of course, but something to consider later.

lift_f32 : [ValType::F32] -> [ValType::I64] = "wit_dylib_lift_f32",
lift_f64 : [ValType::F64] -> [ValType::I64] = "wit_dylib_lift_f64",
lift_string : [ValType::I32; 2] -> [ValType::I64] = "wit_dylib_lift_string",
lift_record : [ValType::I32, ValType::I32] -> [ValType::I64] = "wit_dylib_lift_record",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: might want to be consistent about choosing either the [ValType::I32; 2] style or the [ValType::I32, ValType::I32] style here and below.

}

// No other types with intrinsics at this time (futures/streams are
// relative to where they show up in function types.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// relative to where they show up in function types.
// relative to where they show up in function types).

@dicej
Copy link
Collaborator

dicej commented Sep 16, 2025

How does wizer work? - it doesn't, Wizer only works with core modules and not components. The component-init phase of componentize-py will need to be extracted and put somewhere (probably Wizer itself). In the meantime this approach of using shared-everything dynamic linking is incompatible with Wizer.

I did some work to make component-init useful standalone, and the plan was eventually to integrate it into Wizer. (Thats behind various other items on my todo list right now so if anyone wants to take it, please do) Is the component-init-cli crate, as it exists in that repo today, useful for this approach today? Its not yet published on crates.io but I presume @dicej can easily remedy that if its useful

Yeah, I think the next step is to move the repo to the bytecodealliance org and then start publishing it from there. I'll get that started.

@dicej
Copy link
Collaborator

dicej commented Sep 16, 2025

Yeah, I think the next step is to move the repo to the bytecodealliance org and then start publishing it from there. I'll get that started.

Or do we want to make it a subcrate of Wizer? @pchickey I forget whether you had a plan for this already; I'm fine with whatever.

@pchickey
Copy link
Contributor

@dicej If we are moving it to the bca org I would rather put it into the existing wizer repository, rather than its own, and later integrate it with the wizer CLI. I didn't have a concrete plan beyond that. @fitzgen does that sound ok?

@fitzgen
Copy link
Member

fitzgen commented Sep 16, 2025

@fitzgen does that sound ok?

SGTM!

@alexcrichton
Copy link
Member Author

@tschneidereit

To take a specific example, say I want to have (outgoing) http support (aka fetch) be a loadable module in StarlingMonkey: that'd reside in a dylib that uses C or Rust WIT bindings, but wouldn't ever be interested in having access to the wit-dylib.h types.

Yep, that should work as-is!

For incoming http, this would I think also all work, as long as I'm either able to tell wit-dylib to leave the incoming handler export alone and not generate its own version of it

One option would be to exclude it from the world passed to wasm-tools wit-dylib, but adding a --exclude-export wasi:http/incoming-hanlder or something would also be quite easy to add.

Maybe I'm overly pessimistic about our codegen capabilities, but I'm imagining that even with inlining we'd still have the dynamic dispatch and effectively a dynamic interpreter over the arguments and return value. What I'm imagining is I guess almost something like a patching baseline compiler that'd eliminate the dynamic dispatch and turn the interpreter into wasm bytecode.

Ok makes, and I'm about to make a change which would defeat inlining even more in theory. In any case I'll file a follow-up issue for discussion on this.


@dicej

and then add async support as well.

We discussed this briefly, but to write down this here too -- I think adding async support will be a nontrivial, but very doable, amount of work in this framework. Basically I don't forsee any major issues beyond "just gotta get it done". Effectively it'd look like resource intrinsics where the type information would have more function pointers hanging off it.

For example, if we notice we need to lift the same list type in N different places, we could generate a helper function to do that instead of generating N copies of the same code. And that could be done recursively such that a given helper function might defer to other helper functions, inlining only where it makes sense. Not a blocker for this PR, of course, but something to consider later.

Discussed offline with Joel as well, and I'm going to file a follow-up issue for this as I agree it would be nice to have.

This commit is the integration of a new crate into `wasm-tools` dubbed
`wit-dylib`. This is additionally integrated under a new `wasm-tools
wit-dylib` subcommand. The purpose of this crate is to create a
shared-everything dynamic library from a WIT world which implements the
world in terms of a static interface of functions. The main use case
envisioned for this is for componentizing interpreted programming
languages.

The approach taken here is that a shared-everything dynamic library is
generated purely from the input of a WIT world and some configuration
parameters. This generated dynamic library is then suitable to pass to
`wasm-tools component link` to create a full component. This dynamic
library might be further modified through means such as a GC pass or
some form of pre-initialization. This overall architecture is lifted
from componentize-py where it takes a very similar approach, but the
support here is disentangled from any Python specifics. More information
about this can be found in the README of the crate added here.

This new crate is integrated not only as-is but with a start at what is
supposed to be a comprehensive test suite of the generated code.
Specifically there are a suite of `src/bin/*.rs` files which each file
pretends to be an "interpreter" through an shared utility implementation
amongst the crates. Effectively this boxes up all WIT values into a
single `Val` representation. This enables testing all the various
runtime behaviors with high-level facilities like println-debugging,
vectors, strings, etc. Tests are modeled after `wit-bindgen test` where
there's a "caller" and a "callee" where the caller imports an interface
and the callee exports the interface. The `wasm-compose` crate composes
these together to produce a component runnable with a `wasmtime` CLI to
complete the test.

Some possible FAQ-style questions:

* **Why include this in `wasm-tools`?** - this is an interpreter-agnostic
  implementation of a component, for example nothing is Python-specific.
  It's intended that this is neutral and low-level enough to include in
  `wasm-tools`. Developers won't be using this day-in-and-day-out but
  it's hoped to be an integral part of componentizing interpreted languages.

* **What languages will use this?** - for now, none, it's just starting.
  The Rust crate written at `crates/wit-dylib/test-programs` is intended
  to be suitable for external use but isn't published just yet. I hope
  to dabble with Lua after this lands with the `mlua` crate and my hope
  is to work with Joel to integrate this into `componentize-py`.
  Longer-term I'd also like to integrate this into StarlingMonkey.

* **How is this used?** - the README contains a bit more information,
  but at a high level it's (a) write your interpreter and implement/use
  `wit_dylib.h`, (b) compile your interpreter as a shared library, (c)
  use `wit-dylib` for a WIT world, (d) link these together into a single
  component, and (e) profit. Various compiler flags are required to get
  this all passing, but that's the high-level bits.

* **How does wizer work?** - it doesn't, Wizer only works with core
  modules and not components. The `component-init` phase of
  `componentize-py` will need to be extracted and put somewhere
  (probably Wizer itself). In the meantime this approach of using
  shared-everything dynamic linking is incompatible with Wizer.

* **How different is this from `componentize-py`?** - very, I started
  with the same basic structure but ended up evolving relatively far
  from the specific implementation details of `componentize-py`. At a
  high-enough level the two continue to look the same but you don't have
  to go too far down to see how the implementations differ.

* **Fuzzing?** - I haven't figured this out and this implementation is
  not fuzzed yet. It's still TBD what exactly this would look like. It's
  easy enough to generate an arbitrary world and then generate a dylib
  and assert it's valid but what really wants to happen is to validate
  that the actual generated code is correct. This'll take some more
  integration work. In the meantime it's intended that the test suite is
  comprehensive enough to be able to uncover and execute any bug found
  to serve as a regression suite.

* **Why now?** - I had a itch and wanted to scratch it. It's expected
  that this will be a lynchpin of componentizing interpreted languages,
  but this is not all that's needed. For example `component-init` and/or
  Wizer integration is still needed. Basically this is a separable
  component I wanted to write, but there's yet more work to be done to
  fully integrate this everywhere.
* More flexible in the representation of a value chosen by an interpreter
* Easier-to-understand interface: "just push and pop"
* External lift/lower terminology is now "push" and "pop"
* APIs were tweaked as necessary to minimize conversion functions and
  model the stack behavior.
@alexcrichton alexcrichton force-pushed the wasm-interpreter-adapter branch from e7d37fa to bd42507 Compare September 17, 2025 01:06
@alexcrichton
Copy link
Member Author

Ok I've now pushed up a new interface for doing all of this. The interpreter is now modeled as having a stack that's pushed/popped from for value conversion which I'm subjectively saying is easier to understand and implement. This is also required to be more flexible for some other dabbling I was doing. I'm going to do some more testing tomorrow and probably merge this afterwards.

@alexcrichton alexcrichton added this pull request to the merge queue Sep 17, 2025
Merged via the queue into bytecodealliance:main with commit 434da5e Sep 17, 2025
34 checks passed
@alexcrichton alexcrichton deleted the wasm-interpreter-adapter branch September 17, 2025 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants