Skip to content

externref/funcref distinction fundamental for toolchain? #150

Closed
@wingo

Description

@wingo

Hello,

Going to implement ref.null support in LLVM, I ran into an interesting issue. The summary is that I think it makes sense to define an LLVM-specific top type encompassing both externref and funcref; an anyref, if you will.

Concretely, ref.null needs a type operand. The instruction is specified as being ref.null REFTYPE (https://webassembly.github.io/reference-types/core/syntax/instructions.html#reference-instructions), and encodes as such in the binary.

Currently REFTYPE can only be externref or funcref. However with typed function references, the set becomes unbounded, as users can define their own e.g. (func (i32 i32) -> (i64 f32)) and similar. So at least on the MC layer we will need for ref.null to have a reftype operand. Therefore I will probably make a Reftype operand kind, which is similar in a way to the Signature operand to block instructions.

To compare, the approach taken in the implementation of the table instructions was to provide e.g. TABLE.GET_externref for tables returning externref, and TABLE.GET_funcref for those returning funcref. Given the signature operand to ref.null though, it is not necessary from a target point of view to have two kinds of ref.null.

Which leads me to my proposal: what good does it do us in LLVM to distinguish externref and funcref values as different MachineValueTypes? It's not sufficient to provide the information needed to ref.null, and yet not necessary for instructions like table.get. It would be simpler if we could just treat all reference types the same.

In the case of table.get and similar instructions, it turns out that discriminating between externref and funcref is not necessary for the target encoding; the result of table.get is the type of the table. We could remove the duplicate instruction definitions, and define table.get as just returning a value of type anyref.

If this analysis is right, we should replace the externref and funcref MVT's with one anyref. If the difference is important for the instruction encoding, the instruction will have to take a Reftype operand. I will work up a patch.

One question is, how do we represent ref.null on the IR level. Given that the set of types is unbounded (once we have typed function references), a quick-and-dirty way would be to define the intrinsic as anyref __builtin_wasm_ref_null(const char *type), and pass either externref or funcref as immediate strings. This is just a placeholder idea, I guess.

For context, it used to be that there was just anyref in the reference-types proposal, but it was later changed to externref and funcref. This was essentially for run-time concerns, AFAIU: you might want to represent function references and GC objects differently, and that forcing a top type onto them constrains run-time in undesirable ways. I get that. But for the compiler, it doesn't seem to me like the difference buys us anything.

Cc @tlively @sbc100 @pmatos. If this discussion might be better elsewhere, happy to take it there :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions