Description
Hello,
Going to implement ref.null
support in LLVM, I ran into an interesting issue. The summary is that I think it makes sense to define an LLVM-specific top type encompassing both externref
and funcref
; an anyref
, if you will.
Concretely, ref.null
needs a type operand. The instruction is specified as being ref.null REFTYPE
(https://webassembly.github.io/reference-types/core/syntax/instructions.html#reference-instructions), and encodes as such in the binary.
Currently REFTYPE
can only be externref
or funcref
. However with typed function references, the set becomes unbounded, as users can define their own e.g. (func (i32 i32) -> (i64 f32))
and similar. So at least on the MC layer we will need for ref.null
to have a reftype operand. Therefore I will probably make a Reftype
operand kind, which is similar in a way to the Signature
operand to block instructions.
To compare, the approach taken in the implementation of the table instructions was to provide e.g. TABLE.GET_externref
for tables returning externref, and TABLE.GET_funcref
for those returning funcref. Given the signature operand to ref.null
though, it is not necessary from a target point of view to have two kinds of ref.null
.
Which leads me to my proposal: what good does it do us in LLVM to distinguish externref
and funcref
values as different MachineValueType
s? It's not sufficient to provide the information needed to ref.null
, and yet not necessary for instructions like table.get
. It would be simpler if we could just treat all reference types the same.
In the case of table.get
and similar instructions, it turns out that discriminating between externref
and funcref
is not necessary for the target encoding; the result of table.get
is the type of the table. We could remove the duplicate instruction definitions, and define table.get
as just returning a value of type anyref
.
If this analysis is right, we should replace the externref
and funcref
MVT's with one anyref
. If the difference is important for the instruction encoding, the instruction will have to take a Reftype
operand. I will work up a patch.
One question is, how do we represent ref.null
on the IR level. Given that the set of types is unbounded (once we have typed function references), a quick-and-dirty way would be to define the intrinsic as anyref __builtin_wasm_ref_null(const char *type)
, and pass either externref
or funcref
as immediate strings. This is just a placeholder idea, I guess.
For context, it used to be that there was just anyref
in the reference-types proposal, but it was later changed to externref
and funcref
. This was essentially for run-time concerns, AFAIU: you might want to represent function references and GC objects differently, and that forcing a top type onto them constrains run-time in undesirable ways. I get that. But for the compiler, it doesn't seem to me like the difference buys us anything.
Cc @tlively @sbc100 @pmatos. If this discussion might be better elsewhere, happy to take it there :)