-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Currently in #17 it's suggested that the x86_64 lowering of the i16x8.relaxed_laneselect
instruction is pblendvb
, and this appears to be what v8 does today. In #115 (plus the current Overview.md
), however, the english prose for the definition of this instruction is:
Relaxed lane selection is deterministic when all bits are set or unset in the
mask. Otherwise depending on the host, either only the top bit is examined, or
all bits are examined (i.e. it becomes a bit select).
I don't believe, though, that the pblendvb
instruction correctly implements these semantics because lane selection mask 0x0080 that's neither 0x0000 or 0xffff and the high bit is zero, meaning that according to the spec the output should be the element in the b
vector. The pblendvb
instruction works at the byte-level, though, so one byte will be chosen from the a
vector and one will be chosen from the b
vector.
I think that this is also an issue with v8's lowering of the i{32x4,64x2}.relaxed_laneselect
since they all go through pblendvb
right now, although the suggestion in #17 I think would work with blendvp{s,d}
instead.