This repository was archived by the owner on Dec 22, 2021. It is now read-only.
move{32,64}_zero_{r,v} instructions #374
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduction
@Maratyszcza has done a wonderful job describing the use cases and functionality of load64_zero and load32_zero in #237. This proposal seeks to extend the functionality of load64_zero and load32_zero to be functionally complete with the underlying architecture by adding support for its sister variants with identical implementations. This would add support from other 32-bit and 64-bit registers and from the low 32 and 64 bits of other vectors. The proposed instructions are move32_zero_r, move64_zero_r, move32_zero_v, and move64_zero_v respectively. Since these are sister instructions, the applications, use cases, and instructions are identical to the original proposal.
Mapping to Common Instruction Sets
This section illustrates how the new WebAssembly instructions can be lowered on common instruction sets. However, these patterns are provided only for convenience, compliant WebAssembly implementations do not have to follow the same code generation patterns.
x86/x86-64 processors with AVX instruction set
v128.move32_zero_r
v = v128.move32_zero_r(r32) is lowered to VMOVD xmm_v, r32
v128.move64_zero_r
v = v128.move64_zero_r(r64) is lowered to VMOVQ xmm_v, r64
v128.move32_zero_v
v = v128.move32_zero_v(v128) is lowered to
v128.move64_zero_v
v = v128.move64_zero_v(v128) is lowered to VMOVQ xmm_v, xmm
x86/x86-64 processors with SSE2 instruction set
v128.move32_zero_r
v = v128.move32_zero_r(r32) is lowered to MOVD xmm_v, r32
v128.move64_zero_r
v = v128.move64_zero_r(r64) is lowered to MOVQ xmm_v, r64
v128.move32_zero_v
v = v128.move32_zero_v(v128) is lowered to
v128.move64_zero_v
v = v128.move64_zero_v(v128) is lowered to MOVQ xmm_v, xmm
ARM64 Processors
v128.move32_zero_r
v = v128.move32_zero_r(r32) is lowered to fmov s0, w0
v128.move64_zero_r
v = v128.move64_zero_r(r64) is lowered to fmov d0, x0
v128.move32_zero_v
v = v128.move32_zero_v(v128) is lowered to fmov s0, s1
v128.move64_zero_v
v = v128.move64_zero_v(v128) is lowered to fmov d0, d1
ARMv7 with Neon
v128.move32_zero_r
v = v128.move32_zero_r(r32) is lowered to
v128.move64_zero_r
v = v128.move32_zero_r(r64) is lowered to
v128.move32_zero_v
v = v128.move32_zero_v(v128) is lowered to
v128.move64_zero_v(v128) is lowered to