x64: Add non-SSE4.1 lowerings of ceil/trunc/floor/nearest #6224

alexcrichton · 2023-04-17T18:26:12Z

This commit adds lowerings that work with SSE2 for CLIF ceil, trunc, floor, and nearest instructions over vectors. To get these working insertlane for float vectors was also implemented for non-SSE4.1 instructions as well.

Note that the goal of these lowerings is not speed but rather "it works", so the decompose-to-call-libcalls logic for vector is probably horrendously slow but should at least be correct.

This commit adds lowerings that work with SSE2 for CLIF `ceil`, `trunc`, `floor`, and `nearest` instructions over vectors. To get these working `insertlane` for float vectors was also implemented for non-SSE4.1 instructions as well. Note that the goal of these lowerings is not speed but rather "it works", so the decompose-to-call-libcalls logic for vector is probably horrendously slow but should at least be correct.

alexcrichton · 2023-04-17T18:27:10Z

I'll also note that this doesn't enable the runtest for insertlane yet because there are other lowerings that use SSE4.1 instructions but a future commit will enable the insertlane test for testing without SSE4.1

abrown

LGTM!

cranelift/codegen/src/isa/x64/inst.isle

abrown · 2023-04-18T16:31:01Z

cranelift/codegen/src/isa/x64/lower.isle

-        (x64_roundss a (RoundImm.RoundUp)))
-
-(rule 1 (lower (ceil a @ (value_type $F64)))
+        (x64_roundss a imm))


I know we're going the other direction here but I started thinking, "what about the AVX versions of these?" Looking at inst.isle, they are emitted by x64_round*, so all is well.

Yeah so far I feel like the current pattern of having a low-level instruction constructor which has an avx and a non-avx rule works out pretty well. It is relatively verbose and may be ripe for some future refactoring (e.g. auto-encoding as AVX if the context's feature flags are set) but for now it's nice to not have to worry about AVX at all really as part of lower.isle

Co-authored-by: Andrew Brown <[email protected]>

…liance#6224) * x64: Add non-SSE4.1 lowerings of ceil/trunc/floor/nearest This commit adds lowerings that work with SSE2 for CLIF `ceil`, `trunc`, `floor`, and `nearest` instructions over vectors. To get these working `insertlane` for float vectors was also implemented for non-SSE4.1 instructions as well. Note that the goal of these lowerings is not speed but rather "it works", so the decompose-to-call-libcalls logic for vector is probably horrendously slow but should at least be correct. * Skip new tests on riscv64 * Update cranelift/codegen/src/isa/x64/inst.isle Co-authored-by: Andrew Brown <[email protected]> --------- Co-authored-by: Andrew Brown <[email protected]>

penzn · 2023-05-18T16:23:55Z

Curious - is this just for testing or are there important users that only have SSE2? Even low-power x86(32/64) CPUs have had SSE4 for a very long time, and the standard standard was written with SSE4 in mind, web implementations don't support anything lower. In case someone would try to accelerate their code using instructions in this PR on SSE2 machine they are likely to get surprised.

alexcrichton · 2023-05-18T16:40:31Z

Just for testing and completeness. We occasionally get reports about users getting an error message that their machine didn't support SSE4.2 which was the basline required by Wasmtime originally. Part of this is on Wasmtime since we require SSE4.2 even if the module doesn't actually use simd instructions and that error could probably be improved by first detecting SIMD and then requiring SSE4.2 or something like that.

Otherwise though I was basically just curious if it was even possible to take the requirement from SSE4.2 to SSE2. I wanted to learn x64 a bit more and the various instructions and it ended up not being too bad and it fits well in the instruction selection framework that Cranelift has, so I figured I'd land it.

My guess though is that this'll only be here for compatibility in niche cases. I definitely have no plans myself to use work like this to shape future standards or anything like that, I think that the SSE4.2 baseline is quite reasonable and well thought out.

cfallin · 2023-05-18T16:51:55Z

@penzn FWIW, the current discussion on this and Alex's work were spawned by a downstream user trying to run Wasmtime on a Penryn-era (Core 2) machine a month or so ago, so these users do still exist, even if rarely (I'm not finding the issue at the moment, but all the related ones are linked from #3810, and we've had pretty extensive discussions on that cluster of issues over the past 1.5 years). It's also just nice to be able to say that we support our full feature set on any x86_64 machine (i.e. only SSE2), and dynamically detect the presence of better options (almost always switch to our SSE4.2 lowerings when available), rather than the other way around where we have a dynamically-detected error path or a "hole" in our support matrix. Wasm is all about portability, after all!

penzn · 2023-05-18T17:31:45Z

Some instructions were added only because 4.1/4.2 support is assumed. Here is one example, also see notes from the meeting that approved it. On the other hand adding fallbacks makes sense.

The tricky part of SIMD is that it needs to be fast as well as correct, you probably don't want to get into a situation when someone would accelerate their code using SIMD operations only to find out it got slower (probably even worse if compiler would quietly do it for them). Performance is pretty hard to specify precisely, criteria for adding instructions was something along the lines of "no slowdown, speedup for at least one platform" (i.e. swizzle).

cfallin · 2023-05-18T18:13:26Z

Right, and that came up in earlier discussions here too. From my perspective at least, it makes sense to tune the feature for newer hardware (and we do), but it kind of also sucks to just abandon users with old hardware completely. Given the choice between "user with 2007 laptop cannot run Wasmtime at all" and "user with 2007 laptop can run Wasmtime, but it's slow just like everything else on this machine" I think we'll take the latter :-)

alexcrichton requested a review from abrown April 17, 2023 18:26

alexcrichton marked this pull request as ready for review April 17, 2023 18:26

alexcrichton requested a review from a team as a code owner April 17, 2023 18:26

alexcrichton removed the request for review from a team April 17, 2023 18:26

Skip new tests on riscv64

d0d6bd8

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:area:x64 Issues related to x64 codegen labels Apr 17, 2023

alexcrichton closed this Apr 18, 2023

alexcrichton reopened this Apr 18, 2023

abrown approved these changes Apr 18, 2023

View reviewed changes

Update cranelift/codegen/src/isa/x64/inst.isle

35af122

Co-authored-by: Andrew Brown <[email protected]>

alexcrichton enabled auto-merge April 18, 2023 16:51

alexcrichton added this pull request to the merge queue Apr 18, 2023

Merged via the queue into bytecodealliance:main with commit 62f8928 Apr 18, 2023

alexcrichton deleted the x64-round-sse41 branch April 18, 2023 18:03

cfallin mentioned this pull request Jul 12, 2023

Release notes for wasmtime-11 #6714

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

x64: Add non-SSE4.1 lowerings of ceil/trunc/floor/nearest #6224

x64: Add non-SSE4.1 lowerings of ceil/trunc/floor/nearest #6224

Uh oh!

alexcrichton commented Apr 17, 2023

Uh oh!

alexcrichton commented Apr 17, 2023

Uh oh!

abrown left a comment

Uh oh!

Uh oh!

abrown Apr 18, 2023

Uh oh!

alexcrichton Apr 18, 2023

Uh oh!

penzn commented May 18, 2023

Uh oh!

alexcrichton commented May 18, 2023

Uh oh!

cfallin commented May 18, 2023

Uh oh!

penzn commented May 18, 2023

Uh oh!

cfallin commented May 18, 2023

Uh oh!

Uh oh!

x64: Add non-SSE4.1 lowerings of ceil/trunc/floor/nearest #6224

x64: Add non-SSE4.1 lowerings of ceil/trunc/floor/nearest #6224

Uh oh!

Conversation

alexcrichton commented Apr 17, 2023

Uh oh!

alexcrichton commented Apr 17, 2023

Uh oh!

abrown left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

abrown Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

alexcrichton Apr 18, 2023

Choose a reason for hiding this comment

Uh oh!

penzn commented May 18, 2023

Uh oh!

alexcrichton commented May 18, 2023

Uh oh!

cfallin commented May 18, 2023

Uh oh!

penzn commented May 18, 2023

Uh oh!

cfallin commented May 18, 2023

Uh oh!

Uh oh!