Closed
Description
The functions below multiply a vector with a splatted scalar, computed from an add:
define <4 x i64> @f_v4i64(<4 x i64> %x, i64 %y) {
%1 = insertelement <4 x i64> poison, i64 %y, i32 0
%2 = shufflevector <4 x i64> %1, <4 x i64> poison, <4 x i32> zeroinitializer
%3 = add <4 x i64> %2, <i64 3, i64 3, i64 3, i64 3>
%4 = mul <4 x i64> %x, %3
ret <4 x i64> %4
}
define <vscale x 4 x i64> @f_nxv4i64(<vscale x 4 x i64> %x, i64 %y) {
%1 = insertelement <vscale x 4 x i64> poison, i64 %y, i32 0
%2 = shufflevector <vscale x 4 x i64> %1, <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
%3 = add <vscale x 4 x i64> %2, shufflevector(<vscale x 4 x i64> insertelement(<vscale x 4 x i64> poison, i64 3, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
%4 = mul <vscale x 4 x i64> %x, %3
ret <vscale x 4 x i64> %4
}
When compiled with llc -o - -mattr=+v
, the splatted add is scalarized for the scalable version:
f_nxv4i64:
addi a0, a0, 3
vsetvli a1, zero, e64, m4, ta, ma
vmul.vx v8, v8, a0
ret
But the fixed version doesn't:
f_v4i64:
vsetivli zero, 4, e64, m2, ta, ma
vmv.v.x v10, a0
vadd.vi v10, v10, 3
vmul.vv v8, v8, v10
ret