-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Improve casting between i1 scalable vectors and i8 fixed vectors for -mrvv-vector-bits #139190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ors for -mrvv-vector-bits For i1 vectors, we used an i8 fixed vector as the storage type. If the known minimum number of elements of the scalable vector type is less than 8, we were doing the cast through memory. This used a load or store from a fixed vector alloca. If X is less than 8, DataLayout indicates that the load/store reads/writes vscale bytes even if vscale is known and vscale*X is less than or equal to 8. This means the load or store is outside the bounds of the fixed size alloca as far as DataLayout is concerned leading to undefined behavior. This patch avoids this by widening the i1 scalable vector type with zero elements until it is divisible by 8. This allows it be bitcasted to/from an i8 scalable vector. We then insert or extract the i8 fixed vector into this type. Hopefully this enables llvm#130973 to be accepted.
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-clang Author: Craig Topper (topperc) ChangesFor i1 vectors, we used an i8 fixed vector as the storage type. If the known minimum number of elements of the scalable vector type is less than 8, we were doing the cast through memory. This used a load or store from a fixed vector alloca. If X is less than 8, DataLayout indicates that the load/store reads/writes vscale bytes even if vscale is known and vscale*X is less than or equal to 8. This means the load or store is outside the bounds of the fixed size alloca as far as DataLayout is concerned leading to undefined behavior. This patch avoids this by widening the i1 scalable vector type with zero elements until it is divisible by 8. This allows it be bitcasted to/from an i8 scalable vector. We then insert or extract the i8 fixed vector into this type. Hopefully this enables #130973 to be accepted. Patch is 41.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139190.diff 8 Files Affected:
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 9dfd25f9a8d43..81dfc3884f1af 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1366,19 +1366,29 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
// If we are casting a fixed i8 vector to a scalable i1 predicate
// vector, use a vector insert and bitcast the result.
if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
- ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
FixedSrcTy->getElementType()->isIntegerTy(8)) {
ScalableDstTy = llvm::ScalableVectorType::get(
FixedSrcTy->getElementType(),
- ScalableDstTy->getElementCount().getKnownMinValue() / 8);
+ llvm::divideCeil(
+ ScalableDstTy->getElementCount().getKnownMinValue(), 8));
}
if (ScalableDstTy->getElementType() == FixedSrcTy->getElementType()) {
auto *Load = CGF.Builder.CreateLoad(Src);
auto *PoisonVec = llvm::PoisonValue::get(ScalableDstTy);
llvm::Value *Result = CGF.Builder.CreateInsertVector(
ScalableDstTy, PoisonVec, Load, uint64_t(0), "cast.scalable");
- if (ScalableDstTy != Ty)
- Result = CGF.Builder.CreateBitCast(Result, Ty);
+ ScalableDstTy = cast<llvm::ScalableVectorType>(Ty);
+ if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
+ !ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
+ FixedSrcTy->getElementType()->isIntegerTy(8))
+ ScalableDstTy = llvm::ScalableVectorType::get(
+ ScalableDstTy->getElementType(),
+ llvm::alignTo<8>(
+ ScalableDstTy->getElementCount().getKnownMinValue()));
+ if (Result->getType() != ScalableDstTy)
+ Result = CGF.Builder.CreateBitCast(Result, ScalableDstTy);
+ if (Result->getType() != Ty)
+ Result = CGF.Builder.CreateExtractVector(Ty, Result, uint64_t(0));
return Result;
}
}
@@ -1476,8 +1486,14 @@ CoerceScalableToFixed(CodeGenFunction &CGF, llvm::FixedVectorType *ToTy,
// If we are casting a scalable i1 predicate vector to a fixed i8
// vector, first bitcast the source.
if (FromTy->getElementType()->isIntegerTy(1) &&
- FromTy->getElementCount().isKnownMultipleOf(8) &&
ToTy->getElementType() == CGF.Builder.getInt8Ty()) {
+ if (!FromTy->getElementCount().isKnownMultipleOf(8)) {
+ FromTy = llvm::ScalableVectorType::get(
+ FromTy->getElementType(),
+ llvm::alignTo<8>(FromTy->getElementCount().getKnownMinValue()));
+ llvm::Value *ZeroVec = llvm::Constant::getNullValue(FromTy);
+ V = CGF.Builder.CreateInsertVector(FromTy, ZeroVec, V, uint64_t(0));
+ }
FromTy = llvm::ScalableVectorType::get(
ToTy->getElementType(),
FromTy->getElementCount().getKnownMinValue() / 8);
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index f639a87e3ad0b..7639b8518db6e 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -2492,18 +2492,28 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
// If we are casting a fixed i8 vector to a scalable i1 predicate
// vector, use a vector insert and bitcast the result.
if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
- ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
FixedSrcTy->getElementType()->isIntegerTy(8)) {
ScalableDstTy = llvm::ScalableVectorType::get(
FixedSrcTy->getElementType(),
- ScalableDstTy->getElementCount().getKnownMinValue() / 8);
+ llvm::divideCeil(
+ ScalableDstTy->getElementCount().getKnownMinValue(), 8));
}
if (FixedSrcTy->getElementType() == ScalableDstTy->getElementType()) {
llvm::Value *PoisonVec = llvm::PoisonValue::get(ScalableDstTy);
llvm::Value *Result = Builder.CreateInsertVector(
ScalableDstTy, PoisonVec, Src, uint64_t(0), "cast.scalable");
+ ScalableDstTy = cast<llvm::ScalableVectorType>(DstTy);
+ if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
+ !ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
+ FixedSrcTy->getElementType()->isIntegerTy(8))
+ ScalableDstTy = llvm::ScalableVectorType::get(
+ ScalableDstTy->getElementType(),
+ llvm::alignTo<8>(
+ ScalableDstTy->getElementCount().getKnownMinValue()));
+ if (Result->getType() != ScalableDstTy)
+ Result = Builder.CreateBitCast(Result, ScalableDstTy);
if (Result->getType() != DstTy)
- Result = Builder.CreateBitCast(Result, DstTy);
+ Result = Builder.CreateExtractVector(DstTy, Result, uint64_t(0));
return Result;
}
}
@@ -2517,8 +2527,17 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
// If we are casting a scalable i1 predicate vector to a fixed i8
// vector, bitcast the source and use a vector extract.
if (ScalableSrcTy->getElementType()->isIntegerTy(1) &&
- ScalableSrcTy->getElementCount().isKnownMultipleOf(8) &&
FixedDstTy->getElementType()->isIntegerTy(8)) {
+ if (!ScalableSrcTy->getElementCount().isKnownMultipleOf(8)) {
+ ScalableSrcTy = llvm::ScalableVectorType::get(
+ ScalableSrcTy->getElementType(),
+ llvm::alignTo<8>(
+ ScalableSrcTy->getElementCount().getKnownMinValue()));
+ llvm::Value *ZeroVec = llvm::Constant::getNullValue(ScalableSrcTy);
+ Src = Builder.CreateInsertVector(ScalableSrcTy, ZeroVec, Src,
+ uint64_t(0));
+ }
+
ScalableSrcTy = llvm::ScalableVectorType::get(
FixedDstTy->getElementType(),
ScalableSrcTy->getElementCount().getKnownMinValue() / 8);
diff --git a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
index e2f02dc64f766..3ab065d34bcfb 100644
--- a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
+++ b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
@@ -15,24 +15,12 @@ typedef vbool64_t fixed_bool64_t __attribute__((riscv_rvv_vector_bits(__riscv_v_
// CHECK-64-LABEL: @call_bool32_ff(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 2)
-// CHECK-64-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-64-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 2)
// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP2]]
//
// CHECK-128-LABEL: @call_bool32_ff(
// CHECK-128-NEXT: entry:
-// CHECK-128-NEXT: [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 4)
-// CHECK-128-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-128-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 4)
// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP2]]
//
fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
@@ -41,24 +29,12 @@ fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
// CHECK-64-LABEL: @call_bool64_ff(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 1)
-// CHECK-64-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-64-NEXT: [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 1)
// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP2]]
//
// CHECK-128-LABEL: @call_bool64_ff(
// CHECK-128-NEXT: entry:
-// CHECK-128-NEXT: [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 2)
-// CHECK-128-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
-// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-128-NEXT: [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 2)
// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP2]]
//
fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
@@ -71,25 +47,13 @@ fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
// CHECK-64-LABEL: @call_bool32_fs(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
-// CHECK-64-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-64-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
+// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP1]]
//
// CHECK-128-LABEL: @call_bool32_fs(
// CHECK-128-NEXT: entry:
-// CHECK-128-NEXT: [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
-// CHECK-128-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
-// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-128-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
+// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP1]]
//
fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
@@ -97,25 +61,13 @@ fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
// CHECK-64-LABEL: @call_bool64_fs(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
-// CHECK-64-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-64-NEXT: [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
+// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP1]]
//
// CHECK-128-LABEL: @call_bool64_fs(
// CHECK-128-NEXT: entry:
-// CHECK-128-NEXT: [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
-// CHECK-128-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
-// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-128-NEXT: [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
+// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP1]]
//
fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 64);
@@ -127,25 +79,13 @@ fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
// CHECK-64-LABEL: @call_bool32_ss(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
-// CHECK-64-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP0]]
//
// CHECK-128-LABEL: @call_bool32_ss(
// CHECK-128-NEXT: entry:
-// CHECK-128-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
-// CHECK-128-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
-// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP0]]
//
fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
@@ -153,25 +93,13 @@ fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
// CHECK-64-LABEL: @call_bool64_ss(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
-// CHECK-64-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP0]]
//
// CHECK-128-LABEL: @call_bool64_ss(
// CHECK-128-NEXT: entry:
-// CHECK-128-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
-// CHECK-128-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
-// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP0]]
//
fixed_bool64_t call_bool64_ss(vbool64_t op1, vbool64_t op2) {
return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 64);
diff --git a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c
index f0fa7e8d07b4d..8407c065adb21 100644
--- a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c
+++ b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c
@@ -29,46 +29,22 @@ fixed_bool8_t from_vbool8_t(vbool8_t type) {
// CHECK-64-LABEL: @from_vbool16_t(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 4 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 4 x i1>, align 1
-// CHECK-64-NEXT: store <vscale x 4 x i1> [[TYPE:%.*]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-64-NEXT: [[TMP0:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP0]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <vscale x 4 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: ret <vscale x 4 x i1> [[TMP1]]
+// CHECK-64-NEXT: ret <vscale x 4 x i1> [...
[truncated]
|
@llvm/pr-subscribers-backend-risc-v Author: Craig Topper (topperc) ChangesFor i1 vectors, we used an i8 fixed vector as the storage type. If the known minimum number of elements of the scalable vector type is less than 8, we were doing the cast through memory. This used a load or store from a fixed vector alloca. If X is less than 8, DataLayout indicates that the load/store reads/writes vscale bytes even if vscale is known and vscale*X is less than or equal to 8. This means the load or store is outside the bounds of the fixed size alloca as far as DataLayout is concerned leading to undefined behavior. This patch avoids this by widening the i1 scalable vector type with zero elements until it is divisible by 8. This allows it be bitcasted to/from an i8 scalable vector. We then insert or extract the i8 fixed vector into this type. Hopefully this enables #130973 to be accepted. Patch is 41.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139190.diff 8 Files Affected:
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 9dfd25f9a8d43..81dfc3884f1af 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1366,19 +1366,29 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
// If we are casting a fixed i8 vector to a scalable i1 predicate
// vector, use a vector insert and bitcast the result.
if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
- ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
FixedSrcTy->getElementType()->isIntegerTy(8)) {
ScalableDstTy = llvm::ScalableVectorType::get(
FixedSrcTy->getElementType(),
- ScalableDstTy->getElementCount().getKnownMinValue() / 8);
+ llvm::divideCeil(
+ ScalableDstTy->getElementCount().getKnownMinValue(), 8));
}
if (ScalableDstTy->getElementType() == FixedSrcTy->getElementType()) {
auto *Load = CGF.Builder.CreateLoad(Src);
auto *PoisonVec = llvm::PoisonValue::get(ScalableDstTy);
llvm::Value *Result = CGF.Builder.CreateInsertVector(
ScalableDstTy, PoisonVec, Load, uint64_t(0), "cast.scalable");
- if (ScalableDstTy != Ty)
- Result = CGF.Builder.CreateBitCast(Result, Ty);
+ ScalableDstTy = cast<llvm::ScalableVectorType>(Ty);
+ if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
+ !ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
+ FixedSrcTy->getElementType()->isIntegerTy(8))
+ ScalableDstTy = llvm::ScalableVectorType::get(
+ ScalableDstTy->getElementType(),
+ llvm::alignTo<8>(
+ ScalableDstTy->getElementCount().getKnownMinValue()));
+ if (Result->getType() != ScalableDstTy)
+ Result = CGF.Builder.CreateBitCast(Result, ScalableDstTy);
+ if (Result->getType() != Ty)
+ Result = CGF.Builder.CreateExtractVector(Ty, Result, uint64_t(0));
return Result;
}
}
@@ -1476,8 +1486,14 @@ CoerceScalableToFixed(CodeGenFunction &CGF, llvm::FixedVectorType *ToTy,
// If we are casting a scalable i1 predicate vector to a fixed i8
// vector, first bitcast the source.
if (FromTy->getElementType()->isIntegerTy(1) &&
- FromTy->getElementCount().isKnownMultipleOf(8) &&
ToTy->getElementType() == CGF.Builder.getInt8Ty()) {
+ if (!FromTy->getElementCount().isKnownMultipleOf(8)) {
+ FromTy = llvm::ScalableVectorType::get(
+ FromTy->getElementType(),
+ llvm::alignTo<8>(FromTy->getElementCount().getKnownMinValue()));
+ llvm::Value *ZeroVec = llvm::Constant::getNullValue(FromTy);
+ V = CGF.Builder.CreateInsertVector(FromTy, ZeroVec, V, uint64_t(0));
+ }
FromTy = llvm::ScalableVectorType::get(
ToTy->getElementType(),
FromTy->getElementCount().getKnownMinValue() / 8);
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index f639a87e3ad0b..7639b8518db6e 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -2492,18 +2492,28 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
// If we are casting a fixed i8 vector to a scalable i1 predicate
// vector, use a vector insert and bitcast the result.
if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
- ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
FixedSrcTy->getElementType()->isIntegerTy(8)) {
ScalableDstTy = llvm::ScalableVectorType::get(
FixedSrcTy->getElementType(),
- ScalableDstTy->getElementCount().getKnownMinValue() / 8);
+ llvm::divideCeil(
+ ScalableDstTy->getElementCount().getKnownMinValue(), 8));
}
if (FixedSrcTy->getElementType() == ScalableDstTy->getElementType()) {
llvm::Value *PoisonVec = llvm::PoisonValue::get(ScalableDstTy);
llvm::Value *Result = Builder.CreateInsertVector(
ScalableDstTy, PoisonVec, Src, uint64_t(0), "cast.scalable");
+ ScalableDstTy = cast<llvm::ScalableVectorType>(DstTy);
+ if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
+ !ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
+ FixedSrcTy->getElementType()->isIntegerTy(8))
+ ScalableDstTy = llvm::ScalableVectorType::get(
+ ScalableDstTy->getElementType(),
+ llvm::alignTo<8>(
+ ScalableDstTy->getElementCount().getKnownMinValue()));
+ if (Result->getType() != ScalableDstTy)
+ Result = Builder.CreateBitCast(Result, ScalableDstTy);
if (Result->getType() != DstTy)
- Result = Builder.CreateBitCast(Result, DstTy);
+ Result = Builder.CreateExtractVector(DstTy, Result, uint64_t(0));
return Result;
}
}
@@ -2517,8 +2527,17 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
// If we are casting a scalable i1 predicate vector to a fixed i8
// vector, bitcast the source and use a vector extract.
if (ScalableSrcTy->getElementType()->isIntegerTy(1) &&
- ScalableSrcTy->getElementCount().isKnownMultipleOf(8) &&
FixedDstTy->getElementType()->isIntegerTy(8)) {
+ if (!ScalableSrcTy->getElementCount().isKnownMultipleOf(8)) {
+ ScalableSrcTy = llvm::ScalableVectorType::get(
+ ScalableSrcTy->getElementType(),
+ llvm::alignTo<8>(
+ ScalableSrcTy->getElementCount().getKnownMinValue()));
+ llvm::Value *ZeroVec = llvm::Constant::getNullValue(ScalableSrcTy);
+ Src = Builder.CreateInsertVector(ScalableSrcTy, ZeroVec, Src,
+ uint64_t(0));
+ }
+
ScalableSrcTy = llvm::ScalableVectorType::get(
FixedDstTy->getElementType(),
ScalableSrcTy->getElementCount().getKnownMinValue() / 8);
diff --git a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
index e2f02dc64f766..3ab065d34bcfb 100644
--- a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
+++ b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
@@ -15,24 +15,12 @@ typedef vbool64_t fixed_bool64_t __attribute__((riscv_rvv_vector_bits(__riscv_v_
// CHECK-64-LABEL: @call_bool32_ff(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 2)
-// CHECK-64-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-64-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 2)
// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP2]]
//
// CHECK-128-LABEL: @call_bool32_ff(
// CHECK-128-NEXT: entry:
-// CHECK-128-NEXT: [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 4)
-// CHECK-128-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-128-NEXT: [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 4)
// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP2]]
//
fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
@@ -41,24 +29,12 @@ fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
// CHECK-64-LABEL: @call_bool64_ff(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 1)
-// CHECK-64-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-64-NEXT: [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 1)
// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP2]]
//
// CHECK-128-LABEL: @call_bool64_ff(
// CHECK-128-NEXT: entry:
-// CHECK-128-NEXT: [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 2)
-// CHECK-128-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
-// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-128-NEXT: [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 2)
// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP2]]
//
fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
@@ -71,25 +47,13 @@ fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
// CHECK-64-LABEL: @call_bool32_fs(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
-// CHECK-64-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-64-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
+// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP1]]
//
// CHECK-128-LABEL: @call_bool32_fs(
// CHECK-128-NEXT: entry:
-// CHECK-128-NEXT: [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
-// CHECK-128-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
-// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-128-NEXT: [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
+// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP1]]
//
fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
@@ -97,25 +61,13 @@ fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
// CHECK-64-LABEL: @call_bool64_fs(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
-// CHECK-64-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-64-NEXT: [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
+// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP1]]
//
// CHECK-128-LABEL: @call_bool64_fs(
// CHECK-128-NEXT: entry:
-// CHECK-128-NEXT: [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
-// CHECK-128-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
-// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-128-NEXT: [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
+// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP1]]
//
fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 64);
@@ -127,25 +79,13 @@ fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
// CHECK-64-LABEL: @call_bool32_ss(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
-// CHECK-64-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-64-NEXT: ret <vscale x 2 x i1> [[TMP0]]
//
// CHECK-128-LABEL: @call_bool32_ss(
// CHECK-128-NEXT: entry:
-// CHECK-128-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
-// CHECK-128-NEXT: store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
-// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-128-NEXT: ret <vscale x 2 x i1> [[TMP0]]
//
fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
@@ -153,25 +93,13 @@ fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
// CHECK-64-LABEL: @call_bool64_ss(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-64-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
-// CHECK-64-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-64-NEXT: ret <vscale x 1 x i1> [[TMP0]]
//
// CHECK-128-LABEL: @call_bool64_ss(
// CHECK-128-NEXT: entry:
-// CHECK-128-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
// CHECK-128-NEXT: [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
-// CHECK-128-NEXT: store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
-// CHECK-128-NEXT: [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT: store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-128-NEXT: ret <vscale x 1 x i1> [[TMP0]]
//
fixed_bool64_t call_bool64_ss(vbool64_t op1, vbool64_t op2) {
return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 64);
diff --git a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c
index f0fa7e8d07b4d..8407c065adb21 100644
--- a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c
+++ b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c
@@ -29,46 +29,22 @@ fixed_bool8_t from_vbool8_t(vbool8_t type) {
// CHECK-64-LABEL: @from_vbool16_t(
// CHECK-64-NEXT: entry:
-// CHECK-64-NEXT: [[SAVED_VALUE:%.*]] = alloca <vscale x 4 x i1>, align 1
-// CHECK-64-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 4 x i1>, align 1
-// CHECK-64-NEXT: store <vscale x 4 x i1> [[TYPE:%.*]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-64-NEXT: [[TMP0:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-64-NEXT: store <1 x i8> [[TMP0]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: [[TMP1:%.*]] = load <vscale x 4 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT: ret <vscale x 4 x i1> [[TMP1]]
+// CHECK-64-NEXT: ret <vscale x 4 x i1> [...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks broadly good to me.
FromTy = llvm::ScalableVectorType::get( | ||
FromTy->getElementType(), | ||
llvm::alignTo<8>(FromTy->getElementCount().getKnownMinValue())); | ||
llvm::Value *ZeroVec = llvm::Constant::getNullValue(FromTy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be poison
instead of zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure of the semantics of bitcasting poison elements to a larger element type. Does it poison just the bits or the whole element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LangRef for bitcast says "It is always a no-op cast because no bits change with this conversion.", which suggests any non-poison bits must be preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the original RFC for poison had this text https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html
- bitcast between types of different bitwidth can return poison if when
concatenating adjacent values one of these values is poison. For example:
%x = bitcast <3 x i2> <2, poison, 2> to <2 x i3>
->
%x = <poison, poison>
%x = bitcast <6 x i2> <2, poison, 2, 2, 2, 2> to <4 x i3>
->
%x = <poison, poison, 5, 2>
I'm not sure if that's the semantics we ended up with. If those are the semantics then, i1 poison elements that get added to the upper bits would poison the entire i8 element after the bitcast.
@nikic @nunoplopes what are the semantics of bitcasting poison elements to a larger element type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the semantics. A single poison bits taints the whole value.
So when you bitcast a vector to elements with different sizes, you need to account for the lanes that share the original lane.
In summary, you can't combine poison elements into a lane that is meaningful for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's a choice between the LangRef and an old mailing list post then I think the LangRef should win. At least the bitcast part of the LangRef suggests this is not true, which is backed up by https://godbolt.org/z/n67WTcf1v that shows effort is put in to preserving the known bits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is just a missed optimization. Offering bit-level poison semantics is too complicated. No one managed to come up with a proposal for that. So, we have to go with value-wise poison semantics for the foreseeable future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say "a missed optimization", I say "it has implemented the LangRef" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nunoplopes is correct, there is no bitwise poison in LLVM. That particular bit of phrasing in the bitcast docs probably predates the introduction of poison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For i1 vectors, we used an i8 fixed vector as the storage type.
If the known minimum number of elements of the scalable vector type is less than 8, we were doing the cast through memory. This used a load or store from a fixed vector alloca. If known minimum number of elements is less than 8, DataLayout indicates that the load/store reads/writes vscale bytes even if vscale is known and vscale*min_num_elements is less than or equal to 8. This means the load or store is outside the bounds of the fixed size alloca as far as DataLayout is concerned leading to undefined behavior.
This patch avoids this by widening the i1 scalable vector type with zero elements until it is divisible by 8. This allows it be bitcasted to/from an i8 scalable vector. We then insert or extract the i8 fixed vector into this type.
Hopefully this enables #130973 to be accepted.