[RISCV] Improve casting between i1 scalable vectors and i8 fixed vectors for -mrvv-vector-bits #139190

topperc · 2025-05-09T01:17:58Z

For i1 vectors, we used an i8 fixed vector as the storage type.

If the known minimum number of elements of the scalable vector type is less than 8, we were doing the cast through memory. This used a load or store from a fixed vector alloca. If known minimum number of elements is less than 8, DataLayout indicates that the load/store reads/writes vscale bytes even if vscale is known and vscale*min_num_elements is less than or equal to 8. This means the load or store is outside the bounds of the fixed size alloca as far as DataLayout is concerned leading to undefined behavior.

This patch avoids this by widening the i1 scalable vector type with zero elements until it is divisible by 8. This allows it be bitcasted to/from an i8 scalable vector. We then insert or extract the i8 fixed vector into this type.

Hopefully this enables #130973 to be accepted.

…ors for -mrvv-vector-bits For i1 vectors, we used an i8 fixed vector as the storage type. If the known minimum number of elements of the scalable vector type is less than 8, we were doing the cast through memory. This used a load or store from a fixed vector alloca. If X is less than 8, DataLayout indicates that the load/store reads/writes vscale bytes even if vscale is known and vscale*X is less than or equal to 8. This means the load or store is outside the bounds of the fixed size alloca as far as DataLayout is concerned leading to undefined behavior. This patch avoids this by widening the i1 scalable vector type with zero elements until it is divisible by 8. This allows it be bitcasted to/from an i8 scalable vector. We then insert or extract the i8 fixed vector into this type. Hopefully this enables llvm#130973 to be accepted.

llvmbot · 2025-05-09T01:18:29Z

@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Craig Topper (topperc)

Changes

For i1 vectors, we used an i8 fixed vector as the storage type.

If the known minimum number of elements of the scalable vector type is less than 8, we were doing the cast through memory. This used a load or store from a fixed vector alloca. If X is less than 8, DataLayout indicates that the load/store reads/writes vscale bytes even if vscale is known and vscale*X is less than or equal to 8. This means the load or store is outside the bounds of the fixed size alloca as far as DataLayout is concerned leading to undefined behavior.

This patch avoids this by widening the i1 scalable vector type with zero elements until it is divisible by 8. This allows it be bitcasted to/from an i8 scalable vector. We then insert or extract the i8 fixed vector into this type.

Hopefully this enables #130973 to be accepted.

Patch is 41.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139190.diff

8 Files Affected:

(modified) clang/lib/CodeGen/CGCall.cpp (+21-5)
(modified) clang/lib/CodeGen/CGExprScalar.cpp (+23-4)
(modified) clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c (+16-88)
(modified) clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c (+10-46)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-bitcast-less-8.c (+16-16)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-cast.c (+6-12)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-codegen.c (+19-18)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-globals.c (+8-8)

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 9dfd25f9a8d43..81dfc3884f1af 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1366,19 +1366,29 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
       // If we are casting a fixed i8 vector to a scalable i1 predicate
       // vector, use a vector insert and bitcast the result.
       if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
-          ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
           FixedSrcTy->getElementType()->isIntegerTy(8)) {
         ScalableDstTy = llvm::ScalableVectorType::get(
             FixedSrcTy->getElementType(),
-            ScalableDstTy->getElementCount().getKnownMinValue() / 8);
+            llvm::divideCeil(
+                ScalableDstTy->getElementCount().getKnownMinValue(), 8));
       }
       if (ScalableDstTy->getElementType() == FixedSrcTy->getElementType()) {
         auto *Load = CGF.Builder.CreateLoad(Src);
         auto *PoisonVec = llvm::PoisonValue::get(ScalableDstTy);
         llvm::Value *Result = CGF.Builder.CreateInsertVector(
             ScalableDstTy, PoisonVec, Load, uint64_t(0), "cast.scalable");
-        if (ScalableDstTy != Ty)
-          Result = CGF.Builder.CreateBitCast(Result, Ty);
+        ScalableDstTy = cast<llvm::ScalableVectorType>(Ty);
+        if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
+            !ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
+            FixedSrcTy->getElementType()->isIntegerTy(8))
+          ScalableDstTy = llvm::ScalableVectorType::get(
+              ScalableDstTy->getElementType(),
+              llvm::alignTo<8>(
+                  ScalableDstTy->getElementCount().getKnownMinValue()));
+        if (Result->getType() != ScalableDstTy)
+          Result = CGF.Builder.CreateBitCast(Result, ScalableDstTy);
+        if (Result->getType() != Ty)
+          Result = CGF.Builder.CreateExtractVector(Ty, Result, uint64_t(0));
         return Result;
       }
     }
@@ -1476,8 +1486,14 @@ CoerceScalableToFixed(CodeGenFunction &CGF, llvm::FixedVectorType *ToTy,
   // If we are casting a scalable i1 predicate vector to a fixed i8
   // vector, first bitcast the source.
   if (FromTy->getElementType()->isIntegerTy(1) &&
-      FromTy->getElementCount().isKnownMultipleOf(8) &&
       ToTy->getElementType() == CGF.Builder.getInt8Ty()) {
+    if (!FromTy->getElementCount().isKnownMultipleOf(8)) {
+      FromTy = llvm::ScalableVectorType::get(
+          FromTy->getElementType(),
+          llvm::alignTo<8>(FromTy->getElementCount().getKnownMinValue()));
+      llvm::Value *ZeroVec = llvm::Constant::getNullValue(FromTy);
+      V = CGF.Builder.CreateInsertVector(FromTy, ZeroVec, V, uint64_t(0));
+    }
     FromTy = llvm::ScalableVectorType::get(
         ToTy->getElementType(),
         FromTy->getElementCount().getKnownMinValue() / 8);
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index f639a87e3ad0b..7639b8518db6e 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -2492,18 +2492,28 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
         // If we are casting a fixed i8 vector to a scalable i1 predicate
         // vector, use a vector insert and bitcast the result.
         if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
-            ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
             FixedSrcTy->getElementType()->isIntegerTy(8)) {
           ScalableDstTy = llvm::ScalableVectorType::get(
               FixedSrcTy->getElementType(),
-              ScalableDstTy->getElementCount().getKnownMinValue() / 8);
+              llvm::divideCeil(
+                  ScalableDstTy->getElementCount().getKnownMinValue(), 8));
         }
         if (FixedSrcTy->getElementType() == ScalableDstTy->getElementType()) {
           llvm::Value *PoisonVec = llvm::PoisonValue::get(ScalableDstTy);
           llvm::Value *Result = Builder.CreateInsertVector(
               ScalableDstTy, PoisonVec, Src, uint64_t(0), "cast.scalable");
+          ScalableDstTy = cast<llvm::ScalableVectorType>(DstTy);
+          if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
+              !ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
+              FixedSrcTy->getElementType()->isIntegerTy(8))
+            ScalableDstTy = llvm::ScalableVectorType::get(
+                ScalableDstTy->getElementType(),
+                llvm::alignTo<8>(
+                    ScalableDstTy->getElementCount().getKnownMinValue()));
+          if (Result->getType() != ScalableDstTy)
+            Result = Builder.CreateBitCast(Result, ScalableDstTy);
           if (Result->getType() != DstTy)
-            Result = Builder.CreateBitCast(Result, DstTy);
+            Result = Builder.CreateExtractVector(DstTy, Result, uint64_t(0));
           return Result;
         }
       }
@@ -2517,8 +2527,17 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
         // If we are casting a scalable i1 predicate vector to a fixed i8
         // vector, bitcast the source and use a vector extract.
         if (ScalableSrcTy->getElementType()->isIntegerTy(1) &&
-            ScalableSrcTy->getElementCount().isKnownMultipleOf(8) &&
             FixedDstTy->getElementType()->isIntegerTy(8)) {
+          if (!ScalableSrcTy->getElementCount().isKnownMultipleOf(8)) {
+            ScalableSrcTy = llvm::ScalableVectorType::get(
+                ScalableSrcTy->getElementType(),
+                llvm::alignTo<8>(
+                    ScalableSrcTy->getElementCount().getKnownMinValue()));
+            llvm::Value *ZeroVec = llvm::Constant::getNullValue(ScalableSrcTy);
+            Src = Builder.CreateInsertVector(ScalableSrcTy, ZeroVec, Src,
+                                             uint64_t(0));
+          }
+
           ScalableSrcTy = llvm::ScalableVectorType::get(
               FixedDstTy->getElementType(),
               ScalableSrcTy->getElementCount().getKnownMinValue() / 8);
diff --git a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
index e2f02dc64f766..3ab065d34bcfb 100644
--- a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
+++ b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
@@ -15,24 +15,12 @@ typedef vbool64_t fixed_bool64_t __attribute__((riscv_rvv_vector_bits(__riscv_v_
 
 // CHECK-64-LABEL: @call_bool32_ff(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 2)
-// CHECK-64-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-64-NEXT:    [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 2)
 // CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
 //
 // CHECK-128-LABEL: @call_bool32_ff(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 4)
-// CHECK-128-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-128-NEXT:    [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 4)
 // CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
 //
 fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
@@ -41,24 +29,12 @@ fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
 
 // CHECK-64-LABEL: @call_bool64_ff(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 1)
-// CHECK-64-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-64-NEXT:    [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 1)
 // CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
 //
 // CHECK-128-LABEL: @call_bool64_ff(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 2)
-// CHECK-128-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-128-NEXT:    [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 2)
 // CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
 //
 fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
@@ -71,25 +47,13 @@ fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
 
 // CHECK-64-LABEL: @call_bool32_fs(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
-// CHECK-64-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-64-NEXT:    [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
+// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP1]]
 //
 // CHECK-128-LABEL: @call_bool32_fs(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
-// CHECK-128-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-128-NEXT:    [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
+// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP1]]
 //
 fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
@@ -97,25 +61,13 @@ fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
 
 // CHECK-64-LABEL: @call_bool64_fs(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
-// CHECK-64-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-64-NEXT:    [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
+// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP1]]
 //
 // CHECK-128-LABEL: @call_bool64_fs(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
-// CHECK-128-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-128-NEXT:    [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
+// CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP1]]
 //
 fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 64);
@@ -127,25 +79,13 @@ fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
 
 // CHECK-64-LABEL: @call_bool32_ss(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
 // CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
-// CHECK-64-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP0]]
 //
 // CHECK-128-LABEL: @call_bool32_ss(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
 // CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
-// CHECK-128-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP0]]
 //
 fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
@@ -153,25 +93,13 @@ fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
 
 // CHECK-64-LABEL: @call_bool64_ss(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
 // CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
-// CHECK-64-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP0]]
 //
 // CHECK-128-LABEL: @call_bool64_ss(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
 // CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
-// CHECK-128-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP0]]
 //
 fixed_bool64_t call_bool64_ss(vbool64_t op1, vbool64_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 64);
diff --git a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c
index f0fa7e8d07b4d..8407c065adb21 100644
--- a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c
+++ b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c
@@ -29,46 +29,22 @@ fixed_bool8_t from_vbool8_t(vbool8_t type) {
 
 // CHECK-64-LABEL: @from_vbool16_t(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 4 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 4 x i1>, align 1
-// CHECK-64-NEXT:    store <vscale x 4 x i1> [[TYPE:%.*]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-64-NEXT:    [[TMP0:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP0]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <vscale x 4 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 4 x i1> [[TMP1]]
+// CHECK-64-NEXT:    ret <vscale x 4 x i1> [...
[truncated]

llvmbot · 2025-05-09T01:18:30Z

@llvm/pr-subscribers-backend-risc-v

Author: Craig Topper (topperc)

Changes

For i1 vectors, we used an i8 fixed vector as the storage type.

If the known minimum number of elements of the scalable vector type is less than 8, we were doing the cast through memory. This used a load or store from a fixed vector alloca. If X is less than 8, DataLayout indicates that the load/store reads/writes vscale bytes even if vscale is known and vscale*X is less than or equal to 8. This means the load or store is outside the bounds of the fixed size alloca as far as DataLayout is concerned leading to undefined behavior.

This patch avoids this by widening the i1 scalable vector type with zero elements until it is divisible by 8. This allows it be bitcasted to/from an i8 scalable vector. We then insert or extract the i8 fixed vector into this type.

Hopefully this enables #130973 to be accepted.

Patch is 41.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139190.diff

8 Files Affected:

(modified) clang/lib/CodeGen/CGCall.cpp (+21-5)
(modified) clang/lib/CodeGen/CGExprScalar.cpp (+23-4)
(modified) clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c (+16-88)
(modified) clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c (+10-46)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-bitcast-less-8.c (+16-16)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-cast.c (+6-12)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-codegen.c (+19-18)
(modified) clang/test/CodeGen/RISCV/attr-rvv-vector-bits-globals.c (+8-8)

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 9dfd25f9a8d43..81dfc3884f1af 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -1366,19 +1366,29 @@ static llvm::Value *CreateCoercedLoad(Address Src, llvm::Type *Ty,
       // If we are casting a fixed i8 vector to a scalable i1 predicate
       // vector, use a vector insert and bitcast the result.
       if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
-          ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
           FixedSrcTy->getElementType()->isIntegerTy(8)) {
         ScalableDstTy = llvm::ScalableVectorType::get(
             FixedSrcTy->getElementType(),
-            ScalableDstTy->getElementCount().getKnownMinValue() / 8);
+            llvm::divideCeil(
+                ScalableDstTy->getElementCount().getKnownMinValue(), 8));
       }
       if (ScalableDstTy->getElementType() == FixedSrcTy->getElementType()) {
         auto *Load = CGF.Builder.CreateLoad(Src);
         auto *PoisonVec = llvm::PoisonValue::get(ScalableDstTy);
         llvm::Value *Result = CGF.Builder.CreateInsertVector(
             ScalableDstTy, PoisonVec, Load, uint64_t(0), "cast.scalable");
-        if (ScalableDstTy != Ty)
-          Result = CGF.Builder.CreateBitCast(Result, Ty);
+        ScalableDstTy = cast<llvm::ScalableVectorType>(Ty);
+        if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
+            !ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
+            FixedSrcTy->getElementType()->isIntegerTy(8))
+          ScalableDstTy = llvm::ScalableVectorType::get(
+              ScalableDstTy->getElementType(),
+              llvm::alignTo<8>(
+                  ScalableDstTy->getElementCount().getKnownMinValue()));
+        if (Result->getType() != ScalableDstTy)
+          Result = CGF.Builder.CreateBitCast(Result, ScalableDstTy);
+        if (Result->getType() != Ty)
+          Result = CGF.Builder.CreateExtractVector(Ty, Result, uint64_t(0));
         return Result;
       }
     }
@@ -1476,8 +1486,14 @@ CoerceScalableToFixed(CodeGenFunction &CGF, llvm::FixedVectorType *ToTy,
   // If we are casting a scalable i1 predicate vector to a fixed i8
   // vector, first bitcast the source.
   if (FromTy->getElementType()->isIntegerTy(1) &&
-      FromTy->getElementCount().isKnownMultipleOf(8) &&
       ToTy->getElementType() == CGF.Builder.getInt8Ty()) {
+    if (!FromTy->getElementCount().isKnownMultipleOf(8)) {
+      FromTy = llvm::ScalableVectorType::get(
+          FromTy->getElementType(),
+          llvm::alignTo<8>(FromTy->getElementCount().getKnownMinValue()));
+      llvm::Value *ZeroVec = llvm::Constant::getNullValue(FromTy);
+      V = CGF.Builder.CreateInsertVector(FromTy, ZeroVec, V, uint64_t(0));
+    }
     FromTy = llvm::ScalableVectorType::get(
         ToTy->getElementType(),
         FromTy->getElementCount().getKnownMinValue() / 8);
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index f639a87e3ad0b..7639b8518db6e 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -2492,18 +2492,28 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
         // If we are casting a fixed i8 vector to a scalable i1 predicate
         // vector, use a vector insert and bitcast the result.
         if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
-            ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
             FixedSrcTy->getElementType()->isIntegerTy(8)) {
           ScalableDstTy = llvm::ScalableVectorType::get(
               FixedSrcTy->getElementType(),
-              ScalableDstTy->getElementCount().getKnownMinValue() / 8);
+              llvm::divideCeil(
+                  ScalableDstTy->getElementCount().getKnownMinValue(), 8));
         }
         if (FixedSrcTy->getElementType() == ScalableDstTy->getElementType()) {
           llvm::Value *PoisonVec = llvm::PoisonValue::get(ScalableDstTy);
           llvm::Value *Result = Builder.CreateInsertVector(
               ScalableDstTy, PoisonVec, Src, uint64_t(0), "cast.scalable");
+          ScalableDstTy = cast<llvm::ScalableVectorType>(DstTy);
+          if (ScalableDstTy->getElementType()->isIntegerTy(1) &&
+              !ScalableDstTy->getElementCount().isKnownMultipleOf(8) &&
+              FixedSrcTy->getElementType()->isIntegerTy(8))
+            ScalableDstTy = llvm::ScalableVectorType::get(
+                ScalableDstTy->getElementType(),
+                llvm::alignTo<8>(
+                    ScalableDstTy->getElementCount().getKnownMinValue()));
+          if (Result->getType() != ScalableDstTy)
+            Result = Builder.CreateBitCast(Result, ScalableDstTy);
           if (Result->getType() != DstTy)
-            Result = Builder.CreateBitCast(Result, DstTy);
+            Result = Builder.CreateExtractVector(DstTy, Result, uint64_t(0));
           return Result;
         }
       }
@@ -2517,8 +2527,17 @@ Value *ScalarExprEmitter::VisitCastExpr(CastExpr *CE) {
         // If we are casting a scalable i1 predicate vector to a fixed i8
         // vector, bitcast the source and use a vector extract.
         if (ScalableSrcTy->getElementType()->isIntegerTy(1) &&
-            ScalableSrcTy->getElementCount().isKnownMultipleOf(8) &&
             FixedDstTy->getElementType()->isIntegerTy(8)) {
+          if (!ScalableSrcTy->getElementCount().isKnownMultipleOf(8)) {
+            ScalableSrcTy = llvm::ScalableVectorType::get(
+                ScalableSrcTy->getElementType(),
+                llvm::alignTo<8>(
+                    ScalableSrcTy->getElementCount().getKnownMinValue()));
+            llvm::Value *ZeroVec = llvm::Constant::getNullValue(ScalableSrcTy);
+            Src = Builder.CreateInsertVector(ScalableSrcTy, ZeroVec, Src,
+                                             uint64_t(0));
+          }
+
           ScalableSrcTy = llvm::ScalableVectorType::get(
               FixedDstTy->getElementType(),
               ScalableSrcTy->getElementCount().getKnownMinValue() / 8);
diff --git a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
index e2f02dc64f766..3ab065d34bcfb 100644
--- a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
+++ b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-call.c
@@ -15,24 +15,12 @@ typedef vbool64_t fixed_bool64_t __attribute__((riscv_rvv_vector_bits(__riscv_v_
 
 // CHECK-64-LABEL: @call_bool32_ff(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 2)
-// CHECK-64-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-64-NEXT:    [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 2)
 // CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
 //
 // CHECK-128-LABEL: @call_bool32_ff(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2_COERCE:%.*]], i64 4)
-// CHECK-128-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-128-NEXT:    [[TMP2:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[TMP1:%.*]], i64 4)
 // CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
 //
 fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
@@ -41,24 +29,12 @@ fixed_bool32_t call_bool32_ff(fixed_bool32_t op1, fixed_bool32_t op2) {
 
 // CHECK-64-LABEL: @call_bool64_ff(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 1)
-// CHECK-64-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-64-NEXT:    [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 1)
 // CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
 //
 // CHECK-128-LABEL: @call_bool64_ff(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE4:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2_COERCE:%.*]], i64 2)
-// CHECK-128-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA11:![0-9]+]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE4]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
+// CHECK-128-NEXT:    [[TMP2:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[TMP1:%.*]], i64 2)
 // CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
 //
 fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
@@ -71,25 +47,13 @@ fixed_bool64_t call_bool64_ff(fixed_bool64_t op1, fixed_bool64_t op2) {
 
 // CHECK-64-LABEL: @call_bool32_fs(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
-// CHECK-64-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-64-NEXT:    [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
+// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP1]]
 //
 // CHECK-128-LABEL: @call_bool32_fs(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1_COERCE:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
-// CHECK-128-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA6]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-128-NEXT:    [[TMP1:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[TMP0:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
+// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP1]]
 //
 fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
@@ -97,25 +61,13 @@ fixed_bool32_t call_bool32_fs(fixed_bool32_t op1, vbool32_t op2) {
 
 // CHECK-64-LABEL: @call_bool64_fs(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
-// CHECK-64-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-64-NEXT:    [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
+// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP1]]
 //
 // CHECK-128-LABEL: @call_bool64_fs(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE2:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1_COERCE:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
-// CHECK-128-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA11]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE2]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-128-NEXT:    [[TMP1:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[TMP0:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
+// CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP1]]
 //
 fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 64);
@@ -127,25 +79,13 @@ fixed_bool64_t call_bool64_fs(fixed_bool64_t op1, vbool64_t op2) {
 
 // CHECK-64-LABEL: @call_bool32_ss(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
 // CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 2)
-// CHECK-64-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-64-NEXT:    ret <vscale x 2 x i1> [[TMP0]]
 //
 // CHECK-128-LABEL: @call_bool32_ss(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 2 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 2 x i1>, align 1
 // CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 2 x i1> @llvm.riscv.vmand.nxv2i1.i64(<vscale x 2 x i1> [[OP1:%.*]], <vscale x 2 x i1> [[OP2:%.*]], i64 4)
-// CHECK-128-NEXT:    store <vscale x 2 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 2 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP2]]
+// CHECK-128-NEXT:    ret <vscale x 2 x i1> [[TMP0]]
 //
 fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 32);
@@ -153,25 +93,13 @@ fixed_bool32_t call_bool32_ss(vbool32_t op1, vbool32_t op2) {
 
 // CHECK-64-LABEL: @call_bool64_ss(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
 // CHECK-64-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 1)
-// CHECK-64-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-64-NEXT:    ret <vscale x 1 x i1> [[TMP0]]
 //
 // CHECK-128-LABEL: @call_bool64_ss(
 // CHECK-128-NEXT:  entry:
-// CHECK-128-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 1 x i1>, align 1
-// CHECK-128-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 1 x i1>, align 1
 // CHECK-128-NEXT:    [[TMP0:%.*]] = tail call <vscale x 1 x i1> @llvm.riscv.vmand.nxv1i1.i64(<vscale x 1 x i1> [[OP1:%.*]], <vscale x 1 x i1> [[OP2:%.*]], i64 2)
-// CHECK-128-NEXT:    store <vscale x 1 x i1> [[TMP0]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA11]]
-// CHECK-128-NEXT:    [[TMP1:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10]]
-// CHECK-128-NEXT:    store <1 x i8> [[TMP1]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    [[TMP2:%.*]] = load <vscale x 1 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP2]]
+// CHECK-128-NEXT:    ret <vscale x 1 x i1> [[TMP0]]
 //
 fixed_bool64_t call_bool64_ss(vbool64_t op1, vbool64_t op2) {
   return __riscv_vmand(op1, op2, __riscv_v_fixed_vlen / 64);
diff --git a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c
index f0fa7e8d07b4d..8407c065adb21 100644
--- a/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c
+++ b/clang/test/CodeGen/RISCV/attr-riscv-rvv-vector-bits-less-8-cast.c
@@ -29,46 +29,22 @@ fixed_bool8_t from_vbool8_t(vbool8_t type) {
 
 // CHECK-64-LABEL: @from_vbool16_t(
 // CHECK-64-NEXT:  entry:
-// CHECK-64-NEXT:    [[SAVED_VALUE:%.*]] = alloca <vscale x 4 x i1>, align 1
-// CHECK-64-NEXT:    [[RETVAL_COERCE:%.*]] = alloca <vscale x 4 x i1>, align 1
-// CHECK-64-NEXT:    store <vscale x 4 x i1> [[TYPE:%.*]], ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA6:![0-9]+]]
-// CHECK-64-NEXT:    [[TMP0:%.*]] = load <1 x i8>, ptr [[SAVED_VALUE]], align 1, !tbaa [[TBAA10:![0-9]+]]
-// CHECK-64-NEXT:    store <1 x i8> [[TMP0]], ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    [[TMP1:%.*]] = load <vscale x 4 x i1>, ptr [[RETVAL_COERCE]], align 1
-// CHECK-64-NEXT:    ret <vscale x 4 x i1> [[TMP1]]
+// CHECK-64-NEXT:    ret <vscale x 4 x i1> [...
[truncated]

paulwalker-arm

This looks broadly good to me.

paulwalker-arm · 2025-05-12T12:36:15Z

clang/lib/CodeGen/CGCall.cpp

+      FromTy = llvm::ScalableVectorType::get(
+          FromTy->getElementType(),
+          llvm::alignTo<8>(FromTy->getElementCount().getKnownMinValue()));
+      llvm::Value *ZeroVec = llvm::Constant::getNullValue(FromTy);


Should this be poison instead of zero?

I wasn't sure of the semantics of bitcasting poison elements to a larger element type. Does it poison just the bits or the whole element?

The LangRef for bitcast says "It is always a no-op cast because no bits change with this conversion.", which suggests any non-poison bits must be preserved.

the original RFC for poison had this text https://lists.llvm.org/pipermail/llvm-dev/2016-October/106182.html

- bitcast between types of different bitwidth can return poison if when concatenating adjacent values one of these values is poison. For example: %x = bitcast <3 x i2> <2, poison, 2> to <2 x i3> -> %x = <poison, poison> %x = bitcast <6 x i2> <2, poison, 2, 2, 2, 2> to <4 x i3> -> %x = <poison, poison, 5, 2>

I'm not sure if that's the semantics we ended up with. If those are the semantics then, i1 poison elements that get added to the upper bits would poison the entire i8 element after the bitcast.

@nikic @nunoplopes what are the semantics of bitcasting poison elements to a larger element type?

Yes, that's the semantics. A single poison bits taints the whole value.
So when you bitcast a vector to elements with different sizes, you need to account for the lanes that share the original lane.
In summary, you can't combine poison elements into a lane that is meaningful for you.

If it's a choice between the LangRef and an old mailing list post then I think the LangRef should win. At least the bitcast part of the LangRef suggests this is not true, which is backed up by https://godbolt.org/z/n67WTcf1v that shows effort is put in to preserving the known bits?

That is just a missed optimization. Offering bit-level poison semantics is too complicated. No one managed to come up with a proposal for that. So, we have to go with value-wise poison semantics for the foreseeable future.

You say "a missed optimization", I say "it has implemented the LangRef" :)

@nunoplopes is correct, there is no bitwise poison in LLVM. That particular bit of phrasing in the bitcast docs probably predates the introduction of poison.

clang/lib/CodeGen/CGExprScalar.cpp

clang/lib/CodeGen/CGCall.cpp

paulwalker-arm

This looks good to me. Thanks for helping to unblock #130973 @topperc.

topperc requested review from nikic and paulwalker-arm May 9, 2025 01:17

llvmbot added clang Clang issues not falling into any other category backend:RISC-V clang:codegen IR generation bugs: mangling, exceptions, etc. labels May 9, 2025

topperc changed the title ~~[RISCV] Improve casting between i1 scalable vectors and i8 fixed vect…~~ [RISCV] Improve casting between i1 scalable vectors and i8 fixed vectors for -mrvv-vector-bits May 9, 2025

topperc mentioned this pull request May 9, 2025

[CodeGen][RISCV] Use vscale_range to handle more fixed<->scalable casts of i1 vectors. #138378

Closed

paulwalker-arm reviewed May 12, 2025

View reviewed changes

fixup! Add and use VectorType::getWithSizeAndScalar.

968a25c

llvmbot added the llvm:ir label May 13, 2025

paulwalker-arm approved these changes May 14, 2025

View reviewed changes

topperc merged commit bf0655f into llvm:main May 14, 2025
10 of 12 checks passed

topperc deleted the pr/i1-cast branch May 14, 2025 17:27

paulwalker-arm mentioned this pull request May 15, 2025

[LLVM][SROA] Teach SROA how to "bitcast" between fixed and scalable vectors. #130973

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RISCV] Improve casting between i1 scalable vectors and i8 fixed vectors for -mrvv-vector-bits #139190

[RISCV] Improve casting between i1 scalable vectors and i8 fixed vectors for -mrvv-vector-bits #139190

Uh oh!

topperc commented May 9, 2025 •

edited

Loading

Uh oh!

llvmbot commented May 9, 2025 •

edited

Loading

Uh oh!

llvmbot commented May 9, 2025

Uh oh!

paulwalker-arm left a comment

Uh oh!

paulwalker-arm May 12, 2025

Uh oh!

topperc May 12, 2025

Uh oh!

paulwalker-arm May 13, 2025 •

edited

Loading

Uh oh!

topperc May 13, 2025 •

edited

Loading

Uh oh!

nunoplopes May 13, 2025

Uh oh!

paulwalker-arm May 14, 2025

Uh oh!

nunoplopes May 14, 2025

Uh oh!

paulwalker-arm May 14, 2025

Uh oh!

nikic May 14, 2025

Uh oh!

Uh oh!

Uh oh!

paulwalker-arm left a comment

Uh oh!

Uh oh!

Uh oh!

[RISCV] Improve casting between i1 scalable vectors and i8 fixed vectors for -mrvv-vector-bits #139190

[RISCV] Improve casting between i1 scalable vectors and i8 fixed vectors for -mrvv-vector-bits #139190

Uh oh!

Conversation

topperc commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented May 9, 2025

Uh oh!

paulwalker-arm left a comment

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm May 12, 2025

Choose a reason for hiding this comment

Uh oh!

topperc May 12, 2025

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

topperc May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nunoplopes May 13, 2025

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm May 14, 2025

Choose a reason for hiding this comment

Uh oh!

nunoplopes May 14, 2025

Choose a reason for hiding this comment

Uh oh!

paulwalker-arm May 14, 2025

Choose a reason for hiding this comment

Uh oh!

nikic May 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

paulwalker-arm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

topperc commented May 9, 2025 •

edited

Loading

llvmbot commented May 9, 2025 •

edited

Loading

paulwalker-arm May 13, 2025 •

edited

Loading

topperc May 13, 2025 •

edited

Loading