[MachinePipeliner] Fix constraints aren't considered in certain cases #95356

kasuga-fj · 2024-06-13T05:15:03Z

when scheduling

When scheduling an instruction, if both any predecessors and any successors of the instruction are already scheduled, SchedStart isn't taken into account. It may result generating incorrect code. This patch fixes the problem. Also, this patch merges SchedStart into EarlyStart (same for SchedEnd).

I checked the schedule changes due to this patch in llvm-test-suite. In conclusion, as far as I observed, there was no change in most cases. Just in case, details of the results are as follows.

Compile options:

-O3 \
-mcpu=neoverse-v1 \
-mllvm -aarch64-enable-pipeliner \
-mllvm -pipeliner-max-stages=1000 \
-mllvm -pipeliner-max-mii=1000 \
-mllvm -pipeliner-enable-copytophi=0 \
-mllvm -pipeliner-ii-search-range=100 \
-mllvm -enable-misched=0 \
-mllvm -enable-post-misched=0

Observed changes:

Test	Summary
MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000	The order of some instructions was changed
MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt	The order of some instructions was changed
MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk	The order of some instructions was changed

close #93936

kasuga-fj · 2024-06-19T04:49:11Z

ping

bcahoon · 2024-06-19T22:52:00Z

llvm/lib/CodeGen/MachinePipeliner.cpp

+        // backwards when all scheduled predecessors are loop-carried
+        // output/order dependencies. Empirically, there are also cases where
+        // scheduling becomes possible with backward search.
+        if (Schedule.onlyHasLoopCarriedOutputOrOrderPreds(SU, this)) {


I'm not sure I see how this keeps the original behavior? I assume some test case fails without it?
If it's needed, can this be combined with the code at line 2455?

I commented out the code between lines 2446-2449, and noticed some test failures. I took a quick look at sw-bad-sched.ll, and I can see what is going on.

Prior to this patch:

Inst (3) %63:intregs = L2_loadri_io %15:intregs, 0 :: (load (s32) from %ir.lsr.iv1, !tbaa !0) es: 80000000 ls: 0 me: 7fffffff ms: ffffffff Trying to insert node between 0 and -1 II: 3

With this patch:

Inst (3) %63:intregs = L2_loadri_io %15:intregs, 0 :: (load (s32) from %ir.lsr.iv1, !tbaa !0) es: ffffffff ls: 0 Trying to insert node between 0 and -1 II: 3

In the old case, the pipeliner uses does a bottom up schedule because es == INT_MAX and ls == 0.
Because es is -1, with the patch, the pipeliner drops into the case when both es and ls have been defined. Previously, that used a top-down search (unless the SU is a Phi).

I think it's okay for the schedule processing to change after your patch. I feel it would be surprising if it didn't. For example, in the case of swp-bad-sched.ll, the generated schedules are equivalent when comparing the output from your patch to the output from removing the code in lines 2446-2449. I haven't looked at the other cases though.

The question is - does the bottom-up scheduling produces a better result for certain situations. The patch has a comment "Empirically, there are also cases where scheduling becomes possible with backward search". This would be a reason to add/keep the code that does a backward search. But, I would see those cases and I think it would be better to combine any changes to do a bottom up search with the isPHI() check

does the bottom-up scheduling produces a better result for certain situations.

Yes. As far as I have checked with llvm-test-suite, there were cases where scheduling failed when stopping the backward search (about 1% of all cases). Removing lines 2446-2449 sometimes generated better schedule, but these were about a tenth of the cases where scheduling failed. Therefore I believe that keeping the original behavior is valuable.

In any case, as you mentioned, merging this process with isPHI() checks looks good to me. I'll fix it.

bcahoon

Can you add a test code using the example from the issue?

llvmbot · 2024-06-21T08:40:46Z

@llvm/pr-subscribers-backend-aarch64

Author: Ryotaro KASUGA (kasuga-fj)

Changes

when scheduling

When scheduling an instruction, if both any predecessors and any successors of the instruction are already scheduled, SchedStart isn't taken into account. It may result generating incorrect code. This patch fixes the problem. Also, this patch merges SchedStart into EarlyStart (same for SchedEnd).

I checked the schedule changes due to this patch in llvm-test-suite. In conclusion, as far as I observed, there was no change in most cases. Just in case, details of the results are as follows.

Compile options:

-O3 \
-mcpu=neoverse-v1 \
-mllvm -aarch64-enable-pipeliner \
-mllvm -pipeliner-max-stages=1000 \
-mllvm -pipeliner-max-mii=1000 \
-mllvm -pipeliner-enable-copytophi=0 \
-mllvm -pipeliner-ii-search-range=100 \
-mllvm -enable-misched=0 \
-mllvm -enable-post-misched=0

Observed changes:

Test	Summary
MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000	The order of some instructions was changed
MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt	The order of some instructions was changed
MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk	The order of some instructions was changed

close #93936

Patch is 22.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95356.diff

3 Files Affected:

(modified) llvm/include/llvm/CodeGen/MachinePipeliner.h (+5-2)
(modified) llvm/lib/CodeGen/MachinePipeliner.cpp (+39-31)
(added) llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir (+335)

diff --git a/llvm/include/llvm/CodeGen/MachinePipeliner.h b/llvm/include/llvm/CodeGen/MachinePipeliner.h
index 8f0a17cf99967..50df76db2d98d 100644
--- a/llvm/include/llvm/CodeGen/MachinePipeliner.h
+++ b/llvm/include/llvm/CodeGen/MachinePipeliner.h
@@ -594,8 +594,8 @@ class SMSchedule {
   /// chain.
   int latestCycleInChain(const SDep &Dep);
 
-  void computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart,
-                    int *MinEnd, int *MaxStart, int II, SwingSchedulerDAG *DAG);
+  void computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart, int II,
+                    SwingSchedulerDAG *DAG);
   bool insert(SUnit *SU, int StartCycle, int EndCycle, int II);
 
   /// Iterators for the cycle to instruction map.
@@ -653,6 +653,9 @@ class SMSchedule {
   bool isLoopCarried(const SwingSchedulerDAG *SSD, MachineInstr &Phi) const;
   bool isLoopCarriedDefOfUse(const SwingSchedulerDAG *SSD, MachineInstr *Def,
                              MachineOperand &MO) const;
+
+  bool onlyHasLoopCarriedOutputOrOrderPreds(SUnit *SU,
+                                            SwingSchedulerDAG *DAG) const;
   void print(raw_ostream &os) const;
   void dump() const;
 };
diff --git a/llvm/lib/CodeGen/MachinePipeliner.cpp b/llvm/lib/CodeGen/MachinePipeliner.cpp
index 6c24cfca793fc..f2ff8d23f276a 100644
--- a/llvm/lib/CodeGen/MachinePipeliner.cpp
+++ b/llvm/lib/CodeGen/MachinePipeliner.cpp
@@ -2420,47 +2420,43 @@ bool SwingSchedulerDAG::schedulePipeline(SMSchedule &Schedule) {
       // upon the scheduled time for any predecessors/successors.
       int EarlyStart = INT_MIN;
       int LateStart = INT_MAX;
-      // These values are set when the size of the schedule window is limited
-      // due to chain dependences.
-      int SchedEnd = INT_MAX;
-      int SchedStart = INT_MIN;
-      Schedule.computeStart(SU, &EarlyStart, &LateStart, &SchedEnd, &SchedStart,
-                            II, this);
+      Schedule.computeStart(SU, &EarlyStart, &LateStart, II, this);
       LLVM_DEBUG({
         dbgs() << "\n";
         dbgs() << "Inst (" << SU->NodeNum << ") ";
         SU->getInstr()->dump();
         dbgs() << "\n";
       });
-      LLVM_DEBUG({
-        dbgs() << format("\tes: %8x ls: %8x me: %8x ms: %8x\n", EarlyStart,
-                         LateStart, SchedEnd, SchedStart);
-      });
+      LLVM_DEBUG(
+          dbgs() << format("\tes: %8x ls: %8x\n", EarlyStart, LateStart));
 
-      if (EarlyStart > LateStart || SchedEnd < EarlyStart ||
-          SchedStart > LateStart)
+      if (EarlyStart > LateStart)
         scheduleFound = false;
-      else if (EarlyStart != INT_MIN && LateStart == INT_MAX) {
-        SchedEnd = std::min(SchedEnd, EarlyStart + (int)II - 1);
-        scheduleFound = Schedule.insert(SU, EarlyStart, SchedEnd, II);
-      } else if (EarlyStart == INT_MIN && LateStart != INT_MAX) {
-        SchedStart = std::max(SchedStart, LateStart - (int)II + 1);
-        scheduleFound = Schedule.insert(SU, LateStart, SchedStart, II);
-      } else if (EarlyStart != INT_MIN && LateStart != INT_MAX) {
-        SchedEnd =
-            std::min(SchedEnd, std::min(LateStart, EarlyStart + (int)II - 1));
-        // When scheduling a Phi it is better to start at the late cycle and go
-        // backwards. The default order may insert the Phi too far away from
-        // its first dependence.
-        if (SU->getInstr()->isPHI())
-          scheduleFound = Schedule.insert(SU, SchedEnd, EarlyStart, II);
+      else if (EarlyStart != INT_MIN && LateStart == INT_MAX)
+        scheduleFound =
+            Schedule.insert(SU, EarlyStart, EarlyStart + (int)II - 1, II);
+      else if (EarlyStart == INT_MIN && LateStart != INT_MAX)
+        scheduleFound =
+            Schedule.insert(SU, LateStart, LateStart - (int)II + 1, II);
+      else if (EarlyStart != INT_MIN && LateStart != INT_MAX) {
+        LateStart = std::min(LateStart, EarlyStart + (int)II - 1);
+        // When scheduling a Phi it is better to start at the late cycle and
+        // go backwards. The default order may insert the Phi too far away
+        // from its first dependence.
+        // Also, do backward search when all scheduled predecessors are
+        // loop-carried output/order dependencies. Empirically, there are also
+        // cases where scheduling becomes possible with backward search.
+        if (SU->getInstr()->isPHI() ||
+            Schedule.onlyHasLoopCarriedOutputOrOrderPreds(SU, this))
+          scheduleFound = Schedule.insert(SU, LateStart, EarlyStart, II);
         else
-          scheduleFound = Schedule.insert(SU, EarlyStart, SchedEnd, II);
+          scheduleFound = Schedule.insert(SU, EarlyStart, LateStart, II);
       } else {
         int FirstCycle = Schedule.getFirstCycle();
         scheduleFound = Schedule.insert(SU, FirstCycle + getASAP(SU),
                                         FirstCycle + getASAP(SU) + II - 1, II);
       }
+
       // Even if we find a schedule, make sure the schedule doesn't exceed the
       // allowable number of stages. We keep trying if this happens.
       if (scheduleFound)
@@ -2868,8 +2864,7 @@ static SUnit *multipleIterations(SUnit *SU, SwingSchedulerDAG *DAG) {
 /// Compute the scheduling start slot for the instruction.  The start slot
 /// depends on any predecessor or successor nodes scheduled already.
 void SMSchedule::computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart,
-                              int *MinEnd, int *MaxStart, int II,
-                              SwingSchedulerDAG *DAG) {
+                              int II, SwingSchedulerDAG *DAG) {
   // Iterate over each instruction that has been scheduled already.  The start
   // slot computation depends on whether the previously scheduled instruction
   // is a predecessor or successor of the specified instruction.
@@ -2888,7 +2883,7 @@ void SMSchedule::computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart,
             *MaxEarlyStart = std::max(*MaxEarlyStart, EarlyStart);
             if (DAG->isLoopCarriedDep(SU, Dep, false)) {
               int End = earliestCycleInChain(Dep) + (II - 1);
-              *MinEnd = std::min(*MinEnd, End);
+              *MinLateStart = std::min(*MinLateStart, End);
             }
           } else {
             int LateStart = cycle - Dep.getLatency() +
@@ -2912,7 +2907,7 @@ void SMSchedule::computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart,
             *MinLateStart = std::min(*MinLateStart, LateStart);
             if (DAG->isLoopCarriedDep(SU, Dep)) {
               int Start = latestCycleInChain(Dep) + 1 - II;
-              *MaxStart = std::max(*MaxStart, Start);
+              *MaxEarlyStart = std::max(*MaxEarlyStart, Start);
             }
           } else {
             int EarlyStart = cycle + Dep.getLatency() -
@@ -3105,6 +3100,19 @@ bool SMSchedule::isLoopCarriedDefOfUse(const SwingSchedulerDAG *SSD,
   return false;
 }
 
+/// Return true if all scheduled predecessors are loop-carried output/order
+/// dependencies.
+bool SMSchedule::onlyHasLoopCarriedOutputOrOrderPreds(
+    SUnit *SU, SwingSchedulerDAG *DAG) const {
+  for (const SDep &Pred : SU->Preds)
+    if (InstrToCycle.count(Pred.getSUnit()) && !DAG->isBackedge(SU, Pred))
+      return false;
+  for (const SDep &Succ : SU->Succs)
+    if (InstrToCycle.count(Succ.getSUnit()) && DAG->isBackedge(SU, Succ))
+      return false;
+  return true;
+}
+
 /// Determine transitive dependences of unpipelineable instructions
 SmallSet<SUnit *, 8> SMSchedule::computeUnpipelineableNodes(
     SwingSchedulerDAG *SSD, TargetInstrInfo::PipelinerLoopInfo *PLI) {
diff --git a/llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir b/llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir
new file mode 100644
index 0000000000000..c1014b296cad3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir
@@ -0,0 +1,335 @@
+# RUN: llc --verify-machineinstrs -mtriple=aarch64 -o - %s -run-pass pipeliner -aarch64-enable-pipeliner -debug-only=pipeliner -pipeliner-max-stages=50 -pipeliner-max-mii=50 -pipeliner-enable-copytophi=0 -pipeliner-ii-search-range=30 2>&1 | FileCheck %s
+# REQUIRES: asserts
+
+# Test that each instruction must be scheduled between the early cycle and the late cycle. Previously there were cases where an instruction is scheduled outside of the valid range. See issue #93936 for details.
+
+# CHECK: {{^ *}}Try to schedule with 47
+# CHECK: {{^ *}}Inst (11)   %48:fpr128 = LDRQui %35:gpr64sp, 0 :: (load (s128) from %ir.lsr.iv63, align 4, !tbaa !0)
+# CHECK-EMPTY:
+# CHECK-NEXT: {{^ *}}es: ffffffe9 ls: ffffffe9
+# CHECK-NEXT: {{^ *}}Trying to insert node between -23 and -23 II: 47
+# CHECK-NEXT: {{^ *}}failed to insert at cycle -23   %48:fpr128 = LDRQui %35:gpr64sp, 0 :: (load (s128) from %ir.lsr.iv63, align 4, !tbaa !0)
+
+--- |
+  target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
+  
+  define dso_local void @f(ptr nocapture noundef writeonly %a, ptr nocapture noundef readonly %b, ptr nocapture noundef readonly %c, ptr nocapture noundef readonly %d, ptr nocapture noundef readonly %e, float noundef %f, i32 noundef %N) local_unnamed_addr {
+  entry:
+    %cmp16 = icmp sgt i32 %N, 0
+    br i1 %cmp16, label %for.body.preheader, label %for.cond.cleanup
+  
+  for.body.preheader:                               ; preds = %entry
+    %wide.trip.count = zext nneg i32 %N to i64
+    %min.iters.check = icmp ult i32 %N, 8
+    br i1 %min.iters.check, label %for.body.preheader37, label %vector.memcheck
+  
+  vector.memcheck:                                  ; preds = %for.body.preheader
+    %0 = ptrtoint ptr %a to i64
+    %1 = ptrtoint ptr %b to i64
+    %2 = ptrtoint ptr %c to i64
+    %3 = ptrtoint ptr %d to i64
+    %4 = ptrtoint ptr %e to i64
+    %5 = sub i64 %0, %1
+    %diff.check = icmp ult i64 %5, 32
+    %6 = sub i64 %0, %2
+    %diff.check22 = icmp ult i64 %6, 32
+    %conflict.rdx = or i1 %diff.check, %diff.check22
+    %7 = sub i64 %0, %3
+    %diff.check24 = icmp ult i64 %7, 32
+    %conflict.rdx25 = or i1 %conflict.rdx, %diff.check24
+    %8 = sub i64 %0, %4
+    %diff.check27 = icmp ult i64 %8, 32
+    %conflict.rdx28 = or i1 %conflict.rdx25, %diff.check27
+    br i1 %conflict.rdx28, label %for.body.preheader37, label %vector.ph
+  
+  vector.ph:                                        ; preds = %vector.memcheck
+    %n.vec = and i64 %wide.trip.count, 2147483640
+    %broadcast.splatinsert = insertelement <4 x float> poison, float %f, i64 0
+    %broadcast.splat = shufflevector <4 x float> %broadcast.splatinsert, <4 x float> poison, <4 x i32> zeroinitializer
+    %scevgep54 = getelementptr i8, ptr %b, i64 16
+    %scevgep58 = getelementptr i8, ptr %a, i64 16
+    %scevgep62 = getelementptr i8, ptr %c, i64 16
+    %scevgep66 = getelementptr i8, ptr %e, i64 16
+    %scevgep70 = getelementptr i8, ptr %d, i64 16
+    br label %vector.body
+  
+  vector.body:                                      ; preds = %vector.body, %vector.ph
+    %lsr.iv71 = phi ptr [ %scevgep72, %vector.body ], [ %scevgep70, %vector.ph ]
+    %lsr.iv67 = phi ptr [ %scevgep68, %vector.body ], [ %scevgep66, %vector.ph ]
+    %lsr.iv63 = phi ptr [ %scevgep64, %vector.body ], [ %scevgep62, %vector.ph ]
+    %lsr.iv59 = phi ptr [ %scevgep60, %vector.body ], [ %scevgep58, %vector.ph ]
+    %lsr.iv55 = phi ptr [ %scevgep56, %vector.body ], [ %scevgep54, %vector.ph ]
+    %lsr.iv52 = phi i64 [ %lsr.iv.next53, %vector.body ], [ %n.vec, %vector.ph ]
+    %scevgep57 = getelementptr i8, ptr %lsr.iv55, i64 -16
+    %wide.load = load <4 x float>, ptr %scevgep57, align 4, !tbaa !6
+    %wide.load29 = load <4 x float>, ptr %lsr.iv55, align 4, !tbaa !6
+    %9 = fmul <4 x float> %wide.load, %broadcast.splat
+    %10 = fmul <4 x float> %wide.load29, %broadcast.splat
+    %scevgep65 = getelementptr i8, ptr %lsr.iv63, i64 -16
+    %wide.load30 = load <4 x float>, ptr %scevgep65, align 4, !tbaa !6
+    %wide.load31 = load <4 x float>, ptr %lsr.iv63, align 4, !tbaa !6
+    %scevgep73 = getelementptr i8, ptr %lsr.iv71, i64 -16
+    %wide.load32 = load <4 x float>, ptr %scevgep73, align 4, !tbaa !6
+    %wide.load33 = load <4 x float>, ptr %lsr.iv71, align 4, !tbaa !6
+    %11 = fsub <4 x float> %wide.load30, %wide.load32
+    %12 = fsub <4 x float> %wide.load31, %wide.load33
+    %13 = fmul <4 x float> %9, %11
+    %14 = fmul <4 x float> %10, %12
+    %scevgep69 = getelementptr i8, ptr %lsr.iv67, i64 -16
+    %wide.load34 = load <4 x float>, ptr %scevgep69, align 4, !tbaa !6
+    %wide.load35 = load <4 x float>, ptr %lsr.iv67, align 4, !tbaa !6
+    %15 = fdiv <4 x float> %13, %wide.load34
+    %16 = fdiv <4 x float> %14, %wide.load35
+    %scevgep61 = getelementptr i8, ptr %lsr.iv59, i64 -16
+    store <4 x float> %15, ptr %scevgep61, align 4, !tbaa !6
+    store <4 x float> %16, ptr %lsr.iv59, align 4, !tbaa !6
+    %lsr.iv.next53 = add nsw i64 %lsr.iv52, -8
+    %scevgep56 = getelementptr i8, ptr %lsr.iv55, i64 32
+    %scevgep60 = getelementptr i8, ptr %lsr.iv59, i64 32
+    %scevgep64 = getelementptr i8, ptr %lsr.iv63, i64 32
+    %scevgep68 = getelementptr i8, ptr %lsr.iv67, i64 32
+    %scevgep72 = getelementptr i8, ptr %lsr.iv71, i64 32
+    %17 = icmp eq i64 %lsr.iv.next53, 0
+    br i1 %17, label %middle.block, label %vector.body, !llvm.loop !10
+  
+  middle.block:                                     ; preds = %vector.body
+    %cmp.n = icmp eq i64 %n.vec, %wide.trip.count
+    br i1 %cmp.n, label %for.cond.cleanup, label %for.body.preheader37
+  
+  for.body.preheader37:                             ; preds = %vector.memcheck, %for.body.preheader, %middle.block
+    %indvars.iv.ph = phi i64 [ %n.vec, %middle.block ], [ 0, %for.body.preheader ], [ 0, %vector.memcheck ]
+    %18 = shl nuw nsw i64 %indvars.iv.ph, 2
+    %scevgep = getelementptr i8, ptr %a, i64 %18
+    %scevgep39 = getelementptr i8, ptr %e, i64 %18
+    %scevgep42 = getelementptr i8, ptr %d, i64 %18
+    %scevgep45 = getelementptr i8, ptr %c, i64 %18
+    %scevgep48 = getelementptr i8, ptr %b, i64 %18
+    %19 = sub i64 %wide.trip.count, %indvars.iv.ph
+    br label %for.body
+  
+  for.cond.cleanup:                                 ; preds = %for.body, %middle.block, %entry
+    ret void
+  
+  for.body:                                         ; preds = %for.body.preheader37, %for.body
+    %lsr.iv51 = phi i64 [ %19, %for.body.preheader37 ], [ %lsr.iv.next, %for.body ]
+    %lsr.iv49 = phi ptr [ %scevgep48, %for.body.preheader37 ], [ %scevgep50, %for.body ]
+    %lsr.iv46 = phi ptr [ %scevgep45, %for.body.preheader37 ], [ %scevgep47, %for.body ]
+    %lsr.iv43 = phi ptr [ %scevgep42, %for.body.preheader37 ], [ %scevgep44, %for.body ]
+    %lsr.iv40 = phi ptr [ %scevgep39, %for.body.preheader37 ], [ %scevgep41, %for.body ]
+    %lsr.iv = phi ptr [ %scevgep, %for.body.preheader37 ], [ %scevgep38, %for.body ]
+    %20 = load float, ptr %lsr.iv49, align 4, !tbaa !6
+    %mul = fmul float %20, %f
+    %21 = load float, ptr %lsr.iv46, align 4, !tbaa !6
+    %22 = load float, ptr %lsr.iv43, align 4, !tbaa !6
+    %sub = fsub float %21, %22
+    %mul5 = fmul float %mul, %sub
+    %23 = load float, ptr %lsr.iv40, align 4, !tbaa !6
+    %div = fdiv float %mul5, %23
+    store float %div, ptr %lsr.iv, align 4, !tbaa !6
+    %scevgep38 = getelementptr i8, ptr %lsr.iv, i64 4
+    %scevgep41 = getelementptr i8, ptr %lsr.iv40, i64 4
+    %scevgep44 = getelementptr i8, ptr %lsr.iv43, i64 4
+    %scevgep47 = getelementptr i8, ptr %lsr.iv46, i64 4
+    %scevgep50 = getelementptr i8, ptr %lsr.iv49, i64 4
+    %lsr.iv.next = add i64 %lsr.iv51, -1
+    %exitcond.not = icmp eq i64 %lsr.iv.next, 0
+    br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+  }
+  
+  !6 = !{!7, !7, i64 0}
+  !7 = !{!"float", !8, i64 0}
+  !8 = !{!"omnipotent char", !9, i64 0}
+  !9 = !{!"Simple C/C++ TBAA"}
+  !10 = distinct !{!10, !11, !12, !13}
+  !11 = !{!"llvm.loop.mustprogress"}
+  !12 = !{!"llvm.loop.isvectorized", i32 1}
+  !13 = !{!"llvm.loop.unroll.runtime.disable"}
+  !14 = distinct !{!14, !11, !12}
+
+...
+---
+name:            f
+tracksRegLiveness: true
+liveins:
+  - { reg: '$x0', virtual-reg: '%39' }
+  - { reg: '$x1', virtual-reg: '%40' }
+  - { reg: '$x2', virtual-reg: '%41' }
+  - { reg: '$x3', virtual-reg: '%42' }
+  - { reg: '$x4', virtual-reg: '%43' }
+  - { reg: '$s0', virtual-reg: '%44' }
+  - { reg: '$w5', virtual-reg: '%45' }
+body:             |
+  bb.0.entry:
+    successors: %bb.1, %bb.7
+    liveins: $x0, $x1, $x2, $x3, $x4, $s0, $w5
+  
+    %45:gpr32common = COPY $w5
+    %44:fpr32 = COPY $s0
+    %43:gpr64common = COPY $x4
+    %42:gpr64common = COPY $x3
+    %41:gpr64common = COPY $x2
+    %40:gpr64common = COPY $x1
+    %39:gpr64common = COPY $x0
+    dead $wzr = SUBSWri %45, 1, 0, implicit-def $nzcv
+    Bcc 11, %bb.7, implicit $nzcv
+    B %bb.1
+  
+  bb.1.for.body.preheader:
+    successors: %bb.12, %bb.2
+  
+    %48:gpr32 = ORRWrs $wzr, %45, 0
+    %0:gpr64 = SUBREG_TO_REG 0, killed %48, %subreg.sub_32
+    dead $wzr = SUBSWri %45, 8, 0, implicit-def $nzcv
+    Bcc 2, %bb.2, implicit $nzcv
+  
+  bb.12:
+    %49:gpr64all = COPY $xzr
+    %47:gpr64all = COPY %49
+    B %bb.6
+  
+  bb.2.vector.memcheck:
+    successors: %bb.6, %bb.11
+  
+    %55:gpr64common = SUBXrr %39, %40
+    %59:gpr64all = COPY $xzr
+    %51:gpr64all = COPY %59
+    dead $xzr = SUBSXri killed %55, 32, 0, implicit-def $nzcv
+    Bcc 3, %bb.6, implicit $nzcv
+    B %bb.11
+  
+  bb.11.vector.memcheck:
+    successors: %bb.6, %bb.10
+  
+    %56:gpr64common = SUBXrr %39, %41
+    dead $xzr = SUBSXri %56, 32, 0, implicit-def $nzcv
+    Bcc 3, %bb.6, implicit $nzcv
+    B %bb.10
+  
+  bb.10.vector.memcheck:
+    successors: %bb.6, %bb.9
+  
+    %57:gpr64common = SUBXrr %39, %42
+    dead $xzr = SUBSXri %57, 32, 0, implicit-def $nzcv
+    Bcc 3, %bb.6, implicit $nzcv
+    B %bb.9
+  
+  bb.9.vector.memcheck:
+    successors: %bb.6, %bb.3
+  
+    %58:gpr64common = SUBXrr %39, %43
+    dead $xzr = SUBSXri %58, 32, 0, implicit-def $nzcv
+    Bcc 3, %bb.6, implicit $nzcv
+    B %bb.3
+  
+  bb.3.vector.ph:
+    %64:gpr64common = ANDXri %0, 8027
+    %1:gpr64 = COPY %64
+    %66:fpr128 = IMPLICIT_DEF
+    %65:fpr128 = INSERT_SUBREG %66, %44, %subreg.ssub
+    %67:gpr64sp = ADDXri %40, 16, 0
+    %3:gpr64all = COPY %67
+    %68:gpr64sp = ADDXri %39, 16, 0
+    %4:gpr64all = COPY %68
+    %69:gpr64sp = ADDXri %41, 16, 0
+    %5:gpr64all = COPY %69
+    %70:gpr64sp = ADDXri %43, 16, 0
+    %6:gpr64all = COPY %70
+    %71:gpr64sp = ADDXri %42, 16, 0
+    %7:gpr64all = COPY %71
+  
+  bb.4.vector.body:
+    successors: %bb.5, %bb.4
+  
+    %8:gpr64sp = PHI %7, %bb.3, %19, %bb.4
+    %9:gpr64sp = PHI %6, %bb.3, %18, %bb.4
+    %10:gpr64sp = PHI %5, %bb.3, %17, %bb.4
+    %11:gpr64sp = PHI %4, %bb.3, %16, %bb.4
+    %12:gpr64sp = PHI %3, %bb.3, %15, %bb.4
+    %13:gpr64sp = PHI %1, %bb.3, %14, %bb.4
+    %72:fpr128 = LDURQi %12, -16 :: (load (s128) from %ir.scevgep57, align 4, !tbaa !6)
+    %73:fpr128 = LDRQui %12, 0 :: (load (s128) from %ir.lsr.iv55, align 4, !tbaa !6)
+    %74:fpr128 = nofpexcept FMULv4i32_indexed killed %72, %65, 0, implicit $fpcr
+    %75:fpr128 = nofpexcept FMULv4i32_indexed killed %73, %65, 0, implicit $fpcr
+    %76:fpr128 = LDURQi %10, -16 :: (load (s128) from %ir.scevgep65, align 4, !tbaa !6)
+    %77:fpr128 = LDRQui %10, 0 :: (load (s128) from %ir.lsr.iv63, align 4, !tbaa !6)
+    %78:fpr128 = LDURQi %8, -16 :: (load (s128) from %ir.scevgep73, align 4, !tbaa !6)
+    %79:fpr128 = LDRQui %8, 0 :: (load (s128) from %ir.lsr.iv71, align 4, !tbaa !6)
+    %80:fpr128 = nofpexcept FSUBv4f32 killed %76, killed %78, implicit $fpcr
+    %81:fpr128 = nofpexcept FSUBv4f32 killed %77, killed %79, implicit $fpcr
+    %82:fpr128 = nofpexcept FMULv4f32 killed %74, killed %80, implicit $fpcr
+    %83:fpr128 = nofpexcept FMULv4f32 killed %75, killed %81, implicit $fpcr
+    %84:fpr128 = LDURQi %9, -16 :: (load (s128) from %ir.scevgep69, align 4, !tbaa !6)
+    %85:fpr128 = LDRQui %9, 0 :: (load (s128) from %ir.lsr.iv67, align 4, !tbaa !6)
+    %86:fpr128 = nofpexcept FDIVv4f32 killed %82, killed %84, implicit $fpcr
+    %87:fpr128 = nofpexcept FDIVv4f32 killed %83, killed %85, implicit $fpcr
+    STURQi killed %86, %11, -16 :: (store (s128) into %ir.scevgep61, align 4, !tbaa !6)
+    STRQui killed %87, %11, 0 :: (store (s128) into %ir.lsr.iv59, ...
[truncated]

kasuga-fj · 2024-06-21T08:41:35Z

Added test.

bcahoon

Thanks for fixing. LGTM.

when scheduling When scheduling an instruction, if both any predecessors and any successors of the instruction are already scheduled, `SchedStart` isn't taken into account. It may result generating incorrect code. This patch fixes the problem.

llvm-ci · 2024-07-01T00:30:19Z

LLVM Buildbot has detected a new failure on builder clang-ppc64be-linux-test-suite running on ppc64be-clang-test-suite while building llvm at step 7 "test-build-unified-tree-check-runtimes".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/176/builds/576

Here is the relevant piece of the build log for the reference:

Step 7 (test-build-unified-tree-check-runtimes) failure: test (failure)
******************** TEST 'SanitizerCommon-lsan-powerpc64-Linux :: Linux/soft_rss_limit_mb_test.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/./bin/clang  --driver-mode=g++ -gline-tables-only -fsanitize=leak  -m64 -funwind-tables  -I/home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test -ldl -O2 /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=leak -m64 -funwind-tables -I/home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test -ldl -O2 /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
RUN: at line 5: env LSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1      /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
+ env LSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1 /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
/home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp:68:24: error: CHECK_MAY_RETURN_1: expected string not found in input
// CHECK_MAY_RETURN_1: allocating 512 times
                       ^
<stdin>:52:44: note: scanning from here
Some of the malloc calls returned non-null: 256
                                           ^
<stdin>:53:13: note: possible intended match here
==131393==LeakSanitizer: soft rss limit unexhausted (220Mb vs 95Mb)
            ^

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
           47:  [256] 
           48:  [320] 
           49:  [384] 
           50:  [448] 
           51: Some of the malloc calls returned null: 256 
           52: Some of the malloc calls returned non-null: 256 
check:68'0                                                X~~~~ error: no match found
           53: ==131393==LeakSanitizer: soft rss limit unexhausted (220Mb vs 95Mb) 
check:68'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:68'1                 ?                                                        possible intended match
>>>>>>

--

********************

…in cases" (#97246) Reverts #95356 Due to ppc64le test failures caught by the LLVM Buildbot. https://lab.llvm.org/buildbot/#/builders/176/builds/576

…llvm#95356) when scheduling When scheduling an instruction, if both any predecessors and any successors of the instruction are already scheduled, `SchedStart` isn't taken into account. It may result generating incorrect code. This patch fixes the problem. Also, this patch merges `SchedStart` into `EarlyStart` (same for `SchedEnd`). Fixes llvm#93936

…in cases" (llvm#97246) Reverts llvm#95356 Due to ppc64le test failures caught by the LLVM Buildbot. https://lab.llvm.org/buildbot/#/builders/176/builds/576

kasuga-fj requested a review from bcahoon June 13, 2024 05:16

bcahoon reviewed Jun 19, 2024

View reviewed changes

llvmbot added the backend:AArch64 label Jun 21, 2024

kasuga-fj requested a review from bcahoon June 21, 2024 08:42

bcahoon approved these changes Jun 27, 2024

View reviewed changes

kasuga-fj force-pushed the fix-swpl-schedstart branch from 35c5813 to 453f53a Compare June 28, 2024 01:28

kasuga-fj force-pushed the fix-swpl-schedstart branch from 453f53a to 09464c4 Compare June 28, 2024 09:37

kasuga-fj merged commit e19ac0d into llvm:main Jul 1, 2024
7 checks passed

kasuga-fj deleted the fix-swpl-schedstart branch July 1, 2024 00:07

kasuga-fj mentioned this pull request Jul 1, 2024

Revert "[MachinePipeliner] Fix constraints aren't considered in certain cases" #97246

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MachinePipeliner] Fix constraints aren't considered in certain cases #95356

[MachinePipeliner] Fix constraints aren't considered in certain cases #95356

Uh oh!

kasuga-fj commented Jun 13, 2024 •

edited

Loading

Uh oh!

kasuga-fj commented Jun 19, 2024

Uh oh!

bcahoon Jun 19, 2024

Uh oh!

bcahoon Jun 20, 2024

Uh oh!

kasuga-fj Jun 21, 2024 •

edited

Loading

Uh oh!

bcahoon left a comment

Uh oh!

llvmbot commented Jun 21, 2024

Uh oh!

kasuga-fj commented Jun 21, 2024

Uh oh!

bcahoon left a comment

Uh oh!

Uh oh!

llvm-ci commented Jul 1, 2024

Uh oh!

Uh oh!

[MachinePipeliner] Fix constraints aren't considered in certain cases #95356

[MachinePipeliner] Fix constraints aren't considered in certain cases #95356

Uh oh!

Conversation

kasuga-fj commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kasuga-fj commented Jun 19, 2024

Uh oh!

bcahoon Jun 19, 2024

Choose a reason for hiding this comment

Uh oh!

bcahoon Jun 20, 2024

Choose a reason for hiding this comment

Uh oh!

kasuga-fj Jun 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bcahoon left a comment

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Jun 21, 2024

Uh oh!

kasuga-fj commented Jun 21, 2024

Uh oh!

bcahoon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Jul 1, 2024

Uh oh!

Uh oh!

kasuga-fj commented Jun 13, 2024 •

edited

Loading

kasuga-fj Jun 21, 2024 •

edited

Loading