Skip to content

[MachinePipeliner] Fix constraints aren't considered in certain cases #95356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 1, 2024

Conversation

kasuga-fj
Copy link
Contributor

@kasuga-fj kasuga-fj commented Jun 13, 2024

when scheduling

When scheduling an instruction, if both any predecessors and any successors of the instruction are already scheduled, SchedStart isn't taken into account. It may result generating incorrect code. This patch fixes the problem. Also, this patch merges SchedStart into EarlyStart (same for SchedEnd).

I checked the schedule changes due to this patch in llvm-test-suite. In conclusion, as far as I observed, there was no change in most cases. Just in case, details of the results are as follows.

Compile options:

-O3 \
-mcpu=neoverse-v1 \
-mllvm -aarch64-enable-pipeliner \
-mllvm -pipeliner-max-stages=1000 \
-mllvm -pipeliner-max-mii=1000 \
-mllvm -pipeliner-enable-copytophi=0 \
-mllvm -pipeliner-ii-search-range=100 \
-mllvm -enable-misched=0 \
-mllvm -enable-post-misched=0

Observed changes:

Test Summary
MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 The order of some instructions was changed
MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt The order of some instructions was changed
MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk The order of some instructions was changed

close #93936

@kasuga-fj kasuga-fj requested a review from bcahoon June 13, 2024 05:16
@kasuga-fj
Copy link
Contributor Author

ping

// backwards when all scheduled predecessors are loop-carried
// output/order dependencies. Empirically, there are also cases where
// scheduling becomes possible with backward search.
if (Schedule.onlyHasLoopCarriedOutputOrOrderPreds(SU, this)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see how this keeps the original behavior? I assume some test case fails without it?
If it's needed, can this be combined with the code at line 2455?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented out the code between lines 2446-2449, and noticed some test failures. I took a quick look at sw-bad-sched.ll, and I can see what is going on.

Prior to this patch:

Inst (3)   %63:intregs = L2_loadri_io %15:intregs, 0 :: (load (s32) from %ir.lsr.iv1, !tbaa !0)
      es: 80000000 ls:        0 me: 7fffffff ms: ffffffff
Trying to insert node between 0 and -1 II: 3

With this patch:

Inst (3)   %63:intregs = L2_loadri_io %15:intregs, 0 :: (load (s32) from %ir.lsr.iv1, !tbaa !0)
      es: ffffffff ls:        0
Trying to insert node between 0 and -1 II: 3

In the old case, the pipeliner uses does a bottom up schedule because es == INT_MAX and ls == 0.
Because es is -1, with the patch, the pipeliner drops into the case when both es and ls have been defined. Previously, that used a top-down search (unless the SU is a Phi).

I think it's okay for the schedule processing to change after your patch. I feel it would be surprising if it didn't. For example, in the case of swp-bad-sched.ll, the generated schedules are equivalent when comparing the output from your patch to the output from removing the code in lines 2446-2449. I haven't looked at the other cases though.

The question is - does the bottom-up scheduling produces a better result for certain situations. The patch has a comment "Empirically, there are also cases where scheduling becomes possible with backward search". This would be a reason to add/keep the code that does a backward search. But, I would see those cases and I think it would be better to combine any changes to do a bottom up search with the isPHI() check

Copy link
Contributor Author

@kasuga-fj kasuga-fj Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the bottom-up scheduling produces a better result for certain situations.

Yes. As far as I have checked with llvm-test-suite, there were cases where scheduling failed when stopping the backward search (about 1% of all cases). Removing lines 2446-2449 sometimes generated better schedule, but these were about a tenth of the cases where scheduling failed. Therefore I believe that keeping the original behavior is valuable.

In any case, as you mentioned, merging this process with isPHI() checks looks good to me. I'll fix it.

Copy link
Contributor

@bcahoon bcahoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test code using the example from the issue?

@llvmbot
Copy link
Member

llvmbot commented Jun 21, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Ryotaro KASUGA (kasuga-fj)

Changes

when scheduling

When scheduling an instruction, if both any predecessors and any successors of the instruction are already scheduled, SchedStart isn't taken into account. It may result generating incorrect code. This patch fixes the problem. Also, this patch merges SchedStart into EarlyStart (same for SchedEnd).

I checked the schedule changes due to this patch in llvm-test-suite. In conclusion, as far as I observed, there was no change in most cases. Just in case, details of the results are as follows.

Compile options:

-O3 \
-mcpu=neoverse-v1 \
-mllvm -aarch64-enable-pipeliner \
-mllvm -pipeliner-max-stages=1000 \
-mllvm -pipeliner-max-mii=1000 \
-mllvm -pipeliner-enable-copytophi=0 \
-mllvm -pipeliner-ii-search-range=100 \
-mllvm -enable-misched=0 \
-mllvm -enable-post-misched=0

Observed changes:

Test Summary
MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 The order of some instructions was changed
MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt The order of some instructions was changed
MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk The order of some instructions was changed

close #93936


Patch is 22.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95356.diff

3 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/MachinePipeliner.h (+5-2)
  • (modified) llvm/lib/CodeGen/MachinePipeliner.cpp (+39-31)
  • (added) llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir (+335)
diff --git a/llvm/include/llvm/CodeGen/MachinePipeliner.h b/llvm/include/llvm/CodeGen/MachinePipeliner.h
index 8f0a17cf99967..50df76db2d98d 100644
--- a/llvm/include/llvm/CodeGen/MachinePipeliner.h
+++ b/llvm/include/llvm/CodeGen/MachinePipeliner.h
@@ -594,8 +594,8 @@ class SMSchedule {
   /// chain.
   int latestCycleInChain(const SDep &Dep);
 
-  void computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart,
-                    int *MinEnd, int *MaxStart, int II, SwingSchedulerDAG *DAG);
+  void computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart, int II,
+                    SwingSchedulerDAG *DAG);
   bool insert(SUnit *SU, int StartCycle, int EndCycle, int II);
 
   /// Iterators for the cycle to instruction map.
@@ -653,6 +653,9 @@ class SMSchedule {
   bool isLoopCarried(const SwingSchedulerDAG *SSD, MachineInstr &Phi) const;
   bool isLoopCarriedDefOfUse(const SwingSchedulerDAG *SSD, MachineInstr *Def,
                              MachineOperand &MO) const;
+
+  bool onlyHasLoopCarriedOutputOrOrderPreds(SUnit *SU,
+                                            SwingSchedulerDAG *DAG) const;
   void print(raw_ostream &os) const;
   void dump() const;
 };
diff --git a/llvm/lib/CodeGen/MachinePipeliner.cpp b/llvm/lib/CodeGen/MachinePipeliner.cpp
index 6c24cfca793fc..f2ff8d23f276a 100644
--- a/llvm/lib/CodeGen/MachinePipeliner.cpp
+++ b/llvm/lib/CodeGen/MachinePipeliner.cpp
@@ -2420,47 +2420,43 @@ bool SwingSchedulerDAG::schedulePipeline(SMSchedule &Schedule) {
       // upon the scheduled time for any predecessors/successors.
       int EarlyStart = INT_MIN;
       int LateStart = INT_MAX;
-      // These values are set when the size of the schedule window is limited
-      // due to chain dependences.
-      int SchedEnd = INT_MAX;
-      int SchedStart = INT_MIN;
-      Schedule.computeStart(SU, &EarlyStart, &LateStart, &SchedEnd, &SchedStart,
-                            II, this);
+      Schedule.computeStart(SU, &EarlyStart, &LateStart, II, this);
       LLVM_DEBUG({
         dbgs() << "\n";
         dbgs() << "Inst (" << SU->NodeNum << ") ";
         SU->getInstr()->dump();
         dbgs() << "\n";
       });
-      LLVM_DEBUG({
-        dbgs() << format("\tes: %8x ls: %8x me: %8x ms: %8x\n", EarlyStart,
-                         LateStart, SchedEnd, SchedStart);
-      });
+      LLVM_DEBUG(
+          dbgs() << format("\tes: %8x ls: %8x\n", EarlyStart, LateStart));
 
-      if (EarlyStart > LateStart || SchedEnd < EarlyStart ||
-          SchedStart > LateStart)
+      if (EarlyStart > LateStart)
         scheduleFound = false;
-      else if (EarlyStart != INT_MIN && LateStart == INT_MAX) {
-        SchedEnd = std::min(SchedEnd, EarlyStart + (int)II - 1);
-        scheduleFound = Schedule.insert(SU, EarlyStart, SchedEnd, II);
-      } else if (EarlyStart == INT_MIN && LateStart != INT_MAX) {
-        SchedStart = std::max(SchedStart, LateStart - (int)II + 1);
-        scheduleFound = Schedule.insert(SU, LateStart, SchedStart, II);
-      } else if (EarlyStart != INT_MIN && LateStart != INT_MAX) {
-        SchedEnd =
-            std::min(SchedEnd, std::min(LateStart, EarlyStart + (int)II - 1));
-        // When scheduling a Phi it is better to start at the late cycle and go
-        // backwards. The default order may insert the Phi too far away from
-        // its first dependence.
-        if (SU->getInstr()->isPHI())
-          scheduleFound = Schedule.insert(SU, SchedEnd, EarlyStart, II);
+      else if (EarlyStart != INT_MIN && LateStart == INT_MAX)
+        scheduleFound =
+            Schedule.insert(SU, EarlyStart, EarlyStart + (int)II - 1, II);
+      else if (EarlyStart == INT_MIN && LateStart != INT_MAX)
+        scheduleFound =
+            Schedule.insert(SU, LateStart, LateStart - (int)II + 1, II);
+      else if (EarlyStart != INT_MIN && LateStart != INT_MAX) {
+        LateStart = std::min(LateStart, EarlyStart + (int)II - 1);
+        // When scheduling a Phi it is better to start at the late cycle and
+        // go backwards. The default order may insert the Phi too far away
+        // from its first dependence.
+        // Also, do backward search when all scheduled predecessors are
+        // loop-carried output/order dependencies. Empirically, there are also
+        // cases where scheduling becomes possible with backward search.
+        if (SU->getInstr()->isPHI() ||
+            Schedule.onlyHasLoopCarriedOutputOrOrderPreds(SU, this))
+          scheduleFound = Schedule.insert(SU, LateStart, EarlyStart, II);
         else
-          scheduleFound = Schedule.insert(SU, EarlyStart, SchedEnd, II);
+          scheduleFound = Schedule.insert(SU, EarlyStart, LateStart, II);
       } else {
         int FirstCycle = Schedule.getFirstCycle();
         scheduleFound = Schedule.insert(SU, FirstCycle + getASAP(SU),
                                         FirstCycle + getASAP(SU) + II - 1, II);
       }
+
       // Even if we find a schedule, make sure the schedule doesn't exceed the
       // allowable number of stages. We keep trying if this happens.
       if (scheduleFound)
@@ -2868,8 +2864,7 @@ static SUnit *multipleIterations(SUnit *SU, SwingSchedulerDAG *DAG) {
 /// Compute the scheduling start slot for the instruction.  The start slot
 /// depends on any predecessor or successor nodes scheduled already.
 void SMSchedule::computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart,
-                              int *MinEnd, int *MaxStart, int II,
-                              SwingSchedulerDAG *DAG) {
+                              int II, SwingSchedulerDAG *DAG) {
   // Iterate over each instruction that has been scheduled already.  The start
   // slot computation depends on whether the previously scheduled instruction
   // is a predecessor or successor of the specified instruction.
@@ -2888,7 +2883,7 @@ void SMSchedule::computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart,
             *MaxEarlyStart = std::max(*MaxEarlyStart, EarlyStart);
             if (DAG->isLoopCarriedDep(SU, Dep, false)) {
               int End = earliestCycleInChain(Dep) + (II - 1);
-              *MinEnd = std::min(*MinEnd, End);
+              *MinLateStart = std::min(*MinLateStart, End);
             }
           } else {
             int LateStart = cycle - Dep.getLatency() +
@@ -2912,7 +2907,7 @@ void SMSchedule::computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart,
             *MinLateStart = std::min(*MinLateStart, LateStart);
             if (DAG->isLoopCarriedDep(SU, Dep)) {
               int Start = latestCycleInChain(Dep) + 1 - II;
-              *MaxStart = std::max(*MaxStart, Start);
+              *MaxEarlyStart = std::max(*MaxEarlyStart, Start);
             }
           } else {
             int EarlyStart = cycle + Dep.getLatency() -
@@ -3105,6 +3100,19 @@ bool SMSchedule::isLoopCarriedDefOfUse(const SwingSchedulerDAG *SSD,
   return false;
 }
 
+/// Return true if all scheduled predecessors are loop-carried output/order
+/// dependencies.
+bool SMSchedule::onlyHasLoopCarriedOutputOrOrderPreds(
+    SUnit *SU, SwingSchedulerDAG *DAG) const {
+  for (const SDep &Pred : SU->Preds)
+    if (InstrToCycle.count(Pred.getSUnit()) && !DAG->isBackedge(SU, Pred))
+      return false;
+  for (const SDep &Succ : SU->Succs)
+    if (InstrToCycle.count(Succ.getSUnit()) && DAG->isBackedge(SU, Succ))
+      return false;
+  return true;
+}
+
 /// Determine transitive dependences of unpipelineable instructions
 SmallSet<SUnit *, 8> SMSchedule::computeUnpipelineableNodes(
     SwingSchedulerDAG *SSD, TargetInstrInfo::PipelinerLoopInfo *PLI) {
diff --git a/llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir b/llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir
new file mode 100644
index 0000000000000..c1014b296cad3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir
@@ -0,0 +1,335 @@
+# RUN: llc --verify-machineinstrs -mtriple=aarch64 -o - %s -run-pass pipeliner -aarch64-enable-pipeliner -debug-only=pipeliner -pipeliner-max-stages=50 -pipeliner-max-mii=50 -pipeliner-enable-copytophi=0 -pipeliner-ii-search-range=30 2>&1 | FileCheck %s
+# REQUIRES: asserts
+
+# Test that each instruction must be scheduled between the early cycle and the late cycle. Previously there were cases where an instruction is scheduled outside of the valid range. See issue #93936 for details.
+
+# CHECK: {{^ *}}Try to schedule with 47
+# CHECK: {{^ *}}Inst (11)   %48:fpr128 = LDRQui %35:gpr64sp, 0 :: (load (s128) from %ir.lsr.iv63, align 4, !tbaa !0)
+# CHECK-EMPTY:
+# CHECK-NEXT: {{^ *}}es: ffffffe9 ls: ffffffe9
+# CHECK-NEXT: {{^ *}}Trying to insert node between -23 and -23 II: 47
+# CHECK-NEXT: {{^ *}}failed to insert at cycle -23   %48:fpr128 = LDRQui %35:gpr64sp, 0 :: (load (s128) from %ir.lsr.iv63, align 4, !tbaa !0)
+
+--- |
+  target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
+  
+  define dso_local void @f(ptr nocapture noundef writeonly %a, ptr nocapture noundef readonly %b, ptr nocapture noundef readonly %c, ptr nocapture noundef readonly %d, ptr nocapture noundef readonly %e, float noundef %f, i32 noundef %N) local_unnamed_addr {
+  entry:
+    %cmp16 = icmp sgt i32 %N, 0
+    br i1 %cmp16, label %for.body.preheader, label %for.cond.cleanup
+  
+  for.body.preheader:                               ; preds = %entry
+    %wide.trip.count = zext nneg i32 %N to i64
+    %min.iters.check = icmp ult i32 %N, 8
+    br i1 %min.iters.check, label %for.body.preheader37, label %vector.memcheck
+  
+  vector.memcheck:                                  ; preds = %for.body.preheader
+    %0 = ptrtoint ptr %a to i64
+    %1 = ptrtoint ptr %b to i64
+    %2 = ptrtoint ptr %c to i64
+    %3 = ptrtoint ptr %d to i64
+    %4 = ptrtoint ptr %e to i64
+    %5 = sub i64 %0, %1
+    %diff.check = icmp ult i64 %5, 32
+    %6 = sub i64 %0, %2
+    %diff.check22 = icmp ult i64 %6, 32
+    %conflict.rdx = or i1 %diff.check, %diff.check22
+    %7 = sub i64 %0, %3
+    %diff.check24 = icmp ult i64 %7, 32
+    %conflict.rdx25 = or i1 %conflict.rdx, %diff.check24
+    %8 = sub i64 %0, %4
+    %diff.check27 = icmp ult i64 %8, 32
+    %conflict.rdx28 = or i1 %conflict.rdx25, %diff.check27
+    br i1 %conflict.rdx28, label %for.body.preheader37, label %vector.ph
+  
+  vector.ph:                                        ; preds = %vector.memcheck
+    %n.vec = and i64 %wide.trip.count, 2147483640
+    %broadcast.splatinsert = insertelement <4 x float> poison, float %f, i64 0
+    %broadcast.splat = shufflevector <4 x float> %broadcast.splatinsert, <4 x float> poison, <4 x i32> zeroinitializer
+    %scevgep54 = getelementptr i8, ptr %b, i64 16
+    %scevgep58 = getelementptr i8, ptr %a, i64 16
+    %scevgep62 = getelementptr i8, ptr %c, i64 16
+    %scevgep66 = getelementptr i8, ptr %e, i64 16
+    %scevgep70 = getelementptr i8, ptr %d, i64 16
+    br label %vector.body
+  
+  vector.body:                                      ; preds = %vector.body, %vector.ph
+    %lsr.iv71 = phi ptr [ %scevgep72, %vector.body ], [ %scevgep70, %vector.ph ]
+    %lsr.iv67 = phi ptr [ %scevgep68, %vector.body ], [ %scevgep66, %vector.ph ]
+    %lsr.iv63 = phi ptr [ %scevgep64, %vector.body ], [ %scevgep62, %vector.ph ]
+    %lsr.iv59 = phi ptr [ %scevgep60, %vector.body ], [ %scevgep58, %vector.ph ]
+    %lsr.iv55 = phi ptr [ %scevgep56, %vector.body ], [ %scevgep54, %vector.ph ]
+    %lsr.iv52 = phi i64 [ %lsr.iv.next53, %vector.body ], [ %n.vec, %vector.ph ]
+    %scevgep57 = getelementptr i8, ptr %lsr.iv55, i64 -16
+    %wide.load = load <4 x float>, ptr %scevgep57, align 4, !tbaa !6
+    %wide.load29 = load <4 x float>, ptr %lsr.iv55, align 4, !tbaa !6
+    %9 = fmul <4 x float> %wide.load, %broadcast.splat
+    %10 = fmul <4 x float> %wide.load29, %broadcast.splat
+    %scevgep65 = getelementptr i8, ptr %lsr.iv63, i64 -16
+    %wide.load30 = load <4 x float>, ptr %scevgep65, align 4, !tbaa !6
+    %wide.load31 = load <4 x float>, ptr %lsr.iv63, align 4, !tbaa !6
+    %scevgep73 = getelementptr i8, ptr %lsr.iv71, i64 -16
+    %wide.load32 = load <4 x float>, ptr %scevgep73, align 4, !tbaa !6
+    %wide.load33 = load <4 x float>, ptr %lsr.iv71, align 4, !tbaa !6
+    %11 = fsub <4 x float> %wide.load30, %wide.load32
+    %12 = fsub <4 x float> %wide.load31, %wide.load33
+    %13 = fmul <4 x float> %9, %11
+    %14 = fmul <4 x float> %10, %12
+    %scevgep69 = getelementptr i8, ptr %lsr.iv67, i64 -16
+    %wide.load34 = load <4 x float>, ptr %scevgep69, align 4, !tbaa !6
+    %wide.load35 = load <4 x float>, ptr %lsr.iv67, align 4, !tbaa !6
+    %15 = fdiv <4 x float> %13, %wide.load34
+    %16 = fdiv <4 x float> %14, %wide.load35
+    %scevgep61 = getelementptr i8, ptr %lsr.iv59, i64 -16
+    store <4 x float> %15, ptr %scevgep61, align 4, !tbaa !6
+    store <4 x float> %16, ptr %lsr.iv59, align 4, !tbaa !6
+    %lsr.iv.next53 = add nsw i64 %lsr.iv52, -8
+    %scevgep56 = getelementptr i8, ptr %lsr.iv55, i64 32
+    %scevgep60 = getelementptr i8, ptr %lsr.iv59, i64 32
+    %scevgep64 = getelementptr i8, ptr %lsr.iv63, i64 32
+    %scevgep68 = getelementptr i8, ptr %lsr.iv67, i64 32
+    %scevgep72 = getelementptr i8, ptr %lsr.iv71, i64 32
+    %17 = icmp eq i64 %lsr.iv.next53, 0
+    br i1 %17, label %middle.block, label %vector.body, !llvm.loop !10
+  
+  middle.block:                                     ; preds = %vector.body
+    %cmp.n = icmp eq i64 %n.vec, %wide.trip.count
+    br i1 %cmp.n, label %for.cond.cleanup, label %for.body.preheader37
+  
+  for.body.preheader37:                             ; preds = %vector.memcheck, %for.body.preheader, %middle.block
+    %indvars.iv.ph = phi i64 [ %n.vec, %middle.block ], [ 0, %for.body.preheader ], [ 0, %vector.memcheck ]
+    %18 = shl nuw nsw i64 %indvars.iv.ph, 2
+    %scevgep = getelementptr i8, ptr %a, i64 %18
+    %scevgep39 = getelementptr i8, ptr %e, i64 %18
+    %scevgep42 = getelementptr i8, ptr %d, i64 %18
+    %scevgep45 = getelementptr i8, ptr %c, i64 %18
+    %scevgep48 = getelementptr i8, ptr %b, i64 %18
+    %19 = sub i64 %wide.trip.count, %indvars.iv.ph
+    br label %for.body
+  
+  for.cond.cleanup:                                 ; preds = %for.body, %middle.block, %entry
+    ret void
+  
+  for.body:                                         ; preds = %for.body.preheader37, %for.body
+    %lsr.iv51 = phi i64 [ %19, %for.body.preheader37 ], [ %lsr.iv.next, %for.body ]
+    %lsr.iv49 = phi ptr [ %scevgep48, %for.body.preheader37 ], [ %scevgep50, %for.body ]
+    %lsr.iv46 = phi ptr [ %scevgep45, %for.body.preheader37 ], [ %scevgep47, %for.body ]
+    %lsr.iv43 = phi ptr [ %scevgep42, %for.body.preheader37 ], [ %scevgep44, %for.body ]
+    %lsr.iv40 = phi ptr [ %scevgep39, %for.body.preheader37 ], [ %scevgep41, %for.body ]
+    %lsr.iv = phi ptr [ %scevgep, %for.body.preheader37 ], [ %scevgep38, %for.body ]
+    %20 = load float, ptr %lsr.iv49, align 4, !tbaa !6
+    %mul = fmul float %20, %f
+    %21 = load float, ptr %lsr.iv46, align 4, !tbaa !6
+    %22 = load float, ptr %lsr.iv43, align 4, !tbaa !6
+    %sub = fsub float %21, %22
+    %mul5 = fmul float %mul, %sub
+    %23 = load float, ptr %lsr.iv40, align 4, !tbaa !6
+    %div = fdiv float %mul5, %23
+    store float %div, ptr %lsr.iv, align 4, !tbaa !6
+    %scevgep38 = getelementptr i8, ptr %lsr.iv, i64 4
+    %scevgep41 = getelementptr i8, ptr %lsr.iv40, i64 4
+    %scevgep44 = getelementptr i8, ptr %lsr.iv43, i64 4
+    %scevgep47 = getelementptr i8, ptr %lsr.iv46, i64 4
+    %scevgep50 = getelementptr i8, ptr %lsr.iv49, i64 4
+    %lsr.iv.next = add i64 %lsr.iv51, -1
+    %exitcond.not = icmp eq i64 %lsr.iv.next, 0
+    br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+  }
+  
+  !6 = !{!7, !7, i64 0}
+  !7 = !{!"float", !8, i64 0}
+  !8 = !{!"omnipotent char", !9, i64 0}
+  !9 = !{!"Simple C/C++ TBAA"}
+  !10 = distinct !{!10, !11, !12, !13}
+  !11 = !{!"llvm.loop.mustprogress"}
+  !12 = !{!"llvm.loop.isvectorized", i32 1}
+  !13 = !{!"llvm.loop.unroll.runtime.disable"}
+  !14 = distinct !{!14, !11, !12}
+
+...
+---
+name:            f
+tracksRegLiveness: true
+liveins:
+  - { reg: '$x0', virtual-reg: '%39' }
+  - { reg: '$x1', virtual-reg: '%40' }
+  - { reg: '$x2', virtual-reg: '%41' }
+  - { reg: '$x3', virtual-reg: '%42' }
+  - { reg: '$x4', virtual-reg: '%43' }
+  - { reg: '$s0', virtual-reg: '%44' }
+  - { reg: '$w5', virtual-reg: '%45' }
+body:             |
+  bb.0.entry:
+    successors: %bb.1, %bb.7
+    liveins: $x0, $x1, $x2, $x3, $x4, $s0, $w5
+  
+    %45:gpr32common = COPY $w5
+    %44:fpr32 = COPY $s0
+    %43:gpr64common = COPY $x4
+    %42:gpr64common = COPY $x3
+    %41:gpr64common = COPY $x2
+    %40:gpr64common = COPY $x1
+    %39:gpr64common = COPY $x0
+    dead $wzr = SUBSWri %45, 1, 0, implicit-def $nzcv
+    Bcc 11, %bb.7, implicit $nzcv
+    B %bb.1
+  
+  bb.1.for.body.preheader:
+    successors: %bb.12, %bb.2
+  
+    %48:gpr32 = ORRWrs $wzr, %45, 0
+    %0:gpr64 = SUBREG_TO_REG 0, killed %48, %subreg.sub_32
+    dead $wzr = SUBSWri %45, 8, 0, implicit-def $nzcv
+    Bcc 2, %bb.2, implicit $nzcv
+  
+  bb.12:
+    %49:gpr64all = COPY $xzr
+    %47:gpr64all = COPY %49
+    B %bb.6
+  
+  bb.2.vector.memcheck:
+    successors: %bb.6, %bb.11
+  
+    %55:gpr64common = SUBXrr %39, %40
+    %59:gpr64all = COPY $xzr
+    %51:gpr64all = COPY %59
+    dead $xzr = SUBSXri killed %55, 32, 0, implicit-def $nzcv
+    Bcc 3, %bb.6, implicit $nzcv
+    B %bb.11
+  
+  bb.11.vector.memcheck:
+    successors: %bb.6, %bb.10
+  
+    %56:gpr64common = SUBXrr %39, %41
+    dead $xzr = SUBSXri %56, 32, 0, implicit-def $nzcv
+    Bcc 3, %bb.6, implicit $nzcv
+    B %bb.10
+  
+  bb.10.vector.memcheck:
+    successors: %bb.6, %bb.9
+  
+    %57:gpr64common = SUBXrr %39, %42
+    dead $xzr = SUBSXri %57, 32, 0, implicit-def $nzcv
+    Bcc 3, %bb.6, implicit $nzcv
+    B %bb.9
+  
+  bb.9.vector.memcheck:
+    successors: %bb.6, %bb.3
+  
+    %58:gpr64common = SUBXrr %39, %43
+    dead $xzr = SUBSXri %58, 32, 0, implicit-def $nzcv
+    Bcc 3, %bb.6, implicit $nzcv
+    B %bb.3
+  
+  bb.3.vector.ph:
+    %64:gpr64common = ANDXri %0, 8027
+    %1:gpr64 = COPY %64
+    %66:fpr128 = IMPLICIT_DEF
+    %65:fpr128 = INSERT_SUBREG %66, %44, %subreg.ssub
+    %67:gpr64sp = ADDXri %40, 16, 0
+    %3:gpr64all = COPY %67
+    %68:gpr64sp = ADDXri %39, 16, 0
+    %4:gpr64all = COPY %68
+    %69:gpr64sp = ADDXri %41, 16, 0
+    %5:gpr64all = COPY %69
+    %70:gpr64sp = ADDXri %43, 16, 0
+    %6:gpr64all = COPY %70
+    %71:gpr64sp = ADDXri %42, 16, 0
+    %7:gpr64all = COPY %71
+  
+  bb.4.vector.body:
+    successors: %bb.5, %bb.4
+  
+    %8:gpr64sp = PHI %7, %bb.3, %19, %bb.4
+    %9:gpr64sp = PHI %6, %bb.3, %18, %bb.4
+    %10:gpr64sp = PHI %5, %bb.3, %17, %bb.4
+    %11:gpr64sp = PHI %4, %bb.3, %16, %bb.4
+    %12:gpr64sp = PHI %3, %bb.3, %15, %bb.4
+    %13:gpr64sp = PHI %1, %bb.3, %14, %bb.4
+    %72:fpr128 = LDURQi %12, -16 :: (load (s128) from %ir.scevgep57, align 4, !tbaa !6)
+    %73:fpr128 = LDRQui %12, 0 :: (load (s128) from %ir.lsr.iv55, align 4, !tbaa !6)
+    %74:fpr128 = nofpexcept FMULv4i32_indexed killed %72, %65, 0, implicit $fpcr
+    %75:fpr128 = nofpexcept FMULv4i32_indexed killed %73, %65, 0, implicit $fpcr
+    %76:fpr128 = LDURQi %10, -16 :: (load (s128) from %ir.scevgep65, align 4, !tbaa !6)
+    %77:fpr128 = LDRQui %10, 0 :: (load (s128) from %ir.lsr.iv63, align 4, !tbaa !6)
+    %78:fpr128 = LDURQi %8, -16 :: (load (s128) from %ir.scevgep73, align 4, !tbaa !6)
+    %79:fpr128 = LDRQui %8, 0 :: (load (s128) from %ir.lsr.iv71, align 4, !tbaa !6)
+    %80:fpr128 = nofpexcept FSUBv4f32 killed %76, killed %78, implicit $fpcr
+    %81:fpr128 = nofpexcept FSUBv4f32 killed %77, killed %79, implicit $fpcr
+    %82:fpr128 = nofpexcept FMULv4f32 killed %74, killed %80, implicit $fpcr
+    %83:fpr128 = nofpexcept FMULv4f32 killed %75, killed %81, implicit $fpcr
+    %84:fpr128 = LDURQi %9, -16 :: (load (s128) from %ir.scevgep69, align 4, !tbaa !6)
+    %85:fpr128 = LDRQui %9, 0 :: (load (s128) from %ir.lsr.iv67, align 4, !tbaa !6)
+    %86:fpr128 = nofpexcept FDIVv4f32 killed %82, killed %84, implicit $fpcr
+    %87:fpr128 = nofpexcept FDIVv4f32 killed %83, killed %85, implicit $fpcr
+    STURQi killed %86, %11, -16 :: (store (s128) into %ir.scevgep61, align 4, !tbaa !6)
+    STRQui killed %87, %11, 0 :: (store (s128) into %ir.lsr.iv59, ...
[truncated]

@kasuga-fj
Copy link
Contributor Author

Added test.

@kasuga-fj kasuga-fj requested a review from bcahoon June 21, 2024 08:42
Copy link
Contributor

@bcahoon bcahoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing. LGTM.

@kasuga-fj kasuga-fj force-pushed the fix-swpl-schedstart branch from 35c5813 to 453f53a Compare June 28, 2024 01:28
when scheduling

When scheduling an instruction, if both any predecessors and any
successors of the instruction are already scheduled, `SchedStart` isn't
taken into account. It may result generating incorrect code. This patch
fixes the problem.
@kasuga-fj kasuga-fj force-pushed the fix-swpl-schedstart branch from 453f53a to 09464c4 Compare June 28, 2024 09:37
@kasuga-fj kasuga-fj merged commit e19ac0d into llvm:main Jul 1, 2024
7 checks passed
@kasuga-fj kasuga-fj deleted the fix-swpl-schedstart branch July 1, 2024 00:07
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 1, 2024

LLVM Buildbot has detected a new failure on builder clang-ppc64be-linux-test-suite running on ppc64be-clang-test-suite while building llvm at step 7 "test-build-unified-tree-check-runtimes".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/176/builds/576

Here is the relevant piece of the build log for the reference:

Step 7 (test-build-unified-tree-check-runtimes) failure: test (failure)
******************** TEST 'SanitizerCommon-lsan-powerpc64-Linux :: Linux/soft_rss_limit_mb_test.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 2: /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/./bin/clang  --driver-mode=g++ -gline-tables-only -fsanitize=leak  -m64 -funwind-tables  -I/home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test -ldl -O2 /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
+ /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/./bin/clang --driver-mode=g++ -gline-tables-only -fsanitize=leak -m64 -funwind-tables -I/home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test -ldl -O2 /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -o /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
RUN: at line 5: env LSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1      /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp 2>&1 | FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
+ env LSAN_OPTIONS=soft_rss_limit_mb=220:quarantine_size=1:allocator_may_return_null=1 /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/build/runtimes/runtimes-bins/compiler-rt/test/sanitizer_common/lsan-powerpc64-Linux/Linux/Output/soft_rss_limit_mb_test.cpp.tmp
+ FileCheck /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp -check-prefix=CHECK_MAY_RETURN_1
/home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp:68:24: error: CHECK_MAY_RETURN_1: expected string not found in input
// CHECK_MAY_RETURN_1: allocating 512 times
                       ^
<stdin>:52:44: note: scanning from here
Some of the malloc calls returned non-null: 256
                                           ^
<stdin>:53:13: note: possible intended match here
==131393==LeakSanitizer: soft rss limit unexhausted (220Mb vs 95Mb)
            ^

Input file: <stdin>
Check file: /home/buildbots/llvm-external-buildbots/workers/ppc64be-clang-test-suite/clang-ppc64be-test-suite/llvm-project/compiler-rt/test/sanitizer_common/TestCases/Linux/soft_rss_limit_mb_test.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            .
            .
            .
           47:  [256] 
           48:  [320] 
           49:  [384] 
           50:  [448] 
           51: Some of the malloc calls returned null: 256 
           52: Some of the malloc calls returned non-null: 256 
check:68'0                                                X~~~~ error: no match found
           53: ==131393==LeakSanitizer: soft rss limit unexhausted (220Mb vs 95Mb) 
check:68'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:68'1                 ?                                                        possible intended match
>>>>>>

--

********************


kasuga-fj added a commit that referenced this pull request Jul 1, 2024
lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
…llvm#95356)

when scheduling

When scheduling an instruction, if both any predecessors and any
successors of the instruction are already scheduled, `SchedStart` isn't
taken into account. It may result generating incorrect code. This patch
fixes the problem. Also, this patch merges `SchedStart` into
`EarlyStart` (same for `SchedEnd`).

Fixes llvm#93936
lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MachinePipeliner] Consideration of SchedStart is omitted in some cases?
4 participants