-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[MachinePipeliner] Fix constraints aren't considered in certain cases #95356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ping |
// backwards when all scheduled predecessors are loop-carried | ||
// output/order dependencies. Empirically, there are also cases where | ||
// scheduling becomes possible with backward search. | ||
if (Schedule.onlyHasLoopCarriedOutputOrOrderPreds(SU, this)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see how this keeps the original behavior? I assume some test case fails without it?
If it's needed, can this be combined with the code at line 2455?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented out the code between lines 2446-2449, and noticed some test failures. I took a quick look at sw-bad-sched.ll, and I can see what is going on.
Prior to this patch:
Inst (3) %63:intregs = L2_loadri_io %15:intregs, 0 :: (load (s32) from %ir.lsr.iv1, !tbaa !0)
es: 80000000 ls: 0 me: 7fffffff ms: ffffffff
Trying to insert node between 0 and -1 II: 3
With this patch:
Inst (3) %63:intregs = L2_loadri_io %15:intregs, 0 :: (load (s32) from %ir.lsr.iv1, !tbaa !0)
es: ffffffff ls: 0
Trying to insert node between 0 and -1 II: 3
In the old case, the pipeliner uses does a bottom up schedule because es == INT_MAX and ls == 0.
Because es is -1, with the patch, the pipeliner drops into the case when both es and ls have been defined. Previously, that used a top-down search (unless the SU is a Phi).
I think it's okay for the schedule processing to change after your patch. I feel it would be surprising if it didn't. For example, in the case of swp-bad-sched.ll, the generated schedules are equivalent when comparing the output from your patch to the output from removing the code in lines 2446-2449. I haven't looked at the other cases though.
The question is - does the bottom-up scheduling produces a better result for certain situations. The patch has a comment "Empirically, there are also cases where scheduling becomes possible with backward search". This would be a reason to add/keep the code that does a backward search. But, I would see those cases and I think it would be better to combine any changes to do a bottom up search with the isPHI() check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the bottom-up scheduling produces a better result for certain situations.
Yes. As far as I have checked with llvm-test-suite, there were cases where scheduling failed when stopping the backward search (about 1% of all cases). Removing lines 2446-2449 sometimes generated better schedule, but these were about a tenth of the cases where scheduling failed. Therefore I believe that keeping the original behavior is valuable.
In any case, as you mentioned, merging this process with isPHI() checks looks good to me. I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test code using the example from the issue?
@llvm/pr-subscribers-backend-aarch64 Author: Ryotaro KASUGA (kasuga-fj) Changeswhen scheduling When scheduling an instruction, if both any predecessors and any successors of the instruction are already scheduled, I checked the schedule changes due to this patch in llvm-test-suite. In conclusion, as far as I observed, there was no change in most cases. Just in case, details of the results are as follows. Compile options:
Observed changes:
close #93936 Patch is 22.61 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95356.diff 3 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/MachinePipeliner.h b/llvm/include/llvm/CodeGen/MachinePipeliner.h
index 8f0a17cf99967..50df76db2d98d 100644
--- a/llvm/include/llvm/CodeGen/MachinePipeliner.h
+++ b/llvm/include/llvm/CodeGen/MachinePipeliner.h
@@ -594,8 +594,8 @@ class SMSchedule {
/// chain.
int latestCycleInChain(const SDep &Dep);
- void computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart,
- int *MinEnd, int *MaxStart, int II, SwingSchedulerDAG *DAG);
+ void computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart, int II,
+ SwingSchedulerDAG *DAG);
bool insert(SUnit *SU, int StartCycle, int EndCycle, int II);
/// Iterators for the cycle to instruction map.
@@ -653,6 +653,9 @@ class SMSchedule {
bool isLoopCarried(const SwingSchedulerDAG *SSD, MachineInstr &Phi) const;
bool isLoopCarriedDefOfUse(const SwingSchedulerDAG *SSD, MachineInstr *Def,
MachineOperand &MO) const;
+
+ bool onlyHasLoopCarriedOutputOrOrderPreds(SUnit *SU,
+ SwingSchedulerDAG *DAG) const;
void print(raw_ostream &os) const;
void dump() const;
};
diff --git a/llvm/lib/CodeGen/MachinePipeliner.cpp b/llvm/lib/CodeGen/MachinePipeliner.cpp
index 6c24cfca793fc..f2ff8d23f276a 100644
--- a/llvm/lib/CodeGen/MachinePipeliner.cpp
+++ b/llvm/lib/CodeGen/MachinePipeliner.cpp
@@ -2420,47 +2420,43 @@ bool SwingSchedulerDAG::schedulePipeline(SMSchedule &Schedule) {
// upon the scheduled time for any predecessors/successors.
int EarlyStart = INT_MIN;
int LateStart = INT_MAX;
- // These values are set when the size of the schedule window is limited
- // due to chain dependences.
- int SchedEnd = INT_MAX;
- int SchedStart = INT_MIN;
- Schedule.computeStart(SU, &EarlyStart, &LateStart, &SchedEnd, &SchedStart,
- II, this);
+ Schedule.computeStart(SU, &EarlyStart, &LateStart, II, this);
LLVM_DEBUG({
dbgs() << "\n";
dbgs() << "Inst (" << SU->NodeNum << ") ";
SU->getInstr()->dump();
dbgs() << "\n";
});
- LLVM_DEBUG({
- dbgs() << format("\tes: %8x ls: %8x me: %8x ms: %8x\n", EarlyStart,
- LateStart, SchedEnd, SchedStart);
- });
+ LLVM_DEBUG(
+ dbgs() << format("\tes: %8x ls: %8x\n", EarlyStart, LateStart));
- if (EarlyStart > LateStart || SchedEnd < EarlyStart ||
- SchedStart > LateStart)
+ if (EarlyStart > LateStart)
scheduleFound = false;
- else if (EarlyStart != INT_MIN && LateStart == INT_MAX) {
- SchedEnd = std::min(SchedEnd, EarlyStart + (int)II - 1);
- scheduleFound = Schedule.insert(SU, EarlyStart, SchedEnd, II);
- } else if (EarlyStart == INT_MIN && LateStart != INT_MAX) {
- SchedStart = std::max(SchedStart, LateStart - (int)II + 1);
- scheduleFound = Schedule.insert(SU, LateStart, SchedStart, II);
- } else if (EarlyStart != INT_MIN && LateStart != INT_MAX) {
- SchedEnd =
- std::min(SchedEnd, std::min(LateStart, EarlyStart + (int)II - 1));
- // When scheduling a Phi it is better to start at the late cycle and go
- // backwards. The default order may insert the Phi too far away from
- // its first dependence.
- if (SU->getInstr()->isPHI())
- scheduleFound = Schedule.insert(SU, SchedEnd, EarlyStart, II);
+ else if (EarlyStart != INT_MIN && LateStart == INT_MAX)
+ scheduleFound =
+ Schedule.insert(SU, EarlyStart, EarlyStart + (int)II - 1, II);
+ else if (EarlyStart == INT_MIN && LateStart != INT_MAX)
+ scheduleFound =
+ Schedule.insert(SU, LateStart, LateStart - (int)II + 1, II);
+ else if (EarlyStart != INT_MIN && LateStart != INT_MAX) {
+ LateStart = std::min(LateStart, EarlyStart + (int)II - 1);
+ // When scheduling a Phi it is better to start at the late cycle and
+ // go backwards. The default order may insert the Phi too far away
+ // from its first dependence.
+ // Also, do backward search when all scheduled predecessors are
+ // loop-carried output/order dependencies. Empirically, there are also
+ // cases where scheduling becomes possible with backward search.
+ if (SU->getInstr()->isPHI() ||
+ Schedule.onlyHasLoopCarriedOutputOrOrderPreds(SU, this))
+ scheduleFound = Schedule.insert(SU, LateStart, EarlyStart, II);
else
- scheduleFound = Schedule.insert(SU, EarlyStart, SchedEnd, II);
+ scheduleFound = Schedule.insert(SU, EarlyStart, LateStart, II);
} else {
int FirstCycle = Schedule.getFirstCycle();
scheduleFound = Schedule.insert(SU, FirstCycle + getASAP(SU),
FirstCycle + getASAP(SU) + II - 1, II);
}
+
// Even if we find a schedule, make sure the schedule doesn't exceed the
// allowable number of stages. We keep trying if this happens.
if (scheduleFound)
@@ -2868,8 +2864,7 @@ static SUnit *multipleIterations(SUnit *SU, SwingSchedulerDAG *DAG) {
/// Compute the scheduling start slot for the instruction. The start slot
/// depends on any predecessor or successor nodes scheduled already.
void SMSchedule::computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart,
- int *MinEnd, int *MaxStart, int II,
- SwingSchedulerDAG *DAG) {
+ int II, SwingSchedulerDAG *DAG) {
// Iterate over each instruction that has been scheduled already. The start
// slot computation depends on whether the previously scheduled instruction
// is a predecessor or successor of the specified instruction.
@@ -2888,7 +2883,7 @@ void SMSchedule::computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart,
*MaxEarlyStart = std::max(*MaxEarlyStart, EarlyStart);
if (DAG->isLoopCarriedDep(SU, Dep, false)) {
int End = earliestCycleInChain(Dep) + (II - 1);
- *MinEnd = std::min(*MinEnd, End);
+ *MinLateStart = std::min(*MinLateStart, End);
}
} else {
int LateStart = cycle - Dep.getLatency() +
@@ -2912,7 +2907,7 @@ void SMSchedule::computeStart(SUnit *SU, int *MaxEarlyStart, int *MinLateStart,
*MinLateStart = std::min(*MinLateStart, LateStart);
if (DAG->isLoopCarriedDep(SU, Dep)) {
int Start = latestCycleInChain(Dep) + 1 - II;
- *MaxStart = std::max(*MaxStart, Start);
+ *MaxEarlyStart = std::max(*MaxEarlyStart, Start);
}
} else {
int EarlyStart = cycle + Dep.getLatency() -
@@ -3105,6 +3100,19 @@ bool SMSchedule::isLoopCarriedDefOfUse(const SwingSchedulerDAG *SSD,
return false;
}
+/// Return true if all scheduled predecessors are loop-carried output/order
+/// dependencies.
+bool SMSchedule::onlyHasLoopCarriedOutputOrOrderPreds(
+ SUnit *SU, SwingSchedulerDAG *DAG) const {
+ for (const SDep &Pred : SU->Preds)
+ if (InstrToCycle.count(Pred.getSUnit()) && !DAG->isBackedge(SU, Pred))
+ return false;
+ for (const SDep &Succ : SU->Succs)
+ if (InstrToCycle.count(Succ.getSUnit()) && DAG->isBackedge(SU, Succ))
+ return false;
+ return true;
+}
+
/// Determine transitive dependences of unpipelineable instructions
SmallSet<SUnit *, 8> SMSchedule::computeUnpipelineableNodes(
SwingSchedulerDAG *SSD, TargetInstrInfo::PipelinerLoopInfo *PLI) {
diff --git a/llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir b/llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir
new file mode 100644
index 0000000000000..c1014b296cad3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/sms-instruction-scheduled-at-correct-cycle.mir
@@ -0,0 +1,335 @@
+# RUN: llc --verify-machineinstrs -mtriple=aarch64 -o - %s -run-pass pipeliner -aarch64-enable-pipeliner -debug-only=pipeliner -pipeliner-max-stages=50 -pipeliner-max-mii=50 -pipeliner-enable-copytophi=0 -pipeliner-ii-search-range=30 2>&1 | FileCheck %s
+# REQUIRES: asserts
+
+# Test that each instruction must be scheduled between the early cycle and the late cycle. Previously there were cases where an instruction is scheduled outside of the valid range. See issue #93936 for details.
+
+# CHECK: {{^ *}}Try to schedule with 47
+# CHECK: {{^ *}}Inst (11) %48:fpr128 = LDRQui %35:gpr64sp, 0 :: (load (s128) from %ir.lsr.iv63, align 4, !tbaa !0)
+# CHECK-EMPTY:
+# CHECK-NEXT: {{^ *}}es: ffffffe9 ls: ffffffe9
+# CHECK-NEXT: {{^ *}}Trying to insert node between -23 and -23 II: 47
+# CHECK-NEXT: {{^ *}}failed to insert at cycle -23 %48:fpr128 = LDRQui %35:gpr64sp, 0 :: (load (s128) from %ir.lsr.iv63, align 4, !tbaa !0)
+
+--- |
+ target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32"
+
+ define dso_local void @f(ptr nocapture noundef writeonly %a, ptr nocapture noundef readonly %b, ptr nocapture noundef readonly %c, ptr nocapture noundef readonly %d, ptr nocapture noundef readonly %e, float noundef %f, i32 noundef %N) local_unnamed_addr {
+ entry:
+ %cmp16 = icmp sgt i32 %N, 0
+ br i1 %cmp16, label %for.body.preheader, label %for.cond.cleanup
+
+ for.body.preheader: ; preds = %entry
+ %wide.trip.count = zext nneg i32 %N to i64
+ %min.iters.check = icmp ult i32 %N, 8
+ br i1 %min.iters.check, label %for.body.preheader37, label %vector.memcheck
+
+ vector.memcheck: ; preds = %for.body.preheader
+ %0 = ptrtoint ptr %a to i64
+ %1 = ptrtoint ptr %b to i64
+ %2 = ptrtoint ptr %c to i64
+ %3 = ptrtoint ptr %d to i64
+ %4 = ptrtoint ptr %e to i64
+ %5 = sub i64 %0, %1
+ %diff.check = icmp ult i64 %5, 32
+ %6 = sub i64 %0, %2
+ %diff.check22 = icmp ult i64 %6, 32
+ %conflict.rdx = or i1 %diff.check, %diff.check22
+ %7 = sub i64 %0, %3
+ %diff.check24 = icmp ult i64 %7, 32
+ %conflict.rdx25 = or i1 %conflict.rdx, %diff.check24
+ %8 = sub i64 %0, %4
+ %diff.check27 = icmp ult i64 %8, 32
+ %conflict.rdx28 = or i1 %conflict.rdx25, %diff.check27
+ br i1 %conflict.rdx28, label %for.body.preheader37, label %vector.ph
+
+ vector.ph: ; preds = %vector.memcheck
+ %n.vec = and i64 %wide.trip.count, 2147483640
+ %broadcast.splatinsert = insertelement <4 x float> poison, float %f, i64 0
+ %broadcast.splat = shufflevector <4 x float> %broadcast.splatinsert, <4 x float> poison, <4 x i32> zeroinitializer
+ %scevgep54 = getelementptr i8, ptr %b, i64 16
+ %scevgep58 = getelementptr i8, ptr %a, i64 16
+ %scevgep62 = getelementptr i8, ptr %c, i64 16
+ %scevgep66 = getelementptr i8, ptr %e, i64 16
+ %scevgep70 = getelementptr i8, ptr %d, i64 16
+ br label %vector.body
+
+ vector.body: ; preds = %vector.body, %vector.ph
+ %lsr.iv71 = phi ptr [ %scevgep72, %vector.body ], [ %scevgep70, %vector.ph ]
+ %lsr.iv67 = phi ptr [ %scevgep68, %vector.body ], [ %scevgep66, %vector.ph ]
+ %lsr.iv63 = phi ptr [ %scevgep64, %vector.body ], [ %scevgep62, %vector.ph ]
+ %lsr.iv59 = phi ptr [ %scevgep60, %vector.body ], [ %scevgep58, %vector.ph ]
+ %lsr.iv55 = phi ptr [ %scevgep56, %vector.body ], [ %scevgep54, %vector.ph ]
+ %lsr.iv52 = phi i64 [ %lsr.iv.next53, %vector.body ], [ %n.vec, %vector.ph ]
+ %scevgep57 = getelementptr i8, ptr %lsr.iv55, i64 -16
+ %wide.load = load <4 x float>, ptr %scevgep57, align 4, !tbaa !6
+ %wide.load29 = load <4 x float>, ptr %lsr.iv55, align 4, !tbaa !6
+ %9 = fmul <4 x float> %wide.load, %broadcast.splat
+ %10 = fmul <4 x float> %wide.load29, %broadcast.splat
+ %scevgep65 = getelementptr i8, ptr %lsr.iv63, i64 -16
+ %wide.load30 = load <4 x float>, ptr %scevgep65, align 4, !tbaa !6
+ %wide.load31 = load <4 x float>, ptr %lsr.iv63, align 4, !tbaa !6
+ %scevgep73 = getelementptr i8, ptr %lsr.iv71, i64 -16
+ %wide.load32 = load <4 x float>, ptr %scevgep73, align 4, !tbaa !6
+ %wide.load33 = load <4 x float>, ptr %lsr.iv71, align 4, !tbaa !6
+ %11 = fsub <4 x float> %wide.load30, %wide.load32
+ %12 = fsub <4 x float> %wide.load31, %wide.load33
+ %13 = fmul <4 x float> %9, %11
+ %14 = fmul <4 x float> %10, %12
+ %scevgep69 = getelementptr i8, ptr %lsr.iv67, i64 -16
+ %wide.load34 = load <4 x float>, ptr %scevgep69, align 4, !tbaa !6
+ %wide.load35 = load <4 x float>, ptr %lsr.iv67, align 4, !tbaa !6
+ %15 = fdiv <4 x float> %13, %wide.load34
+ %16 = fdiv <4 x float> %14, %wide.load35
+ %scevgep61 = getelementptr i8, ptr %lsr.iv59, i64 -16
+ store <4 x float> %15, ptr %scevgep61, align 4, !tbaa !6
+ store <4 x float> %16, ptr %lsr.iv59, align 4, !tbaa !6
+ %lsr.iv.next53 = add nsw i64 %lsr.iv52, -8
+ %scevgep56 = getelementptr i8, ptr %lsr.iv55, i64 32
+ %scevgep60 = getelementptr i8, ptr %lsr.iv59, i64 32
+ %scevgep64 = getelementptr i8, ptr %lsr.iv63, i64 32
+ %scevgep68 = getelementptr i8, ptr %lsr.iv67, i64 32
+ %scevgep72 = getelementptr i8, ptr %lsr.iv71, i64 32
+ %17 = icmp eq i64 %lsr.iv.next53, 0
+ br i1 %17, label %middle.block, label %vector.body, !llvm.loop !10
+
+ middle.block: ; preds = %vector.body
+ %cmp.n = icmp eq i64 %n.vec, %wide.trip.count
+ br i1 %cmp.n, label %for.cond.cleanup, label %for.body.preheader37
+
+ for.body.preheader37: ; preds = %vector.memcheck, %for.body.preheader, %middle.block
+ %indvars.iv.ph = phi i64 [ %n.vec, %middle.block ], [ 0, %for.body.preheader ], [ 0, %vector.memcheck ]
+ %18 = shl nuw nsw i64 %indvars.iv.ph, 2
+ %scevgep = getelementptr i8, ptr %a, i64 %18
+ %scevgep39 = getelementptr i8, ptr %e, i64 %18
+ %scevgep42 = getelementptr i8, ptr %d, i64 %18
+ %scevgep45 = getelementptr i8, ptr %c, i64 %18
+ %scevgep48 = getelementptr i8, ptr %b, i64 %18
+ %19 = sub i64 %wide.trip.count, %indvars.iv.ph
+ br label %for.body
+
+ for.cond.cleanup: ; preds = %for.body, %middle.block, %entry
+ ret void
+
+ for.body: ; preds = %for.body.preheader37, %for.body
+ %lsr.iv51 = phi i64 [ %19, %for.body.preheader37 ], [ %lsr.iv.next, %for.body ]
+ %lsr.iv49 = phi ptr [ %scevgep48, %for.body.preheader37 ], [ %scevgep50, %for.body ]
+ %lsr.iv46 = phi ptr [ %scevgep45, %for.body.preheader37 ], [ %scevgep47, %for.body ]
+ %lsr.iv43 = phi ptr [ %scevgep42, %for.body.preheader37 ], [ %scevgep44, %for.body ]
+ %lsr.iv40 = phi ptr [ %scevgep39, %for.body.preheader37 ], [ %scevgep41, %for.body ]
+ %lsr.iv = phi ptr [ %scevgep, %for.body.preheader37 ], [ %scevgep38, %for.body ]
+ %20 = load float, ptr %lsr.iv49, align 4, !tbaa !6
+ %mul = fmul float %20, %f
+ %21 = load float, ptr %lsr.iv46, align 4, !tbaa !6
+ %22 = load float, ptr %lsr.iv43, align 4, !tbaa !6
+ %sub = fsub float %21, %22
+ %mul5 = fmul float %mul, %sub
+ %23 = load float, ptr %lsr.iv40, align 4, !tbaa !6
+ %div = fdiv float %mul5, %23
+ store float %div, ptr %lsr.iv, align 4, !tbaa !6
+ %scevgep38 = getelementptr i8, ptr %lsr.iv, i64 4
+ %scevgep41 = getelementptr i8, ptr %lsr.iv40, i64 4
+ %scevgep44 = getelementptr i8, ptr %lsr.iv43, i64 4
+ %scevgep47 = getelementptr i8, ptr %lsr.iv46, i64 4
+ %scevgep50 = getelementptr i8, ptr %lsr.iv49, i64 4
+ %lsr.iv.next = add i64 %lsr.iv51, -1
+ %exitcond.not = icmp eq i64 %lsr.iv.next, 0
+ br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+ }
+
+ !6 = !{!7, !7, i64 0}
+ !7 = !{!"float", !8, i64 0}
+ !8 = !{!"omnipotent char", !9, i64 0}
+ !9 = !{!"Simple C/C++ TBAA"}
+ !10 = distinct !{!10, !11, !12, !13}
+ !11 = !{!"llvm.loop.mustprogress"}
+ !12 = !{!"llvm.loop.isvectorized", i32 1}
+ !13 = !{!"llvm.loop.unroll.runtime.disable"}
+ !14 = distinct !{!14, !11, !12}
+
+...
+---
+name: f
+tracksRegLiveness: true
+liveins:
+ - { reg: '$x0', virtual-reg: '%39' }
+ - { reg: '$x1', virtual-reg: '%40' }
+ - { reg: '$x2', virtual-reg: '%41' }
+ - { reg: '$x3', virtual-reg: '%42' }
+ - { reg: '$x4', virtual-reg: '%43' }
+ - { reg: '$s0', virtual-reg: '%44' }
+ - { reg: '$w5', virtual-reg: '%45' }
+body: |
+ bb.0.entry:
+ successors: %bb.1, %bb.7
+ liveins: $x0, $x1, $x2, $x3, $x4, $s0, $w5
+
+ %45:gpr32common = COPY $w5
+ %44:fpr32 = COPY $s0
+ %43:gpr64common = COPY $x4
+ %42:gpr64common = COPY $x3
+ %41:gpr64common = COPY $x2
+ %40:gpr64common = COPY $x1
+ %39:gpr64common = COPY $x0
+ dead $wzr = SUBSWri %45, 1, 0, implicit-def $nzcv
+ Bcc 11, %bb.7, implicit $nzcv
+ B %bb.1
+
+ bb.1.for.body.preheader:
+ successors: %bb.12, %bb.2
+
+ %48:gpr32 = ORRWrs $wzr, %45, 0
+ %0:gpr64 = SUBREG_TO_REG 0, killed %48, %subreg.sub_32
+ dead $wzr = SUBSWri %45, 8, 0, implicit-def $nzcv
+ Bcc 2, %bb.2, implicit $nzcv
+
+ bb.12:
+ %49:gpr64all = COPY $xzr
+ %47:gpr64all = COPY %49
+ B %bb.6
+
+ bb.2.vector.memcheck:
+ successors: %bb.6, %bb.11
+
+ %55:gpr64common = SUBXrr %39, %40
+ %59:gpr64all = COPY $xzr
+ %51:gpr64all = COPY %59
+ dead $xzr = SUBSXri killed %55, 32, 0, implicit-def $nzcv
+ Bcc 3, %bb.6, implicit $nzcv
+ B %bb.11
+
+ bb.11.vector.memcheck:
+ successors: %bb.6, %bb.10
+
+ %56:gpr64common = SUBXrr %39, %41
+ dead $xzr = SUBSXri %56, 32, 0, implicit-def $nzcv
+ Bcc 3, %bb.6, implicit $nzcv
+ B %bb.10
+
+ bb.10.vector.memcheck:
+ successors: %bb.6, %bb.9
+
+ %57:gpr64common = SUBXrr %39, %42
+ dead $xzr = SUBSXri %57, 32, 0, implicit-def $nzcv
+ Bcc 3, %bb.6, implicit $nzcv
+ B %bb.9
+
+ bb.9.vector.memcheck:
+ successors: %bb.6, %bb.3
+
+ %58:gpr64common = SUBXrr %39, %43
+ dead $xzr = SUBSXri %58, 32, 0, implicit-def $nzcv
+ Bcc 3, %bb.6, implicit $nzcv
+ B %bb.3
+
+ bb.3.vector.ph:
+ %64:gpr64common = ANDXri %0, 8027
+ %1:gpr64 = COPY %64
+ %66:fpr128 = IMPLICIT_DEF
+ %65:fpr128 = INSERT_SUBREG %66, %44, %subreg.ssub
+ %67:gpr64sp = ADDXri %40, 16, 0
+ %3:gpr64all = COPY %67
+ %68:gpr64sp = ADDXri %39, 16, 0
+ %4:gpr64all = COPY %68
+ %69:gpr64sp = ADDXri %41, 16, 0
+ %5:gpr64all = COPY %69
+ %70:gpr64sp = ADDXri %43, 16, 0
+ %6:gpr64all = COPY %70
+ %71:gpr64sp = ADDXri %42, 16, 0
+ %7:gpr64all = COPY %71
+
+ bb.4.vector.body:
+ successors: %bb.5, %bb.4
+
+ %8:gpr64sp = PHI %7, %bb.3, %19, %bb.4
+ %9:gpr64sp = PHI %6, %bb.3, %18, %bb.4
+ %10:gpr64sp = PHI %5, %bb.3, %17, %bb.4
+ %11:gpr64sp = PHI %4, %bb.3, %16, %bb.4
+ %12:gpr64sp = PHI %3, %bb.3, %15, %bb.4
+ %13:gpr64sp = PHI %1, %bb.3, %14, %bb.4
+ %72:fpr128 = LDURQi %12, -16 :: (load (s128) from %ir.scevgep57, align 4, !tbaa !6)
+ %73:fpr128 = LDRQui %12, 0 :: (load (s128) from %ir.lsr.iv55, align 4, !tbaa !6)
+ %74:fpr128 = nofpexcept FMULv4i32_indexed killed %72, %65, 0, implicit $fpcr
+ %75:fpr128 = nofpexcept FMULv4i32_indexed killed %73, %65, 0, implicit $fpcr
+ %76:fpr128 = LDURQi %10, -16 :: (load (s128) from %ir.scevgep65, align 4, !tbaa !6)
+ %77:fpr128 = LDRQui %10, 0 :: (load (s128) from %ir.lsr.iv63, align 4, !tbaa !6)
+ %78:fpr128 = LDURQi %8, -16 :: (load (s128) from %ir.scevgep73, align 4, !tbaa !6)
+ %79:fpr128 = LDRQui %8, 0 :: (load (s128) from %ir.lsr.iv71, align 4, !tbaa !6)
+ %80:fpr128 = nofpexcept FSUBv4f32 killed %76, killed %78, implicit $fpcr
+ %81:fpr128 = nofpexcept FSUBv4f32 killed %77, killed %79, implicit $fpcr
+ %82:fpr128 = nofpexcept FMULv4f32 killed %74, killed %80, implicit $fpcr
+ %83:fpr128 = nofpexcept FMULv4f32 killed %75, killed %81, implicit $fpcr
+ %84:fpr128 = LDURQi %9, -16 :: (load (s128) from %ir.scevgep69, align 4, !tbaa !6)
+ %85:fpr128 = LDRQui %9, 0 :: (load (s128) from %ir.lsr.iv67, align 4, !tbaa !6)
+ %86:fpr128 = nofpexcept FDIVv4f32 killed %82, killed %84, implicit $fpcr
+ %87:fpr128 = nofpexcept FDIVv4f32 killed %83, killed %85, implicit $fpcr
+ STURQi killed %86, %11, -16 :: (store (s128) into %ir.scevgep61, align 4, !tbaa !6)
+ STRQui killed %87, %11, 0 :: (store (s128) into %ir.lsr.iv59, ...
[truncated]
|
Added test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing. LGTM.
35c5813
to
453f53a
Compare
when scheduling When scheduling an instruction, if both any predecessors and any successors of the instruction are already scheduled, `SchedStart` isn't taken into account. It may result generating incorrect code. This patch fixes the problem.
453f53a
to
09464c4
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/176/builds/576 Here is the relevant piece of the build log for the reference:
|
…in cases" (#97246) Reverts #95356 Due to ppc64le test failures caught by the LLVM Buildbot. https://lab.llvm.org/buildbot/#/builders/176/builds/576
…llvm#95356) when scheduling When scheduling an instruction, if both any predecessors and any successors of the instruction are already scheduled, `SchedStart` isn't taken into account. It may result generating incorrect code. This patch fixes the problem. Also, this patch merges `SchedStart` into `EarlyStart` (same for `SchedEnd`). Fixes llvm#93936
…in cases" (llvm#97246) Reverts llvm#95356 Due to ppc64le test failures caught by the LLVM Buildbot. https://lab.llvm.org/buildbot/#/builders/176/builds/576
when scheduling
When scheduling an instruction, if both any predecessors and any successors of the instruction are already scheduled,
SchedStart
isn't taken into account. It may result generating incorrect code. This patch fixes the problem. Also, this patch mergesSchedStart
intoEarlyStart
(same forSchedEnd
).I checked the schedule changes due to this patch in llvm-test-suite. In conclusion, as far as I observed, there was no change in most cases. Just in case, details of the results are as follows.
Compile options:
Observed changes:
close #93936