-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[Inline] Propagate FMFs from calls to inlined instructions. #145537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ctions. This patch updates inlining to propagate fast-math flags from the arguments of an inlined call to the instructions that replace the call. I am not sure if the behavior of fast-math flags at call sites is fully/clearly specified in LangRef currently, but at least intuitively it seems reasonable to me to interpret fast-math flags at a call site to mean that we can assume that the flags apply to the arguments and result of the call. This should allow using that information to propagate the flags to the inlined instructions. But I might be missing something that prevents us from doing so. The end-to-end motivation for that is to allow for better propgation for `#pragma clang fp` when inlining. E.g. consider the code below. I would expect that the add in the loop has the `reassoc` flag, but currently it does not. Note that the patch here on its own is not enough, as clang also needs to add the fast-math flags to the call instructions. float do_add(float A, float B) { return A + B; } float red1(float *A, float *B, unsigned N) { float sum = 0; for (unsigned I = 0; I != N; ++I) { #pragma clang fp reassociate(on) sum = do_add(sum, A[I]); } return sum; }
@llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesThis patch updates inlining to propagate fast-math flags from the arguments of an inlined call to the instructions that replace the call. I am not sure if the behavior of fast-math flags at call sites is fully/clearly specified in LangRef currently, but at least intuitively it seems reasonable to me to interpret fast-math flags at a call site to mean that we can assume that the flags apply to the arguments and result of the call. This should allow using that information to propagate the flags to the inlined instructions. But I might be missing something that prevents us from doing so. The end-to-end motivation for that is to allow for better propgation for I would expect that the add in the loop has the float do_add(float A, float B) { return A + B; } float red1(float *A, float *B, unsigned N) { Full diff: https://github.com/llvm/llvm-project/pull/145537.diff 4 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Utils/Cloning.h b/llvm/include/llvm/Transforms/Utils/Cloning.h
index 6b56230a6e1d4..c8c15edb55c56 100644
--- a/llvm/include/llvm/Transforms/Utils/Cloning.h
+++ b/llvm/include/llvm/Transforms/Utils/Cloning.h
@@ -23,6 +23,7 @@
#include "llvm/Analysis/InlineCost.h"
#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/DebugLoc.h"
+#include "llvm/IR/FMF.h"
#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/Compiler.h"
#include "llvm/Transforms/Utils/ValueMapper.h"
@@ -87,6 +88,8 @@ struct ClonedCodeInfo {
/// check whether the main VMap mapping involves simplification or not.
DenseMap<const Value *, const Value *> OrigVMap;
+ FastMathFlags FMFs;
+
ClonedCodeInfo() = default;
bool isSimplified(const Value *From, const Value *To) const {
diff --git a/llvm/lib/Transforms/Utils/CloneFunction.cpp b/llvm/lib/Transforms/Utils/CloneFunction.cpp
index fccb73a36b182..3442fd040735f 100644
--- a/llvm/lib/Transforms/Utils/CloneFunction.cpp
+++ b/llvm/lib/Transforms/Utils/CloneFunction.cpp
@@ -700,6 +700,51 @@ void PruningFunctionCloner::CloneBlock(
}
}
+/// Propagate fast-math flags flags from OldFunc's new arguments to their users
+/// if applicable.
+static void propagateFastMathFlags(const Function *OldFunc,
+ ValueToValueMapTy &VMap,
+ const FastMathFlags &FMFs) {
+ if (!FMFs.any())
+ return;
+
+ // Visit all instructions reachable from the arguments of OldFunc. This
+ // ensures we only visit instructions in the original function. The arguments
+ // have FMFs as fast-math flags. Set them for all applicable instructions in
+ // the new function (retrieved via VMap).
+
+ DenseSet<const Value *> Visited;
+ SmallVector<const Instruction *, 32> Worklist;
+ for (const Argument &Arg : OldFunc->args()) {
+ Visited.insert(&Arg);
+ for (const User *U : Arg.users()) {
+ if (Visited.insert(U).second)
+ Worklist.push_back(cast<Instruction>(U));
+ }
+ }
+
+ while (!Worklist.empty()) {
+ const Instruction *CurrentOld = Worklist.pop_back_val();
+ Instruction *Current = dyn_cast<Instruction>(VMap.lookup(CurrentOld));
+ if (!Current || !isa<FPMathOperator>(Current))
+ continue;
+
+ // TODO: Assumes all FP ops propagate the flags from args to the result, if
+ // all operands have the same flags.
+ if (!all_of(CurrentOld->operands(),
+ [&Visited](Value *V) { return Visited.contains(V); }))
+ continue;
+
+ Current->setFastMathFlags(Current->getFastMathFlags() | FMFs);
+
+ // Add all users of this instruction to the worklist
+ for (const User *U : CurrentOld->users()) {
+ if (Visited.insert(U).second)
+ Worklist.push_back(cast<Instruction>(U));
+ }
+ }
+}
+
/// This works like CloneAndPruneFunctionInto, except that it does not clone the
/// entire function. Instead it starts at an instruction provided by the caller
/// and copies (and prunes) only the code reachable from that instruction.
@@ -996,6 +1041,8 @@ void llvm::CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc,
I != E; ++I)
if (ReturnInst *RI = dyn_cast<ReturnInst>(I->getTerminator()))
Returns.push_back(RI);
+
+ propagateFastMathFlags(OldFunc, VMap, CodeInfo->FMFs);
}
/// This works exactly like CloneFunctionInto,
diff --git a/llvm/lib/Transforms/Utils/InlineFunction.cpp b/llvm/lib/Transforms/Utils/InlineFunction.cpp
index 7df5e9958182c..a3af09d124e1f 100644
--- a/llvm/lib/Transforms/Utils/InlineFunction.cpp
+++ b/llvm/lib/Transforms/Utils/InlineFunction.cpp
@@ -2697,6 +2697,8 @@ llvm::InlineResult llvm::InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
// have no dead or constant instructions leftover after inlining occurs
// (which can happen, e.g., because an argument was constant), but we'll be
// happy with whatever the cloner can do.
+ InlinedFunctionInfo.FMFs =
+ isa<FPMathOperator>(&CB) ? CB.getFastMathFlags() : FastMathFlags();
CloneAndPruneFunctionInto(Caller, CalledFunc, VMap,
/*ModuleLevelChanges=*/false, Returns, ".i",
&InlinedFunctionInfo);
diff --git a/llvm/test/Transforms/Inline/propagate-fast-math-flags-to-inlined-instructions.ll b/llvm/test/Transforms/Inline/propagate-fast-math-flags-to-inlined-instructions.ll
new file mode 100644
index 0000000000000..1d5becea14014
--- /dev/null
+++ b/llvm/test/Transforms/Inline/propagate-fast-math-flags-to-inlined-instructions.ll
@@ -0,0 +1,93 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
+; RUN: opt -p inline -S %s | FileCheck %s
+
+@g = external global float
+
+define float @add(float %a, float %b) {
+; CHECK-LABEL: define float @add(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT: [[ADD:%.*]] = fadd float [[A]], [[B]]
+; CHECK-NEXT: ret float [[ADD]]
+;
+ %add = fadd float %a, %b
+ ret float %add
+}
+
+define float @caller1(float %a, float %b) {
+; CHECK-LABEL: define float @caller1(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT: [[ADD_I:%.*]] = fadd reassoc float [[A]], [[B]]
+; CHECK-NEXT: ret float [[ADD_I]]
+;
+ %r = call reassoc float @add(float %a, float %b)
+ ret float %r
+}
+
+define float @add_with_unrelated_fp_math(float %a, float %b) {
+; CHECK-LABEL: define float @add_with_unrelated_fp_math(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT: [[L:%.*]] = load float, ptr @g, align 4
+; CHECK-NEXT: [[RES:%.*]] = fmul float [[L]], [[A]]
+; CHECK-NEXT: store float [[RES]], ptr @g, align 4
+; CHECK-NEXT: [[ADD:%.*]] = fadd float [[A]], [[B]]
+; CHECK-NEXT: ret float [[ADD]]
+;
+ %l = load float, ptr @g
+ %res = fmul float %l, %a
+ store float %res, ptr @g
+ %add = fadd float %a, %b
+ ret float %add
+}
+
+; Make sure the call-site fast-math flags are not added to instructions where
+; not all operands have the new fast-math flags.
+define float @caller2(float %a, float %b) {
+; CHECK-LABEL: define float @caller2(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT: [[L_I:%.*]] = load float, ptr @g, align 4
+; CHECK-NEXT: [[RES_I:%.*]] = fmul float [[L_I]], [[A]]
+; CHECK-NEXT: store float [[RES_I]], ptr @g, align 4
+; CHECK-NEXT: [[ADD_I:%.*]] = fadd nnan float [[A]], [[B]]
+; CHECK-NEXT: ret float [[ADD_I]]
+;
+ %r = call nnan float @add_with_unrelated_fp_math(float %a, float %b)
+ ret float %r
+}
+
+define float @add_with_nnan(float %a, float %b) {
+; CHECK-LABEL: define float @add_with_nnan(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT: [[ADD:%.*]] = fadd nnan float [[A]], [[B]]
+; CHECK-NEXT: ret float [[ADD]]
+;
+ %add = fadd nnan float %a, %b
+ ret float %add
+}
+
+; Make sure the fast-math flags on the original instruction are kept and the
+; call-site flags are added.
+define float @caller3(float %a, float %b) {
+; CHECK-LABEL: define float @caller3(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT: [[ADD_I:%.*]] = fadd nnan ninf float [[A]], [[B]]
+; CHECK-NEXT: ret float [[ADD_I]]
+;
+ %r = call ninf float @add_with_nnan(float %a, float %b)
+ ret float %r
+}
+
+; Make sure the fast-math flags don't get accidentally propagated to
+; instructions in the caller, reachable via the passed arguments.
+define float @caller4(float %a, float %b) {
+; CHECK-LABEL: define float @caller4(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT: [[ADD_I:%.*]] = fadd ninf float [[A]], [[B]]
+; CHECK-NEXT: [[DIV:%.*]] = fdiv float [[A]], [[B]]
+; CHECK-NEXT: [[ADD:%.*]] = fadd float [[ADD_I]], [[DIV]]
+; CHECK-NEXT: ret float [[ADD]]
+;
+ %r = call ninf float @add(float %a, float %b)
+ %div = fdiv float %a, %b
+ %add = fadd float %r, %div
+ ret float %add
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may work for the poison-generating flags by propagating that poison, but I don't think this is a valid thing to do for rewrite-based flags.
More generally, I would say that FMF on calls really only makes sense for intrinsics (or things that are "close enough", like libcalls), but not user-defined functions.
I agree with @nikic on the limitations of applying fast-math flags to calls. In fact, I think it would be reasonable to state outright that fast-math flags can't be applied transitively to defined functions. Consider this situation:
Here a pragma has been used to disallow We should probably update clang to stop setting the fast-math flags on non-intrinsic calls, but I don't think we should try to apply them if they are present. The |
Ah thanks for the example. I guess the problem is that we cannot distinguish why flags are not present, i.e. they could be explicitly disabled... Although I find it suprising that in the converse case, we can end up with contracted ops?
Any thoughts on how we could model things in Clang to get something like below working, as in propagating the flags through inlining? Maybe another pragma that instead adds
|
I can understand how that might surprise many users, but I think it is consistent with the scope of the pragma. The decision to inline or not shouldn't affect the result. Would something like this work for you? https://godbolt.org/z/oer4qGe8W I can't think of a way to do it if without some modification of the called function. We have a similar problem with target intrinsics, where the intrinsics are defined as functions that always get inlined, so there's no good way to use scope-local fast-math settings on the intrinsics. https://godbolt.org/z/obehdMf4e I know this surprises many users, who expect these intrinsics to behave like built-ins, but the implementation doesn't work that way. At some point I was involved in a discussion of creating some kind of function attribute that would indicate that fast-math settings should be transitively applied to a function from its call-site, but I don't think that ever went anywhere. @jcranmer-intel Do you remember that discussion? |
This patch updates inlining to propagate fast-math flags from the arguments of an inlined call to the instructions that replace the call.
I am not sure if the behavior of fast-math flags at call sites is fully/clearly specified in LangRef currently, but at least intuitively it seems reasonable to me to interpret fast-math flags at a call site to mean that we can assume that the flags apply to the arguments and result of the call. This should allow using that information to propagate the flags to the inlined instructions. But I might be missing something that prevents us from doing so.
The end-to-end motivation for that is to allow for better propgation for
#pragma clang fp
when inlining. E.g. consider the code below.I would expect that the add in the loop has the
reassoc
flag, but currently it does not. Note that the patch here on its own is not enough, as clang also needs to add the fast-math flags to the call instructions.float do_add(float A, float B) { return A + B; }
float red1(float *A, float *B, unsigned N) {
float sum = 0;
for (unsigned I = 0; I != N; ++I) {
#pragma clang fp reassociate(on)
sum = do_add(sum, A[I]);
}
return sum;
}