Skip to content

[Inline] Propagate FMFs from calls to inlined instructions. #145537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Jun 24, 2025

This patch updates inlining to propagate fast-math flags from the arguments of an inlined call to the instructions that replace the call.

I am not sure if the behavior of fast-math flags at call sites is fully/clearly specified in LangRef currently, but at least intuitively it seems reasonable to me to interpret fast-math flags at a call site to mean that we can assume that the flags apply to the arguments and result of the call. This should allow using that information to propagate the flags to the inlined instructions. But I might be missing something that prevents us from doing so.

The end-to-end motivation for that is to allow for better propgation for #pragma clang fp when inlining. E.g. consider the code below.

I would expect that the add in the loop has the reassoc flag, but currently it does not. Note that the patch here on its own is not enough, as clang also needs to add the fast-math flags to the call instructions.

float do_add(float A, float B) { return A + B; }

float red1(float *A, float *B, unsigned N) {
float sum = 0;
for (unsigned I = 0; I != N; ++I) {
#pragma clang fp reassociate(on)
sum = do_add(sum, A[I]);
}
return sum;
}

…ctions.

This patch updates inlining to propagate fast-math flags from the
arguments of an inlined call to the instructions that replace the call.

I am not sure if the behavior of fast-math flags at call sites is
fully/clearly specified in LangRef currently, but at least intuitively
it seems reasonable to me to interpret fast-math flags at a call site to
mean that we can assume that the flags apply to the arguments and result
of the call. This should allow using that information to propagate
the flags to the inlined instructions. But I might be missing something
that prevents us from doing so.

The end-to-end motivation for that is to allow for better propgation
for `#pragma clang fp` when inlining. E.g. consider the code below.

I would expect that the add in the loop has the `reassoc` flag, but
currently it does not. Note that the patch here on its own is not
enough, as clang also needs to add the fast-math flags to the call
instructions.

float do_add(float A, float B) { return A + B; }

float red1(float *A, float *B, unsigned N) {
    float sum = 0;
    for (unsigned I = 0; I != N; ++I) {
      #pragma clang fp reassociate(on)
      sum = do_add(sum, A[I]);
    }
    return sum;
}
@llvmbot
Copy link
Member

llvmbot commented Jun 24, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

This patch updates inlining to propagate fast-math flags from the arguments of an inlined call to the instructions that replace the call.

I am not sure if the behavior of fast-math flags at call sites is fully/clearly specified in LangRef currently, but at least intuitively it seems reasonable to me to interpret fast-math flags at a call site to mean that we can assume that the flags apply to the arguments and result of the call. This should allow using that information to propagate the flags to the inlined instructions. But I might be missing something that prevents us from doing so.

The end-to-end motivation for that is to allow for better propgation for #pragma clang fp when inlining. E.g. consider the code below.

I would expect that the add in the loop has the reassoc flag, but currently it does not. Note that the patch here on its own is not enough, as clang also needs to add the fast-math flags to the call instructions.

float do_add(float A, float B) { return A + B; }

float red1(float *A, float *B, unsigned N) {
float sum = 0;
for (unsigned I = 0; I != N; ++I) {
#pragma clang fp reassociate(on)
sum = do_add(sum, A[I]);
}
return sum;
}


Full diff: https://github.com/llvm/llvm-project/pull/145537.diff

4 Files Affected:

  • (modified) llvm/include/llvm/Transforms/Utils/Cloning.h (+3)
  • (modified) llvm/lib/Transforms/Utils/CloneFunction.cpp (+47)
  • (modified) llvm/lib/Transforms/Utils/InlineFunction.cpp (+2)
  • (added) llvm/test/Transforms/Inline/propagate-fast-math-flags-to-inlined-instructions.ll (+93)
diff --git a/llvm/include/llvm/Transforms/Utils/Cloning.h b/llvm/include/llvm/Transforms/Utils/Cloning.h
index 6b56230a6e1d4..c8c15edb55c56 100644
--- a/llvm/include/llvm/Transforms/Utils/Cloning.h
+++ b/llvm/include/llvm/Transforms/Utils/Cloning.h
@@ -23,6 +23,7 @@
 #include "llvm/Analysis/InlineCost.h"
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/DebugLoc.h"
+#include "llvm/IR/FMF.h"
 #include "llvm/IR/ValueHandle.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Transforms/Utils/ValueMapper.h"
@@ -87,6 +88,8 @@ struct ClonedCodeInfo {
   /// check whether the main VMap mapping involves simplification or not.
   DenseMap<const Value *, const Value *> OrigVMap;
 
+  FastMathFlags FMFs;
+
   ClonedCodeInfo() = default;
 
   bool isSimplified(const Value *From, const Value *To) const {
diff --git a/llvm/lib/Transforms/Utils/CloneFunction.cpp b/llvm/lib/Transforms/Utils/CloneFunction.cpp
index fccb73a36b182..3442fd040735f 100644
--- a/llvm/lib/Transforms/Utils/CloneFunction.cpp
+++ b/llvm/lib/Transforms/Utils/CloneFunction.cpp
@@ -700,6 +700,51 @@ void PruningFunctionCloner::CloneBlock(
   }
 }
 
+/// Propagate fast-math flags flags from OldFunc's new arguments to their users
+/// if applicable.
+static void propagateFastMathFlags(const Function *OldFunc,
+                                   ValueToValueMapTy &VMap,
+                                   const FastMathFlags &FMFs) {
+  if (!FMFs.any())
+    return;
+
+  // Visit all instructions reachable from the arguments of OldFunc. This
+  // ensures we only visit instructions in the original function. The arguments
+  // have FMFs as fast-math flags. Set them for all applicable instructions in
+  // the new function (retrieved via VMap).
+
+  DenseSet<const Value *> Visited;
+  SmallVector<const Instruction *, 32> Worklist;
+  for (const Argument &Arg : OldFunc->args()) {
+    Visited.insert(&Arg);
+    for (const User *U : Arg.users()) {
+      if (Visited.insert(U).second)
+        Worklist.push_back(cast<Instruction>(U));
+    }
+  }
+
+  while (!Worklist.empty()) {
+    const Instruction *CurrentOld = Worklist.pop_back_val();
+    Instruction *Current = dyn_cast<Instruction>(VMap.lookup(CurrentOld));
+    if (!Current || !isa<FPMathOperator>(Current))
+      continue;
+
+    // TODO: Assumes all FP ops propagate the flags from args to the result, if
+    // all operands have the same flags.
+    if (!all_of(CurrentOld->operands(),
+                [&Visited](Value *V) { return Visited.contains(V); }))
+      continue;
+
+    Current->setFastMathFlags(Current->getFastMathFlags() | FMFs);
+
+    // Add all users of this instruction to the worklist
+    for (const User *U : CurrentOld->users()) {
+      if (Visited.insert(U).second)
+        Worklist.push_back(cast<Instruction>(U));
+    }
+  }
+}
+
 /// This works like CloneAndPruneFunctionInto, except that it does not clone the
 /// entire function. Instead it starts at an instruction provided by the caller
 /// and copies (and prunes) only the code reachable from that instruction.
@@ -996,6 +1041,8 @@ void llvm::CloneAndPruneIntoFromInst(Function *NewFunc, const Function *OldFunc,
        I != E; ++I)
     if (ReturnInst *RI = dyn_cast<ReturnInst>(I->getTerminator()))
       Returns.push_back(RI);
+
+  propagateFastMathFlags(OldFunc, VMap, CodeInfo->FMFs);
 }
 
 /// This works exactly like CloneFunctionInto,
diff --git a/llvm/lib/Transforms/Utils/InlineFunction.cpp b/llvm/lib/Transforms/Utils/InlineFunction.cpp
index 7df5e9958182c..a3af09d124e1f 100644
--- a/llvm/lib/Transforms/Utils/InlineFunction.cpp
+++ b/llvm/lib/Transforms/Utils/InlineFunction.cpp
@@ -2697,6 +2697,8 @@ llvm::InlineResult llvm::InlineFunction(CallBase &CB, InlineFunctionInfo &IFI,
     // have no dead or constant instructions leftover after inlining occurs
     // (which can happen, e.g., because an argument was constant), but we'll be
     // happy with whatever the cloner can do.
+    InlinedFunctionInfo.FMFs =
+        isa<FPMathOperator>(&CB) ? CB.getFastMathFlags() : FastMathFlags();
     CloneAndPruneFunctionInto(Caller, CalledFunc, VMap,
                               /*ModuleLevelChanges=*/false, Returns, ".i",
                               &InlinedFunctionInfo);
diff --git a/llvm/test/Transforms/Inline/propagate-fast-math-flags-to-inlined-instructions.ll b/llvm/test/Transforms/Inline/propagate-fast-math-flags-to-inlined-instructions.ll
new file mode 100644
index 0000000000000..1d5becea14014
--- /dev/null
+++ b/llvm/test/Transforms/Inline/propagate-fast-math-flags-to-inlined-instructions.ll
@@ -0,0 +1,93 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 5
+; RUN: opt -p inline -S %s | FileCheck %s
+
+@g = external global float
+
+define float @add(float %a, float %b) {
+; CHECK-LABEL: define float @add(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[A]], [[B]]
+; CHECK-NEXT:    ret float [[ADD]]
+;
+  %add = fadd float %a, %b
+  ret float %add
+}
+
+define float @caller1(float %a, float %b) {
+; CHECK-LABEL: define float @caller1(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT:    [[ADD_I:%.*]] = fadd reassoc float [[A]], [[B]]
+; CHECK-NEXT:    ret float [[ADD_I]]
+;
+  %r = call reassoc float @add(float %a, float %b)
+  ret float %r
+}
+
+define float @add_with_unrelated_fp_math(float %a, float %b) {
+; CHECK-LABEL: define float @add_with_unrelated_fp_math(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT:    [[L:%.*]] = load float, ptr @g, align 4
+; CHECK-NEXT:    [[RES:%.*]] = fmul float [[L]], [[A]]
+; CHECK-NEXT:    store float [[RES]], ptr @g, align 4
+; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[A]], [[B]]
+; CHECK-NEXT:    ret float [[ADD]]
+;
+  %l = load float, ptr @g
+  %res = fmul float %l, %a
+  store float %res, ptr @g
+  %add = fadd float %a, %b
+  ret float %add
+}
+
+; Make sure the call-site fast-math flags are not added to instructions where
+; not all operands have the new fast-math flags.
+define float @caller2(float %a, float %b) {
+; CHECK-LABEL: define float @caller2(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT:    [[L_I:%.*]] = load float, ptr @g, align 4
+; CHECK-NEXT:    [[RES_I:%.*]] = fmul float [[L_I]], [[A]]
+; CHECK-NEXT:    store float [[RES_I]], ptr @g, align 4
+; CHECK-NEXT:    [[ADD_I:%.*]] = fadd nnan float [[A]], [[B]]
+; CHECK-NEXT:    ret float [[ADD_I]]
+;
+  %r = call nnan float @add_with_unrelated_fp_math(float %a, float %b)
+  ret float %r
+}
+
+define float @add_with_nnan(float %a, float %b) {
+; CHECK-LABEL: define float @add_with_nnan(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT:    [[ADD:%.*]] = fadd nnan float [[A]], [[B]]
+; CHECK-NEXT:    ret float [[ADD]]
+;
+  %add = fadd nnan float %a, %b
+  ret float %add
+}
+
+; Make sure the fast-math flags on the original instruction are kept and the
+; call-site flags are added.
+define float @caller3(float %a, float %b) {
+; CHECK-LABEL: define float @caller3(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT:    [[ADD_I:%.*]] = fadd nnan ninf float [[A]], [[B]]
+; CHECK-NEXT:    ret float [[ADD_I]]
+;
+  %r = call ninf float @add_with_nnan(float %a, float %b)
+  ret float %r
+}
+
+; Make sure the fast-math flags don't get accidentally propagated to
+; instructions in the caller, reachable via the passed arguments.
+define float @caller4(float %a, float %b) {
+; CHECK-LABEL: define float @caller4(
+; CHECK-SAME: float [[A:%.*]], float [[B:%.*]]) {
+; CHECK-NEXT:    [[ADD_I:%.*]] = fadd ninf float [[A]], [[B]]
+; CHECK-NEXT:    [[DIV:%.*]] = fdiv float [[A]], [[B]]
+; CHECK-NEXT:    [[ADD:%.*]] = fadd float [[ADD_I]], [[DIV]]
+; CHECK-NEXT:    ret float [[ADD]]
+;
+  %r = call ninf float @add(float %a, float %b)
+  %div = fdiv float %a, %b
+  %add = fadd float %r, %div
+  ret float %add
+}

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may work for the poison-generating flags by propagating that poison, but I don't think this is a valid thing to do for rewrite-based flags.

More generally, I would say that FMF on calls really only makes sense for intrinsics (or things that are "close enough", like libcalls), but not user-defined functions.

@andykaylor
Copy link
Contributor

I agree with @nikic on the limitations of applying fast-math flags to calls. In fact, I think it would be reasonable to state outright that fast-math flags can't be applied transitively to defined functions. Consider this situation:

float foo(float a, float b, float c) {
  #pragma STDC FP_CONTRACT OFF
  return a * b + c;
}

float bar(float a, float b, float c) {
  return foo(a, b, c);
}

Here a pragma has been used to disallow contract on the definition of foo, but currently clang sets the contract flag on the call to foo in bar. This call is going to be inlined, but that should not re-enable contract on the inlined operations.

We should probably update clang to stop setting the fast-math flags on non-intrinsic calls, but I don't think we should try to apply them if they are present. The nofpclass attribute covers the effects of the poison-inducing flags, and those flags can be applied during inlining based on the nofpclass attribute settings, but the rewrite flags should not be applied.

@fhahn
Copy link
Contributor Author

fhahn commented Jun 24, 2025

I agree with @nikic on the limitations of applying fast-math flags to calls. In fact, I think it would be reasonable to state outright that fast-math flags can't be applied transitively to defined functions. Consider this situation:

float foo(float a, float b, float c) {
  #pragma STDC FP_CONTRACT OFF
  return a * b + c;
}

float bar(float a, float b, float c) {
  return foo(a, b, c);
}

Here a pragma has been used to disallow contract on the definition of foo, but currently clang sets the contract flag on the call to foo in bar. This call is going to be inlined, but that should not re-enable contract on the inlined operations.

We should probably update clang to stop setting the fast-math flags on non-intrinsic calls, but I don't think we should try to apply them if they are present. The nofpclass attribute covers the effects of the poison-inducing flags, and those flags can be applied during inlining based on the nofpclass attribute settings, but the rewrite flags should not be applied.

Ah thanks for the example. I guess the problem is that we cannot distinguish why flags are not present, i.e. they could be explicitly disabled...

Although I find it suprising that in the converse case, we can end up with contracted ops?

 float foo(float a, float b, float c) {
   return a * b + c;
 }
 
 float bar(float a, float b, float c) {
   #pragma STDC FP_CONTRACT OFF
   return foo(a, b, c);
 }

Any thoughts on how we could model things in Clang to get something like below working, as in propagating the flags through inlining? Maybe another pragma that instead adds nofpclass? Currently #pragma clang fp reassociate(on) doesn't compose nicely with things like C++ templated libraries.

float do_add(float A, float B) { return A + B; }

float red1(float *A, float *B, unsigned N) {
float sum = 0;
  for (unsigned I = 0; I != N; ++I) {
    #pragma clang fp reassociate(on)
    sum = do_add(sum, A[I]);
  }
return sum;
}

@andykaylor
Copy link
Contributor

Although I find it suprising that in the converse case, we can end up with contracted ops?

I can understand how that might surprise many users, but I think it is consistent with the scope of the pragma. The decision to inline or not shouldn't affect the result.

Would something like this work for you?

https://godbolt.org/z/oer4qGe8W

I can't think of a way to do it if without some modification of the called function.

We have a similar problem with target intrinsics, where the intrinsics are defined as functions that always get inlined, so there's no good way to use scope-local fast-math settings on the intrinsics.

https://godbolt.org/z/obehdMf4e

I know this surprises many users, who expect these intrinsics to behave like built-ins, but the implementation doesn't work that way. At some point I was involved in a discussion of creating some kind of function attribute that would indicate that fast-math settings should be transitively applied to a function from its call-site, but I don't think that ever went anywhere. @jcranmer-intel Do you remember that discussion?

@arsenm arsenm added the floating-point Floating-point math label Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants