Skip to content

Introduce -fexperimental-loop-fuse to clang and flang #142686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions clang/include/clang/Basic/CodeGenOptions.def
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,7 @@ CODEGENOPT(TimeTrace , 1, 0, Benign) ///< Set when -ftime-trace is enabl
VALUE_CODEGENOPT(TimeTraceGranularity, 32, 500, Benign) ///< Minimum time granularity (in microseconds),
///< traced by time profiler
CODEGENOPT(InterchangeLoops , 1, 0, Benign) ///< Run loop-interchange.
CODEGENOPT(FuseLoops , 1, 0, Benign) ///< Run loop-fusion.
CODEGENOPT(UnrollLoops , 1, 0, Benign) ///< Control whether loops are unrolled.
CODEGENOPT(RerollLoops , 1, 0, Benign) ///< Control whether loops are rerolled.
CODEGENOPT(NoUseJumpTables , 1, 0, Benign) ///< Set when -fno-jump-tables is enabled.
Expand Down
4 changes: 4 additions & 0 deletions clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -4268,6 +4268,10 @@ def floop_interchange : Flag<["-"], "floop-interchange">, Group<f_Group>,
HelpText<"Enable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def fno_loop_interchange: Flag<["-"], "fno-loop-interchange">, Group<f_Group>,
HelpText<"Disable the loop interchange pass">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
defm experimental_loop_fusion
: OptInCC1FFlag<"experimental-loop-fusion", "Enable", "Disable",
"Enable the loop fusion pass",
[ClangOption, CC1Option, FlangOption, FC1Option]>;
def funroll_loops : Flag<["-"], "funroll-loops">, Group<f_Group>,
HelpText<"Turn on loop unroller">, Visibility<[ClangOption, CC1Option, FlangOption, FC1Option]>;
def fno_unroll_loops : Flag<["-"], "fno-unroll-loops">, Group<f_Group>,
Expand Down
2 changes: 2 additions & 0 deletions clang/lib/CodeGen/BackendUtil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -897,6 +897,7 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
PipelineTuningOptions PTO;
PTO.LoopUnrolling = CodeGenOpts.UnrollLoops;
PTO.LoopInterchange = CodeGenOpts.InterchangeLoops;
PTO.LoopFusion = CodeGenOpts.FuseLoops;
// For historical reasons, loop interleaving is set to mirror setting for loop
// unrolling.
PTO.LoopInterleaving = CodeGenOpts.UnrollLoops;
Expand Down Expand Up @@ -1332,6 +1333,7 @@ runThinLTOBackend(CompilerInstance &CI, ModuleSummaryIndex *CombinedIndex,
Conf.SampleProfile = std::move(SampleProfile);
Conf.PTO.LoopUnrolling = CGOpts.UnrollLoops;
Conf.PTO.LoopInterchange = CGOpts.InterchangeLoops;
Conf.PTO.LoopFusion = CGOpts.FuseLoops;
// For historical reasons, loop interleaving is set to mirror setting for loop
// unrolling.
Conf.PTO.LoopInterleaving = CGOpts.UnrollLoops;
Expand Down
2 changes: 2 additions & 0 deletions clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6870,6 +6870,8 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
options::OPT_fno_unroll_loops);
Args.AddLastArg(CmdArgs, options::OPT_floop_interchange,
options::OPT_fno_loop_interchange);
Args.AddLastArg(CmdArgs, options::OPT_fexperimental_loop_fusion,
options::OPT_fno_experimental_loop_fusion);

Args.AddLastArg(CmdArgs, options::OPT_fstrict_flex_arrays_EQ);

Expand Down
3 changes: 3 additions & 0 deletions clang/lib/Driver/ToolChains/Flang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,9 @@ void Flang::addCodegenOptions(const ArgList &Args,
!stackArrays->getOption().matches(options::OPT_fno_stack_arrays))
CmdArgs.push_back("-fstack-arrays");

Args.AddLastArg(CmdArgs, options::OPT_fexperimental_loop_fusion,
options::OPT_fno_experimental_loop_fusion);

handleInterchangeLoopsArgs(Args, CmdArgs);
handleVectorizeLoopsArgs(Args, CmdArgs);
handleVectorizeSLPArgs(Args, CmdArgs);
Expand Down
5 changes: 5 additions & 0 deletions clang/lib/Frontend/CompilerInvocation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1671,6 +1671,9 @@ void CompilerInvocationBase::GenerateCodeGenArgs(const CodeGenOptions &Opts,
else
GenerateArg(Consumer, OPT_fno_loop_interchange);

if (Opts.FuseLoops)
GenerateArg(Consumer, OPT_fexperimental_loop_fusion);

if (!Opts.BinutilsVersion.empty())
GenerateArg(Consumer, OPT_fbinutils_version_EQ, Opts.BinutilsVersion);

Expand Down Expand Up @@ -1991,6 +1994,8 @@ bool CompilerInvocation::ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args,
(Opts.OptimizationLevel > 1));
Opts.InterchangeLoops =
Args.hasFlag(OPT_floop_interchange, OPT_fno_loop_interchange, false);
Opts.FuseLoops = Args.hasFlag(OPT_fexperimental_loop_fusion,
OPT_fno_experimental_loop_fusion, false);
Opts.BinutilsVersion =
std::string(Args.getLastArgValue(OPT_fbinutils_version_EQ));

Expand Down
13 changes: 13 additions & 0 deletions clang/test/Driver/clang_f_opts.c
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,19 @@
// CHECK-INTERCHANGE-LOOPS: "-floop-interchange"
// CHECK-NO-INTERCHANGE-LOOPS: "-fno-loop-interchange"

// RUN: %clang -### -S -fexperimental-loop-fusion %s 2>&1 | FileCheck -check-prefix=CHECK-FUSE-LOOPS %s
// RUN: %clang -### -S -fno-experimental-loop-fusion %s 2>&1 | FileCheck -check-prefix=CHECK-NO-FUSE-LOOPS %s
// RUN: %clang -### -S -fno-experimental-loop-fusion -fexperimental-loop-fusion %s 2>&1 | FileCheck -check-prefix=CHECK-FUSE-LOOPS %s
// RUN: %clang -### -S -fexperimental-loop-fusion -fno-experimental-loop-fusion %s 2>&1 | FileCheck -check-prefix=CHECK-NO-FUSE-LOOPS %s
// CHECK-FUSE-LOOPS: "-fexperimental-loop-fusion"
// CHECK-NO-FUSE-LOOPS: "-fno-experimental-loop-fusion"
//
// RUN: %clang -c -fexperimental-loop-fusion -mllvm -print-pipeline-passes -O3 %s 2>&1 | FileCheck --check-prefixes=LOOP-FUSION-ON %s
// RUN: %clang -c -mllvm -print-pipeline-passes -O3 %s 2>&1 | FileCheck --check-prefixes=LOOP-FUSION-OFF %s

// LOOP-FUSION-ON: loop-fusion
// LOOP-FUSION-OFF-NOT: loop-fusion

// RUN: %clang -### -S -fprofile-sample-accurate %s 2>&1 | FileCheck -check-prefix=CHECK-PROFILE-SAMPLE-ACCURATE %s
// CHECK-PROFILE-SAMPLE-ACCURATE: "-fprofile-sample-accurate"

Expand Down
2 changes: 2 additions & 0 deletions flang/docs/ReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ page](https://llvm.org/releases/).

## New Compiler Flags

* -fexperimental-loop-fusion is now recognized by flang.

## Windows Support

## Fortran Language Changes in Flang
Expand Down
1 change: 1 addition & 0 deletions flang/include/flang/Frontend/CodeGenOptions.def
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ CODEGENOPT(StackArrays, 1, 0) ///< -fstack-arrays (enable the stack-arrays pass)
CODEGENOPT(VectorizeLoop, 1, 0) ///< Enable loop vectorization.
CODEGENOPT(VectorizeSLP, 1, 0) ///< Enable SLP vectorization.
CODEGENOPT(InterchangeLoops, 1, 0) ///< Enable loop interchange.
CODEGENOPT(FuseLoops, 1, 0) ///< Enable loop fusion.
CODEGENOPT(LoopVersioning, 1, 0) ///< Enable loop versioning.
CODEGENOPT(UnrollLoops, 1, 0) ///< Enable loop unrolling
CODEGENOPT(AliasAnalysis, 1, 0) ///< Enable alias analysis pass
Expand Down
3 changes: 3 additions & 0 deletions flang/lib/Frontend/CompilerInvocation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,9 @@ static void parseCodeGenArgs(Fortran::frontend::CodeGenOptions &opts,
if (args.getLastArg(clang::driver::options::OPT_floop_interchange))
opts.InterchangeLoops = 1;

if (args.getLastArg(clang::driver::options::OPT_fexperimental_loop_fusion))
opts.FuseLoops = 1;

if (args.getLastArg(clang::driver::options::OPT_vectorize_loops))
opts.VectorizeLoop = 1;

Expand Down
1 change: 1 addition & 0 deletions flang/lib/Frontend/FrontendActions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -951,6 +951,7 @@ void CodeGenAction::runOptimizationPipeline(llvm::raw_pwrite_stream &os) {
si.getTimePasses().setOutStream(ci.getTimingStreamLLVM());
pto.LoopUnrolling = opts.UnrollLoops;
pto.LoopInterchange = opts.InterchangeLoops;
pto.LoopFusion = opts.FuseLoops;
pto.LoopInterleaving = opts.UnrollLoops;
pto.LoopVectorization = opts.VectorizeLoop;
pto.SLPVectorization = opts.VectorizeSLP;
Expand Down
17 changes: 17 additions & 0 deletions flang/test/Driver/loop-fuse.f90
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
! RUN: %flang -### -S -fexperimental-loop-fusion %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-FUSE %s
! RUN: %flang -### -S -fno-experimental-loop-fusion %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
! RUN: %flang -### -S -O0 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
! RUN: %flang -### -S -O1 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
! RUN: %flang -### -S -O2 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
! RUN: %flang -### -S -O3 %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
! RUN: %flang -### -S -Os %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
! RUN: %flang -### -S -Oz %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE %s
! CHECK-LOOP-FUSE: "-fexperimental-loop-fusion"
! CHECK-NO-LOOP-FUSE-NOT: "-fexperimental-loop-fusion"
! RUN: %flang_fc1 -emit-llvm -O2 -fexperimental-loop-fusion -mllvm -print-pipeline-passes -o /dev/null %s 2>&1 | FileCheck -check-prefix=CHECK-LOOP-FUSE-PASS %s
! RUN: %flang_fc1 -emit-llvm -O2 -fno-experimental-loop-fusion -mllvm -print-pipeline-passes -o /dev/null %s 2>&1 | FileCheck -check-prefix=CHECK-NO-LOOP-FUSE-PASS %s
! CHECK-LOOP-FUSE-PASS: loop-fusion
! CHECK-NO-LOOP-FUSE-PASS-NOT: loop-fusion

program test
end program
3 changes: 3 additions & 0 deletions llvm/include/llvm/Passes/PassBuilder.h
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,9 @@ class PipelineTuningOptions {
/// false.
bool LoopInterchange;

/// Tuning option to enable/disable loop fusion. Its default value is false.
bool LoopFusion;

/// Tuning option to forget all SCEV loops in LoopUnroll. Its default value
/// is that of the flag: `-forget-scev-loop-unroll`.
bool ForgetAllSCEVInLoopUnroll;
Expand Down
13 changes: 12 additions & 1 deletion llvm/lib/Passes/PassBuilderPipelines.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@
#include "llvm/Transforms/Scalar/LoopDeletion.h"
#include "llvm/Transforms/Scalar/LoopDistribute.h"
#include "llvm/Transforms/Scalar/LoopFlatten.h"
#include "llvm/Transforms/Scalar/LoopFuse.h"
#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"
#include "llvm/Transforms/Scalar/LoopInstSimplify.h"
#include "llvm/Transforms/Scalar/LoopInterchange.h"
Expand Down Expand Up @@ -204,6 +205,10 @@ static cl::opt<bool>
EnableLoopInterchange("enable-loopinterchange", cl::init(false), cl::Hidden,
cl::desc("Enable the LoopInterchange Pass"));

static cl::opt<bool> EnableLoopFusion("enable-loopfusion", cl::init(false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't quite understand why you want to add both a clang flag and an opt flag. But if that's the intention, EnableLoopFusino should be defined in tools/opt/NewPMDriver.cpp, not here. The same applies to EnableLoopInterchange. IIUC, declaring this here allows specifying like clang -mllvm -enable-loopfusion, but it will be overwritten after PipelineTuningOptions's ctor is invoked.

On the other hand, options defined in tools/opt/NewPMDriver.cpp aren't recognized by the clang driver, IIUC.

But again, I'm not sure why clang/flang flags are really needed. Why isn't the opt flag sufficient? IMHO, at the very least, you should clarify the reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, I am taking over this patch from Sebastian. Your comment makes me think.

The primary purpose of this patch is to user-facing option to allow turning fusion on or off. Not everyone is aware of the options provided by developer tools like opt.

I see the precedence in the code. In flang/include/flang/Frontend/CodegenOptions.def I see options for loop-versioning, unrolling etc.
I see clang and flang differ here. clang doesn't seem to have such options except for floop-interchange.

floop-interchange was introduced in #125830

Having fexperimental-loop-fusion is more of syntactic sugar and convenience is based on the same principles of #125830.

I can remove the change from PassBuilderPipelines.cpp i.e. EnableFusion, as it can be redundant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the precedence in the code. In flang/include/flang/Frontend/CodegenOptions.def I see options for loop-versioning, unrolling etc. I see clang and flang differ here. clang doesn't seem to have such options except for floop-interchange.

I didn't really understand what you meant... There is a clang option -funroll-loops.

I can remove the change from PassBuilderPipelines.cpp i.e. EnableFusion, as it can be redundant.

I didn't mean it is redundant. It might be useful if we want to turn on/off the loop-fusion via opt command. What I meant is, if you also want to add both clang/flang and opt options, then you should move EnableFusion into proper place (maybe it is tools/opt/NewPMDriver.cpp).

At the moment, I don’t have any strong objections to adding experimental flags to clang, although I'm not sure if there's an existing policy on clang side for such cases.

cl::Hidden,
cl::desc("Enable the LoopFuse Pass"));

static cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam",
cl::init(false), cl::Hidden,
cl::desc("Enable Unroll And Jam Pass"));
Expand Down Expand Up @@ -313,6 +318,7 @@ PipelineTuningOptions::PipelineTuningOptions() {
SLPVectorization = false;
LoopUnrolling = true;
LoopInterchange = EnableLoopInterchange;
LoopFusion = EnableLoopFusion;
ForgetAllSCEVInLoopUnroll = ForgetSCEVInLoopUnroll;
LicmMssaOptCap = SetLicmMssaOptCap;
LicmMssaNoAccForPromotionCap = SetLicmMssaNoAccForPromotionCap;
Expand Down Expand Up @@ -1551,6 +1557,11 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
OptimizePM.addPass(createFunctionToLoopPassAdaptor(
std::move(LPM), /*UseMemorySSA=*/false, /*UseBlockFrequencyInfo=*/false));

// FIXME: This may not be the right place in the pipeline.
// We need to have the data to support the right place.
if (PTO.LoopFusion)
OptimizePM.addPass(LoopFusePass());

// Distribute loops to allow partial vectorization. I.e. isolate dependences
// into separate loop that would otherwise inhibit vectorization. This is
// currently only performed for loops marked with the metadata
Expand Down Expand Up @@ -2355,4 +2366,4 @@ AAManager PassBuilder::buildDefaultAAPipeline() {
bool PassBuilder::isInstrumentedPGOUse() const {
return (PGOOpt && PGOOpt->Action == PGOOptions::IRUse) ||
!UseCtxProfile.empty();
}
}