Skip to content

[Transforms][IPO] Add func suffix in ArgumentPromotion and DeadArgume… #105742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,7 @@ doPromotion(Function *F, FunctionAnalysisManager &FAM,

F->getParent()->getFunctionList().insert(F->getIterator(), NF);
NF->takeName(F);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is takeName() implementation:

void Value::takeName(Value *V) {
  assert(V != this && "Illegal call to this->takeName(this)!");
  ValueSymbolTable *ST = nullptr;
  // If this value has a name, drop it.
  if (hasName()) {
    // Get the symtab this is in.
    if (getSymTab(this, ST)) {
      // We can't set a name on this value, but we need to clear V's name if
      // it has one.
      if (V->hasName()) V->setName("");
      return;  // Cannot set a name on this value (e.g. constant).
    }

    // Remove old name.
    if (ST)
      ST->removeValueName(getValueName());
    destroyValueName();
  }

  // Now we know that this has no name.
  
  // If V has no name either, we're done.
  if (!V->hasName()) return;
  
  // Get this's symtab if we didn't before.
  if (!ST) {
    if (getSymTab(this, ST)) {
      // Clear V's name.
      V->setName("");
      return;  // Cannot set a name on this value (e.g. constant).
    } 
  } 

  // Get V's ST, this should always succeed, because V has a name.
  ValueSymbolTable *VST;
  bool Failure = getSymTab(V, VST);
  assert(!Failure && "V has a name, so it should have a ST!"); (void)Failure;
  
  // If these values are both in the same symtab, we can do this very fast.
  // This works even if both values have no symtab yet.
  if (ST == VST) {
    // Take the name!
    setValueName(V->getValueName());
    V->setValueName(nullptr);
    getValueName()->setValue(this);
    return;
  }
    
  // Otherwise, things are slightly more complex.  Remove V's name from VST and
  // then reinsert it into ST.
  
  if (VST)
    VST->removeValueName(V->getValueName());
  setValueName(V->getValueName());
  V->setValueName(nullptr);
  getValueName()->setValue(this);
  
  if (ST)
    ST->reinsertValue(this);
}

I think this is needed since it is a little bit complicated e.g. checking duplicated symbol etc. NF->takeName(F) provides the new func name and we just need to add suffix on top of that.

NF->setName(NF->getName() + ".argprom");

// Loop over all the callers of the function, transforming the call sites to
// pass in the loaded pointers.
Expand Down
4 changes: 4 additions & 0 deletions llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -889,6 +889,10 @@ bool DeadArgumentEliminationPass::removeDeadStuffFromFunction(Function *F) {
// it again.
F->getParent()->getFunctionList().insert(F->getIterator(), NF);
NF->takeName(F);
if (NumArgumentsEliminated)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What cases are we excluding with the "if" here? If we don't change the signature, we don't recreate the function in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What cases are we excluding with the "if" here? If we don't change the signature, we don't recreate the function in the first place.

IIUC, there is another metric NumRetValsEliminated. It is possible that NumRetValsEliminated > 0 and NumArgumentsEliminated = 0. I added the above check since we really care function arguments in bpf tracing and func return value is not that important. But I can remove the above check so we do not limit the case only to bpf tracing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine wanting a different suffix for the case where we only eliminate a return value, but I think it makes sense to have some suffix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will remove the check and use '.retelim' suffix for cases where only the return value is eliminated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine wanting a different suffix for the case where we only eliminate a return value, but I think it makes sense to have some suffix.

Just pushed a new revision to introduce '.retelim' suffix for cases where the sole func signature change is to remove return values.

NF->setName(NF->getName() + ".argelim");
else
NF->setName(NF->getName() + ".retelim");
NF->IsNewDbgInfoFormat = F->IsNewDbgInfoFormat;

// Loop over all the callers of the function, transforming the call sites to
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@ define internal void @a() alwaysinline {
}

define internal void @b(ptr) noinline {
; CHECK-LABEL: @b(
; CHECK-LABEL: @b.argprom(
; CHECK-NEXT: ret void
;
ret void
}

define internal void @c() noinline {
; CHECK-LABEL: @c(
; CHECK-NEXT: call void @b()
; CHECK-NEXT: call void @b.argprom()
; CHECK-NEXT: ret void
;
call void @b(ptr @a)
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/BugPoint/remove_arguments_test.ll
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

declare i32 @test2()

; CHECK: define void @test() {
; CHECK: define void @test.argelim() {
define i32 @test(i32 %A, ptr %B, float %C) {
call i32 @test2()
ret i32 %1
Expand Down
16 changes: 8 additions & 8 deletions llvm/test/CodeGen/AArch64/arg_promotion.ll
Original file line number Diff line number Diff line change
Expand Up @@ -38,16 +38,16 @@ define dso_local void @caller_4xi32(ptr noalias %src, ptr noalias %dst) #1 {
; CHECK-LABEL: define dso_local void @caller_4xi32(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[SRC_VAL:%.*]] = load <4 x i32>, ptr [[SRC:%.*]], align 16
; CHECK-NEXT: call fastcc void @callee_4xi32(<4 x i32> [[SRC_VAL]], ptr noalias [[DST:%.*]])
; CHECK-NEXT: call fastcc void @callee_4xi32.argprom.argprom(<4 x i32> [[SRC_VAL]], ptr noalias [[DST:%.*]])
; CHECK-NEXT: ret void
;
entry:
call fastcc void @callee_4xi32(ptr noalias %src, ptr noalias %dst)
call fastcc void @callee_4xi32.argprom(ptr noalias %src, ptr noalias %dst)
ret void
}

define internal fastcc void @callee_4xi32(ptr noalias %src, ptr noalias %dst) #1 {
; CHECK-LABEL: define internal fastcc void @callee_4xi32(
define internal fastcc void @callee_4xi32.argprom(ptr noalias %src, ptr noalias %dst) #1 {
; CHECK-LABEL: define internal fastcc void @callee_4xi32.argprom.argprom(
; CHECK-NEXT: entry:
; CHECK-NEXT: store <4 x i32> [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
; CHECK-NEXT: ret void
Expand All @@ -65,7 +65,7 @@ define dso_local void @caller_i256(ptr noalias %src, ptr noalias %dst) #0 {
; CHECK-LABEL: define dso_local void @caller_i256(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[SRC_VAL:%.*]] = load i256, ptr [[SRC:%.*]], align 16
; CHECK-NEXT: call fastcc void @callee_i256(i256 [[SRC_VAL]], ptr noalias [[DST:%.*]])
; CHECK-NEXT: call fastcc void @callee_i256.argprom(i256 [[SRC_VAL]], ptr noalias [[DST:%.*]])
; CHECK-NEXT: ret void
;
entry:
Expand All @@ -74,7 +74,7 @@ entry:
}

define internal fastcc void @callee_i256(ptr noalias %src, ptr noalias %dst) #0 {
; CHECK-LABEL: define internal fastcc void @callee_i256(
; CHECK-LABEL: define internal fastcc void @callee_i256.argprom(
; CHECK-NEXT: entry:
; CHECK-NEXT: store i256 [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
; CHECK-NEXT: ret void
Expand Down Expand Up @@ -159,7 +159,7 @@ define dso_local void @caller_struct4xi32(ptr noalias %src, ptr noalias %dst) #1
; CHECK-NEXT: [[SRC_VAL:%.*]] = load <4 x i32>, ptr [[SRC:%.*]], align 16
; CHECK-NEXT: [[TMP0:%.*]] = getelementptr i8, ptr [[SRC]], i64 16
; CHECK-NEXT: [[SRC_VAL1:%.*]] = load <4 x i32>, ptr [[TMP0]], align 16
; CHECK-NEXT: call fastcc void @callee_struct4xi32(<4 x i32> [[SRC_VAL]], <4 x i32> [[SRC_VAL1]], ptr noalias [[DST:%.*]])
; CHECK-NEXT: call fastcc void @callee_struct4xi32.argprom(<4 x i32> [[SRC_VAL]], <4 x i32> [[SRC_VAL1]], ptr noalias [[DST:%.*]])
; CHECK-NEXT: ret void
;
entry:
Expand All @@ -168,7 +168,7 @@ entry:
}

define internal fastcc void @callee_struct4xi32(ptr noalias %src, ptr noalias %dst) #1 {
; CHECK-LABEL: define internal fastcc void @callee_struct4xi32(
; CHECK-LABEL: define internal fastcc void @callee_struct4xi32.argprom(
; CHECK-NEXT: entry:
; CHECK-NEXT: store <4 x i32> [[SRC_0_VAL:%.*]], ptr [[DST:%.*]], align 16
; CHECK-NEXT: [[DST2:%.*]] = getelementptr inbounds [[STRUCT_4XI32:%.*]], ptr [[DST]], i64 0, i32 1
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/AMDGPU/internalize.ll
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
; ALL: gvar_used
@gvar_used = addrspace(1) global i32 undef, align 4

; OPT: define internal fastcc void @func_used_noinline(
; OPT: define internal fastcc void @func_used_noinline.argelim(
; OPT-NONE: define fastcc void @func_used_noinline(
define fastcc void @func_used_noinline(ptr addrspace(1) %out, i32 %tid) #1 {
entry:
Expand Down
24 changes: 12 additions & 12 deletions llvm/test/ThinLTO/X86/memprof-aliased-location1.ll
Original file line number Diff line number Diff line change
Expand Up @@ -84,22 +84,22 @@ attributes #0 = { noinline optnone }
;; The first call to foo does not allocate cold memory. It should call the
;; original functions, which ultimately call the original allocation decorated
;; with a "notcold" attribute.
; IR: call {{.*}} @_Z3foov()
; IR: call {{.*}} @_Z3foov.retelim()
;; The second call to foo allocates cold memory. It should call cloned functions
;; which ultimately call a cloned allocation decorated with a "cold" attribute.
; IR: call {{.*}} @_Z3foov.memprof.1()
; IR: define internal {{.*}} @_Z3barv()
; IR: call {{.*}} @_Z3foov.memprof.1.retelim()
; IR: define internal {{.*}} @_Z3barv.retelim()
; IR: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
; IR: define internal {{.*}} @_Z3bazv()
; IR: call {{.*}} @_Z3barv()
; IR: define internal {{.*}} @_Z3foov()
; IR: call {{.*}} @_Z3bazv()
; IR: define internal {{.*}} @_Z3barv.memprof.1()
; IR: define internal {{.*}} @_Z3bazv.retelim()
; IR: call {{.*}} @_Z3barv.retelim()
; IR: define internal {{.*}} @_Z3foov.retelim()
; IR: call {{.*}} @_Z3bazv.retelim()
; IR: define internal {{.*}} @_Z3barv.memprof.1.retelim()
; IR: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
; IR: define internal {{.*}} @_Z3bazv.memprof.1()
; IR: call {{.*}} @_Z3barv.memprof.1()
; IR: define internal {{.*}} @_Z3foov.memprof.1()
; IR: call {{.*}} @_Z3bazv.memprof.1()
; IR: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
; IR: call {{.*}} @_Z3barv.memprof.1.retelim()
; IR: define internal {{.*}} @_Z3foov.memprof.1.retelim()
; IR: call {{.*}} @_Z3bazv.memprof.1.retelim()
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }

Expand Down
24 changes: 12 additions & 12 deletions llvm/test/ThinLTO/X86/memprof-aliased-location2.ll
Original file line number Diff line number Diff line change
Expand Up @@ -84,22 +84,22 @@ attributes #0 = { noinline optnone }
;; The first call to foo does not allocate cold memory. It should call the
;; original functions, which ultimately call the original allocation decorated
;; with a "notcold" attribute.
; IR: call {{.*}} @_Z3foov()
; IR: call {{.*}} @_Z3foov.retelim()
;; The second call to foo allocates cold memory. It should call cloned functions
;; which ultimately call a cloned allocation decorated with a "cold" attribute.
; IR: call {{.*}} @_Z3foov.memprof.1()
; IR: define internal {{.*}} @_Z3barv()
; IR: call {{.*}} @_Z3foov.memprof.1.retelim()
; IR: define internal {{.*}} @_Z3barv.retelim()
; IR: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
; IR: define internal {{.*}} @_Z3bazv()
; IR: call {{.*}} @_Z3barv()
; IR: define internal {{.*}} @_Z3foov()
; IR: call {{.*}} @_Z3bazv()
; IR: define internal {{.*}} @_Z3barv.memprof.1()
; IR: define internal {{.*}} @_Z3bazv.retelim()
; IR: call {{.*}} @_Z3barv.retelim()
; IR: define internal {{.*}} @_Z3foov.retelim()
; IR: call {{.*}} @_Z3bazv.retelim()
; IR: define internal {{.*}} @_Z3barv.memprof.1.retelim()
; IR: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
; IR: define internal {{.*}} @_Z3bazv.memprof.1()
; IR: call {{.*}} @_Z3barv.memprof.1()
; IR: define internal {{.*}} @_Z3foov.memprof.1()
; IR: call {{.*}} @_Z3bazv.memprof.1()
; IR: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
; IR: call {{.*}} @_Z3barv.memprof.1.retelim()
; IR: define internal {{.*}} @_Z3foov.memprof.1.retelim()
; IR: call {{.*}} @_Z3bazv.memprof.1.retelim()
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }

Expand Down
19 changes: 18 additions & 1 deletion llvm/test/ThinLTO/X86/memprof-basic.ll
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@
;; We should have cloned bar, baz, and foo, for the cold memory allocation.
; RUN: cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED

; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST


;; Try again but with distributed ThinLTO
Expand Down Expand Up @@ -303,6 +303,23 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }

; IRNODIST: define {{.*}} @main
; IRNODIST: call {{.*}} @_Z3foov.retelim()
; IRNODIST: call {{.*}} @_Z3foov.memprof.1.retelim()
; IRNODIST: define internal {{.*}} @_Z3barv.retelim()
; IRNODIST: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
; IRNODIST: define internal {{.*}} @_Z3bazv.retelim()
; IRNODIST: call {{.*}} @_Z3barv.retelim()
; IRNODIST: define internal {{.*}} @_Z3foov.retelim()
; IRNODIST: call {{.*}} @_Z3bazv.retelim()
; IRNODIST: define internal {{.*}} @_Z3barv.memprof.1.retelim()
; IRNODIST: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
; IRNODIST: define internal {{.*}} @_Z3bazv.memprof.1.retelim()
; IRNODIST: call {{.*}} @_Z3barv.memprof.1.retelim()
; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.retelim()
; IRNODIST: call {{.*}} @_Z3bazv.memprof.1.retelim()
; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }

; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
Expand Down
14 changes: 13 additions & 1 deletion llvm/test/ThinLTO/X86/memprof-duplicate-context-ids.ll
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@
; RUN: -o %t.out 2>&1 | FileCheck %s --check-prefix=DUMP \
; RUN: --check-prefix=STATS --check-prefix=STATS-BE --check-prefix=REMARKS

; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST


;; Try again but with distributed ThinLTO
Expand Down Expand Up @@ -247,6 +247,18 @@ attributes #0 = { noinline optnone}
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }

; IRNODIST: define internal {{.*}} @_Z1Dv.retelim()
; IRNODIST: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
; IRNODIST: define internal {{.*}} @_Z1Fv.retelim()
; IRNODIST: call {{.*}} @_Z1Dv.retelim()
; IRNODIST: define internal {{.*}} @_Z1Bv.retelim()
; IRNODIST: call {{.*}} @_Z1Dv.memprof.1.retelim()
; IRNODIST: define internal {{.*}} @_Z1Ev.retelim()
; IRNODIST: call {{.*}} @_Z1Dv.memprof.1.retelim()
; IRNODIST: define internal {{.*}} @_Z1Dv.memprof.1.retelim()
; IRNODIST: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }

; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
Expand Down
19 changes: 18 additions & 1 deletion llvm/test/ThinLTO/X86/memprof-funcassigncloning.ll
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
; RUN: -o %t.out 2>&1 | FileCheck %s --check-prefix=DUMP \
; RUN: --check-prefix=STATS --check-prefix=STATS-BE --check-prefix=REMARKS

; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this change about?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this change about?

In original memprof-funcassigncloning.ll, there are two places check prefix IR are used:

 52 ; RUN: opt -thinlto-bc %s >%t.o
 53 ; RUN: llvm-lto2 run %t.o -enable-memprof-context-disambiguation \
 54 ; RUN:  -supports-hot-cold-new \
 55 ; RUN:  -r=%t.o,main,plx \
 56 ; RUN:  -r=%t.o,_ZdaPv, \
 57 ; RUN:  -r=%t.o,sleep, \
 58 ; RUN:  -r=%t.o,_Znam, \
 59 ; RUN:  -memprof-verify-ccg -memprof-verify-nodes -memprof-dump-ccg \
 60 ; RUN:  -stats -pass-remarks=memprof-context-disambiguation -save-temps \
 61 ; RUN:  -o %t.out 2>&1 | FileCheck %s --check-prefix=DUMP \
 62 ; RUN:  --check-prefix=STATS --check-prefix=STATS-BE --check-prefix=REMARKS
 63 
 **64 ; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR**
 65 
 66 
 67 ;; Try again but with distributed ThinLTO
 68 ; RUN: llvm-lto2 run %t.o -enable-memprof-context-disambiguation \
 69 ; RUN:  -supports-hot-cold-new \
 70 ; RUN:  -thinlto-distributed-indexes \
 71 ; RUN:  -r=%t.o,main,plx \
 72 ; RUN:  -r=%t.o,_ZdaPv, \
 73 ; RUN:  -r=%t.o,sleep, \
 74 ; RUN:  -r=%t.o,_Znam, \
 75 ; RUN:  -memprof-verify-ccg -memprof-verify-nodes -memprof-dump-ccg \
 76 ; RUN:  -stats -pass-remarks=memprof-context-disambiguation \
 77 ; RUN:  -o %t2.out 2>&1 | FileCheck %s --check-prefix=DUMP \
 78 ; RUN:  --check-prefix=STATS
 79 
 80 ;; Run ThinLTO backend
 81 ; RUN: opt -passes=memprof-context-disambiguation \
 82 ; RUN:  -memprof-import-summary=%t.o.thinlto.bc \
 83 ; RUN:  -stats -pass-remarks=memprof-context-disambiguation \
 **84 ; RUN:  %t.o -S 2>&1 | FileCheck %s --check-prefix=IR \**
 85 ; RUN:  --check-prefix=STATS-BE --check-prefix=REMARKS

With added suffix, the function signature will change for one 'IR' check but not another. That is why I renamed one IR to IRNODIST where the IRNODIST flavor has func signature change to minimize the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why name change does not happen with distributed thinLTO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why name change does not happen with distributed thinLTO?

This is due to the following code in llvm/tools/llvm-lto2/llvm-lto2.cpp:

  if (ThinLTODistributedIndexes)
    Backend = createWriteIndexesThinBackend(/*OldPrefix=*/"",
                                            /*NewPrefix=*/"",
                                            /*NativeObjectPrefix=*/"",
                                            ThinLTOEmitImports,
                                            /*LinkedObjectsFile=*/nullptr,
                                            /*OnWrite=*/{});
  else
    Backend = createInProcessThinBackend(
        llvm::heavyweight_hardware_concurrency(Threads),
        /* OnWrite */ {}, ThinLTOEmitIndexes, ThinLTOEmitImports);

If ThinLTODistributedIndexes is true, createWriteIndexesThinBackend() is called which did not trigger DeadArgElimination pass.



;; Try again but with distributed ThinLTO
Expand Down Expand Up @@ -283,6 +283,23 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }

; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.argelim(
; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD:[0-9]+]]
; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
; IRNODIST: define internal {{.*}} @_Z1BPPcS0_(
; IRNODIST: call {{.*}} @_Z1EPPcS0_.argelim(
; IRNODIST: define internal {{.*}} @_Z1CPPcS0_(
; IRNODIST: call {{.*}} @_Z1EPPcS0_.memprof.3.argelim(
; IRNODIST: define internal {{.*}} @_Z1DPPcS0_(
; IRNODIST: call {{.*}} @_Z1EPPcS0_.memprof.2.argelim(
; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.memprof.2.argelim(
; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[COLD:[0-9]+]]
; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
; IRNODIST: define internal {{.*}} @_Z1EPPcS0_.memprof.3.argelim(
; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[NOTCOLD]]
; IRNODIST: call {{.*}} @_Znam(i64 noundef 10) #[[COLD]]
; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }

; STATS: 2 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 2 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
Expand Down
15 changes: 14 additions & 1 deletion llvm/test/ThinLTO/X86/memprof-indirectcall.ll
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@
;; from main allocating cold memory.
; RUN: cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED

; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST


;; Try again but with distributed ThinLTO
Expand Down Expand Up @@ -419,6 +419,19 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }

; IRNODIST: define {{.*}} @main(
; IRNODIST: call {{.*}} @_Z3foov.argelim()
; IRNODIST: call {{.*}} @_Z3foov.memprof.1.argelim()
; IRNODIST: call {{.*}} @_Z3barP1A.argelim(
; IRNODIST: call {{.*}} @_Z3barP1A.argelim(
; IRNODIST: call {{.*}} @_Z3barP1A.argelim(
; IRNODIST: call {{.*}} @_Z3barP1A.argelim(
; IRNODIST: define internal {{.*}} @_Z3foov.argelim()
; IRNODIST: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.argelim()
; IRNODIST: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }

; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
Expand Down
15 changes: 14 additions & 1 deletion llvm/test/ThinLTO/X86/memprof-inlined.ll
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
;; cold memory.
; RUN: cat %t.ccg.cloned.dot | FileCheck %s --check-prefix=DOTCLONED

; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IR
; RUN: llvm-dis %t.out.1.4.opt.bc -o - | FileCheck %s --check-prefix=IRNODIST


;; Try again but with distributed ThinLTO
Expand Down Expand Up @@ -323,6 +323,19 @@ attributes #0 = { noinline optnone }
; IR: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IR: attributes #[[COLD]] = { "memprof"="cold" }

; IRNODIST: define internal {{.*}} @_Z3barv.retelim()
; IRNODIST: call {{.*}} @_Znam(i64 0) #[[NOTCOLD:[0-9]+]]
; IRNODIST: define internal {{.*}} @_Z3foov.retelim()
; IRNODIST: call {{.*}} @_Z3barv.retelim()
; IRNODIST: define {{.*}} @main()
; IRNODIST: call {{.*}} @_Z3foov.retelim()
; IRNODIST: call {{.*}} @_Z3foov.memprof.1.retelim()
; IRNODIST: define internal {{.*}} @_Z3barv.memprof.1.retelim()
; IRNODIST: call {{.*}} @_Znam(i64 0) #[[COLD:[0-9]+]]
; IRNODIST: define internal {{.*}} @_Z3foov.memprof.1.retelim()
; IRNODIST: call {{.*}} @_Z3barv.memprof.1.retelim()
; IRNODIST: attributes #[[NOTCOLD]] = { "memprof"="notcold" }
; IRNODIST: attributes #[[COLD]] = { "memprof"="cold" }

; STATS: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned)
; STATS-BE: 1 memprof-context-disambiguation - Number of cold static allocations (possibly cloned) during ThinLTO backend
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
; RUN: cat %t | FileCheck -check-prefix=REMARK %s

define internal i32 @deref(ptr %x) nounwind {
; CHECK-LABEL: define {{[^@]+}}@deref
; CHECK-LABEL: define {{[^@]+}}@deref.argprom
; CHECK-SAME: (i32 [[X_0_VAL:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-NEXT: entry:
; CHECK-NEXT: ret i32 [[X_0_VAL]]
Expand All @@ -29,7 +29,7 @@ define i32 @f(i32 %x) {
; CHECK-NEXT: [[X_ADDR:%.*]] = alloca i32, align 4
; CHECK-NEXT: store i32 [[X]], ptr [[X_ADDR]], align 4
; CHECK-NEXT: [[X_ADDR_VAL:%.*]] = load i32, ptr [[X_ADDR]], align 4
; CHECK-NEXT: [[TEMP1:%.*]] = call i32 @deref(i32 [[X_ADDR_VAL]])
; CHECK-NEXT: [[TEMP1:%.*]] = call i32 @deref.argprom(i32 [[X_ADDR_VAL]])
; CHECK-NEXT: ret i32 [[TEMP1]]
;
entry:
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/Transforms/ArgumentPromotion/BPF/argpromotion.ll
Original file line number Diff line number Diff line change
Expand Up @@ -85,4 +85,4 @@ entry:
; Without number-of-argument constraint, argpromotion will create a function signature with 5 arguments, which equals
; the maximum number of argument permitted by bpf backend, so argpromotion result code does work.
;
; CHECK: i32 @foo2(i32 %p1.0.val, i32 %p1.4.val, i32 %p2.8.val, i32 %p2.16.val, i32 %p3.20.val)
; CHECK: i32 @foo2.argprom(i32 %p1.0.val, i32 %p1.4.val, i32 %p2.8.val, i32 %p2.16.val, i32 %p3.20.val)
4 changes: 2 additions & 2 deletions llvm/test/Transforms/ArgumentPromotion/X86/attributes.ll
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ bb:
}

define internal fastcc void @promote_avx2(ptr %arg, ptr readonly %arg1) #0 {
; CHECK-LABEL: define {{[^@]+}}@promote_avx2
; CHECK-LABEL: define {{[^@]+}}@promote_avx2.argprom
; CHECK-SAME: (ptr [[ARG:%.*]], <4 x i64> [[ARG1_VAL:%.*]])
; CHECK-NEXT: bb:
; CHECK-NEXT: store <4 x i64> [[ARG1_VAL]], ptr [[ARG]]
Expand All @@ -62,7 +62,7 @@ define void @promote(ptr %arg) #0 {
; CHECK-NEXT: [[TMP2:%.*]] = alloca <4 x i64>, align 32
; CHECK-NEXT: call void @llvm.memset.p0.i64(ptr align 32 [[TMP]], i8 0, i64 32, i1 false)
; CHECK-NEXT: [[TMP_VAL:%.*]] = load <4 x i64>, ptr [[TMP]]
; CHECK-NEXT: call fastcc void @promote_avx2(ptr [[TMP2]], <4 x i64> [[TMP_VAL]])
; CHECK-NEXT: call fastcc void @promote_avx2.argprom(ptr [[TMP2]], <4 x i64> [[TMP_VAL]])
; CHECK-NEXT: [[TMP4:%.*]] = load <4 x i64>, ptr [[TMP2]], align 32
; CHECK-NEXT: store <4 x i64> [[TMP4]], ptr [[ARG]], align 2
; CHECK-NEXT: ret void
Expand Down
Loading
Loading