Skip to content

Commit abd8cd9

Browse files
committed
[CUDA][HIP] Fix linkage for -fgpu-rdc
Currently for explicit template function instantiation in CUDA/HIP device compilation clang emits instantiated kernel with external linkage and instantiated device function with internal linkage. This is fine for -fno-gpu-rdc since there is only one TU. However this causes duplicate symbols for kernels for -fgpu-rdc if the same instantiation happen in multiple TU. Or missing symbols if a device function calls an explicitly instantiated template function in a different TU. To make explicit template function instantiation work for -fgpu-rdc we need to follow the C++ linkage paradigm, i.e. use weak_odr linkage. Differential Revision: https://reviews.llvm.org/D90311
1 parent c009d11 commit abd8cd9

File tree

2 files changed

+26
-4
lines changed

2 files changed

+26
-4
lines changed

clang/lib/CodeGen/CodeGenModule.cpp

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4483,13 +4483,16 @@ llvm::GlobalValue::LinkageTypes CodeGenModule::getLLVMLinkageForDeclarator(
44834483
// and must all be equivalent. However, we are not allowed to
44844484
// throw away these explicit instantiations.
44854485
//
4486-
// We don't currently support CUDA device code spread out across multiple TUs,
4486+
// CUDA/HIP: For -fno-gpu-rdc case, device code is limited to one TU,
44874487
// so say that CUDA templates are either external (for kernels) or internal.
4488-
// This lets llvm perform aggressive inter-procedural optimizations.
4488+
// This lets llvm perform aggressive inter-procedural optimizations. For
4489+
// -fgpu-rdc case, device function calls across multiple TU's are allowed,
4490+
// therefore we need to follow the normal linkage paradigm.
44894491
if (Linkage == GVA_StrongODR) {
4490-
if (Context.getLangOpts().AppleKext)
4492+
if (getLangOpts().AppleKext)
44914493
return llvm::Function::ExternalLinkage;
4492-
if (Context.getLangOpts().CUDA && Context.getLangOpts().CUDAIsDevice)
4494+
if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice &&
4495+
!getLangOpts().GPURelocatableDeviceCode)
44934496
return D->hasAttr<CUDAGlobalAttr>() ? llvm::Function::ExternalLinkage
44944497
: llvm::Function::InternalLinkage;
44954498
return llvm::Function::WeakODRLinkage;
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
// RUN: %clang_cc1 -triple nvptx -fcuda-is-device \
2+
// RUN: -emit-llvm -o - %s \
3+
// RUN: | FileCheck -check-prefix=NORDC %s
4+
// RUN: %clang_cc1 -triple nvptx -fcuda-is-device \
5+
// RUN: -fgpu-rdc -emit-llvm -o - %s \
6+
// RUN: | FileCheck -check-prefix=RDC %s
7+
8+
#include "Inputs/cuda.h"
9+
10+
// NORDC: define internal void @_Z4funcIiEvv()
11+
// NORDC: define void @_Z6kernelIiEvv()
12+
// RDC: define weak_odr void @_Z4funcIiEvv()
13+
// RDC: define weak_odr void @_Z6kernelIiEvv()
14+
15+
template <typename T> __device__ void func() {}
16+
template <typename T> __global__ void kernel() {}
17+
18+
template __device__ void func<int>();
19+
template __global__ void kernel<int>();

0 commit comments

Comments
 (0)