|
| 1 | +# SYCL Runtime Compilation |
| 2 | + |
| 3 | +SYCL-RTC means using the |
| 4 | +[`kernel_compiler`](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc) |
| 5 | +extension to wrap a SYCL source string comprised of kernel definitions in the |
| 6 | +[free-function syntax](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_free_function_kernels.asciidoc) |
| 7 | +into a `kernel_bundle` in the `ext_oneapi_source` state, which is then compiled |
| 8 | +into `exectuable` state by the extension's `build(...)` function. The feature is |
| 9 | +backed by an implementation inside the `sycl-jit` library, which exposes the |
| 10 | +modular, LLVM-based compiler tech behind DPC++ to be called by the SYCL runtime. |
| 11 | +This document gives an overview of the design. |
| 12 | + |
| 13 | +```c++ |
| 14 | +#include <sycl/sycl.hpp> |
| 15 | +namespace syclexp = sycl::ext::oneapi::experimental; |
| 16 | + |
| 17 | +// ... |
| 18 | + |
| 19 | +std::string sycl_source = R"""( |
| 20 | + #include <sycl/sycl.hpp> |
| 21 | + |
| 22 | + extern "C" SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(( |
| 23 | + sycl::ext::oneapi::experimental::nd_range_kernel<1>)) |
| 24 | + void vec_add(float* in1, float* in2, float* out){ |
| 25 | + size_t id = sycl::ext::oneapi::this_work_item::get_nd_item<1>() |
| 26 | + .get_global_linear_id(); |
| 27 | + out[id] = in1[id] + in2[id]; |
| 28 | + } |
| 29 | +)"""; |
| 30 | +
|
| 31 | +sycl::queue q; |
| 32 | +
|
| 33 | +auto source_bundle = syclexp::create_kernel_bundle_from_source( |
| 34 | + q.get_context(), syclexp::source_language::sycl, sycl_source); |
| 35 | +
|
| 36 | +// This design document explains what happens on the next line. |
| 37 | +auto exec_bundle = syclexp::build(source_bundle); |
| 38 | +``` |
| 39 | +
|
| 40 | +## File-and-process-based prototype |
| 41 | +
|
| 42 | +The |
| 43 | +[first implementation](https://github.com/intel/llvm/blob/03cb2b25026f060149eb94c85b228e5b3a780588/sycl/source/detail/kernel_compiler/kernel_compiler_sycl.cpp#L254) |
| 44 | +of the `build(...)` function wrote the source string into a temporary file, |
| 45 | +invoked DPC++ on it with the `-fsycl-dump-device-code` flag to dump the device |
| 46 | +code to another file in SPIR-V format, and finally loaded that file back into |
| 47 | +the runtime, from where it was executed. |
| 48 | +
|
| 49 | +## The rationale for an in-memory compilation pipeline |
| 50 | +
|
| 51 | +Invoking the DPC++ executable as outlined in the previous section worked |
| 52 | +reasonably well to implement the basic `kernel_compiler` extension, but we |
| 53 | +observed several shortcomings: |
| 54 | +
|
| 55 | +- Functional completeness: Emitting a single SPIR-V file is sufficient for |
| 56 | + simple kernels, but more advanced device code may result in multiple *device |
| 57 | + images* comprised of SPIR-V binaries and accompanying metadata (*runtime |
| 58 | + properties*) that needs to be communicated to the runtime. |
| 59 | +- Robustness: Reading multiple dependent files from a temporary directory can be |
| 60 | + be fragile. |
| 61 | +- Performance: Multiple processes are launched by the compiler driver, and file |
| 62 | + I/O operations have a non-negligible overhead. The `-fsycl-dump-device-code` |
| 63 | + required the presence of a dummy `main()` to be added to the source string, |
| 64 | + and caused an unnecessary host compilation to be performed. |
| 65 | +- Security: Reading executable code from disk is a security concern, and users |
| 66 | + of an RTC-enabled application may be unaware that a compilation writing |
| 67 | + intermediate files is happening in the background. |
| 68 | +
|
| 69 | +These challenges ultimately motivated the design of the **in-memory compilation |
| 70 | +pipeline** based on the `sycl-jit` library which is now the default approach in |
| 71 | +DPC++ and the oneAPI product distribution since the 2025.2 release. This new |
| 72 | +approach leverages **modular compiler technology** to produce a faster, more |
| 73 | +feature-rich, more robust and safer implementation of the `kernel_compiler` |
| 74 | +extension. |
| 75 | +
|
| 76 | +The individual steps in the pipeline (frontend, device library linking, |
| 77 | +`sycl-post-link` and target format translation) are now invoked programmatically |
| 78 | +via an API inside the same process, and intermediate results are passed along as |
| 79 | +objects in memory. The code can be found in the |
| 80 | +[`compileSYCL(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/RTC.cpp#L57) |
| 81 | +function. |
| 82 | + |
| 83 | +## Using the LibTooling API to compile the source string to an `llvm::Module` |
| 84 | +
|
| 85 | +[LibTooling](https://clang.llvm.org/docs/LibTooling.html) is a high-level API to |
| 86 | +write standalone tools based on Clang, such as linters, refactoring tools or |
| 87 | +static analysers. To use it, one defines a *tool action* to run on a set of |
| 88 | +files in a *virtual filesystem overlay*, which the frontend then processes |
| 89 | +according to a *compilation command database*. |
| 90 | +
|
| 91 | +For SYCL-RTC, the filesystem overlay is populated with files containing the |
| 92 | +source string and any virtual `include_files` (defined via the homonymous |
| 93 | +property). The compilation command is static and puts the frontend into |
| 94 | +`-fsycl-device-only` mode. Any user-given options (from the `build_options` |
| 95 | +property) are appended. Lastly, the implementation defines a custom tool action |
| 96 | +which runs the frontend until LLVM codegen, and then obtains ownership of the |
| 97 | +LLVM module. |
| 98 | +
|
| 99 | +This might be a slightly unusual way to use of LibTooling, but we found it works |
| 100 | +great for SYCL-RTC. The next sections explain the |
| 101 | +[`jit_compiler::compileDeviceCode(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L418) |
| 102 | +function in more detail. |
| 103 | +
|
| 104 | +### Step 1: Determine the path of the compiler installation |
| 105 | +
|
| 106 | +To set up up working frontend invocation, we need to know where to find |
| 107 | +supplemental files such as the SYCL headers. Normally, these paths are |
| 108 | +determined relative to the compiler executable, however in our case, the |
| 109 | +executable is actually the RTC-enabled application, which can reside in an |
| 110 | +arbitrary location. Instead, we use OS-specific logic inside |
| 111 | +[`getDPCPPRoot()`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L112) |
| 112 | +to determine the location of the shared library `sycl-jit.so` (or `.dll` on |
| 113 | +Windows) which contains the SYCL-RTC implementation. From its location, we can |
| 114 | +derive the compiler installation's root directory. |
| 115 | +
|
| 116 | +### Step 2: Collect command-line arguments |
| 117 | +
|
| 118 | +The next step is to collect the command-line arguments for the frontend |
| 119 | +invocation. The |
| 120 | +[`adjustArgs(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L320) |
| 121 | +function relies on Clang's option handling infrastructure to set the required |
| 122 | +options to enter the device compilation mode (`-fsycl-device-only`), set up the |
| 123 | +compiler environment, and select the target. Finally, any user-specified |
| 124 | +arguments passed via the |
| 125 | +[`build_options`](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc#new-properties-for-the-build-and-compile-functions) |
| 126 | +property are appended to the list of command-line arguments. |
| 127 | +
|
| 128 | +### Step 3: Configure the `ClangTool` |
| 129 | +
|
| 130 | +Once we know the required command-line arguments, we can set up the compilation |
| 131 | +command database and an |
| 132 | +[instance](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L433) |
| 133 | +of the |
| 134 | +[`ClangTool`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/Tooling/Tooling.h#L317) |
| 135 | +class, which provides the entry point to the LibTooling interface. As we'll be |
| 136 | +translating only a single file containing the source string, we construct a |
| 137 | +[`FixedCompilationDatabase`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/Tooling/CompilationDatabase.h#L154) |
| 138 | +relative to the current working directory. |
| 139 | +
|
| 140 | +To implement the `kernel_compiler` extension cleanly, we need to capture all |
| 141 | +output (e.g. warnings and errors) from the frontend. The |
| 142 | +[`ClangDiagnosticsWrapper`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L274) |
| 143 | +class configures a |
| 144 | +[`TextDiagnosticsPrinter`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/Frontend/TextDiagnosticPrinter.h#L27) |
| 145 | +to append all messages to a string maintained by our implementation to collect |
| 146 | +all output produced during the runtime compilation. |
| 147 | +
|
| 148 | +The configuration of the `ClangTool` instance continues in the |
| 149 | +[`setupTool`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L353) |
| 150 | +function. First, we redirect all output to our diagnostics wrapper. Then, we |
| 151 | +[set up the overlay |
| 152 | +filesystem](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L361-L364) |
| 153 | +with a file named `rtc_<n>.cpp` (*n* is incremented for each use of the |
| 154 | +`kernel_compiler` extension's `build(...)` function) in the current directory |
| 155 | +with the contents of the source string. Each of the virtual header files that |
| 156 | +the application defined via the |
| 157 | +[`include_files`](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_compiler.asciidoc#new-properties-for-the-create_kernel_bundle_from_source-function) |
| 158 | +property becomes also a file in the overlay filesystem, using the path specified |
| 159 | +in the property. |
| 160 | +
|
| 161 | +The `ClangTool` class exposes so-called argument adjusters, which are intended |
| 162 | +to modify the command-line arguments coming from the compilation command |
| 163 | +database. We have to |
| 164 | +[clear](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L368) |
| 165 | +the default adjusters defined by the class, because one of them injects the |
| 166 | +`-fsyntax-only` flag, which would conflict with the `-fsycl-device-only` flag we |
| 167 | +need for SYCL-RTC. Finally, we |
| 168 | +[add](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L371) |
| 169 | +an argument adjuster ourselves to overwrite the name of executable in the |
| 170 | +invocation. Again, this is to help the correct detection of the environment, by |
| 171 | +making the invocation as similar as possible to a normal use of DPC++. |
| 172 | +
|
| 173 | +### Step 4: Run an action |
| 174 | +
|
| 175 | +The last step is to define a |
| 176 | +[`ToolAction`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/Tooling/Tooling.h#L80) |
| 177 | +to be executed on the source files. Clang conveniently provides the |
| 178 | +[`EmitLLVMAction`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/CodeGen/CodeGenAction.h#L103), |
| 179 | +which runs the frontend up until the LLVM IR code generation, which is exactly |
| 180 | +what we need. However, LibTooling does not provides a helper to wrap it in a |
| 181 | +`ToolAction`, so we need to define and run our own |
| 182 | +[`GetLLVMModuleAction`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L241). |
| 183 | +
|
| 184 | +We extracted common boilerplate code to configure a |
| 185 | +[`CompilerInstance`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/Frontend/CompilerInstance.h#L81) |
| 186 | +in the |
| 187 | +[`RTCActionBase`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L176) |
| 188 | +class. Inside the `GetLLVMModuleAction`, we instantiate and execute the |
| 189 | +aforementioned `EmitLLVMAction`, and, in case the translation was successful, |
| 190 | +[obtains ownership](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L255) |
| 191 | +of the constructed `llvm::Module` from it. |
| 192 | +
|
| 193 | +Finally, the call to |
| 194 | +[`Action.takeModule()`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L442) |
| 195 | +transfers ownership again to the caller of `compileDeviceCode`. Note that this |
| 196 | +simple mechanism works because we know that there is only a single compilation |
| 197 | +happening for every instance of the `ClangTool` and hence our |
| 198 | +`GetLLVMModuleAction` class. |
| 199 | +
|
| 200 | +## Caching |
| 201 | +
|
| 202 | +The implementation optionally uses the runtime's |
| 203 | +[persistent cache](https://intel.github.io/llvm/design/KernelProgramCache.html#persistent-cache) |
| 204 | +to elide recurring invocations of the frontend, which we observed to be the most |
| 205 | +expensive (in terms of runtime overhead) phase of our compilation pipeline. |
| 206 | +
|
| 207 | +### Overall design |
| 208 | +
|
| 209 | +We cache only the frontend invocation, meaning that after a successful |
| 210 | +translation, we store the LLVM IR module obtained via LibTooling on disk in the |
| 211 | +Bitcode format using built-in utilities. In case of a cache hit in a later |
| 212 | +runtime compilation, we load the module from disk and feed it into the device |
| 213 | +linking phase. The rationale for this design was that were no utilities to save |
| 214 | +and restore the linked and post-processed device images to disk at the time (the |
| 215 | +[SYCLBIN](https://intel.github.io/llvm/design/SYCLBINDesign.html) infrastructure |
| 216 | +was added later), and caching these steps would have resulted only in marginal |
| 217 | +further runtime savings. |
| 218 | +
|
| 219 | +### Cache key considerations |
| 220 | +
|
| 221 | +The main challenge is to define a robust cache key. Because code compiled via |
| 222 | +SYCL-RTC can `#include` header files defined via the `include_files` property as |
| 223 | +well as from the filesystem, e.g. `sycl.hpp` from the DPC++ installation or user |
| 224 | +libraries, it is not sufficient to look only at the source string. In order to |
| 225 | +make the cache as conservative as possible (cache collisions are unlikely but |
| 226 | +mathematically possible), we decided to compute a hash value of the |
| 227 | +*preprocessed* source string, i.e. with all `#include` directives resolved. We |
| 228 | +additionally compute a hash value of the rendered command-line arguments, and |
| 229 | +append it to the hash of the preprocessed source to obtain the final cache key. |
| 230 | +
|
| 231 | +### Implementation notes |
| 232 | +
|
| 233 | +The cache key computation is implemented in the |
| 234 | +[`jit_compiler::calculateHash(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L381) |
| 235 | +function. We are again relying on LibTooling to invoke the preprocessor - |
| 236 | +handily, Clang provides a |
| 237 | +[`PreprocessorFrontendAction`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/include/clang/Frontend/FrontendAction.h#L294) |
| 238 | +that we extend to tailor to our use-case. We choose |
| 239 | +[BLAKE3](https://en.wikipedia.org/wiki/BLAKE_(hash_function)) as the hash |
| 240 | +algorithm because its proven in similar contexts (most notably, |
| 241 | +[ccache](https://ccache.dev)) and available as a utility in the LLVM ecosystem. |
| 242 | +As the output is a byte array, we apply Base64 encoding to obtain a character |
| 243 | +string for use with the persistent cache. |
| 244 | +
|
| 245 | +## Device library linking and SYCL-specific transformations |
| 246 | +
|
| 247 | +With an LLVM IR module in hand, obtained either from the frontend or the cache, |
| 248 | +the next steps in the compilation pipeline are simple. |
| 249 | +
|
| 250 | +The device library linking is done by the |
| 251 | +[`jit_compiler::linkDeviceLibraries(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L566) |
| 252 | +function. These libraries provide primitives for a variety of extra |
| 253 | +functionality, such as an extended set of math functions and support for |
| 254 | +`bfloat16` arithmetic, and are available as Bitcode files inside the DPC++ |
| 255 | +installation or the vendor toolchain, so we just use LLVM utilities to load them |
| 256 | +into memory and link them to the module representing the runtime-compiled |
| 257 | +kernels. The main challenge here is that the logic to select the device |
| 258 | +libraries is currently not reusable from its implementation in the driver, so |
| 259 | +our implementation is a simplified copy of the |
| 260 | +[`SYCL::getDeviceLibraries(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/clang/lib/Driver/ToolChains/SYCL.cpp#L553) |
| 261 | +method, which needs to be kept in sync with the driver code. |
| 262 | +
|
| 263 | +For the SYCL-specific post-processing, implemented in |
| 264 | +[`jit_compiler::performPostLink(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp#L750), |
| 265 | +we can reuse modular analysis and transformation passes in the |
| 266 | +[`SYCLLowerIR`](https://github.com/intel/llvm/tree/sycl/llvm/lib/SYCLLowerIR) |
| 267 | +component. The main tasks for the post-processing passes is to split the device |
| 268 | +code module into smaller units (either as requested by the user, or required by |
| 269 | +the ESIMD mode), and to compute the properties that need to be passed to the |
| 270 | +SYCL runtime when the device images are loaded. The logic to orchestrate the |
| 271 | +`SYCLLowerIR` passes is adapted from the `sycl-post-link` tool's |
| 272 | +[`processInputModule(...)`](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/llvm/tools/sycl-post-link/sycl-post-link.cpp#L606) |
| 273 | +function. This duplicated code should be removed as well once a suitable |
| 274 | +reusable implementation becomes available. |
| 275 | +
|
| 276 | +
|
| 277 | +## Translation to the target format |
| 278 | +
|
| 279 | +The |
| 280 | +[final phase](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/rtc/RTC.cpp#L139) |
| 281 | +in the pipeline is to translate the LLVM IR modules resulting from the previous |
| 282 | +phase into a device-specific target format that can be handled by the runtime. |
| 283 | +For Intel CPUs and GPUs, that's binary SPIR-V. For AMD and NVIDIA GPUs, we emit |
| 284 | +AMDGCN and PTX assembly, respectively. Over time, we created our own set of |
| 285 | +[utilities](https://github.com/intel/llvm/blob/cc966df07d29db75d07f969f044c0491819bd930/sycl-jit/jit-compiler/lib/translation/Translation.h) |
| 286 | +to facilitate the translation. Internally, we dispatch the task to either the |
| 287 | +SPIR-V translator (a copy of which is maintained inside the DPC++ repository), |
| 288 | +or use vendor-specific backends that are part of LLVM to generate the |
| 289 | +third-party GPU code. |
| 290 | +
|
| 291 | +## Third-party hardware support |
| 292 | +
|
| 293 | +SYCL-RTC works for AMD and NVIDIA GPUs, too. The usage of the `kernel_compiler` |
| 294 | +extension remains the same for SYCL devices representing such a third-party GPU. |
| 295 | +The concrete GPU architecture is queried via the environment variable |
| 296 | +`SYCL_JIT_AMDGCN_PTX_TARGET_CPU` when executing the RTC-enabled application. For |
| 297 | +AMD GPUs, it is **mandatory** to set it. For NVIDIA GPUs, it is highly |
| 298 | +recommended to change it from the conservative default architecture (`sm_50`). |
| 299 | +
|
| 300 | +```shell |
| 301 | +$ clang++ -fsycl myapp.cpp -o myapp |
| 302 | +$ SYCL_JIT_AMDGCN_PTX_TARGET_CPU=sm_90 ./myapp |
| 303 | +``` |
| 304 | +
|
| 305 | +A list of values that can be set as the target CPU can be found in the |
| 306 | +[documentation of the `-fsycl-targets=` |
| 307 | +option](https://intel.github.io/llvm/UsersManual.html#generic-options) (leave |
| 308 | +out the `amd_gpu_` and `nvidia_gpu_` prefixes). |
| 309 | +
|
| 310 | +## Further reading |
| 311 | +
|
| 312 | +- Technical presentation at IWOCL 2025: *Fast In-Memory Runtime Compilation of |
| 313 | + SYCL Code*: |
| 314 | + [Slides](https://www.iwocl.org/wp-content/uploads/iwocl-2025-julian-oppermann-fast-in-memory-runtime.pdf) |
| 315 | + [Video Recording](https://youtu.be/X9mS8xetZJY) |
| 316 | +- Blog post: |
| 317 | + [*SYCL Runtime Compilation: A New Way to Specialise Kernels Using C++ Metaprogramming*](https://codeplay.com/portal/blogs/2025/07/08/sycl-runtime-compilation) |
0 commit comments