diff --git a/design-documents/tlsdesc-resolvers.rst b/design-documents/tlsdesc-resolvers.rst new file mode 100644 index 0000000..b96d4f9 --- /dev/null +++ b/design-documents/tlsdesc-resolvers.rst @@ -0,0 +1,137 @@ +.. + Copyright (c) 2023-2025, Arm Limited and its affiliates. All rights reserved. + CC-BY-SA-4.0 AND Apache-Patent-License + See LICENSE file for details + +.. _SYSVABI64: https://github.com/ARM-software/abi-aa/tree/main/sysvabi64/sysvabi64.rst +.. _TLSDESC: http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-ARM.txt + +Thread Local Storage TLSDESC resolver functions +*********************************************** + +Preamble +======== + +Background +---------- + +The ``R_AARCH64_TLSDESC`` dynamic relocation is platform specific. The +dynamic loader is expected to choose an appropriate resolver function +for the context. This document provides some example resolver +functions. + +These examples are for illustrative purposes only. There is no +requirement for any of the following resolver functions to be +implemented. + +The ABI requirements for calling convention of resolver functions is +described in `SYSVABI64`_. + +Example Resolver Functions +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Due to the restrictions on calling convention, the +resolver routines must be written in assembly language. + +Static TLS Specialization: + +When the TLS variable is in the static TLS block, the offset from the +thread pointer is fixed at runtime. The dynamic loader can calculate +the offset and place it in the TLS descriptor. All the static TLS +resolver function needs to do is extract the offset and return it. + +.. code-block:: asm + + _dl_tlsdesc_return: + // x0 contains pointer to struct tlsdesc. + // tlsdesc.argument.value contains offset of variable from TP + ldr x0, [x0, #8] + ret + +Dynamic TLS Specialization: + +When the TLS variable is defined in dynamic TLS the address of the TLS +variable must be calculated by the resolver function using +``__tls_get_addr``. The resolver function returns the offset from the +thread pointer by subtracting the address of the thread pointer from +the address of the TLS variable. In practice an implementation of the +dynamic TLS resolver contains many platform specific details outside +of the scope of the ABI. An example of how a dynamic resolver might be +implemented can be found in the Dynamic Specialization section of +TLSDESC_. + +Undefined Weak Symbols + +An undefined weak symbol has the value 0. As the resolver function +returns an offset from the Thread Pointer, to get a value of 0 when +added to the Thread Pointer the resolver function returns a negative +thread pointer value that cancels to 0 when added to the thread +pointer. + +.. code-block:: asm + + __dl_tlsdesc_undefweak: + mrs x0, tpidr_el0 + neg x0, x0 + ret + +Lazy resolution of R_AARCH64_TLSDESC +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The TLSDESC_ paper describes an optional mechanism to resolve TLSDESC +calls lazily. Lazy resolution for TLSDESC resolver functions is not +recommended on AArch64. Additional synchronization is required for +each TLSDESC call, which has a significant affect on performance. The +description below describes the additional synchronization that is +needed. + +Instead of fully resolving the ``R_AARCH64_TLSDESC`` relocation at +module load time, a lazy resolver function runs on the first TLSDESC +call. The lazy resolver updates the TLS Descriptor with the actual +resolver function and the parameter to the actual resolver +function. In a multi-threaded program when lazy TLS in use, the +resolver functions must ensure that the write to the parameter in the +TLS descriptor has completed before reading it. + +.. code-block:: asm + + // Code to obtain the offset of var from thread pointer. + // Loads the address of the resolver function into x1. + // Places the address of the TLS Descriptor into x0. + adrp x0, :tlsdesc:var + ldr x1, [x0, #:tlsdesc_lo12:var] + add x0, x0, #:tlsdesc_lo12:var] + .tlsdesccall var + blr x1 // _dl_desc_return + + // Resolver function + _dl_tlsdesc_return: + // load the parameter from the TLS descriptor. Without + // synchronization this load can read an old value prior + // to the lazy resolvers update to the descriptor completing. + ldr x0, [x0, #8] + ret + +The recommended way to ensure synchronization between the lazy +resolver update of the TLS Descriptor and the actual resolver function +accessing the TLS Descriptor is: + +* The TLS lazy resolver function uses a store release when updating + the address of the resolver function in the TLS Descriptor. + +* The actual entry function uses a load acquire on the address of the + resolver function, with a destination register of xzr. + +Referring to the example above, the code for the resolver function +becomes: + +.. code-block:: asm + + // Resolver function + _dl_tlsdesc_return: + // Guaranteed to complete after the lazy resolvers store release + // of the address in [x0]. + ldar xzr, [x0] + // Access the parameter. + ldr x0, [x0, #8] + ret diff --git a/sysvabi64/sysvabi64-tls.svg b/sysvabi64/sysvabi64-tls.svg new file mode 100644 index 0000000..d96ff87 --- /dev/null +++ b/sysvabi64/sysvabi64-tls.svg @@ -0,0 +1,283 @@ + +image/svg+xmlTCBTLSTLS...TLSnnndtvdtv...dtvdtvTLSIndex:1...NN + 1Component index N stored in GOTPC-relative reference to GOTOffset in TLSof importedvariablestored inGOTDatasegmentGOT ...DSOTextsegmentoffset 1offset Ntp +1ND1ND + \ No newline at end of file diff --git a/sysvabi64/sysvabi64.rst b/sysvabi64/sysvabi64.rst index 7e5da71..6fe9ef0 100644 --- a/sysvabi64/sysvabi64.rst +++ b/sysvabi64/sysvabi64.rst @@ -14,6 +14,7 @@ .. _AAELF64: https://github.com/ARM-software/abi-aa/releases .. _CPPABI64: https://developer.arm.com/docs/ihi0059/latest .. _GCABI: https://itanium-cxx-abi.github.io/cxx-abi/abi.html +.. _GCCML: https://gcc.gnu.org/legacy-ml/gcc/2018-10/msg00112.html .. _LINUX_ABI: https://github.com/hjl-tools/linux-abi/wiki .. _MemTagABIELF64: https://github.com/ARM-software/abi-aa/releases .. _PAuthABIELF64: https://github.com/ARM-software/abi-aa/releases @@ -22,7 +23,15 @@ .. _SCO-ELF: http://www.sco.com/developers/gabi .. _SYM-VER: http://www.akkadia.org/drepper/symbol-versioning .. _SYSVABI: https://github.com/ARM-software/abi-aa/releases -.. _TLSDESC: http://www.fsfla.org/~lxoliva/writeups/TLS/paper-lk2006.pdf +.. _ELFTLS: https://www.uclibc.org/docs/tls.pdf +.. _TLSDESC: http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-ARM.txt +.. _TLSDESCRES: https://github.com/ARM-software/abi-aa/tree/main/design-documents/tlsdesc-resolvers.txt + +.. role:: c(code) + :language: c + +.. role:: cpp(code) + :language: cpp System V ABI for the Arm® 64-bit Architecture (AArch64) ******************************************************* @@ -216,6 +225,7 @@ Change History | 2025Q2 | 20\ :sup:`th` June 2024 | Require that ``PT_GNU_PROPERTY`` program header be | | | | present in executables and shared-libraries if a | | | | .note.gnu.property section is present. | + | | | - Added chapter on Thread Local Storage (TLS) | +------------+------------------------------+-------------------------------------------------------+ References @@ -240,6 +250,8 @@ This document refers to, or is referred to by, the following documents. +-----------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ | GCABI_ | https://itanium-cxx-abi.github.io/cxx-abi/abi.html | Generic C++ ABI | +-----------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | GCCML_ | https://gcc.gnu.org/legacy-ml/gcc/2018-10/msg00112.html | GCC Mailing list topic TLSDESC clobber ABI stability/futureproofness? | + +-----------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ | HWCAP_ | https://www.kernel.org/doc/html/latest/arm64/elf_hwcaps.html | Linux Kernel HWCAPs interface | +-----------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ | LINUX_ABI_ | https://github.com/hjl-tools/linux-abi/wiki | Linux Extensions to gABI | @@ -254,6 +266,8 @@ This document refers to, or is referred to by, the following documents. +-----------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ | SYM-VER_ | http://people.redhat.com/drepper/symbol-versioning | GNU Symbol Versioning | +-----------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ + | TLSDESCRES_ | design-documents/tlsdesc-resolvers | TLSDESC resolver function examples | + +-----------------+--------------------------------------------------------------+-----------------------------------------------------------------------------+ Terms and Abbreviations ----------------------- @@ -620,6 +634,8 @@ syntax is of the form ``#::`` +-----------------------+-------------+---------------------------------------+ | ``gottprel`` | ``adrp`` | R_AARCH64_TLSIE_ADR_GOTTPREL_PAGE21 | +-----------------------+-------------+---------------------------------------+ + | ``gottprel`` | ``ldr`` | R_AARCH64_TLSIE_LD_GOTTPREL_PREL19 | + +-----------------------+-------------+---------------------------------------+ | ``gottprel_lo12`` | ``ldr`` | R_AARCH64_TLSIE_LD64_GOTTPREL_LO12_NC | +-----------------------+-------------+---------------------------------------+ | ``tprel`` | ``add`` | R_AARCH64_TLSLE_ADD_TPREL_LO12 | @@ -1890,6 +1906,577 @@ See `MemTagABIELF64`_ and `PAuthABIELF64`_ for details of reserved tags. PageBreak oneColumn +Thread Local Storage +==================== + +Introduction to thread local storage +------------------------------------ + +Thread Local Storage (TLS) is a class of own data (static storage) that – +like the stack – is instanced once for each thread of execution. It fits +into the abstract storage hierarchy as follows. + +* (Most global) Program-own data (static and extern variables, instanced + once per program/process). + +* Thread local storage (variables instanced once per thread, shared between + all accessing function activations). + +* (Most local) Automatic data (stack variables, instanced once per function + activation, per thread). + +Rules governing thread local storage on AArch64 +----------------------------------------------- + +* How to denote TLS in source programs: + + C++11 and C11 use :c:`thread_local T t...`; A GCC extension uses + :c:`__thread T t...`; this is Q-o-I. + +* How to represent the initializing images of TLS in object files, and how + to define symbols in TLS: + + The rules for ELF are well established (see ``SHF_TLS``, ``STT_TLS`` in + SCO-ELF_). + +* How a loader or run-time system creates instances of TLS per-thread at + execution time: + + This is part of ABI for the platform or execution environment. + +This document and AAELF64_ are concerned with: + +* How to relocate, statically and dynamically, with respect to symbols + defined in TLS (for details of relocations relevant to AArch64 Linux see + AAELF64_). + +* How code must address variables allocated in TLS (the subject of the + notes below). + +Introduction to TLS addressing +------------------------------ + +This section covers only the definitions required to understand the +AArch64 specific details. A more in-depth description to TLS +addressing in general can be found in ELFTLS_. + +In the most general form, a program is constructed dynamically from an +executable and a number of shared libraries. Each component, +(executable or shared library) can be mapped into multiple +processes. Additionally a shared library can be loaded dynamically by +a program, rather than being part of the initial process image +constructed when the program is first loaded. + +For the purpose of addressing TLS, components of an application, +referred to as modules, are identified using indexes. The module index +for the executable is always 1, but the module indexes for shared +libraries are allocated at process start time, or when a shared +library is loaded dynamically via dlopen. A shared library may have a +different module index in two different processes so its per-thread +module index must be part of its process state (or be queried +dynamically). The run-time system is responsible for maintaining a +per-thread vector of pointers to allocated TLS regions indexed by +these module indexes. + +There is a system resource called the Thread Pointer (TP) that +points to a Thread Control Block (TCB) for the currently +executing thread which, in turn, points to the Dynamic Thread Vector +(DTV) for that thread. + +.. raw:: pdf + + PageBreak oneColumn + +SystemV AArch64 TLS addressing architecture +------------------------------------------- + +The figure below depicts the fundamental components of the TLS +addressing architecture used by SystemV for AArch64. + +.. _SystemV AArch64 TLS addressing architecture: + +.. figure:: sysvabi64-tls.svg + + SystemV AArch64 TLS addressing architecture + +The TLS data for a module is called the TLS Block. + +The thread pointer points directly to the Thread Control Block (TCB). + +The size of the TCB is 16-bytes, where the first 8 bytes contain the +pointer to the Dynamic Thread Vector (DTV), and the other 8 bytes are +reserved for the implementation. + +Following the TCB and any required alignment padding (defined in +`SystemV AArch64 TLS addressing`_), the TLS Blocks of the modules +loaded at process start form the static TLS Block. The memory for the +TLS Block is allocated at process start time. + +The TLS Blocks for modules loaded dynamically via dlopen are known as +dynamic TLS. + +Index N, where N > 0, of the Dynamic Thread Vector DTV[N] is a pointer +to the TLS block for module N. + +To calculate the address of a TLS variable in any given module, static +or dynamic, the expression ``TP[0][Module id][offset in module]`` can +be used. The function ``__tls_get_addr(module_id, offset)`` returns +the result of this calculation. + +Index 0 of the Dynamic Thread Vector DTV[0] is reserved for use by the +platform. It is typically used to store the thread's generation +counter. In an implementation that supports deferred allocation of +TLS, a global generation number is incremented whenever the number of +dynamic modules changes due to ``dlopen`` or ``dlclose``. In the +``__tls_get_addr(module_id, offset)`` function, if the thread's +generation count is less than the global generation number, the +thread's DTV is updated, and the TLS for the ``module_id`` is +allocated if it is not present. + +In pseudo code + +.. code-block:: c + + /* tls_get_addr with deferred allocation */ + void * __tls_get_addr(size_t module_id, size_t offset) + { + dtv = get_thread_dtv(); + + if (dtv[0].generation_counter != global_generation_number) + /* includes setting the thread's generation counter to + the global_generation_number */ + update_thread_dtv(); + + if (dtv[module_id] == unallocated) + allocate_tls(dtv, module_id); + + return dtv[module_id][offset]; + } + +The calculation in __tls_get_addr is the most general and it can be +applied to both static and dynamic TLS. There are four defined models +of accessing TLS that trade off generality for performance. In order +of descending generality: + + 1. General Dynamic, also known as Global Dynamic, can be used anywhere. + + 2. Local Dynamic, can be used anywhere where the definition of the + TLS variable and the access are from the same module. + + 3. Initial Exec, can be used for TLS variables defined in the + static TLS block. + + 4. Local Exec, can be used in the executable for TLS variables + defined in the executables static TLS block. + +SystemV AArch64 TLS addressing +------------------------------ + +AArch64 TLS SystemV design choices: + +* AArch64 uses variant 1 TLS as described in ELFTLS_. + +* There are two dialects of TLS supported by the relocations defined + in AAELF64_, the traditional dialect described by ELFTLS_ and the + descriptor dialect described by TLSDESC_. This document describes + only the descriptor dialect as this is the default dialect for GCC + and the only dialect supported by clang. + +* The thread pointer (TP) is always accessible via the ``TPIDR_EL0`` + system register. This can be accessed via inlining an ``mrs`` + instruction to read the thread pointer. + +* The compiler can generate code that supports a TLS block size of 4 + KiB, 16 MiB, 4GiB or 16EiB, depending on the addressing mode. The + default is 16 MiB for all addressing modes. + +* The TLS for an executable or shared-library is described by the + ``PT_TLS`` program header. + +Recall from the diagram in `SystemV AArch64 TLS addressing +architecture`_ that the Thread Pointer ``TP`` points to the start of +the ``TCB``, which is followed by 0 or more bytes of alignment +padding, then the executable's TLS block. + +The ``TP``, and hence the start of the ``TCB`` must be aligned to a +``PT_TLS.p_align`` boundary. This can be expressed as ``TP ≡ 0 (modulo +PT_TLS.p_align)`` where ``≡`` means congruent to. + +The static and dynamic linker must agree on the size of the padding +(``PADsize``) between the TCB and the executable's TLS Block. Using +``TCBsize`` as the size of the TCB (16 bytes), the following expression can be used to calcluate ``PADsize`` from the ``PT_TLS`` program header. + +``PADsize = (PT_TLS.p_vaddr - TCBsize) mod PT_TLS.p_align``. + +A number of dynamic linkers use a different calculation that requires +``PT_TLS.p_vaddr ≡ 0 (modulo PT_TLS.p_align)`` to correctly align the +executables TLS block. In this case the expression above simplifies to +``PADsize = Max(0, PT_TLS.p_align - TCBsize``). For maximum +compatibility, static linkers and any linker scripts including TLS, +are recommended to align the TLS block so that ``PT_TLS.p_vaddr ≡ 0 +(modulo p_align)``. This requires the start of the TLS to be aligned +to the maximum of the .tdata and .tbss sections. + +The expression for ``PADsize`` above can be derived from the +requirement that ``PADsize`` must be the smallest positive integer +that satisfies the following congruence: + +``TP`` + ``TCBsize + PADsize ≡ PT_TLS.p_vaddr (modulo PT_TLS.p_align)``. + +Using Integers modulo m where (``PT_TLS.p_align``). +``TP:sub:m + TCBsize:sub:m + PADsize:sub:m = PT_TLS.p_vaddr:sub:m`` + +As ``TP:sub:m`` is 0 as ``TP ≡ 0 (modulo PT_TLS.p_align)`` rearranging +we get: + +``PADsize:sub:m = PT_TLS.p_vaddr:sub:m - TCBsize:sub:m`` +which is equivalent to +``PADsize:sub:m = (PT_TLS.p_vaddr - TCBsize):sub:m``. + +TLS Descriptors +--------------- + +AArch64 uses the TLS Descriptor dialect for the general dynamic model. +The TLS Descriptor dialect permits a dynamic linker to use the +location and properties of the TLS symbol to select an optimal +resolver function. + +The static relocations with a prefix of ``R_AARCH64_TLSDESC_`` +targeting TLS symbol ``var``, instruct the static linker to create a +TLS Descriptor for ``var``. The TLS Descriptor for a variable is +stored in a pair of consecutive GOT entries, N and N + 1. The GOT +entry for N has a dynamic ``R_AARCH64_TLSDESC`` relocation targeting +the TLS symbol for ``var``. + +Code sequences for accessing TLS variables +------------------------------------------ + +The code sequences below assume the default TLS block size of 16 MiB, +this permits the Local Exec model to use of a pair of add instructions +with a combined 24-bit immediate field. Larger TLS sizes can be +supported by using a ``movz`` and one or more ``movk`` instructions to +construct an offset from the thread pointer in a register. + +A code model may use a sequence from a less restrictive code model. + +In the code-sequences below: + +* ``tp`` is a core register containing the thread pointer. + +* ``gp`` is a core register containing the base of the GOT. + +* ``xn`` is an arbitrary core register. Numbered core registers such + as ``x0`` and ``x1`` refer to the specific core register. + +* ``.tlsdesccall`` is an assembler directive that adds a + ``R_AARCH64_TLSDESC_CALL`` relocation to the next instruction. + +* ``.tlsdescldr`` is an assembler directive that adds a + ``R_AARCH64_TLSDESC_LDR`` relocation to the next instruction. + +* ``.tlsdescadd`` is an assembler directive that adds a + ``R_AARCH64_TLSDESC_ADD`` relocation to the next instruction. + +Relaxation is a term used by the TLS literature such as ELFTLS_ to +represent an optimization. AAELF64_ has used optimization for similar +link-time instruction sequence optimizations. This document will use +relaxation to be consistent with existing references. + +The static linker can relax a more general TLS model to a more +constrained TLS model when the TLS variables meet the requirements for +using the constrained model. The section `Static link time TLS +Relaxations`_ describes the details of the permitted relaxations. + +General Dynamic +^^^^^^^^^^^^^^^ + +General Dynamic is the most general form of accessing TLS. It supports +static and dynamic TLS. + +To permit static linker relaxation. The TLSDESC code sequences must be +emitted exactly as specified, with no other instruction breaking up +the sequence, with exactly the same registers used. + +The code sequences below return the offset of the TLS variable from +``tp`` in ``x0``. To get the address of the TLS variable requires +additional code to add ``x0`` to ``tp``, this is not part of the ABI +required TLSDESC code sequence. + +Small Code Model; + +.. code-block:: asm + + adrp x0, :tlsdesc:var // R_AARCH64_TLSDESC_ADR_PAGE21 var + ldr x1, [x0, #:tlsdesc_lo12:var] // R_AARCH64_TLSDESC_LD64_LO12 var + add x0, x0, #:tlsdesc_lo12:var] // R_AARCH64_TLSDESC_ADD_LO12 var + .tlsdesccall var + blr x1 // R_AARCH64_TLSDESC_CALL var + // offset of var from tp in x0 + +Tiny Code Model; + +.. code-block:: asm + + ldr x1, :tlsdesc:var // R_AARCH64_TLSDESC_LD_PREL19 var + adr x0, :tlsdesc:var // R_AARCH64_TLSDESC_ADR_PREL21 var + .tlsdesccall var + blr x1 // R_AARCH64_TLSDESC_CALL var + // offset of var from tp in x0 + +Large Code Model; + +.. code-block:: asm + + movz x0, #:tlsdesc_off_g1:var // R_AARCH64_TLSDESC_OFF_G1 var + movk x0, #:tlsdesc_off_g0_nc:var // R_AARCH64_TLSDESC_OFF_GO_NC var + .tlsdescldr var + ldr x1, [gp, x0] // R_AARCH64_TLSDESC_LDR var + .tlsdescadd var + add x0, gp, x0 // R_AARCH64_TLSDESC_ADD var + .tlsdesccall var + blr x1 // R_AARCH64_TLSDESC_CALL var + // offset of var from tp in x0 + +Local Dynamic +^^^^^^^^^^^^^ + +Local Dynamic is a special case of general dynamic where the compiler +knows that the TLS variable is defined in the same module as the code +that is accessing the variable. In this case the offset of the TLS +variable from the start of the module's TLS block is a static link +time constant, instead of dynamically calculating the offset of the +TLS variable from the thread pointer. The offset of the module's TLS +block from the thread pointer is calculated, then the offset of the +TLS variable within that block is added. This is more efficient than +general dynamic when more than one TLS variable from the same module +is accessed from the same function, but less efficient when accessing +a single TLS variable. + +The code sequence for local dynamic is the same as global dynamic and +like global dynamic must be emitted exactly as specified. There are no +specific relocations for Local Dynamic using the descriptor dialect. A +special symbol ``_TLS_MODULE_BASE_`` is used to get a tlsdesccall to +return the offset of the module's TLS block from the thread pointer. + +Code-generators are not required to implement local dynamic and can +emit general dynamic in its place. + +Initial Exec +^^^^^^^^^^^^ + +Initial Exec can be used for static TLS. The location of the module's +TLS block and the offset of the TLS variable within that block are +run-time constants. The dynamic-loader computes the offset from the +thread pointer and places it in a GOT entry. The GOT entry is +relocated by dynamic relocation ``R_AARCH64_TLS_TPREL64``. + +A shared-library that contains Initial Exec TLS must have the +``DF_STATIC_TLS`` dynamic tag set. In the general case an attempt to +load a shared library with ``DF_STATIC_TLS`` via ``dlopen`` will be +rejected. Some dynamic loaders implement a surplus of DTV slots that +permit a fixed number of ``DF_STATIC_TLS`` modules to be dynamically +loaded. Whether a DTV surplus is available and how many slots are +available is implementation defined. + +Small Code model; + +The static linker is permitted to relax the instructions below to +Local Exec individually using the relocation directive. The +instructions do not have to be contiguous. + +.. code-block:: asm + + adrp xn, :gottprel: var // R_AARCH64_TLSIE_ADR_GOTTPREL_PAGE21 var + ldr xn, [xn, #:gottprel_lo12:var] // R_AARCH64_TLSIE_LD64_GOTTPREL_LO12_NC var + // offset of var from tp in xn + +Tiny Code model; + +.. code-block:: asm + + ldr xn, :gottprel:var // R_AARCH64_TLSIE_LD_GOTTPREL_PREL19 var + // offset of var from tp in xn + +Large Code model; + +.. code-block:: asm + + movz xn, #:gottprel_g1:var // R_AARCH64_TLSIE_MOVW_GOTTPREL_G1 var + movk xn, #:gottprel_g0_nc:var // R_AARCH64_TLSIE_MOVW_GOTTPREL_G0_NC var + ldr xn, [gp, xn] + // offset of var from tp in xn + +Local Exec +^^^^^^^^^^ + +Local Exec is used for accesses to the executable's TLS block. The +executable always has the TLS module index of 1 so the offsets of the +TLS variables from the thread pointer are static link time +constants. The code sequences are the same for all code models. + +The instruction sequences below are not required by the ABI but using +the instructions and relocations below increases the chances of static +linkers applying the optimizations in (AAELF64_) when the size of the +executables TLS block is smaller than 16 KiB. + +.. code-block:: asm + + add xn, tp, :tprel_hi12:var, lsl #12 // R_AARCH64_TLSLE_ADD_TPREL_HI12 var + add xn, xn, :tprel_lo12_nc:var // R_AARCH64_TLSLE_ADD_TPREL_LO12_NC var + // offset of var from tp in xn + +Optimization to load a 64-bit var directly into a core register. + +.. code-block:: asm + + add xn, tp, :tprel_hi12:var, lsl #12 // R_AARCH64_TLSLE_ADD_TPREL_HI12 var + ldr xn, [xn, #:tprel_lo12_nc:var] // R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC var + +Static link time TLS Relaxations +-------------------------------- + +The Relaxations described below can be automatically applied to code +sequences in the executable. Relaxing from general dynamic will +prevent a shared library from being opened at runtime via dlopen so +should not be applied automatically. + +The static linker should use the relocation directives to distinguish +between code models. + +General Dynamic to Initial Exec +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This relaxation can be performed when the TLS variable is defined in a +module that is part of static TLS. + +Small Code Model; + +.. code-block:: asm + + adrp x0, :gottprel:var // R_AARCH64_TLSIE_ADR_GOTTPREL_PAGE21 var + ldr x0, [x0, :gottprel_lo12:var] // R_AARCH64_TLSIE_LD64_GOTTPREL_LO12_NC var + nop + nop + // offset of var from tp in x0 + +Tiny Code Model; + +.. code-block:: asm + + ldr x0, :gottprel:var // R_AARCH64_TLSIE_LD_GOTTPREL_PREL19 var + nop + nop + // offset of var from tp in x0 + +Large Code Model; + +.. code-block:: asm + + movz x0, #:gottprel_g1:var // R_AARCH64_TLSIE_MOVW_GOTTPREL_G1 var + movk x0, #:gottprel_g0_nc:var // R_AARCH64_TLSIE_MOVW_GOTTPREL_G0_NC var + ldr x0, [gp, x0] + nop + // offset of var from tp in x0 + +General Dynamic to Local Exec +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This relaxation can be performed when the TLS variable is defined in +the executable. + +Small Code Model; + +.. code-block:: asm + + movz x0, :tprel_g1:var // R_AARCH64_TLSLE_MOVW_TPREL_G1 var + movk x0, :tprel_g0:var // R_AARCH64_TLSLE_MOVW_TPREL_G0_NC var + nop + nop + // offset of var from tp in x0 + +Tiny Code Model; + +.. code-block:: asm + + movz x0, :tprel_g1:var // R_AARCH64_TLSLE_MOVW_TPREL_G1 var + movk x0, :tprel_g0:var // R_AARCH64_TLSLE_MOVW_TPREL_G0_NC var + nop + // offset of var from tp in x0 + +Large Code Model; + +.. code-block:: asm + + movz x0, :tprel_g1:var // R_AARCH64_TLSLE_MOVW_TPREL_G1 var + movk x0, :tprel_g0:var // R_AARCH64_TLSLE_MOVW_TPREL_G0_NC var + nop + nop + nop + // offset of var from tp in x0 + +Initial Exec to Local Exec +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This relaxation is only defined for the Small Code model. It can be +performed when the TLS variable is defined in the executable. The +static linker is permitted to relax each instruction individually, +using the relocation directive to identify the instruction. The +destination register must be preserved. + +.. code-block:: asm + + movz xn, :tprel_g1:var // R_AARCH64_TLSLE_MOVW_TPREL_G1 var + movk xn, :tprel_g0:var // R_AARCH64_TLSLE_MOVW_TPREL_G0_NC var + +TLS Descriptor resolver functions +--------------------------------- + +When resolving the ``R_AARCH64_TLSDESC`` relocation, the dynamic +loader places the address of the chosen resolver function in the first +GOT entry, and the argument for the chosen resolver function in the +second GOT entry. + +The AArch64 C and assembler examples are adapted from the AArch32 +`TLSDESC`_ paper. The C code below represents the TLS Descriptor. + +.. code-block:: c + + // Argument passed to TLS resolver functions. + struct tlsdesc + { + ptrdiff_t (*resolver)(struct tlsdesc *); + union + { + void *pointer; + long value; + } argument; + }; + +TLS Resolver Functions +---------------------- + +The TLS resolver functions are not standardized by this ABI as they +are internal to the dynamic linker. Programs must not directly refer +to TLS resolver functions. + +The `TLSDESCRES`_ document contains information on how a platform +might implement the resolver functions. + +Calling Convention +^^^^^^^^^^^^^^^^^^ + +TLS resolver functions have one argument, the address of the TLS +descriptor, passed in ``x0``, they return the offset of the variable +from the thread pointer in ``x0``. + +TLS resolver functions must save all general-purpose and SIMD&FP +registers that they modify with the exception of ``x0``, ``x1``, +``x30`` and the processor flags. + +TLS resolver functions are not required to save any register added by +an extension, such as the scalable vector registers or the SVE +predicate registers. See `GCCML`_ for details. + Libraries =========