Skip to content

Increased memory usage with free-threaded build #135898

Open
@nascheme

Description

@nascheme

This issue collects the various ways that the free-threaded build might use more memory compared with the default build. Some of these could be considered bugs or at least areas of potential optimizations. Others are a consequence of deliberate design choices and are unlikely to change.

Short list

  • non-GC objects have a larger object header (e.g. on a 64-bit platform, None is 32 bytes vs 16 bytes).
  • mimalloc can use more memory compared with pymalloc and the default allocator (e.g. from libc).
  • biased reference counting can cause objects to live longer compared with default build
  • interned strings are always immortal
  • QSBR (used for lock-free data structures) can hold memory that would normally be freed with the default build.
  • objects that have "deferred reference counting" enabled are not freed immediately when the reference count reaches zero.

Larger object header for non-GC objects

The free-threaded build uses a different PyObject structure. This structure uses two extra words of memory (16 bytes on a 64-bit platform). It does not use the PyGC_Head structure (also two words in size) that gets allocated before objects that support cyclic GC. So, only non-GC objects are larger.

mimalloc can use more memory

There is a separate issue for this. There are some additional reasons with mimalloc combined with the free-threaded build may use more memory. In order to support lock-free data structures, some mimalloc heaps are configured to use QSBR for reclaiming free memory. This means that recently freed memory that was backing these data structures will not be immediately available for new allocations and will not be returned to the OS. Setting MIMALLOC_PURGE_DELAY=0 in the environment and running gc.collect() should cause the memory to be free. Note that setting that environment variable will likely decrease the performance of mimalloc.

The free-threaded build also uses multiple mimalloc "heaps". Memory recently freed by one of these heaps is not immediately available to other heaps and is not returned to the OS. This can cause the working memory size of the process to increase.

Biased reference counting can cause objects to live longer

In the default build, when an object's reference count reaches zero, it is normally deallocated. The free-threaded build uses "biased reference counting", with a fast-path for objects "owned" the current thread and a slow path for other objects. See PEP 703 for more details. Any time an object ends up in a "queued" state, deallocation can be deferred. Typically these objects are deallocted from the "eval breaker" section of the bytecode evaluator.

All interned strings are immortal

For modern Python versions (since 2.3), interning a string (e.g. with sys.intern()) does not cause it to become immortal. Instead, if the last reference to that string disappears, it will be removed from the interned string table. This is not the case for the free-threaded build and any interned string will become immortal, surviving until interpreter shutdown.

QSBR can hold freed memory

In order to safely implement lock-free data structures, a safe memory reclamation (SMR) scheme is used, known as quiescent state-based reclamation (QSBR). This means that the memory backing data structures allowing lock-free access will use QSBR (which defers the free operation) rather than immediately freeing the memory. Two examples of these data structures are the list object and the dictionary keys object. See InternalDocs/qsbr.md for more details on how QSBR is implemented in CPython. Running gc.collect() should cause all memory being held by QSBR to be actually freed. Note that even when QSBR frees the memory, mimalloc may not immediately return that memory to the OS and so the resident set size (RSS) of the process might not decrease.

QSBR as implemented in version 3.14 is suboptimal in that it only considers run-time performance and does not consider the total amount of memory being held. This will likely be fixed in 3.15 so that if QSBR is holding a lot of memory, it is processed more quickly and freed if it is safe to do so.

Objects that have "deferred reference counting" enabled are not freed immediately

In order to reduce the cost of reference counting, objects that are commonly accessed from multiple threads have "deferred reference counting" enabled for them. When enabled, these objects will only be freed when the cyclic GC runs. See PEP 703 for additional details. Deferred reference counting is enabled for the following object types:

  • module top-level functions
  • class methods defined in the class scope
  • module objects
  • descriptors
  • thread-local objects (created by _thread._local())

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions