Skip to content

Data race on block->next in mi_block_set_nextx #129748

@colesbury

Description

@colesbury

Bug report

I've seen this in non-debug TSan builds. The TSAN report looks like:

  Write of size 8 at 0x7fffc4043690 by thread T2692:
    #0 mi_block_set_nextx /raid/sgross/cpython/./Include/internal/mimalloc/mimalloc/internal.h:652:15 (python+0x2ce71e) (BuildId: 2d15b5a5260b454c4f23bd5e53d32d43bfb806c4)
    #1 _mi_free_block_mt /raid/sgross/cpython/Objects/mimalloc/alloc.c:467:9 (python+0x2ce71e)
    #2 _mi_free_block /raid/sgross/cpython/Objects/mimalloc/alloc.c:506:5 (python+0x2a8b9a) (BuildId: 2d15b5a5260b454c4f23bd5e53d32d43bfb806c4)
    #3 _mi_free_generic /raid/sgross/cpython/Objects/mimalloc/alloc.c:524:3 (python+0x2a8b9a)
    #4 mi_free /raid/sgross/cpython/Objects/mimalloc/alloc.c (python+0x2c765b) (BuildId: 2d15b5a5260b454c4f23bd5e53d32d43bfb806c4)
    #5 _PyObject_MiFree /raid/sgross/cpython/Objects/obmalloc.c:284:5 (python+0x2c765b)
...

  Previous atomic read of size 8 at 0x7fffc4043690 by thread T2690:
    #0 _Py_atomic_load_uintptr_relaxed /raid/sgross/cpython/./Include/cpython/pyatomic_gcc.h:375:10 (python+0x4d0341) (BuildId: 2d15b5a5260b454c4f23bd5e53d32d43bfb806c4)
    #1 _Py_IsOwnedByCurrentThread /raid/sgross/cpython/./Include/object.h:252:12 (python+0x4d0341)
    #2 _Py_TryIncrefFast /raid/sgross/cpython/./Include/internal/pycore_object.h:560:9 (python+0x4d0341)
    #3 _Py_TryIncrefCompare /raid/sgross/cpython/./Include/internal/pycore_object.h:599:9 (python+0x4d0341)
    #4 PyMember_GetOne /raid/sgross/cpython/Python/structmember.c:99:18 (python+0x4d0054) (BuildId: 2d15b5a5260b454c4f23bd5e53d32d43bfb806c4)
    #5 member_get /raid/sgross/cpython/Objects/descrobject.c:179:12 (python+0x2056aa) (BuildId: 2d15b5a5260b454c4f23bd5e53d32d43bfb806c4)
...

SUMMARY: ThreadSanitizer: data race /raid/sgross/cpython/./Include/internal/mimalloc/mimalloc/internal.h:652:15 in mi_block_set_nextx

This happens when we call _Py_TryIncrefCompare() or _Py_TryXGetRef or similar on an object that may be concurrently freed. Perhaps surprisingly, this is a supported operation. See https://peps.python.org/pep-0703/#mimalloc-changes-for-optimistic-list-and-dict-access.

The problem is mi_block_set_nextx doesn't use a relaxed store, so this is a data race because the mimalloc freelist pointer may overlap the ob_tid field. The mimalloc freelist pointer is at the beginning of the freed memory block and ob_tid is the first field in PyObject.

You won't see this data race if:

  • The object uses Py_TPFLAGS_MANAGED_DICT. In that case the beginning the managed dictionary pointer comes before ob_tid. That is fine because, unlike ob_tid, the managed dictionary pointer is never accessed concurrently with freeing the object.
  • If CPython is built with --with-pydebug. The debug allocator sticks two extra words at the beginning of each allocation, so the freelist pointers will overlap with those (this is also fine).

Here are two options:

  • Use relaxed stores in mimalloc, such as in mi_block_set_nextx. There's about six of these assignments -- not terrible to change -- but I don't love the idea of modifications to mimalloc that don't make sense to upstream, and these only make sense in the context of free threaded CPython.
  • Reorder PyObject in the free threading build so that ob_type is the first field. This avoids any overlap with ob_tid. It's annoying to break ABI or change the PyObject header though.

cc @mpage @Yhg1s

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions