See open question in https://github.com/NVIDIA/cuda-python/pull/1366. Comment: https://github.com/NVIDIA/cuda-python/pull/1366#discussion_r2620860178 Adjust flow in `Buffer.fill` to reduce intermediate object creation