Skip to content

Conversation

leofang
Copy link
Member

@leofang leofang commented Feb 1, 2021

Blocked by #4537. The new stuff starts at 0b840b1.

This PR enables the following canonical usage, similar to CuPy's other memory pools:

cupy.cuda.set_allocator(cupy.cuda.MemoryAsyncPool().malloc)

Similar to MemoryPool, this new MemoryAsyncPool supports multiple devices, which is not transparent if just using the bare malloc_async() from #4537.

Moreover, having a wrapper like MemoryAsyncPool allows more possibilities, such as

  • Setting which cudaMemPool_t handle to use for each device via the pool constructor
  • Enforce memory alignment
  • Error handling when OOM occurs
  • Matching the interface of other CuPy pools (despite most of them do not work as we are not doing any bookkeeping here)

In the future we could support MemoryAsyncPool.set_limit() by using cudaMemPoolTrimTo(). I do not add it in this PR because it's not yet clear to me how nicely it can be when working with other libraries that also use the async pool.

@leofang leofang changed the title [WIP] Add MemoryAsyncPool supporting stream-ordered memory allocator [WIP] CUDA 11.2: Add MemoryAsyncPool supporting stream-ordered memory allocator Feb 1, 2021
@kmaehashi kmaehashi added cat:feature New features/APIs prio:medium labels Feb 2, 2021
@leofang leofang changed the title [WIP] CUDA 11.2: Add MemoryAsyncPool supporting stream-ordered memory allocator CUDA 11.2: Add MemoryAsyncPool supporting malloc_async Feb 5, 2021
@leofang leofang marked this pull request as ready for review February 5, 2021 03:06
@leofang leofang changed the title CUDA 11.2: Add MemoryAsyncPool supporting malloc_async CUDA 11.2: Add MemoryAsyncPool to support malloc_async Feb 5, 2021
@leofang
Copy link
Member Author

leofang commented Feb 9, 2021

Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit b05f5a8, target branch master) failed with status FAILURE.

@leofang
Copy link
Member Author

leofang commented Feb 9, 2021

Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 5b5ed3a, target branch master) succeeded!

@chainer-ci
Copy link
Member

Jenkins CI test (for commit e6538b1, target branch master) failed with status FAILURE.

@leofang
Copy link
Member Author

leofang commented Mar 19, 2021

@emcastillo Looks like the CI could randomly hit OOM in the added tests. What should I do? Make the tests less memory-hungry? Mark them slow?

@leofang
Copy link
Member Author

leofang commented Mar 19, 2021

Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit a5e5255, target branch master) failed with status FAILURE.

@leofang
Copy link
Member Author

leofang commented Mar 20, 2021

Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit a5e5255, target branch master) succeeded!

@leofang
Copy link
Member Author

leofang commented Mar 20, 2021

@emcastillo Tests added in this PR passed twice in a row. Should be safe now?

@chainer-ci
Copy link
Member

@emcastillo This pull-request is marked as st:test-and-merge, but there were no activities for the last 3 days. Could you check?

1 similar comment
@chainer-ci
Copy link
Member

@emcastillo This pull-request is marked as st:test-and-merge, but there were no activities for the last 3 days. Could you check?

@emcastillo emcastillo removed the st:test-and-merge (deprecated) Ready to merge after test pass. label Mar 28, 2021
@emcastillo
Copy link
Member

lets rebase

@leofang
Copy link
Member Author

leofang commented Mar 28, 2021

lets rebase

Done!

Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit 9e1d636, target branch master) succeeded!

@leofang
Copy link
Member Author

leofang commented Apr 4, 2021

Jenkins, test this please

@chainer-ci
Copy link
Member

Jenkins CI test (for commit c49e805, target branch master) succeeded!

@emcastillo emcastillo added this to the v10.0.0a1 milestone Apr 6, 2021
@emcastillo emcastillo added st:test-and-merge (deprecated) Ready to merge after test pass. to-be-backported Pull-requests to be backported to stable branch labels Apr 6, 2021
@mergify mergify bot merged commit 5880b98 into cupy:master Apr 6, 2021
chainer-ci pushed a commit to chainer-ci/cupy that referenced this pull request Apr 6, 2021
CUDA 11.2: Add `MemoryAsyncPool` to support `malloc_async`
@leofang leofang deleted the mempool_impl branch April 6, 2021 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:feature New features/APIs prio:medium st:test-and-merge (deprecated) Ready to merge after test pass. to-be-backported Pull-requests to be backported to stable branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants