Skip to content

Conversation

Azusachan
Copy link
Contributor

ROCm 6 introduced various changes on its API. In particular,

  • Removal of gcnarch from hipDeviceProp_t structure
  • Renaming of ‘memoryType’ in hipPointerAttribute_t structure to ‘type’

This patch provides support on ROCm 6.0.0 and above.

@Azusachan Azusachan closed this May 2, 2024
@Azusachan Azusachan reopened this May 2, 2024
@kmaehashi kmaehashi self-assigned this May 7, 2024
@kmaehashi kmaehashi added cat:enhancement Improvements to existing features to-be-backported Pull-requests to be backported to stable branch prio:medium labels May 7, 2024
@littlewu2508
Copy link
Contributor

littlewu2508 commented May 23, 2024

cupy_backends/cuda/libs/_cnvrtc.pxi also needs to update to avoid nvrtc.getVersion error, thanks to @Berrysoft

From 05233251a78e86bd269f79272561de22991843a1 Mon Sep 17 00:00:00 2001
From: Yiyang Wu <[email protected]>
Date: Thu, 23 May 2024 20:41:14 +0800
Subject: [PATCH] Add ROCm 6 in runtime_version

---
 cupy_backends/cuda/libs/_cnvrtc.pxi | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/cupy_backends/cuda/libs/_cnvrtc.pxi b/cupy_backends/cuda/libs/_cnvrtc.pxi
index 9f02b5522..b2b06aa4f 100644
--- a/cupy_backends/cuda/libs/_cnvrtc.pxi
+++ b/cupy_backends/cuda/libs/_cnvrtc.pxi
@@ -139,5 +139,8 @@ cdef SoftLink _get_softlink():
         elif runtime_version < 6_00_00000:
             # ROCm 5.x
             libname = 'libamdhip64.so.5'
+        elif runtime_version < 7_00_00000:
+            # ROCm 6.x
+            libname = 'libamdhip64.so.6'
 
     return SoftLink(libname, prefix, mandatory=True)
-- 
2.44.0

@Berrysoft
Copy link
Contributor

@littlewu2508 the comment should be # ROCm 6.x :)

@littlewu2508
Copy link
Contributor

the comment should be # ROCm 6.x :)

Thanks, I've edited the patch

ROCm 6 introduced various changes on its API. In particular,
* Removal of gcnarch from hipDeviceProp_t structure
* Renaming of ‘memoryType’ in hipPointerAttribute_t structure to ‘type’

This patch allows cupy to be built on this version.
@Azusachan
Copy link
Contributor Author

Rebase on 13.2.0

@littlewu2508 littlewu2508 mentioned this pull request Aug 6, 2024
Copy link

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I am trying to install cupy on a ROCm 6.1 machine (early access for El Capitan at LLNL) and this patch address the configure and compile errors I encountered. Can this patch be merged? :)

@ax3l
Copy link

ax3l commented Aug 24, 2024

cc @takagi @jglaser

@kmaehashi
Copy link
Member

Hi @Azusachan, thank you so much for the contribution, and sorry for keeping you waiting! I have verified the build succeeds with this PR, and of course, happy to merge this one to support ROCm 6.x in CuPy.

A roadblock I faced when testing this PR was that I couldn't launch the kernel in my environment with ROCm 6.2. Does anyone ever experienced or resolved this kind of issue?

>>> cupy.arange(10)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/maehashi/Development/cupy/cupy/_creation/ranges.py", line 60, in arange
    _arange_ufunc(typ(start), typ(step), ret, dtype=dtype)
  File "cupy/_core/_kernel.pyx", line 1375, in cupy._core._kernel.ufunc.__call__
    kern = self._get_ufunc_kernel(dev_id, op, arginfos, has_where)
  File "cupy/_core/_kernel.pyx", line 1402, in cupy._core._kernel.ufunc._get_ufunc_kernel
    kern = _get_ufunc_kernel(
  File "cupy/_core/_kernel.pyx", line 1082, in cupy._core._kernel._get_ufunc_kernel
    return _get_simple_elementwise_kernel(
  File "cupy/_core/_kernel.pyx", line 94, in cupy._core._kernel._get_simple_elementwise_kernel
    return _get_simple_elementwise_kernel_from_code(name, code, options)
  File "cupy/_core/_kernel.pyx", line 82, in cupy._core._kernel._get_simple_elementwise_kernel_from_code
    module = compile_with_cache(code, options)
  File "cupy/_core/core.pyx", line 2258, in cupy._core.core.compile_with_cache
    return cuda.compiler._compile_module_with_cache(
  File "/home/maehashi/Development/cupy/cupy/cuda/compiler.py", line 480, in _compile_module_with_cache
    return _compile_with_cache_hip(
  File "/home/maehashi/Development/cupy/cupy/cuda/compiler.py", line 930, in _compile_with_cache_hip
    mod.load(binary)
  File "cupy/cuda/function.pyx", line 263, in cupy.cuda.function.Module.load
    cpdef load(self, bytes cubin):
  File "cupy/cuda/function.pyx", line 264, in cupy.cuda.function.Module.load
    runtime._ensure_context()
  File "cupy_backends/cuda/api/runtime.pyx", line 1022, in cupy_backends.cuda.api.runtime._ensure_context
    memGetInfo()
  File "cupy_backends/cuda/api/runtime.pyx", line 593, in cupy_backends.cuda.api.runtime.memGetInfo
    check_status(status)
  File "cupy_backends/cuda/api/runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
    raise CUDARuntimeError(status)
cupy_backends.cuda.api.runtime.CUDARuntimeError: hipErrorInvalidValue: invalid argument

Also cc-ing AMD people: @AdrianAbeyta @pnunna93 @lcskrishna @bmedishe @shbiswas834

@kmaehashi
Copy link
Member

A roadblock I faced when testing this PR was that I couldn't launch the kernel in my environment with ROCm 6.2. Does anyone ever experienced or resolved this kind of issue?

Ok my GPU was too old to run ROCm 6.0... The problem disappeared with gfx908.

@kmaehashi
Copy link
Member

/test mini

kmaehashi
kmaehashi previously approved these changes Sep 17, 2024
@kmaehashi
Copy link
Member

/test mini

@kmaehashi kmaehashi merged commit 392d941 into cupy:main Sep 18, 2024
60 checks passed
chainer-ci pushed a commit to chainer-ci/cupy that referenced this pull request Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:enhancement Improvements to existing features prio:medium to-be-backported Pull-requests to be backported to stable branch
Projects
None yet
5 participants