Skip to content

Conversation

@mertalev
Copy link
Member

@mertalev mertalev commented Mar 5, 2025

Description

This PR introduces support for AMD GPUs through ROCm. It's a rebased version of #11063 with updated dependencies.

It also once again removes algo caching, as the concurrency issue with caching seems to be more subtle than originally thought. While disabling caching is wasteful (it essentially runs a benchmark every time instead of only once), it's still better than the current alternative of either lowering concurrency to 1 or not having ROCm support.

@mertalev mertalev requested a review from bo0tzz as a code owner March 5, 2025 14:46
@github-actions github-actions bot added documentation Improvements or additions to documentation 🧠machine-learning labels Mar 5, 2025
@mertalev mertalev added changelog:feature and removed documentation Improvements or additions to documentation labels Mar 5, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 5, 2025
suffix: ["", "-cuda", "-openvino", "-armnn"]
suffix: ['', '-cuda', '-rocm', '-openvino', '-armnn']
steps:
- name: Login to GitHub Container Registry
Copy link
Collaborator

@NicholasFlamy NicholasFlamy Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some changes in indentation as well as changes from double quote to single quote. Was this intended? I know it's from the first commit from the original PR but I don't think that was addressed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VS Code did this when I saved. I'm not sure why it's different

Copy link
Collaborator

@NicholasFlamy NicholasFlamy Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a PR check that runs prettier on the workflow files? I would think the inconsistency exists because there likely isn't.

Copy link
Member

@zackpollard zackpollard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Docker cache appears working with no changes, would you mind changing something within ML itself that would require a source code change and rebuild, just so we can see the cache working in those cases before we merge?

@satmandu
Copy link

satmandu commented Mar 7, 2025

FYI, there's a set of rocm builds available supporting a wider range of AMD hardware, which might be useful:

lamikr/rocm_sdk_builder#216

@NicholasFlamy
Copy link
Collaborator

FYI, there's a set of rocm builds available supporting a wider range of AMD hardware, which might be useful:

lamikr/rocm_sdk_builder#216

"ROCM SDK Builder 6.1.2 is based on to ROCM 6.1.2"
That's a little older but probably okay. I'm not sure what's the point of using it though. It doesn't support a wider range of hardware from what I can tell. It's the same support as ROCm normally has.

@SharkWipf
Copy link

Sadly, no, not quite. Official ROCm does not support, for instance, gfx1103 (RX 780M and similar iGPUs, 7940HS and similar APUs).
That said, I don't know if Immich can make use of it, since applications using ROCm need to be built against it I believe. I.e. prebuilt Pytorch builds won't work.
I'm not sure what Immich uses, but I'm chiming in because I would love to run Immich on those iGPUs in question, and they are common in current gen mini PCs.

@NicholasFlamy
Copy link
Collaborator

NicholasFlamy commented Mar 7, 2025

Sadly, no, not quite. Official ROCm does not support, for instance, gfx1103 (RX 780M and similar iGPUs, 7940HS and similar APUs). That said, I don't know if Immich can make use of it, since applications using ROCm need to be built against it I believe. I.e. prebuilt Pytorch builds won't work. I'm not sure what Immich uses, but I'm chiming in because I would love to run Immich on those iGPUs in question, and they are common in current gen mini PCs.

The official listed support in the docs is mostly just gfx103X and gfx110X and maybe some other stuff. They're inconsistent and define supported as our team will help you on GitHub with certain stuff but anything not on the list may work (eg. Vega GPUs work fine) but they won't help you.

Edit: So my question would be, how does one check what's supported by the build they are running?
Also, we are building onnxruntime from source so if you want more support let us know what command line flags are needed or wtv.

@SharkWipf
Copy link

SharkWipf commented Mar 7, 2025

They're inconsistent and define supported as our team will help you on GitHub with certain stuff but anything not on the list may work (eg. Vega GPUs work fine) but they won't help you.

Yeah, but the official ROCm build will not work with gfx1103 at all, applications built against it (i.e. pytorch prebuilt) will not work with gfx1103, and building against it for gfx1103 will not work either.
I'm not sure what the exact steps are to get gfx1103 in ROCm but I do know it requires a custom build/version of ROCm. And while as you said, AMD's stance is "it may work but we won't help you out", it does not mean it will work without this custom ROCm build.

Edit: So my question would be, how does one check what's supported by the build they are running?

I'm not quite sure. On Fedora, the gfx1103 build is provided as a separate package and listed as a separate folder, but the officially supported gfx1102 falls under gfx1100 here, so it's not a reliable check:

$ ls /usr/lib64/rocm/
gfx10  gfx11  gfx1100  gfx1103  gfx8  gfx9  gfx90a  gfx942

@satmandu
Copy link

satmandu commented Mar 7, 2025

Maybe it would be useful to have two rocm flavored options? One with the current main rocm version, and one with the community version built to support a wider variety of GPUs?

@NicholasFlamy
Copy link
Collaborator

provided as a separate package and listed as a separate folder

Nice, they split them up by version. Eventually we want to do that to cut down the 30 GB image size. Frigate also splits them up. The current image we build has multiple versions all built into one image.

@SharkWipf
Copy link

SharkWipf commented Mar 7, 2025

Doing that would also resolve the issue of "official or unofficial build?" I suppose, since you can just provide the official builds for the supported GPUs and the unofficial builds for the non-supported GPUs. But you'd need to provide a lot of images that way.

Edit: FYI:

$ du -hs /usr/lib64/rocm/* | sort -h
0       /usr/lib64/rocm/gfx8
452M    /usr/lib64/rocm/gfx1100
467M    /usr/lib64/rocm/gfx942
1.2G    /usr/lib64/rocm/gfx1103
2.0G    /usr/lib64/rocm/gfx90a
2.3G    /usr/lib64/rocm/gfx10
2.3G    /usr/lib64/rocm/gfx11
5.5G    /usr/lib64/rocm/gfx9


WORKDIR /code

RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx migraphx-dev half
Copy link
Collaborator

@NicholasFlamy NicholasFlamy Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx migraphx-dev half
RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv migraphx-dev

Only migraphx-dev is needed as the other 2 are dependencies.

Edit: don't change it now, though, because it's already building.

/opt/ann/build.sh \
/opt/armnn/

FROM rocm/dev-ubuntu-22.04:6.3.4-complete AS prod-rocm
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there were already comments on this, but I think copying the deps manually may result in a smaller, yet still working image. It might be worth re-investigating.


# Warning: 25GiB+ disk space required to pull this image
# TODO: find a way to reduce the image size
FROM rocm/dev-ubuntu-22.04:6.3.4-complete AS builder-rocm

This comment was marked as resolved.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Not it.

@przemekbialek
Copy link

They're inconsistent and define supported as our team will help you on GitHub with certain stuff but anything not on the list may work (eg. Vega GPUs work fine) but they won't help you.

Yeah, but the official ROCm build will not work with gfx1103 at all, applications built against it (i.e. pytorch prebuilt) will not work with gfx1103, and building against it for gfx1103 will not work either. I'm not sure what the exact steps are to get gfx1103 in ROCm but I do know it requires a custom build/version of ROCm. And while as you said, AMD's stance is "it may work but we won't help you out", it does not mean it will work without this custom ROCm build.

Edit: So my question would be, how does one check what's supported by the build they are running?

I'm not quite sure. On Fedora, the gfx1103 build is provided as a separate package and listed as a separate folder, but the officially supported gfx1102 falls under gfx1100 here, so it's not a reliable check:

$ ls /usr/lib64/rocm/
gfx10  gfx11  gfx1100  gfx1103  gfx8  gfx9  gfx90a  gfx942

Fedora rocBLAS patch for gfx1103 support looks like copy of gfx1102 (navi33). Only names and ISA versions differ. I diffed changes betwen few files and think that theese are only diferences.

-- phoenix
-- gfx1103
-- [Device 1586]
+- navi33
+- gfx1102
+- [Device 73f0]
 - AllowNoFreeDims: false
   AssignedDerivedParameters: true
   Batched: true
@@ -112,7 +112,7 @@
     GroupLoadStore: false
     GuaranteeNoPartialA: false
     GuaranteeNoPartialB: false
-    ISA: [11, 0, 3]
+    ISA: [11, 0, 2]

I'm intrested in additional gpu support because I have minipc with Ryzen8845HS (Radeon 780M) for testing, and second one with Ryzen5825U.
I tried running ghcr.io/immich-app/immich-machine-learning:pr-16613-rocm version with HSA_OVERRIDE_GFX_VERSION=11.0.0, but this setup crashes my card under heavy load (only default models from immich works and only when I run one type of job in single thread). I read that for 780M best choice is gfx1102 but when I set HSA_OVERRIDE_GFX_VERSION=11.0.2 I have errors. I think its because onnxruntime doesn't have compiled support for this arch. Now I trying to build machine-learning with rocm onnxruntime support with small patch which I think enables gfx900 and gfx1102 support in onnxruntime, so if and when build completes I will try this.

diff --git a/cmake/CMakeLists.txt b/cmake/CMakeLists.txt
index d90a2a355..bb1a7de12 100644
--- a/cmake/CMakeLists.txt
+++ b/cmake/CMakeLists.txt
@@ -295,7 +295,7 @@ if (onnxruntime_USE_ROCM)
   endif()

   if (NOT CMAKE_HIP_ARCHITECTURES)
-    set(CMAKE_HIP_ARCHITECTURES "gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx940;gfx941;gfx942;gfx1200;gfx1201")
+    set(CMAKE_HIP_ARCHITECTURES "gfx900;gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx1102;gfx940;gfx941;gfx942;gfx1200;gfx1201")
   endif()

   file(GLOB rocm_cmake_components ${onnxruntime_ROCM_HOME}/lib/cmake/*)

@SharkWipf
Copy link

but this setup crashes my card under heavy load

My 780m locks up my desktop roughly 50% of the time when using ROCm llama.cpp/whisper.cpp with any ROCm version (1100, 1102, 1103). I'd hoped it would be less of an issue headless or with different applications, but if you have the same issue with Immich that does not bode well...

@NicholasFlamy
Copy link
Collaborator

HSA_OVERRIDE_GFX_VERSION=11.0.2

This is not a valid version from what I've observed. So far, there are only 3 valid options:

HSA_OVERRIDE_GFX_VERSION=11.0.0
HSA_OVERRIDE_GFX_VERSION=10.3.0
HSA_OVERRIDE_GFX_VERSION=9.0.0

@przemekbialek
Copy link

but this setup crashes my card under heavy load

My 780m locks up my desktop roughly 50% of the time when using ROCm llama.cpp/whisper.cpp with any ROCm version (1100, 1102, 1103). I'd hoped it would be less of an issue headless or with different applications, but if you have the same issue with Immich that does not bode well...

Unfortunately adding support for gfx1102 dosen't solve problems with crashing on Radeon 780M, but I'm happy because I succeeded getting it to work on Ryzen 5825U GPU.

@NicholasFlamy
Copy link
Collaborator

Radeon 780M

They also specifically say certain iGPUs crash. I would bet that they're just bleading edge.

Ryzen 5825U GPU

That model or similar is known to work.

@przemekbialek
Copy link

przemekbialek commented Mar 9, 2025

HSA_OVERRIDE_GFX_VERSION=11.0.2

This is not a valid version from what I've observed. So far, there are only 3 valid options:

HSA_OVERRIDE_GFX_VERSION=11.0.0
HSA_OVERRIDE_GFX_VERSION=10.3.0
HSA_OVERRIDE_GFX_VERSION=9.0.0

ROCm which is in image created in this PR has compiled for arch which are below so 11.0.2 is valid option because this means gfx1102. Below some direcrory listing from image.

-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1010.dat
-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1012.dat
-rw-r--r-- 1 root root     23026 Dec 11 10:06 TensileLibrary_lazy_gfx1030.dat
-rw-r--r-- 1 root root     24186 Dec 11 10:06 TensileLibrary_lazy_gfx1100.dat
-rw-r--r-- 1 root root     24186 Dec 11 10:06 TensileLibrary_lazy_gfx1101.dat
-rw-r--r-- 1 root root     24186 Dec 11 10:06 TensileLibrary_lazy_gfx1102.dat
-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1151.dat
-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1200.dat
-rw-r--r-- 1 root root     17653 Dec 11 10:06 TensileLibrary_lazy_gfx1201.dat
-rw-r--r-- 1 root root     26537 Dec 11 10:06 TensileLibrary_lazy_gfx900.dat
-rw-r--r-- 1 root root     31798 Dec 11 10:06 TensileLibrary_lazy_gfx906.dat
-rw-r--r-- 1 root root     34732 Dec 11 10:06 TensileLibrary_lazy_gfx908.dat
-rw-r--r-- 1 root root     62265 Dec 11 10:06 TensileLibrary_lazy_gfx90a.dat
-rw-r--r-- 1 root root     58949 Dec 11 10:06 TensileLibrary_lazy_gfx942.dat

Without patch to onxruntime HSA_OVERRIDE_GFX_VERSION=9.0.0 isn't a valid option in immich-machine-learning because this arch isn't compiled by default.
By default onnx runtime builds for arch:

set(CMAKE_HIP_ARCHITECTURES "gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx940;gfx941;gfx942;gfx1200;gfx1201")

@mertalev mertalev enabled auto-merge (squash) March 17, 2025 18:02
@mertalev mertalev merged commit 2b37cab into main Mar 17, 2025
50 checks passed
@mertalev mertalev deleted the feat/rocm-ep branch March 17, 2025 21:08
1. If you do not already have it, download the latest [`hwaccel.ml.yml`][hw-file] file and ensure it's in the same folder as the `docker-compose.yml`.
2. In the `docker-compose.yml` under `immich-machine-learning`, uncomment the `extends` section and change `cpu` to the appropriate backend.
3. Still in `immich-machine-learning`, add one of -[armnn, cuda, openvino] to the `image` section's tag at the end of the line.
3. Still in `immich-machine-learning`, add one of -[armnn, cuda, rocm, openvino] to the `image` section's tag at the end of the line.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we forget to add rknn here? oops

savely-krasovsky pushed a commit to savely-krasovsky/immich that referenced this pull request Jun 8, 2025
* feat(ml): introduce support of onnxruntime-rocm for AMD GPU

* try mutex for algo cache

use OrtMutex

* bump versions, run on mich

use 3.12

use 1.19.2

* acquire lock before any changes can be made

guard algo benchmark results

mark mutex as mutable

re-add /bin/sh (?)

use 3.10

use 6.1.2

* use composite cache key

1.19.2

fix variable name

fix variable reference

aaaaaaaaaaaaaaaaaaaa

* bump deps

* disable algo caching

* fix gha

* try ubuntu runner

* actually fix the gha

* update patch

* skip mimalloc preload for rocm

* increase build threads

* increase timeout for rocm

* Revert "increase timeout for rocm"

This reverts commit 2c4452f.

* attempt migraphx

* set migraphx_home

* Revert "set migraphx_home"

This reverts commit c121d3e.

* Revert "attempt migraphx"

This reverts commit 521f9fb.

* migraphx, take two

* bump rocm

* allow cpu

* try only targeting migraphx

* skip tests

* migraph ❌

* known issues

* target gfx900 and gfx1102

* mention `HSA_USE_SVM`

* update lock

* set device id for rocm

---------

Co-authored-by: Mehdi GHESH <[email protected]>
@niklasfink
Copy link

Hi @przemekbialek, you wrote

Configuration above was tested on hardware listed below:

  • Ryzen 8845HS with Radeon 780M (gfx1103) - I used HSA_OVERRIDE_GFX_VERSION=11.0.2 and HSA_OVERRIDE_GFX_VERSION=11.0.0 environment variables. To workaround crashes I must set HSA_USE_SVM=0.

I'm using the Ryzen PRO 8845HS with Radeon 780M (gfx1103). Unfortunately, it doesn't work for me using immich-machine-learning:v1.141.1-rocm and the process fails with: HW Exception by GPU node-1 (Agent handle: 0x7f4df9b457a0) reason :GPU Hang when using HSA_OVERRIDE_GFX_VERSION=11.0.0 or 11.0.2 with or without HSA_USE_SVM=0. Not sure if the configuration has been merged into the immich codebase or if the Radeon 780M still needs a custom configuration as you had it.

Without the override, I get HIP failure 100: no ROCm-capable device is detected ; GPU=-1 ; hostname=x ; file=/code/onnxruntime/onnxruntime/core/providers/rocm/rocm_common.h

@NicholasFlamy would it be possible to upgrade the base image to rocm/dev-ubuntu-22.04:6.4.3-complete@sha256:6cda50e312f3aac068cea9ec06c560ca1f522ad546bc8b3d2cf06da0fe8e8a76 which would be the most recent version and I thought it had gfx1103 support?
https://github.com/immich-app/immich/blob/main/machine-learning/Dockerfile#L25

Regarding ROCm update:

ROCm 6.4.0 released today with runtime support for gfx11-generic, which is an ISA that is the lowest-common denominator of gfx1100, gfx1101, gfx1102, gfx1103, gfx1150, gfx1151, gfx1152, and gfx1153. That is, code that is compiled for gfx11-generic will run on all those GPUs.
https://lists.debian.org/debian-ai/2025/04/msg00081.html

@NicholasFlamy
Copy link
Collaborator

NicholasFlamy commented Sep 13, 2025

@NicholasFlamy would it be possible to upgrade the base image to rocm/dev-ubuntu-22.04:6.4.3-complete@sha256:6cda50e312f3aac068cea9ec06c560ca1f522ad546bc8b3d2cf06da0fe8e8a76 which would be the most recent version and I thought it had gfx1103 support?
https://github.com/immich-app/immich/blob/main/machine-learning/Dockerfile#L25

It's absolutely possible.

Edit: I made a PR, it'll build an image that you can try out to see if it works. #21924 I'll have to test it before this can be merged.

@NicholasFlamy
Copy link
Collaborator

@NicholasFlamy would it be possible to upgrade the base image to rocm/dev-ubuntu-22.04:6.4.3-complete@sha256:6cda50e312f3aac068cea9ec06c560ca1f522ad546bc8b3d2cf06da0fe8e8a76 which would be the most recent version and I thought it had gfx1103 support?
https://github.com/immich-app/immich/blob/main/machine-learning/Dockerfile#L25

It's absolutely possible.

Edit: I made a PR, it'll build an image that you can try out to see if it works. #21924 I'll have to test it before this can be merged.

So, to try the image, you can replace the ML image with this one: ghcr.io/immich-app/immich-machine-learning:pr-22034-rocm
#22034 is the new PR that will incorporate this.

@niklasfink
Copy link

@NicholasFlamy would it be possible to upgrade the base image to rocm/dev-ubuntu-22.04:6.4.3-complete@sha256:6cda50e312f3aac068cea9ec06c560ca1f522ad546bc8b3d2cf06da0fe8e8a76 which would be the most recent version and I thought it had gfx1103 support?
https://github.com/immich-app/immich/blob/main/machine-learning/Dockerfile#L25

It's absolutely possible.
Edit: I made a PR, it'll build an image that you can try out to see if it works. #21924 I'll have to test it before this can be merged.

So, to try the image, you can replace the ML image with this one: ghcr.io/immich-app/immich-machine-learning:pr-22034-rocm #22034 is the new PR that will incorporate this.

Thanks a lot! Unfortunately, it doesn't make a difference for me. No matter what override value I use, I keep getting a HIP failure 100: no ROCm-capable device is detected. (No more GPU hang error though; so, something has changed)

Within the container, the situation is:
image

@NicholasFlamy
Copy link
Collaborator

I keep getting a HIP failure 100: no ROCm-capable device is detected.

Are you doing the GFX override? If not, make sure to try it. If it still doesn't work, make a new issue and @ me in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog:feature documentation Improvements or additions to documentation 🧠machine-learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.