-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
feat(ml): rocm #16613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
feat(ml): rocm #16613
Changes from all commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
f47cac4
feat(ml): introduce support of onnxruntime-rocm for AMD GPU
Zelnes cb0b137
try mutex for algo cache
mertalev e045b5b
bump versions, run on mich
mertalev 092cae8
acquire lock before any changes can be made
mertalev c2c9fdc
use composite cache key
mertalev 821b6d3
bump deps
mertalev f4158a3
disable algo caching
mertalev d9bc297
fix gha
mertalev 805fcb4
try ubuntu runner
mertalev b759ea1
actually fix the gha
mertalev 6c9daae
update patch
mertalev 99f1f64
skip mimalloc preload for rocm
mertalev 4a122b2
increase build threads
mertalev 275b490
increase timeout for rocm
mertalev ef6ac0d
Revert "increase timeout for rocm"
mertalev a4edace
attempt migraphx
mertalev 76ce444
set migraphx_home
mertalev 9884140
Revert "set migraphx_home"
mertalev 1723518
Revert "attempt migraphx"
mertalev 8404f56
migraphx, take two
mertalev cfaa393
bump rocm
mertalev 1bdf60b
allow cpu
mertalev 5820749
try only targeting migraphx
mertalev 97c64bf
skip tests
mertalev 0472258
migraph ❌
mertalev 3bd7700
known issues
mertalev 0bb53e2
target gfx900 and gfx1102
mertalev c7ed1ad
mention `HSA_USE_SVM`
mertalev 23a8dab
update lock
mertalev d97ce76
set device id for rocm
mertalev 0445b62
Merge branch 'main' into feat/rocm-ep
mertalev 730736a
fix indent
mertalev 11dc006
add rknn back
mertalev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,6 +11,7 @@ You do not need to redo any machine learning jobs after enabling hardware accele | |
|
|
||
| - ARM NN (Mali) | ||
| - CUDA (NVIDIA GPUs with [compute capability](https://developer.nvidia.com/cuda-gpus) 5.2 or higher) | ||
| - ROCm (AMD GPUs) | ||
| - OpenVINO (Intel GPUs such as Iris Xe and Arc) | ||
| - RKNN (Rockchip) | ||
|
|
||
|
|
@@ -44,6 +45,12 @@ You do not need to redo any machine learning jobs after enabling hardware accele | |
| - The installed driver must be >= 535 (it must support CUDA 12.2). | ||
| - On Linux (except for WSL2), you also need to have [NVIDIA Container Toolkit][nvct] installed. | ||
|
|
||
| #### ROCm | ||
|
|
||
| - The GPU must be supported by ROCm. If it isn't officially supported, you can attempt to use the `HSA_OVERRIDE_GFX_VERSION` environmental variable: `HSA_OVERRIDE_GFX_VERSION=<a supported version, e.g. 10.3.0>`. If this doesn't work, you might need to also set `HSA_USE_SVM=0`. | ||
| - The ROCm image is quite large and requires at least 35GiB of free disk space. However, pulling later updates to the service through Docker will generally only amount to a few hundred megabytes as the rest will be cached. | ||
| - This backend is new and may experience some issues. For example, GPU power consumption can be higher than usual after running inference, even if the machine learning service is idle. In this case, it will only go back to normal after being idle for 5 minutes (configurable with the [MACHINE_LEARNING_MODEL_TTL](/docs/install/environment-variables) setting). | ||
|
|
||
| #### OpenVINO | ||
|
|
||
| - Integrated GPUs are more likely to experience issues than discrete GPUs, especially for older processors or servers with low RAM. | ||
|
|
@@ -64,12 +71,12 @@ You do not need to redo any machine learning jobs after enabling hardware accele | |
|
|
||
| 1. If you do not already have it, download the latest [`hwaccel.ml.yml`][hw-file] file and ensure it's in the same folder as the `docker-compose.yml`. | ||
| 2. In the `docker-compose.yml` under `immich-machine-learning`, uncomment the `extends` section and change `cpu` to the appropriate backend. | ||
| 3. Still in `immich-machine-learning`, add one of -[armnn, cuda, openvino] to the `image` section's tag at the end of the line. | ||
| 3. Still in `immich-machine-learning`, add one of -[armnn, cuda, rocm, openvino] to the `image` section's tag at the end of the line. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. did we forget to add rknn here? oops |
||
| 4. Redeploy the `immich-machine-learning` container with these updated settings. | ||
|
|
||
| ### Confirming Device Usage | ||
|
|
||
| You can confirm the device is being recognized and used by checking its utilization. There are many tools to display this, such as `nvtop` for NVIDIA or Intel and `intel_gpu_top` for Intel. | ||
| You can confirm the device is being recognized and used by checking its utilization. There are many tools to display this, such as `nvtop` for NVIDIA or Intel, `intel_gpu_top` for Intel, and `radeontop` for AMD. | ||
|
|
||
| You can also check the logs of the `immich-machine-learning` container. When a Smart Search or Face Detection job begins, or when you search with text in Immich, you should either see a log for `Available ORT providers` containing the relevant provider (e.g. `CUDAExecutionProvider` in the case of CUDA), or a `Loaded ANN model` log entry without errors in the case of ARM NN. | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,6 +17,34 @@ RUN mkdir /opt/armnn && \ | |
|
|
||
| FROM builder-cpu AS builder-rknn | ||
|
|
||
| # Warning: 25GiB+ disk space required to pull this image | ||
| # TODO: find a way to reduce the image size | ||
| FROM rocm/dev-ubuntu-22.04:6.3.4-complete AS builder-rocm | ||
This comment was marked as resolved.
Sorry, something went wrong.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nope. Not it. |
||
|
|
||
| WORKDIR /code | ||
|
|
||
| RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv | ||
| RUN wget -nv https://github.com/Kitware/CMake/releases/download/v3.30.1/cmake-3.30.1-linux-x86_64.sh && \ | ||
| chmod +x cmake-3.30.1-linux-x86_64.sh && \ | ||
| mkdir -p /code/cmake-3.30.1-linux-x86_64 && \ | ||
| ./cmake-3.30.1-linux-x86_64.sh --skip-license --prefix=/code/cmake-3.30.1-linux-x86_64 && \ | ||
| rm cmake-3.30.1-linux-x86_64.sh | ||
|
|
||
| ENV PATH=/code/cmake-3.30.1-linux-x86_64/bin:${PATH} | ||
|
|
||
| RUN git clone --single-branch --branch v1.20.1 --recursive "https://github.com/Microsoft/onnxruntime" onnxruntime | ||
mertalev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| WORKDIR /code/onnxruntime | ||
| # Fix for multi-threading based on comments in https://github.com/microsoft/onnxruntime/pull/19567 | ||
| # TODO: find a way to fix this without disabling algo caching | ||
| COPY ./patches/* /tmp/ | ||
| RUN git apply /tmp/*.patch | ||
|
|
||
| RUN /bin/sh ./dockerfiles/scripts/install_common_deps.sh | ||
| # Note: the `parallel` setting uses a substantial amount of RAM | ||
| RUN ./build.sh --allow_running_as_root --config Release --build_wheel --update --build --parallel 17 --cmake_extra_defines\ | ||
| ONNXRUNTIME_VERSION=1.20.1 --skip_tests --use_rocm --rocm_home=/opt/rocm | ||
| RUN mv /code/onnxruntime/build/Linux/Release/dist/*.whl /opt/ | ||
|
|
||
| FROM builder-${DEVICE} AS builder | ||
|
|
||
| ARG DEVICE | ||
|
|
@@ -32,17 +60,20 @@ RUN --mount=type=cache,target=/root/.cache/uv \ | |
| --mount=type=bind,source=uv.lock,target=uv.lock \ | ||
| --mount=type=bind,source=pyproject.toml,target=pyproject.toml \ | ||
| uv sync --frozen --extra ${DEVICE} --no-dev --no-editable --no-install-project --compile-bytecode --no-progress --active --link-mode copy | ||
| RUN if [ "$DEVICE" = "rocm" ]; then \ | ||
| uv pip install /opt/onnxruntime_rocm-*.whl; \ | ||
| fi | ||
|
|
||
| FROM python:3.11-slim-bookworm@sha256:614c8691ab74150465ec9123378cd4dde7a6e57be9e558c3108df40664667a4c AS prod-cpu | ||
|
|
||
| FROM prod-cpu AS prod-openvino | ||
|
|
||
| RUN apt-get update && \ | ||
| apt-get install --no-install-recommends -yqq ocl-icd-libopencl1 wget && \ | ||
| wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.17384.11/intel-igc-core_1.0.17384.11_amd64.deb && \ | ||
| wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.17384.11/intel-igc-opencl_1.0.17384.11_amd64.deb && \ | ||
| wget https://github.com/intel/compute-runtime/releases/download/24.31.30508.7/intel-opencl-icd_24.31.30508.7_amd64.deb && \ | ||
| wget https://github.com/intel/compute-runtime/releases/download/24.31.30508.7/libigdgmm12_22.4.1_amd64.deb && \ | ||
| wget -nv https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.17384.11/intel-igc-core_1.0.17384.11_amd64.deb && \ | ||
| wget -nv https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.17384.11/intel-igc-opencl_1.0.17384.11_amd64.deb && \ | ||
| wget -nv https://github.com/intel/compute-runtime/releases/download/24.31.30508.7/intel-opencl-icd_24.31.30508.7_amd64.deb && \ | ||
| wget -nv https://github.com/intel/compute-runtime/releases/download/24.31.30508.7/libigdgmm12_22.4.1_amd64.deb && \ | ||
| dpkg -i *.deb && \ | ||
| rm *.deb && \ | ||
| apt-get remove wget -yqq && \ | ||
|
|
@@ -59,6 +90,8 @@ COPY --from=builder-cuda /usr/local/bin/python3 /usr/local/bin/python3 | |
| COPY --from=builder-cuda /usr/local/lib/python3.11 /usr/local/lib/python3.11 | ||
| COPY --from=builder-cuda /usr/local/lib/libpython3.11.so /usr/local/lib/libpython3.11.so | ||
|
|
||
| FROM rocm/dev-ubuntu-22.04:6.3.4-complete AS prod-rocm | ||
|
|
||
| FROM prod-cpu AS prod-armnn | ||
|
|
||
| ENV LD_LIBRARY_PATH=/opt/armnn | ||
|
|
@@ -81,13 +114,12 @@ COPY --from=builder-armnn \ | |
|
|
||
| FROM prod-cpu AS prod-rknn | ||
|
|
||
| ADD --checksum=sha256:73993ed4b440460825f21611731564503cc1d5a0c123746477da6cd574f34885 https://github.com/airockchip/rknn-toolkit2/raw/refs/tags/v2.3.0/rknpu2/runtime/Linux/librknn_api/aarch64/librknnrt.so /usr/lib/ | ||
|
|
||
| FROM prod-${DEVICE} AS prod | ||
|
|
||
| ARG DEVICE | ||
|
|
||
| RUN apt-get update && \ | ||
| apt-get install -y --no-install-recommends tini $(if ! [ "$DEVICE" = "openvino" ]; then echo "libmimalloc2.0"; fi) && \ | ||
| apt-get install -y --no-install-recommends tini $(if ! [ "$DEVICE" = "openvino" ] && ! [ "$DEVICE" = "rocm" ]; then echo "libmimalloc2.0"; fi) && \ | ||
| apt-get autoremove -yqq && \ | ||
| apt-get clean && \ | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some changes in indentation as well as changes from double quote to single quote. Was this intended? I know it's from the first commit from the original PR but I don't think that was addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VS Code did this when I saved. I'm not sure why it's different
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a PR check that runs prettier on the workflow files? I would think the inconsistency exists because there likely isn't.