feat(ml): ML on Rockchip NPUs #15241

yoni13 · 2025-01-11T03:04:45Z

Goals: ML on Rockchip NPUs.
Testing on board: #13243 (reply in thread)

TODO:

Nice to have:

Test on PC and other arm-based boards to ensure it doesn't break something.
Rebase my commits (sorry for ugly commit messages).
Test if it's working on RK3588 (I don't have one).
Support more models.

yoni13 · 2025-01-11T13:32:04Z

Docker launch command:

docker run --security-opt systempaths=unconfined --security-opt apparmor=unconfined --device /dev/dri --device /dev/dma_heap --device /dev/rga --device /dev/mpp_service -v /cache:/cache:ro  -p 3004:3003 -v /sys/kernel/debug/:/sys/kernel/debug/:ro --name rknnimmich_name -d rknnimmich

and it works (if you download model to cache ofc)

~~ViT-B-32 and buffalo_l are loaded with two threads rerunning jobs: 2.7G, and peak 3.5G RAM. (its like running 4 models at the same time)~~
Update: this statistic is before we only load onnx when required, will update mem usage when I got time

mertalev · 2025-03-13T22:13:44Z

I've uploaded the facial recognition models as well as some new SigLIP2 models to HF. I can upload the rest after we confirm everything works as expected with these models.

yoni13 · 2025-03-14T02:53:54Z

I was having issues with GH codespaces while trying to fix tests yesterday, can you take a look? 😞

NicholasFlamy · 2025-03-15T00:01:35Z

FWIW it only uses 600MiB if rknn_threads is set to 1, or 970MiB if set to 2.

Is it loading the model once per thread or something?

todorangrg · 2025-03-15T00:42:09Z

so I'm running the build on my rk3588 library and I'm honestly sold for rknn > armnn:

I can use latest models (cool you added them @mertalev!): ViT-SO400M-16-SigLIP2-512__webli -- that's a 1.43GB textual, 0.89GB visual model
both textual and visual take about 2x in ram (~2.8GB textual, 1.8GB visual -- single threaded) -- here I was hoping scaling to larger models would be "better" i.e. less than 2x ram 🤷‍♂️
NPU_CORE_AUTO does what it says, new inference is executed on only one core, and one free is chosen
NPU_CORE_AUTO around 5.5sec / image
NPU_CORE_0_1_2 uses all cores but only one at max as was reported above. For this (big) model: 99%, 25%, 25%; around 4sec/image
CPU usage is a joke for this large model, <5%
I do get this error on loading but things still work (visual happens, I can textual query and results are sensible)

I RKNN: [00:39:45.216] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (c949ad889d@2024-11-07T11:39:30)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [00:39:46.193] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
[03/15/25 00:39:46] I        Loaded RKNN model from                             
                             /cache/clip/ViT-SO400M-16-SigLIP2-512__webli/visual
                             /rknpu/rk3588/model.rknn with 1 threads.

this spams continuously during inference

W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.

Some more remarks:

the fact that this only uses NPU and RAM and all the other parts of the SoC are basically free (even the CPU) is great. You can really run those workloads while your machine does business as usual
based on above, I'm fine if processing my entire library takes few days (>15k photos / day). I do it anyway only when I change model. This means I can use really big models, RAM is the limit.
I checked a bit the exporter doc, I guess some things would still be interesting to be checked that weren't mentioned:
- rknn_batch_size: if model size does not increase much and performance improvement is significant, could ship only batch>1 visual models (e.g. 2 or 3)
- single_core_mode: how much does this decrease the model? given that NPU_CORE_0_1_2 isn't a significant improvement (and it's also not configurable now), might be good to make standard
- remove_weight: the way I read it seems more instances could be loaded that reference weights from the "master" model. Didnt find any example on the internet of how this actually works

Anyway -- this is great!

yoni13 · 2025-03-15T01:00:16Z

I was thinking to change its data format before sending it to rknn to quiet the "the data format will be changed" warning
But for the "dynamic range" one , I have no idea.

todorangrg · 2025-03-15T01:11:28Z

I was thinking to change its data format before sending it to rknn to quiet the "the data format will be changed" warning But for the "dynamic range" one , I have no idea.

Wonder if anyone knows if the conversion actually works as expected, besides the "seems to be reasonable" empirical observation. Like comparing outputs between onnix and rknn for the same inputs, or something similar.

Exporter has an option for dynamic inputs, I guess exporting with that would silence the dynamic range part? Also, maybe exporter has option for changing format?

mertalev · 2025-03-15T02:14:25Z

I haven't compared the raw model outputs between ONNX and RKNN. There's a small inherent difference in fp16 vs fp32, but since the results look good and there's no quantization involved, I didn't probe deeper.

Setting dynamic inputs wouldn't fix that warning because the warning isn't related to the batch dimension but the layout (NCHW vs NHWC).

Sending NHWC inputs would be one way of getting rid of the warnings. It might be a bit more efficient than how that conversion is handled now, because it currently allocates new arrays to make the inputs contiguous, then the engine internally reorders those arrays. If you permute the arrays before making them contiguous, you'd be fusing those two operations together.

mertalev · 2025-03-15T03:39:57Z

So as it turns out, RKNN will permute the input array regardless, so if I permute it beforehand then it'll just mess it up by doing it a second time. Setting the log level to error does make those warning logs go away though.

yoni13 · 2025-03-25T16:47:57Z

XLM-Roberta-Base-ViT-B-32__laion5b_s13b_b90k

E RKNN: [00:39:42.701] Unsupport CPU op: CumSum in this librknnrt.so, please try to register custom op by calling rknn_register_custom_ops or If using rknn, update to the latest toolkit2 and runtime from: https://console.zbox.filez.com/l/I00fc3 (PWD: rknn). If using rknn-llm, update from: https://github.com/airockchip/rknn-llm

Custom OP Example
I forgot to add unsupported custom ops check after refactor 🫠
Or i we can try add this "CumSum" missing op

By the way, should I open a new issue or we can just start from this thread?

mertalev · 2025-03-25T16:58:02Z

It'd be better to open a new issue.

ITCJ · 2025-07-18T12:40:26Z

nice work, great thanks for your contribution. May I ask is there any doc about convert model to run on rockchip？

yoni13 · 2025-07-18T12:54:41Z

https://github.com/immich-app/ml-models 糖橙结 ***@***.***> 於 2025年7月18日週五下午8:40 寫道：

…

*ITCJ* left a comment (immich-app/immich#15241) <#15241 (comment)> nice work, great thanks for your contribution — Reply to this email directly, view it on GitHub <#15241 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASF5CKAH3DMJ3267GU5YPWT3JDTNBAVCNFSM6AAAAABU7QYIGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTAOBZGM2TSNBYGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

yoni13 and others added 16 commits November 29, 2024 07:42

untested

8ef3e49

test

6ffc227

lowercase

7fddf28

ViT-B-32__openai/textual/ Runs with emulator now.

bc849e2

Merge branch 'immich-app:main' into rknn-toolkit2

b6c4b37

Merge branch 'immich-app:main' into rknn-toolkit-lite2

4140e93

Init commit for using rknn, RecognitionFormDataLoadTest doesnt work

257cc6c

Merge branch 'immich-app:main' into rknn-toolkit-lite2

da152bd

Merge branch 'immich-app:main' into rknn-toolkit-lite2

082c426

all infrencing works with 1 max job concurrency

a94fad5

Merge branch 'immich-app:main' into rknn-toolkit-lite2

8608b9c

Update rknn.py

9bc3e5b

fix inf,-inf with 2 concurrency

4d704e9

Revert my changes to dockerfiles

a2722e1

support for rknn.rknnpool.is_available

c20d110

Merge branch 'immich-app:main' into rknn-toolkit-lite2

66004e3

github-actions bot added the 🧠machine-learning label Jan 11, 2025

yoni13 added 12 commits January 11, 2025 15:19

Handling Import and file not found Error for non-arm devices.

d10147f

Set group RKNN to optional

d5ef821

Dockerfile for rknn

506ca0d

Remove unused imports.

7aaf3aa

Indentation issue

f4671f4

Fix typo: rknnlite.api

7f2af6f

ruff format

d5e453a

ruff

23d0ea0

Check if NPU drivers is loaded or not.

4162119

Install onnxruntime

815ed1a

Should Fix No module named 'rknn'

807111e

add rknn to src

665718b

mertalev requested a review from danieldietzler as a code owner March 13, 2025 22:45

github-actions bot added the 🗄️server label Mar 13, 2025

mertalev added 7 commits March 14, 2025 14:29

fix retry

0f1a551

linting

bd9374e

more linting

4db85a8

remove unused import

7fc4760

fix

fedbecc

update tests

1785d09

comparison with arm nn in docs

678cb89

mertalev approved these changes Mar 14, 2025

View reviewed changes

mertalev added 2 commits March 14, 2025 16:49

clarify throughput vs latency

a9508fc

formatting

bc9bec8

set log level

76eb285

mertalev added 3 commits March 15, 2025 00:07

add rk3568

9e689e8

organize imports

f2a98ab

update docs

338f8dc

mertalev merged commit 14c3b99 into immich-app:main Mar 17, 2025
38 checks passed

savely-krasovsky pushed a commit to savely-krasovsky/immich that referenced this pull request Jun 8, 2025

feat(ml): ML on Rockchip NPUs (immich-app#15241)

7e05b9c

Uh oh!

feat(ml): ML on Rockchip NPUs #15241

feat(ml): ML on Rockchip NPUs #15241

Uh oh!

Conversation

yoni13 commented Jan 11, 2025 • edited by mertalev Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoni13 commented Jan 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mertalev commented Mar 13, 2025

Uh oh!

yoni13 commented Mar 14, 2025

Uh oh!

NicholasFlamy commented Mar 15, 2025

Uh oh!

todorangrg commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoni13 commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

todorangrg commented Mar 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mertalev commented Mar 15, 2025

Uh oh!

mertalev commented Mar 15, 2025

Uh oh!

Uh oh!

yoni13 commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mertalev commented Mar 25, 2025

Uh oh!

ITCJ commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yoni13 commented Jul 18, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

yoni13 commented Jan 11, 2025 •

edited by mertalev

Loading

yoni13 commented Jan 11, 2025 •

edited

Loading

todorangrg commented Mar 15, 2025 •

edited

Loading

yoni13 commented Mar 15, 2025 •

edited

Loading

todorangrg commented Mar 15, 2025 •

edited

Loading

yoni13 commented Mar 25, 2025 •

edited

Loading

ITCJ commented Jul 18, 2025 •

edited

Loading