Skip to content

Conversation

@yoni13
Copy link
Contributor

@yoni13 yoni13 commented Jan 11, 2025

Goals: ML on Rockchip NPUs.
Testing on board: #13243 (reply in thread)

TODO:

  • It's works on my OrangePi 3B (RK3566).
  • Build Docker images.
  • Build models for more SOCs.
  • Able to set thread numbers for each model type (e.g. visual -> 2 threads) via environment variables.
  • NPU core masks for RK3576/3588
  • Decide model path (immich-app/ViT-B-32__openai/textual/rk3566/model.rknn) .
  • Export script that accepts CLI arguments for the Immich model name and SoC, then exports the model.
  • Able to maximize NPU usage by using rknnpool.py
  • Documentation.
  • Write tests.
  • Model on Huggingface.
  • Make Docker images work out of the box (without downloading models manually).

Nice to have:

  • Test on PC and other arm-based boards to ensure it doesn't break something.
  • Rebase my commits (sorry for ugly commit messages).
  • Test if it's working on RK3588 (I don't have one).
  • Support more models.

#13243

@yoni13
Copy link
Contributor Author

yoni13 commented Jan 11, 2025

Docker launch command:

docker run --security-opt systempaths=unconfined --security-opt apparmor=unconfined --device /dev/dri --device /dev/dma_heap --device /dev/rga --device /dev/mpp_service -v /cache:/cache:ro  -p 3004:3003 -v /sys/kernel/debug/:/sys/kernel/debug/:ro --name rknnimmich_name -d rknnimmich

and it works (if you download model to cache ofc)

ViT-B-32 and buffalo_l are loaded with two threads rerunning jobs: 2.7G, and peak 3.5G RAM. (its like running 4 models at the same time)
Update: this statistic is before we only load onnx when required, will update mem usage when I got time

@mertalev
Copy link
Member

I've uploaded the facial recognition models as well as some new SigLIP2 models to HF. I can upload the rest after we confirm everything works as expected with these models.

@yoni13
Copy link
Contributor Author

yoni13 commented Mar 14, 2025

I was having issues with GH codespaces while trying to fix tests yesterday, can you take a look? 😞

@NicholasFlamy
Copy link
Collaborator

FWIW it only uses 600MiB if rknn_threads is set to 1, or 970MiB if set to 2.

Is it loading the model once per thread or something?

@todorangrg
Copy link

todorangrg commented Mar 15, 2025

so I'm running the build on my rk3588 library and I'm honestly sold for rknn > armnn:

  • I can use latest models (cool you added them @mertalev!): ViT-SO400M-16-SigLIP2-512__webli -- that's a 1.43GB textual, 0.89GB visual model
  • both textual and visual take about 2x in ram (~2.8GB textual, 1.8GB visual -- single threaded) -- here I was hoping scaling to larger models would be "better" i.e. less than 2x ram 🤷‍♂️
  • NPU_CORE_AUTO does what it says, new inference is executed on only one core, and one free is chosen
  • NPU_CORE_AUTO around 5.5sec / image
  • NPU_CORE_0_1_2 uses all cores but only one at max as was reported above. For this (big) model: 99%, 25%, 25%; around 4sec/image
  • CPU usage is a joke for this large model, <5%
  • I do get this error on loading but things still work (visual happens, I can textual query and results are sensible)
I RKNN: [00:39:45.216] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (c949ad889d@2024-11-07T11:39:30)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [00:39:46.193] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
[03/15/25 00:39:46] I        Loaded RKNN model from                             
                             /cache/clip/ViT-SO400M-16-SigLIP2-512__webli/visual
                             /rknpu/rk3588/model.rknn with 1 threads. 
  • this spams continuously during inference
W The input[0] need NHWC data format, but NCHW set, the data format and data buffer will be changed to NHWC.

Some more remarks:

  • the fact that this only uses NPU and RAM and all the other parts of the SoC are basically free (even the CPU) is great. You can really run those workloads while your machine does business as usual
  • based on above, I'm fine if processing my entire library takes few days (>15k photos / day). I do it anyway only when I change model. This means I can use really big models, RAM is the limit.
  • I checked a bit the exporter doc, I guess some things would still be interesting to be checked that weren't mentioned:
    • rknn_batch_size: if model size does not increase much and performance improvement is significant, could ship only batch>1 visual models (e.g. 2 or 3)
    • single_core_mode: how much does this decrease the model? given that NPU_CORE_0_1_2 isn't a significant improvement (and it's also not configurable now), might be good to make standard
    • remove_weight: the way I read it seems more instances could be loaded that reference weights from the "master" model. Didnt find any example on the internet of how this actually works

Anyway -- this is great!

@yoni13
Copy link
Contributor Author

yoni13 commented Mar 15, 2025

I was thinking to change its data format before sending it to rknn to quiet the "the data format will be changed" warning
But for the "dynamic range" one , I have no idea.

@todorangrg
Copy link

todorangrg commented Mar 15, 2025

I was thinking to change its data format before sending it to rknn to quiet the "the data format will be changed" warning But for the "dynamic range" one , I have no idea.

Wonder if anyone knows if the conversion actually works as expected, besides the "seems to be reasonable" empirical observation. Like comparing outputs between onnix and rknn for the same inputs, or something similar.

Exporter has an option for dynamic inputs, I guess exporting with that would silence the dynamic range part? Also, maybe exporter has option for changing format?

@mertalev
Copy link
Member

I haven't compared the raw model outputs between ONNX and RKNN. There's a small inherent difference in fp16 vs fp32, but since the results look good and there's no quantization involved, I didn't probe deeper.

Setting dynamic inputs wouldn't fix that warning because the warning isn't related to the batch dimension but the layout (NCHW vs NHWC).

Sending NHWC inputs would be one way of getting rid of the warnings. It might be a bit more efficient than how that conversion is handled now, because it currently allocates new arrays to make the inputs contiguous, then the engine internally reorders those arrays. If you permute the arrays before making them contiguous, you'd be fusing those two operations together.

@mertalev
Copy link
Member

So as it turns out, RKNN will permute the input array regardless, so if I permute it beforehand then it'll just mess it up by doing it a second time. Setting the log level to error does make those warning logs go away though.

@mertalev mertalev merged commit 14c3b99 into immich-app:main Mar 17, 2025
38 checks passed
@yoni13
Copy link
Contributor Author

yoni13 commented Mar 25, 2025

XLM-Roberta-Base-ViT-B-32__laion5b_s13b_b90k

E RKNN: [00:39:42.701] Unsupport CPU op: CumSum in this librknnrt.so, please try to register custom op by calling rknn_register_custom_ops or If using rknn, update to the latest toolkit2 and runtime from: https://console.zbox.filez.com/l/I00fc3 (PWD: rknn). If using rknn-llm, update from: https://github.com/airockchip/rknn-llm

Custom OP Example
I forgot to add unsupported custom ops check after refactor 🫠
Or i we can try add this "CumSum" missing op

By the way, should I open a new issue or we can just start from this thread?

@mertalev
Copy link
Member

It'd be better to open a new issue.

savely-krasovsky pushed a commit to savely-krasovsky/immich that referenced this pull request Jun 8, 2025
@ITCJ
Copy link

ITCJ commented Jul 18, 2025

nice work, great thanks for your contribution. May I ask is there any doc about convert model to run on rockchip?

@yoni13
Copy link
Contributor Author

yoni13 commented Jul 18, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants