Why is my Qwen3 4B so tiny in the GPU #3537

wokalski · 2025-05-11T15:06:43Z

wokalski
May 11, 2025

Hi, I am deploying a non-quant Qwen3 4B using LMDeploy. When I load it with transformers on a GPU, with float16 dtype it takes around ~9100MB of VRAM.

I was a bit shocked to discover that that same model, after online turbomind conversion it is quite tiny. I ran the below method in the lmdeploy container. The weird "HAMI" logs are coming from hami

>>> torch.cuda.mem_get_info()
[HAMI-core Msg(145:139885289045440:libvgpu.c:856)]: Initialized
[HAMI-core Msg(145:139885289045440:memory.c:512)]: orig free=19635830784 total=34072559616 limit=4194304000 usage=2792686592
(1401617408, 4194304000)

Which means that this model is using just around 2663 MB of VRAM. How is it possible, what's going on here?

I don't know if it matters but I'm using a V100 here (cuda capability sm_70).

wokalski · 2025-05-11T15:11:29Z

wokalski
May 11, 2025
Author

Turns out, it is a problem with hami which is not constricting the memory here properly.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why is my Qwen3 4B so tiny in the GPU #3537

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why is my Qwen3 4B so tiny in the GPU #3537

Uh oh!

Uh oh!

wokalski May 11, 2025

Replies: 1 comment

Uh oh!

wokalski May 11, 2025 Author

wokalski
May 11, 2025

wokalski
May 11, 2025
Author