Please first refer to S2-S7 for deploying LMDeploy in Jetson.
Activate your conda environment.
conda activate lmdeployEnter the lmdeploy/benchmark directory.
cd ~/lmdeploy/benchmarkRun Benchmark.
python profile_generation.py \
<path/to/your/model>/internlm2-chat-1_8b-turbomind \
--concurrency 1 \
--prompt-tokens 128 \
--completion-tokens 128Replace internlm2 chat-1_8b turbomind with your model path.
Record the speed benchmark.
During the inference process, the unified memory usage can be viewed through the htop command.
# Install htop (if already installed, please ignore)
apt-get install htop
# Run htop to check the usage of Mem.
htop