You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Step 2. Execute the service startup command. Here we use "-b vllm" to specify the Huggingface transformers backend.
98
+
## Step 2. Execute the service startup command. Here we use "-b vllm" to specify the vllm backend.
95
99
## Here we use "-b vllm" to specify the vllm backend that will do bf16 inference as default.
96
100
## Note you should adjust the gpu_memory_utilization yourself according to the model size to avoid out of memory (e.g., gpu_memory_utilization=0.81 is set default for 7B. Here, gpu_memory_utilization is set to 0.85 by "-r 0.85").
0 commit comments