Update README.md and QAnything Startup Usage

songkq · songkq · commit baa64fdedfa9 · 2024-02-02T15:34:03.000+08:00
diff --git a/README.md b/README.md
@@ -166,9 +166,7 @@ git clone https://github.com/netease-youdao/QAnything.git
 ### step2: Enter the project root directory and execute the startup script.
 * [📖 QAnything_Startup_Usage](docs/QAnything_Startup_Usage_README.md)
 * Get detailed usage of LLM interface by ```bash ./run.sh -h```
-  
 
-If you are in the Windows11 system: Need to enter the **WSL** environment.
 ```shell
 cd QAnything
 bash run.sh  # Start on GPU 0 by default.
@@ -186,14 +184,48 @@ huggingfase: https://huggingface.co/netease-youdao/QAnything
 </details>
 
 <details>
-<summary>(Optional) Specify GPU startup</summary>
+<summary>(Optional) Specify GPU startup </summary>
 
 ```shell
 cd QAnything
 bash ./run.sh -c local -i 0 -b default  # gpu id 0
 ```
 </details>
 
+<details>
+<summary>(Optional) Specify GPU startup - Recommended for Windows10/Windows11 WSL2 User</summary>
+
+```shell
+# For Windows OS: Need to enter the **WSL2** environment.
+# Step 1. Download the public LLM model (e.g., Qwen-7B-QAnything) and save to "/path/to/QAnything/assets/custom_models"
+# (Optional) Download Qwen-7B-QAnything from ModelScope: https://www.modelscope.cn/models/netease-youdao/Qwen-7B-QAnything
+# (Optional) Download Qwen-7B-QAnything from Huggingface: https://huggingface.co/netease-youdao/Qwen-7B-QAnything
+cd QAnything/assets/custom_models
+git clone https://huggingface.co/netease-youdao/Qwen-7B-QAnything
+
+# Step 2. Execute the service startup command. Here we use "-b hf" to specify the Huggingface transformers backend.
+cd ../../
+bash ./run.sh -c local -i 0 -b hf -m Qwen-7B-QAnything -t qwen-7b-qanything
+```
+</details>
+
+<details>
+<summary>(Optional) Specify GPU startup - Recommended for GPU Compute Capability >= 8.6 and VRAM >= 24GB</summary>
+
+```shell
+# GPU Compute Capability: https://developer.nvidia.com/cuda-gpus
+# Step 1. Download the public LLM model (e.g., Qwen-7B-QAnything) and save to "/path/to/QAnything/assets/custom_models"
+# (Optional) Download Qwen-7B-QAnything from ModelScope: https://www.modelscope.cn/models/netease-youdao/Qwen-7B-QAnything
+# (Optional) Download Qwen-7B-QAnything from Huggingface: https://huggingface.co/netease-youdao/Qwen-7B-QAnything
+cd QAnything/assets/custom_models
+git clone https://huggingface.co/netease-youdao/Qwen-7B-QAnything
+
+# Step 2. Execute the service startup command. Here we use "-b vllm" to specify the vllm backend.
+cd ../../
+bash ./run.sh -c local -i 0 -b vllm -m Qwen-7B-QAnything -t qwen-7b-qanything -p 1 -r 0.85
+```
+</details>
+
 <details>
 <summary>(Optional) Specify multi-GPU startup </summary>
 
diff --git a/README_zh.md b/README_zh.md
@@ -157,7 +157,6 @@ git clone https://github.com/netease-youdao/QAnything.git
 * [📖 QAnything_Startup_Usage](docs/QAnything_Startup_Usage_README.md)
 * 执行 ```bash ./run.sh -h``` 获取详细的LLM服务配置方法 
   
-如果在Windows系统下请先进入**WSL**环境
 ```shell
 cd QAnything
 bash run.sh  # 默认在0号GPU上启动
@@ -183,6 +182,40 @@ bash ./run.sh -c local -i 0 -b default # 指定0号GPU启动 GPU编号从0开始
 ```
 </details>
 
+<details>
+<summary>（可选）指定单GPU启动 - 推荐 Windows10/Windows11 WSL2 用户使用此方式运行 QAnything</summary>
+
+```shell
+# 注意: Windows系统请先进入**WSL2**环境
+# Step 1. 下载开源 LLM 模型 (e.g., Qwen-7B-QAnything) 并保存在路径 "/path/to/QAnything/assets/custom_models"
+# (可选) 从 ModelScope 下载 Qwen-7B-QAnything: https://www.modelscope.cn/models/netease-youdao/Qwen-7B-QAnything
+# (可选) 从 Huggingface 下载 Qwen-7B-QAnything: https://huggingface.co/netease-youdao/Qwen-7B-QAnything
+cd QAnything/assets/custom_models
+git clone https://huggingface.co/netease-youdao/Qwen-7B-QAnything
+
+# Step 2. 执行启动命令，其中"-b hf"表示指定使用 Huggingface transformers 后端运行 LLM.
+cd ../../
+bash ./run.sh -c local -i 0 -b hf -m Qwen-7B-QAnything -t qwen-7b-qanything
+```
+</details>
+
+<details>
+<summary>（可选）指定单GPU启动 - 推荐 GPU Compute Capability >= 8.6 && VRAM >= 24GB 使用此方式运行 QAnything</summary>
+
+```shell
+# 查看 GPU 算力 GPU Compute Capability: https://developer.nvidia.com/cuda-gpus
+# Step 1. 下载开源 LLM 模型 (e.g., Qwen-7B-QAnything) 并保存在路径 "/path/to/QAnything/assets/custom_models"
+# (可选) 从 ModelScope 下载 Qwen-7B-QAnything: https://www.modelscope.cn/models/netease-youdao/Qwen-7B-QAnything
+# (可选) 从 Huggingface 下载 Qwen-7B-QAnything: https://huggingface.co/netease-youdao/Qwen-7B-QAnything
+cd QAnything/assets/custom_models
+git clone https://huggingface.co/netease-youdao/Qwen-7B-QAnything
+
+# Step 2. 执行启动命令，其中"-b vllm"表示指定使用 vllm 后端运行 LLM.
+cd ../../
+bash ./run.sh -c local -i 0 -b vllm -m Qwen-7B-QAnything -t qwen-7b-qanything -p 1 -r 0.85
+```
+</details>
+
 <details>
 <summary>（可选）指定多GPU启动</summary>
 
diff --git a/docs/QAnything_Startup_Usage_README.md b/docs/QAnything_Startup_Usage_README.md
@@ -61,6 +61,8 @@ Note: You can choose the most suitable Service Startup Command based on your own
 #### 1.1 Run Qwen-7B-QAnything
 ```bash
 ## Step 1. Download the public LLM model (e.g., Qwen-7B-QAnything) and save to "/path/to/QAnything/assets/custom_models"
+## (Optional) Download Qwen-7B-QAnything from ModelScope: https://www.modelscope.cn/models/netease-youdao/Qwen-7B-QAnything
+## (Optional) Download Qwen-7B-QAnything from Huggingface: https://huggingface.co/netease-youdao/Qwen-7B-QAnything
 cd /path/to/QAnything/assets/custom_models
 git clone https://huggingface.co/netease-youdao/Qwen-7B-QAnything
 
@@ -88,10 +90,12 @@ bash ./run.sh -c local -i 0 -b hf -m MiniChat-2-3B -t minichat
 #### 2.1 Run Qwen-7B-QAnything
 ```bash
 ## Step 1. Download the public LLM model (e.g., Qwen-7B-QAnything) and save to "/path/to/QAnything/assets/custom_models"
+## (Optional) Download Qwen-7B-QAnything from ModelScope: https://www.modelscope.cn/models/netease-youdao/Qwen-7B-QAnything
+## (Optional) Download Qwen-7B-QAnything from Huggingface: https://huggingface.co/netease-youdao/Qwen-7B-QAnything
 cd /path/to/QAnything/assets/custom_models
 git clone https://huggingface.co/netease-youdao/Qwen-7B-QAnything
 
-## Step 2. Execute the service startup command.  Here we use "-b vllm" to specify the Huggingface transformers backend.
+## Step 2. Execute the service startup command.  Here we use "-b vllm" to specify the vllm backend.
 ## Here we use "-b vllm" to specify the vllm backend that will do bf16 inference as default.
 ## Note you should adjust the gpu_memory_utilization yourself according to the model size to avoid out of memory (e.g., gpu_memory_utilization=0.81 is set default for 7B. Here, gpu_memory_utilization is set to 0.85 by "-r 0.85").
 cd /path/to/QAnything