docs(book-flight-ai-agent): explain qwq default and add fallback options (#1099)

Harry33t · web-flow · commit 6291e583ab12 · 2026-05-10T13:47:38.000+08:00
The default LLM_MODEL=qwq was chosen for multimodal support: the chat protocol carries an optional image (`bytes bin` in proto/chat.proto) and the front-end allows uploading a picture together with the message. qwq was the smallest Ollama-distributed model combining ReAct-quality reasoning with multimodal input at the time the sample was written. Without that context in the README, users assumed qwq was an arbitrary heavyweight choice and tried to swap in much smaller models like qwen3:4b, which cannot follow the ReAct prompt and route every request to TaskUnrelated — making it look like the search tool is broken. Update both README.md and README_zh.md (no code or .env change): - Add a "Model Recommendations" section that explains why qwq is the default and that its Ollama footprint is modest in practice. - Provide a tiered table of text-only alternatives (qwen2.5:14b / qwen2.5:7b) for users who don't need multimodal. - Document the unit-test fallback (`go test ./go-server/tools/ bookingflight/...`) for users who can't run any LLM locally. - Add a troubleshooting tip pointing out that models <~7B tend to misclassify the task, so the right first step is to try a larger model rather than digging into the code. Closes #1085
diff --git a/book-flight-ai-agent/README.md b/book-flight-ai-agent/README.md
@@ -12,7 +12,7 @@ Modify the configuration file and copy `book-flight-ai-agent/.env.example` to `b
 
 ```ini
 # LLM Settings
-LLM_MODEL = qwq # Ollama model name
+LLM_MODEL = qwq # Ollama model name (see "Model Recommendations" below)
 LLM_URL = "http://127.0.0.1:11434" # Ollama URL, fill in Ollama service address
 LLM_API_KEY = "sk-..." # API key
 
@@ -27,6 +27,34 @@ TIMEOUT_SECOND = 300 # Timeout
 
 **Note**: Currently only models deployed in Ollama mode
 
+#### Model Recommendations
+
+The default model is **`qwq`**. This sample is designed to demonstrate a multimodal agent — the chat protocol carries an optional image (`bytes bin` in `proto/chat.proto`), and the front-end allows uploading a picture together with the message. `qwq` was chosen as the default because, at the time the sample was written, it was the smallest Ollama-distributed model that combined ReAct-quality reasoning with multimodal input. In practice, its actual VRAM/RAM usage in Ollama is modest for many consumer machines.
+
+If you do **not** need the multimodal capability and just want to exercise the agent on text-only flight queries, you can swap to a smaller text-only model. Be aware that the agent's tool selection still requires solid instruction-following, so going too small will make the model misclassify the request as `TaskUnrelated` and never invoke the search tool.
+
+| Use case | Suggested model | Approx. footprint |
+|----------|------------------|-------------------|
+| **Multimodal** (default, recommended) | `qwq` | Modest in Ollama, runs fine on consumer GPUs/CPU |
+| Text-only, multi-turn quality | `qwen2.5:14b` | ~8 GB |
+| Text-only, lighter machine | `qwen2.5:7b` | ~4 GB |
+
+Pull the model before starting the server:
+
+```shell
+$ ollama pull qwq          # default
+# or, for text-only testing:
+$ ollama pull qwen2.5:7b
+```
+
+If you can't run any LLM locally and just want to verify the booking tools, run the unit tests directly:
+
+```shell
+$ go test ./go-server/tools/bookingflight/... -v
+```
+
+> **Tip**: Models smaller than ~7B (e.g. `qwen3:4b`) tend to misclassify the user's intent and route to `TaskUnrelated`, so the agent never invokes the search tool. The code itself is fine — the issue is purely the model's instruction-following ability. If you see the agent refuse to call any tool, try a larger model first before debugging the code.
+
 ### 3. Run the example
 
 First, enter the `book-flight-ai-agent` directory.
diff --git a/book-flight-ai-agent/README_zh.md b/book-flight-ai-agent/README_zh.md
@@ -12,7 +12,7 @@
 
 ```ini
 # LLM 设置
-LLM_MODEL = qwq                     # Ollama 模型名称
+LLM_MODEL = qwq                     # Ollama 模型名称（详见下方"模型选择建议"）
 LLM_URL = "http://127.0.0.1:11434"  # Ollama 的 URL，填写 Ollama 的服务地址
 LLM_API_KEY = "sk-..."              # API key
 
@@ -27,6 +27,34 @@ TIMEOUT_SECONDS = 300               # 超时时间
 
 **注意**：目前仅 Ollama 方式部署的模型
 
+#### 模型选择建议
+
+默认模型是 **`qwq`**。本示例的设计目标是演示**多模态 Agent**：聊天协议带了可选的图片字段（`proto/chat.proto` 里的 `bytes bin`），前端也允许在消息里附图。`qwq` 是当时 Ollama 上能找到的、**同时满足 ReAct 质量和多模态输入**的最小模型；它在 Ollama 中的实际显存/内存占用对多数消费级机器并不算重。
+
+如果你**不需要多模态能力**，只想用纯文本查询机票，可以换成更小的纯文本模型。但要注意：Agent 的工具选择对模型的指令遵循能力要求较高，模型过小会导致请求被误判为 `TaskUnrelated`，agent 根本不调用查询工具。
+
+| 使用场景 | 推荐模型 | 大致占用 |
+|---------|---------|---------|
+| **多模态**（默认，推荐） | `qwq` | Ollama 下并不重，普通显卡/CPU 都能跑 |
+| 纯文本，多轮推理质量优先 | `qwen2.5:14b` | ~8 GB |
+| 纯文本，机器配置较低 | `qwen2.5:7b` | ~4 GB |
+
+启动服务端前先拉模型：
+
+```shell
+$ ollama pull qwq          # 默认
+# 或者，做纯文本测试时：
+$ ollama pull qwen2.5:7b
+```
+
+如果本地完全跑不了 LLM，但又想验证查询/购买工具是否正确，可以直接跑单元测试：
+
+```shell
+$ go test ./go-server/tools/bookingflight/... -v
+```
+
+> **提示**：小于约 7B 的模型（例如 `qwen3:4b`）经常把用户意图识别成 `TaskUnrelated`，导致 agent 根本不调查询工具。这是模型指令遵循能力不足，跟代码无关。看到 agent 不调任何工具时，先试试换更大的模型，再去查代码。
+
 ### 3. 运行示例
 
 首先，进入 `book-flight-ai-agent` 目录.