You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**OffloadingConnector**: enable offloading of KV data to CPU memory, customizing the CPU block size (in tokens) and number of blocks to allocate (per worker):
Copy file name to clipboardExpand all lines: docs/features/tool_calling.md
+9Lines changed: 9 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -319,6 +319,15 @@ Supported models:
319
319
320
320
Flags: `--tool-call-parser glm45`
321
321
322
+
### Qwen3-Coder Models (`qwen3_xml`)
323
+
324
+
Supported models:
325
+
326
+
*`Qwen/Qwen3-480B-A35B-Instruct`
327
+
*`Qwen/Qwen3-Coder-30B-A3B-Instruct`
328
+
329
+
Flags: `--tool-call-parser qwen3_xml`
330
+
322
331
### Models with Pythonic Tool Calls (`pythonic`)
323
332
324
333
A growing number of models output a python list to represent tool calls instead of using JSON. This has the advantage of inherently supporting parallel tool calls and removing ambiguity around the JSON schema required for tool calls. The `pythonic` tool parser can support such models.
Copy file name to clipboardExpand all lines: docs/serving/expert_parallel_deployment.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -193,7 +193,7 @@ For production deployments requiring strict SLA guarantees for time-to-first-tok
193
193
194
194
1.**Install gdrcopy/ucx/nixl**: For maximum performance, run the [install_gdrcopy.sh](gh-file:tools/install_gdrcopy.sh) script to install `gdrcopy` (e.g., `install_gdrcopy.sh "${GDRCOPY_OS_VERSION}" "12.8" "x64"`). You can find available OS versions [here](https://developer.download.nvidia.com/compute/redist/gdrcopy/CUDA%2012.8/). If `gdrcopy` is not installed, things will still work with a plain `pip install nixl`, just with lower performance. `nixl` and `ucx` are installed as dependencies via pip.
195
195
196
-
2.**Configure Both Instances**: Add this flag to both prefill and decode instances `--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}`
196
+
2.**Configure Both Instances**: Add this flag to both prefill and decode instances `--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both"}`. Noted, you may also specify one or multiple NIXL_Backend. Such as: `--kv-transfer-config '{"kv_connector":"NixlConnector","kv_role":"kv_both", "kv_connector_extra_config":{"backend":["UCX", "GDS"]}'`
197
197
198
198
3.**Client Orchestration**: Use the client-side script below to coordinate prefill/decode operations. We are actively working on routing solutions.
0 commit comments