-
Notifications
You must be signed in to change notification settings - Fork 2k
[TRTLLM-7182][test] add multi-nodes test for disagg-serving #7470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TRTLLM-7182][test] add multi-nodes test for disagg-serving #7470
Conversation
📝 WalkthroughWalkthroughAdds SLURM/MPI-aware multi-node disaggregated-serving tests and supporting test utilities: an integration trigger, a new unit test module coordinating context/gen/disagg servers across nodes, extensions to test server launchers (host/env/rank/extra-config and RemoteDisaggOpenAIServer), and low-level networking/SLURM nodelist helpers; plus a tiny formatting cleanup. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant PyTest as PyTest (rank 0 / pytest node)
participant CtxSrv as Context Server (8001)
participant GenSrv as Generation Server (8002)
participant DisaggSrv as Disagg Server (8000)
rect rgba(220,235,255,0.6)
note right of PyTest: Launch & config assembly (rank 0)
PyTest->>CtxSrv: start RemoteOpenAIServer(ctx, host/env/rank, extra_config)
PyTest->>GenSrv: start RemoteOpenAIServer(gen, host/env/rank, extra_config)
PyTest->>DisaggSrv: start RemoteDisaggOpenAIServer(config with ctx/gen URLs)
DisaggSrv-->>PyTest: health check → ready
end
rect rgba(235,255,220,0.6)
note right of DisaggSrv: Disaggregated request flow
PyTest->>DisaggSrv: completions.create(model, prompt)
DisaggSrv->>CtxSrv: request context / prefill
CtxSrv-->>DisaggSrv: context response
DisaggSrv->>GenSrv: generation request (with context)
GenSrv-->>DisaggSrv: generated tokens/text
DisaggSrv-->>PyTest: aggregated completion response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (6)
tests/unittest/llmapi/apps/openai_server.py (3)
52-72: Non-zero ranks sleep MAX_SERVER_START_WAIT_S seconds unnecessarily.Else branch sleeps for the full timeout (600s) then breaks, slowing tests. Consider a short sleep (e.g., 1–2s) or immediate break for non-zero ranks.
- else: - time.sleep(timeout) - break + else: + time.sleep(1.0) + break
25-27: Cast SLURM_PROCID to int to fix rank detection.Comparing a string env var to int 0 makes rank-0 logic skip the health check.
- self.rank = os.environ.get("SLURM_PROCID", 0) + self.rank = int(os.environ.get("SLURM_PROCID", "0"))
1-1: Missing NVIDIA header.Per repository guidelines, prepend the NVIDIA copyright header.
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (3)
9-14: SLURM_NODELIST parsing is simplistic and HOSTNAME debug prints are broken.
- NODE_LIST from SLURM often uses bracket notation; naive split can mislead.
- Debug print references HOSTNAME which doesn’t exist.
At minimum, fix the print:
- print(f"start worker NODE_LIST: {NODE_LIST} {RANK} {HOSTNAME}") + print(f"start worker NODE_LIST: {NODE_LIST} RANK={RANK} HOST={socket.gethostname()}")For robustness, consider expanding SLURM_NODELIST (e.g., via scontrol show hostnames) if available; otherwise document the expected format.
93-116: Overly strict output assertion; likely to flake.With max_tokens=1, most models return "2" (or lowercase "two"). Assert a tolerant match.
- assert message.content == 'Two' + assert message.content.strip().lower() in {'2', 'two'}
43-56: Avoid hard-coded ports to reduce flakes.8000/8001/8002 may collide in CI. Consider using find_free_port() and pass ports through fixtures to compose URLs.
If you want, I can draft a small helper fixture to allocate and share ports.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
tests/integration/defs/test_e2e.py(1 hunks)tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py(1 hunks)tests/unittest/llmapi/apps/openai_server.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Filenames compiled into a target must be case-insensitively unique
Files:
tests/unittest/llmapi/apps/openai_server.pytests/integration/defs/test_e2e.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
**/*.{h,hpp,hh,hxx,cc,cpp,cxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use spaces, not tabs; indent 4 spaces
Files:
tests/unittest/llmapi/apps/openai_server.pytests/integration/defs/test_e2e.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs (Python)
Maintain module namespace on import: prefer from package.subpackage import foo; use foo.Symbol()
Python filenames use snake_case
Python class names use PascalCase
Python functions and methods use snake_case
Python local variables use snake_case; if starting with a number concept, prefix with k (e.g., k_99th_percentile)
Python global variables use G_ prefix with UPPER_SNAKE_CASE
Python constants use UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes
Initialize all externally visible class members in init
For public interfaces, prefer docstrings over comments; comments should be for in-function or file-local interfaces
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes and variables inline with docstrings immediately after assignment
Avoid reflection when a non-reflective approach suffices
Limit except clauses to specific exceptions where possible
When using try/except for duck-typing, keep try body minimal and move logic to else
Files:
tests/unittest/llmapi/apps/openai_server.pytests/integration/defs/test_e2e.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
**/*.{cpp,cc,cxx,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/unittest/llmapi/apps/openai_server.pytests/integration/defs/test_e2e.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/integration/defs/test_e2e.py
🧬 Code graph analysis (3)
tests/unittest/llmapi/apps/openai_server.py (1)
tensorrt_llm/llmapi/mpi_session.py (1)
find_free_port(451-454)
tests/integration/defs/test_e2e.py (1)
tests/integration/defs/conftest.py (3)
llm_root(180-181)llm_venv(707-723)unittest_path(90-91)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (1)
tests/unittest/llmapi/apps/openai_server.py (4)
RemoteDisaggOpenAIServer(92-132)RemoteOpenAIServer(15-89)get_client(80-84)get_async_client(86-89)
🪛 Ruff (0.12.2)
tests/unittest/llmapi/apps/openai_server.py
112-112: Undefined name tempfile
(F821)
121-121: Undefined name yaml
(F821)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
44-44: Undefined name HOSTNAME
(F821)
115-115: Undefined name time
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
fc0d763 to
4d7259d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (5)
tests/unittest/llmapi/apps/openai_server.py (1)
92-118: Fix broken extra-config handling: undefined helper, wrong temp file, missing imports.current code can’t start the server; it references self.extra_config_file (undefined), writes YAML to a different temp file, and lacks yaml/tempfile imports.
Apply:
@@ -class RemoteDisaggOpenAIServer(RemoteOpenAIServer): +class RemoteDisaggOpenAIServer(RemoteOpenAIServer): @@ - self.extra_config_file = self._get_extra_config_file() + # Write a durable temp config file and keep its path for the child process. + self.extra_config_path = self._write_extra_config_file() @@ - launch_cmd = [ + launch_cmd = [ "trtllm-serve", "disaggregated", "--host", f"{self.host}", "--port", - f"{self.port}", "--extra-config", self.extra_config_file.name + f"{self.port}", "--extra-config", self.extra_config_path ] @@ - with tempfile.NamedTemporaryFile() as f: - f.write(self._get_extra_config()) - self.proc = subprocess.Popen(launch_cmd, - stdout=sys.stdout, - stderr=sys.stderr) + self.proc = subprocess.Popen(launch_cmd, + stdout=sys.stdout, + stderr=sys.stderr) @@ def _get_extra_config(self): return yaml.dump({ @@ "hostname": self.host, }) + + def _write_extra_config_file(self) -> str: + # Create a durable temp file so the spawned process can read it. + f = tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) + try: + f.write(self._get_extra_config()) + f.flush() + return f.name + finally: + f.close() + + def __exit__(self, exc_type, exc_value, traceback): + try: + os.remove(getattr(self, "extra_config_path", "")) + except Exception: + pass + return super().__exit__(exc_type, exc_value, traceback)And add missing imports near the top:
import os import subprocess import sys import time +import tempfile +import yamltests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (4)
1-4: Add NVIDIA header and missing import.Header is required; time is used later.
+ # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # SPDX-License-Identifier: Apache-2.0 import os + import time
42-56: Bind workers to 0.0.0.0 so other nodes can reach them.Current workers inherit "localhost" from RemoteOpenAIServer, breaking cross-node connectivity.
- args = ["--tp_size", str(tp_size), "--pp_size", str(pp_size)] + args = ["--host", "0.0.0.0", "--tp_size", str(tp_size), "--pp_size", str(pp_size)] with RemoteOpenAIServer(model_path, port=8001, cli_args=args) as server: yield server @@ - args = ["--tp_size", str(tp_size), "--pp_size", str(pp_size)] + args = ["--host", "0.0.0.0", "--tp_size", str(tp_size), "--pp_size", str(pp_size)] with RemoteOpenAIServer(model_path, port=8002, cli_args=args) as server: yield server
59-67: Fix NODE_LIST logic and selection of ctx/gen hosts.Using worker.host ("localhost") and list.remove() (returns None) breaks host selection.
- if NODE_RANK == 0 and RANK == 0: - ctx_host = worker.host - assert len(NODE_LIST) == 2 - assert ctx_host in NODE_LIST - print(f"ctx_host: {ctx_host} {NODE_LIST}") - gen_host = NODE_LIST.remove(ctx_host)[0] + if NODE_RANK == 0 and RANK == 0: + nodes = [n for n in NODE_LIST if n] + assert len(nodes) >= 2, f"Expected at least 2 nodes, got: {nodes}" + ctx_host, gen_host = nodes[0], nodes[1] + print(f"ctx_host={ctx_host}, gen_host={gen_host}") ctx_url = f"http://{ctx_host}:8001" gen_url = f"http://{gen_host}:8002"
76-89: Gate client fixtures by both NODE_RANK and RANK; guard None.Prevents AttributeError when disagg_server is None.
def client(disagg_server: RemoteDisaggOpenAIServer): - if NODE_RANK == 0: - return disagg_server.get_client() + if NODE_RANK == 0 and RANK == 0 and disagg_server is not None: + return disagg_server.get_client() else: return None @@ def async_client(disagg_server: RemoteDisaggOpenAIServer): - if NODE_RANK == 0: - return disagg_server.get_async_client() + if NODE_RANK == 0 and RANK == 0 and disagg_server is not None: + return disagg_server.get_async_client() else: return None
🧹 Nitpick comments (2)
tests/unittest/llmapi/apps/openai_server.py (1)
101-107: Allow binding to 0.0.0.0 for cross-node access.Binding to "localhost" limits reachability across nodes. Make host configurable or default to "0.0.0.0" in multi-node tests. Example:
- self.host = "localhost" + self.host = os.environ.get("TLLM_BIND_HOST", "0.0.0.0")tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (1)
92-116: Make the assertion tolerant; model may return "2" vs "Two".Avoid flakiness while still validating behavior.
- assert message.content == 'Two' + out = (message.content or "").strip().lower() + assert out in ('two', '2')
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
tests/integration/defs/test_e2e.py(1 hunks)tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py(1 hunks)tests/unittest/llmapi/apps/openai_server.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/integration/defs/test_e2e.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Filenames compiled into a target must be case-insensitively unique
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
**/*.{h,hpp,hh,hxx,cc,cpp,cxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use spaces, not tabs; indent 4 spaces
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs (Python)
Maintain module namespace on import: prefer from package.subpackage import foo; use foo.Symbol()
Python filenames use snake_case
Python class names use PascalCase
Python functions and methods use snake_case
Python local variables use snake_case; if starting with a number concept, prefix with k (e.g., k_99th_percentile)
Python global variables use G_ prefix with UPPER_SNAKE_CASE
Python constants use UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes
Initialize all externally visible class members in init
For public interfaces, prefer docstrings over comments; comments should be for in-function or file-local interfaces
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes and variables inline with docstrings immediately after assignment
Avoid reflection when a non-reflective approach suffices
Limit except clauses to specific exceptions where possible
When using try/except for duck-typing, keep try body minimal and move logic to else
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
**/*.{cpp,cc,cxx,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
🧬 Code graph analysis (2)
tests/unittest/llmapi/apps/openai_server.py (1)
tensorrt_llm/llmapi/mpi_session.py (1)
find_free_port(451-454)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (1)
tests/unittest/llmapi/apps/openai_server.py (4)
RemoteDisaggOpenAIServer(92-132)RemoteOpenAIServer(15-89)get_client(80-84)get_async_client(86-89)
🪛 Ruff (0.12.2)
tests/unittest/llmapi/apps/openai_server.py
112-112: Undefined name tempfile
(F821)
121-121: Undefined name yaml
(F821)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
114-114: Undefined name time
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
4d7259d to
367a78c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unittest/llmapi/apps/openai_server.py (1)
20-31: Rank handling bug and slow non-zero ranks; add request timeout.Cast SLURM_PROCID to int; don’t sleep 600s on non-zero ranks; set a short HTTP timeout.
- self.rank = os.environ.get("SLURM_PROCID", 0) + self.rank = int(os.environ.get("SLURM_PROCID", 0)) @@ - if self.rank == 0: - if requests.get(url).status_code == 200: + if self.rank == 0: + if requests.get(url, timeout=1).status_code == 200: break else: - time.sleep(timeout) - break + breakAlso applies to: 54-65
♻️ Duplicate comments (7)
tests/unittest/llmapi/apps/openai_server.py (2)
1-2: Add required NVIDIA SPDX header.Prepend the standard NVIDIA header at the very top.
+ # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # SPDX-License-Identifier: Apache-2.0 # Adapted from # https://github.com/vllm-project/vllm/blob/baaedfdb2d3f1d70b7dbcde08b083abfe6017a92/tests/utils.py
96-121: Broken extra-config lifecycle and undefined helper (server won’t start)._get_extra_config_file is undefined; writing to a different temp file; default binary mode write will raise; ensure durable temp file and cleanup.
+import yaml @@ - self.extra_config_file = self._get_extra_config_file() + self.extra_config_path = self._write_extra_config_file() @@ - launch_cmd = [ + launch_cmd = [ "trtllm-serve", "disaggregated", "--host", f"{self.host}", "--port", - f"{self.port}", "--extra-config", self.extra_config_file.name + f"{self.port}", "--extra-config", self.extra_config_path ] @@ - with tempfile.NamedTemporaryFile() as f: - f.write(self._get_extra_config()) - self.proc = subprocess.Popen(launch_cmd, - stdout=sys.stdout, - stderr=sys.stderr) + self.proc = subprocess.Popen(launch_cmd, + stdout=sys.stdout, + stderr=sys.stderr) @@ self._wait_for_server(url=self.url_for("health"), timeout=self.MAX_SERVER_START_WAIT_S) + + def __exit__(self, exc_type, exc_value, traceback): + try: + os.remove(getattr(self, "extra_config_path", "")) + except Exception: + pass + return super().__exit__(exc_type, exc_value, traceback) + + def _write_extra_config_file(self) -> str: + f = tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) + try: + f.write(self._get_extra_config()) + f.flush() + return f.name + finally: + f.close()tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (5)
1-1: Add required NVIDIA SPDX header.Per project guidelines, prepend the header.
+ # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # SPDX-License-Identifier: Apache-2.0 import os import socket
1-6: Missing import for time.time.sleep is used later.
import os import socket +import timeAlso applies to: 122-122
43-63: Workers bind to localhost/host; cross-node access will fail.Bind servers to 0.0.0.0 so the aggregator can reach them across nodes.
- with RemoteOpenAIServer(model_path, - port=8001, - cli_args=args, - host="localhost") as server: + with RemoteOpenAIServer(model_path, + port=8001, + cli_args=args, + host="0.0.0.0") as server: yield server @@ - with RemoteOpenAIServer(model_path, port=8002, cli_args=args, - host=host) as server: + with RemoteOpenAIServer(model_path, + port=8002, + cli_args=args, + host="0.0.0.0") as server: yield server
66-79: Host selection and NODE_LIST misuse (logic errors).remove() returns None; pick nodes explicitly and construct URLs.
- if NODE_RANK == 0 and RANK == 0: - ctx_host = socket.gethostname() # start ctx worker on NODE_RANK 0 - assert len(NODE_LIST) == 2 - assert ctx_host in NODE_LIST - print(f"ctx_host: {ctx_host} {NODE_LIST}") - gen_host = NODE_LIST.remove(ctx_host)[0] + if NODE_RANK == 0 and RANK == 0: + nodes = [n for n in NODE_LIST if n] + assert len(nodes) >= 2, f"Expected 2 nodes, got: {nodes}" + ctx_host, gen_host = nodes[0], nodes[1] + print(f"ctx_host={ctx_host}, gen_host={gen_host}") ctx_url = f"http://{ctx_host}:8001" gen_url = f"http://{gen_host}:8002"
84-97: Fixture gating: avoid AttributeError on non-zero ranks.Only call get_client on rank 0 of node 0.
def client(disagg_server: RemoteDisaggOpenAIServer): - if NODE_RANK == 0: + if NODE_RANK == 0 and RANK == 0 and disagg_server is not None: return disagg_server.get_client() else: return None @@ def async_client(disagg_server: RemoteDisaggOpenAIServer): - if NODE_RANK == 0: + if NODE_RANK == 0 and RANK == 0 and disagg_server is not None: return disagg_server.get_async_client() else: return None
🧹 Nitpick comments (2)
tests/unittest/llmapi/apps/openai_server.py (1)
96-103: Make host configurable for the disaggregated server.Allow binding to 0.0.0.0 when needed.
- def __init__(self, + def __init__(self, ctx_servers: List[str], gen_servers: List[str], port: int = None, - llmapi_launch: bool = False) -> None: + host: str = "localhost", + llmapi_launch: bool = False) -> None: @@ - self.host = "localhost" + self.host = hosttests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (1)
100-121: Reduce assertion flakiness for model output.Accept common one-token answers case-insensitively.
- assert message.content == 'Two' + assert message.content.strip().lower() in ('two', '2')
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
tests/integration/defs/test_e2e.py(1 hunks)tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py(1 hunks)tests/unittest/llmapi/apps/openai_server.py(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/integration/defs/test_e2e.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Filenames compiled into a target must be case-insensitively unique
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
**/*.{h,hpp,hh,hxx,cc,cpp,cxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use spaces, not tabs; indent 4 spaces
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Code must target Python 3.8+
Indent with 4 spaces; do not use tabs (Python)
Maintain module namespace on import: prefer from package.subpackage import foo; use foo.Symbol()
Python filenames use snake_case
Python class names use PascalCase
Python functions and methods use snake_case
Python local variables use snake_case; if starting with a number concept, prefix with k (e.g., k_99th_percentile)
Python global variables use G_ prefix with UPPER_SNAKE_CASE
Python constants use UPPER_SNAKE_CASE
Avoid shadowing variables from outer scopes
Initialize all externally visible class members in init
For public interfaces, prefer docstrings over comments; comments should be for in-function or file-local interfaces
Use Google-style docstrings for classes and functions (Sphinx-parsable)
Document attributes and variables inline with docstrings immediately after assignment
Avoid reflection when a non-reflective approach suffices
Limit except clauses to specific exceptions where possible
When using try/except for duck-typing, keep try body minimal and move logic to else
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
**/*.{cpp,cc,cxx,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
🧬 Code graph analysis (2)
tests/unittest/llmapi/apps/openai_server.py (1)
tensorrt_llm/llmapi/mpi_session.py (1)
find_free_port(451-454)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (1)
tests/unittest/llmapi/apps/openai_server.py (4)
RemoteDisaggOpenAIServer(94-134)RemoteOpenAIServer(16-91)get_client(82-86)get_async_client(88-91)
🪛 Ruff (0.12.2)
tests/unittest/llmapi/apps/openai_server.py
123-123: Undefined name yaml
(F821)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
122-122: Undefined name time
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
05391bd to
f5f90e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unittest/llmapi/apps/openai_server.py (1)
67-89: Bound health-check request to avoid hanging.Add a small timeout to the GET to respect the overall wait window.
- if requests.get(url).status_code == 200: + if requests.get(url, timeout=1.0).status_code == 200: break
♻️ Duplicate comments (2)
tests/unittest/llmapi/apps/openai_server.py (2)
1-3: Add required NVIDIA SPDX header at top.File is missing the mandated header.
+ # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # SPDX-License-Identifier: Apache-2.0 + # Adapted from # https://github.com/vllm-project/vllm/blob/baaedfdb2d3f1d70b7dbcde08b083abfe6017a92/tests/utils.py
59-66: Clean up temporary config files on exit.Delete any temp YAML created for
extra_configor disagg config.def __exit__(self, exc_type, exc_value, traceback): self.proc.terminate() try: self.proc.wait(timeout=30) except subprocess.TimeoutExpired as e: self.proc.kill() self.proc.wait(timeout=30) + # Remove temp files if created + for tmp in (getattr(self, "extra_config_file", None), + getattr(self, "config_file", None)): + if tmp: + try: + os.remove(tmp) + except Exception: + pass
🧹 Nitpick comments (9)
tests/unittest/llmapi/apps/openai_server.py (4)
21-29: Type hints: avoid implicit Optional.Annotate
cli_argsandextra_configproperly for readability and static linting.- def __init__(self, - model: str, - cli_args: List[str] = None, + def __init__(self, + model: str, + cli_args: Optional[List[str]] = None, llmapi_launch: bool = False, port: int = None, host: str = "localhost", - env: Optional[dict] = None, - rank: int = -1, - extra_config: Optional[dict] = None) -> None: + env: Optional[dict] = None, + rank: int = -1, + extra_config: Optional[dict] = None) -> None:
47-51: Only default env when None.
if not envreplaces an explicitly passed empty env. Useis None.- if not env: - env = os.environ.copy() + if env is None: + env = os.environ.copy()
43-47: Style nit: avoid list concatenation when building commands.Minor readability tweak per Ruff (RUF005).
- launch_cmd = ["trtllm-serve"] + [model] + args + launch_cmd = ["trtllm-serve", model, *args] @@ - launch_cmd = ["trtllm-llmapi-launch"] + launch_cmd + launch_cmd = ["trtllm-llmapi-launch", *launch_cmd]
128-132: Style nit: command build.- launch_cmd = ["trtllm-serve", "disaggregated", "-c", self.config_file] + launch_cmd = ["trtllm-serve", "disaggregated", "-c", self.config_file] if llmapi_launch: - # start server with llmapi-launch on multi nodes - launch_cmd = ["trtllm-llmapi-launch"] + launch_cmd + launch_cmd = ["trtllm-llmapi-launch", *launch_cmd]tests/integration/defs/test_e2e.py (1)
1625-1641: Silence unused arg and make the invocation robust.
llm_rootisn’t used; prefix with_to satisfy linters. The test node path looks good.-def test_openai_disaggregated_serving_multi_nodes(llm_root, llm_venv, +def test_openai_disaggregated_serving_multi_nodes(_llm_root, llm_venv, ctx_config, gen_config):tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (4)
1-7: Import missing modules used in fixes below.You’ll need re for parsing and typing for hints in this test module.
Apply:
import os import socket import subprocess import time +import re +from typing import Optional
167-174: Gate clients on ctx node’s local rank 0 to match disagg_server.Prevents AttributeError and accidental client use on the gen node.
Apply:
def client(disagg_server: RemoteDisaggOpenAIServer): - if RANK == 0: + if NODE_RANK == 0 and LOCAL_RANK == 0 and disagg_server is not None: return disagg_server.get_client() else: print(f"skipping client for rank {RANK} node rank {NODE_RANK}") return None
199-214: Relax overly strict assertion on completion text.Servers often return “2” or “2.”; normalize whitespace and assert prefix “2”.
Apply:
- message = completion.choices[0].text - assert message.startswith('2.') + message = completion.choices[0].text.strip() + assert message.startswith('2')
25-27: Optional: avoid hard-coded ports to reduce flakiness.If multiple tests run on shared runners, 8000–8002 can collide. Consider dynamic ports and pass them through the config/URLs.
Example:
-CTX_SERVER_PORT = 8001 -GEN_SERVER_PORT = 8002 -DISAGG_SERVER_PORT = 8000 +CTX_SERVER_PORT = int(os.environ.get("CTX_SERVER_PORT", "8001")) +GEN_SERVER_PORT = int(os.environ.get("GEN_SERVER_PORT", "8002")) +DISAGG_SERVER_PORT = int(os.environ.get("DISAGG_SERVER_PORT", "8000"))Then export these in CI or compute free ports on NODE_RANK==0 and broadcast via a simple file or env.
Also applies to: 149-165
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
tensorrt_llm/serve/openai_disagg_server.py(0 hunks)tests/integration/defs/test_e2e.py(1 hunks)tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py(1 hunks)tests/unittest/llmapi/apps/openai_server.py(4 hunks)tests/unittest/llmapi/apps/utils.py(2 hunks)
💤 Files with no reviewable changes (1)
- tensorrt_llm/serve/openai_disagg_server.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
tests/unittest/llmapi/apps/utils.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.pytests/integration/defs/test_e2e.pytests/unittest/llmapi/apps/openai_server.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
tests/unittest/llmapi/apps/utils.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.pytests/integration/defs/test_e2e.pytests/unittest/llmapi/apps/openai_server.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
tests/unittest/llmapi/apps/utils.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.pytests/integration/defs/test_e2e.pytests/unittest/llmapi/apps/openai_server.py
🧠 Learnings (1)
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/integration/defs/test_e2e.py
🧬 Code graph analysis (3)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (2)
tests/unittest/llmapi/apps/openai_server.py (3)
RemoteDisaggOpenAIServer(109-153)RemoteOpenAIServer(17-106)get_client(97-101)tests/unittest/llmapi/apps/utils.py (3)
expand_slurm_nodelist(221-280)get_local_interfaces(190-218)get_local_ip(183-186)
tests/integration/defs/test_e2e.py (1)
tests/integration/defs/conftest.py (4)
llm_root(180-181)llm_venv(707-723)test_root(2189-2190)unittest_path(90-91)
tests/unittest/llmapi/apps/openai_server.py (2)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (1)
env(63-76)tensorrt_llm/llmapi/mpi_session.py (1)
find_free_port(451-454)
🪛 Ruff (0.12.2)
tests/unittest/llmapi/apps/utils.py
204-204: Local variable bytes_out is assigned to but never used
Remove assignment to unused variable bytes_out
(F841)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
44-44: subprocess call: check for execution of untrusted input
(S603)
48-48: Consider moving this statement to an else block
(TRY300)
49-49: Do not catch blind exception: Exception
(BLE001)
128-128: Possible binding to all interfaces
(S104)
150-150: Unused function argument: worker
(ARG001)
153-153: f-string without any placeholders
Remove extraneous f prefix
(F541)
181-181: Probable use of requests call without timeout
(S113)
184-185: try-except-pass detected, consider logging the exception
(S110)
184-184: Do not catch blind exception: Exception
(BLE001)
192-192: Probable use of requests call without timeout
(S113)
194-194: Do not catch blind exception: Exception
(BLE001)
tests/integration/defs/test_e2e.py
1631-1631: Unused function argument: llm_root
(ARG001)
tests/unittest/llmapi/apps/openai_server.py
25-25: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
43-43: Consider ["trtllm-serve", model, *args] instead of concatenation
Replace with ["trtllm-serve", model, *args]
(RUF005)
46-46: Consider ["trtllm-llmapi-launch", *launch_cmd] instead of concatenation
Replace with ["trtllm-llmapi-launch", *launch_cmd]
(RUF005)
49-49: subprocess call: check for execution of untrusted input
(S603)
131-131: Consider ["trtllm-llmapi-launch", *launch_cmd] instead of concatenation
Replace with ["trtllm-llmapi-launch", *launch_cmd]
(RUF005)
134-134: subprocess call: check for execution of untrusted input
(S603)
🔇 Additional comments (3)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (3)
71-76: Set UCX interface only when discovered; document rationale.Good call setting TRTLLM_UCX_INTERFACE. Consider also logging chosen NIC for debug.
149-165: Note: fixture arg ‘worker’ is intentionally unused (side-effect fixture).Ruff ARG001 can be ignored here; the server lifecycle is the point.
If Ruff fails CI on ARG001, add “# noqa: ARG001” to the parameter.
1-1: Add required NVIDIA Apache-2.0 header.Per repo guidelines, prepend the SPDX header.
Apply:
+ # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # SPDX-License-Identifier: Apache-2.0 import os⛔ Skipped due to learnings
Learnt from: galagam PR: NVIDIA/TensorRT-LLM#6487 File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12 Timestamp: 2025-08-06T13:58:07.506Z Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
816558d to
8d7b94b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (10)
tests/unittest/llmapi/apps/utils.py (2)
183-186: LGTM!The socket resource leak issue has been addressed by using a context manager.
191-219: Fix ioctl packing for 64-bit platforms and remove unused variable.The function has platform compatibility issues and an unused variable that should be addressed.
The ioctl packing uses
'iL'which can break on 64-bit platforms where pointers are 8 bytes. Also,bytes_outis assigned but never used.Apply this diff to fix the issues:
def get_local_interfaces(): """ Returns a dictionary of name:ip key value pairs. """ MAX_BYTES = 4096 FILL_CHAR = b'\0' SIOCGIFCONF = 0x8912 sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) names = array.array('B', MAX_BYTES * FILL_CHAR) names_address, names_length = names.buffer_info() - mutable_byte_buffer = struct.pack('iL', MAX_BYTES, names_address) + mutable_byte_buffer = struct.pack('iP', MAX_BYTES, names_address) mutated_byte_buffer = fcntl.ioctl(sock.fileno(), SIOCGIFCONF, mutable_byte_buffer) - max_bytes_out, names_address_out = struct.unpack('iL', mutated_byte_buffer) + max_bytes_out, names_address_out = struct.unpack('iP', mutated_byte_buffer) namestr = names.tobytes() - namestr[:max_bytes_out] - bytes_out = namestr[:max_bytes_out] ip_dict = {}Also, the socket should be closed properly. Consider using a context manager:
- sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) - names = array.array('B', MAX_BYTES * FILL_CHAR) - names_address, names_length = names.buffer_info() - mutable_byte_buffer = struct.pack('iP', MAX_BYTES, names_address) - mutated_byte_buffer = fcntl.ioctl(sock.fileno(), SIOCGIFCONF, - mutable_byte_buffer) + with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as sock: + names = array.array('B', MAX_BYTES * FILL_CHAR) + names_address, names_length = names.buffer_info() + mutable_byte_buffer = struct.pack('iP', MAX_BYTES, names_address) + mutated_byte_buffer = fcntl.ioctl(sock.fileno(), SIOCGIFCONF, + mutable_byte_buffer)tests/unittest/llmapi/apps/openai_server.py (4)
25-25: Fix type casting issue with SLURM_PROCID environment variable.
os.environ.get("SLURM_PROCID", 0)returns a string when the env var exists, but an integer (0) when it doesn't, which can cause type comparison issues later.Apply this fix to ensure consistent typing:
- self.rank = rank if rank != -1 else os.environ.get("SLURM_PROCID", 0) + self.rank = rank if rank != -1 else int(os.environ.get("SLURM_PROCID", "0"))Also applies to: 32-32
38-43: Invalid NamedTemporaryFile parameter causes AttributeError.
delete_on_closeis not a valid parameter forNamedTemporaryFileand will cause anAttributeError. This parameter doesn't exist in the Python tempfile module.Remove the invalid parameter:
if extra_config: - with tempfile.NamedTemporaryFile(mode="w+", + with tempfile.NamedTemporaryFile(mode="w", delete=False, - delete_on_close=False) as f: + suffix=".yaml") as f: f.write(yaml.dump(extra_config)) self.extra_config_file = f.name
126-132: Handle negative port values properly to avoid binding to invalid ports.When
port=-1is passed, the server will try to bind to port -1 which is invalid.Apply this fix to treat negative values as unset:
- self.port = port if port is not None else find_free_port() + self.port = find_free_port() if (port is None or port < 0) else port
134-139: Invalid NamedTemporaryFile parameter in RemoteDisaggOpenAIServer.Same issue as in the parent class -
delete_on_closeis not a valid parameter.Apply this fix:
with tempfile.NamedTemporaryFile(mode="w+", delete=False, - delete_on_close=False) as f: + suffix=".yaml") as f: f.write(self._get_extra_config()) f.flush() self.extra_config_file = f.nametests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (4)
14-17: Consider using SLURM_LOCALID for per-node rank.Using
SLURM_NODEIDforNODE_RANKis correct, but for per-node local rank operations,SLURM_LOCALIDwould be more appropriate than calculatingRANK % SLURM_NTASKS_PER_NODE.Consider adding:
LOCAL_RANK = int(os.environ.get("SLURM_LOCALID", "0"))Then use
LOCAL_RANKinstead ofRANK % SLURM_NTASKS_PER_NODEin lines 153 and 166.
30-35: Host matching logic is brittle with FQDN vs shortname mismatches.The assertion
socket.gethostname() in NODE_LISTcan fail if SLURM uses FQDNs whilesocket.gethostname()returns a shortname or vice versa.Make the host selection more robust:
def get_the_other_host(idx=0): - assert len(NODE_LIST) >= 2 - assert socket.gethostname() in NODE_LIST - node_list = NODE_LIST.copy() - node_list.remove(socket.gethostname()) - return node_list[idx] + nodes = [n for n in NODE_LIST if n] + if len(nodes) < 2: + pytest.skip(f"disagg multi-node test requires at least 2 nodes; got {nodes}") + # Use NODE_RANK to identify our position instead of hostname matching + assert NODE_RANK in (0, 1), f"Unexpected NODE_RANK={NODE_RANK}" + # Return the other node + return nodes[1 - NODE_RANK] if idx == 0 else nodes[(NODE_RANK + 1 + idx) % len(nodes)]
178-178: Add timeouts to requests.get calls to prevent indefinite hangs.Network requests without timeouts can hang indefinitely, causing CI/test failures.
Add timeouts to all requests:
- if requests.get(url).status_code == 200: + if requests.get(url, timeout=5).status_code == 200:- if requests.get(url).status_code >= 100: + if requests.get(url, timeout=5).status_code >= 100: print( - f"endpoint {url} returned status code {requests.get(url).status_code}" + f"endpoint {url} is still up" )Note: Line 191 makes a redundant second request just to get the status code for logging. Consider storing the response:
resp = requests.get(url, timeout=5) if resp.status_code >= 100: print(f"endpoint {url} returned status code {resp.status_code}")Also applies to: 189-191
1-13: Add missing NVIDIA copyright header.Per the coding guidelines, all source files must include the NVIDIA Apache-2.0 copyright header.
Add the required header at the top of the file:
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import os import socket
🧹 Nitpick comments (4)
tests/unittest/llmapi/apps/utils.py (1)
251-256: Redundant bracket parsing logic should be removed.Lines 251-256 parse bracket notation but the results are unused. This appears to be leftover code from an earlier implementation.
Remove the redundant loop that doesn't contribute to the function:
if buf: groups.append(''.join(buf).strip()) - for group in groups: - bracket_match = re.match(r'^([^\[]+)\[(.+?)\]$', group) - if bracket_match: - prefix = bracket_match.group(1) - range_part = bracket_match.group(2) - expanded_nodes = []tests/unittest/llmapi/apps/openai_server.py (1)
69-71: Unused exception variable should be removed.The variable
eis assigned but never used in the exception handler.Remove the unused variable:
- except subprocess.TimeoutExpired as e: + except subprocess.TimeoutExpired: self.proc.kill() self.proc.wait(timeout=30)tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (2)
98-98: Remove extraneous f-string prefix.The string doesn't contain any placeholders, so the f-prefix is unnecessary.
- print(f"Failed to find NIC, will use default UCX interface") + print("Failed to find NIC, will use default UCX interface")
244-244: Avoid calling terminate() on disagg_server in test.The server's lifecycle is managed by the fixture's context manager. Calling
terminate()manually can interfere with proper cleanup.Remove the manual termination since the fixture handles cleanup:
message = completion.choices[0].text assert message.startswith('2.') - disagg_server.terminate()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py(1 hunks)tests/unittest/llmapi/apps/openai_server.py(5 hunks)tests/unittest/llmapi/apps/utils.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.pytests/unittest/llmapi/apps/utils.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.pytests/unittest/llmapi/apps/utils.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.pytests/unittest/llmapi/apps/utils.py
🧬 Code graph analysis (2)
tests/unittest/llmapi/apps/openai_server.py (2)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (1)
env(83-99)tensorrt_llm/llmapi/mpi_session.py (1)
find_free_port(451-454)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (2)
tests/unittest/llmapi/apps/openai_server.py (4)
RemoteDisaggOpenAIServer(121-167)RemoteOpenAIServer(17-118)get_client(109-113)terminate(63-77)tests/unittest/llmapi/apps/utils.py (3)
expand_slurm_nodelist(222-302)get_local_interfaces(191-219)get_local_ip(183-186)
🪛 Ruff (0.12.2)
tests/unittest/llmapi/apps/openai_server.py
25-25: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
44-44: Consider ["trtllm-serve", model, *args] instead of concatenation
Replace with ["trtllm-serve", model, *args]
(RUF005)
47-47: Consider ["trtllm-llmapi-launch", *launch_cmd] instead of concatenation
Replace with ["trtllm-llmapi-launch", *launch_cmd]
(RUF005)
50-50: subprocess call: check for execution of untrusted input
(S603)
69-69: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
75-75: Do not catch blind exception: Exception
(BLE001)
145-145: Consider ["trtllm-llmapi-launch", *launch_cmd] instead of concatenation
Replace with ["trtllm-llmapi-launch", *launch_cmd]
(RUF005)
148-148: subprocess call: check for execution of untrusted input
(S603)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
60-60: subprocess call with shell=True identified, security issue
(S602)
67-67: Consider moving this statement to an else block
(TRY300)
68-68: Do not catch blind exception: Exception
(BLE001)
98-98: f-string without any placeholders
Remove extraneous f prefix
(F541)
150-150: Possible binding to all interfaces
(S104)
164-164: Possible binding to all interfaces
(S104)
178-178: Probable use of requests call without timeout
(S113)
181-181: Do not catch blind exception: Exception
(BLE001)
189-189: Probable use of requests call without timeout
(S113)
191-191: Probable use of requests call without timeout
(S113)
194-194: Do not catch blind exception: Exception
(BLE001)
200-200: Unused function argument: worker
(ARG001)
203-203: f-string without any placeholders
Remove extraneous f prefix
(F541)
tests/unittest/llmapi/apps/utils.py
205-205: Local variable bytes_out is assigned to but never used
Remove assignment to unused variable bytes_out
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/unittest/llmapi/apps/openai_server.py (1)
79-101: Harden health check: timeout and monotonic clock.Prevent indefinite hangs and clock skew issues; only rank 0 polls with a short per-request timeout.
Apply:
- start = time.time() + start = time.monotonic() while True: try: if self.rank == 0: - if requests.get(url).status_code == 200: + if requests.get(url, timeout=5).status_code == 200: break else: time.sleep(0.5) else: time.sleep(timeout) break except Exception as err: result = self.proc.poll() if result is not None and result != 0: raise RuntimeError("Server exited unexpectedly.") from err time.sleep(0.5) - if time.time() - start > timeout: + if time.monotonic() - start > timeout: raise RuntimeError( "Server failed to start in time.") from errtests/unittest/llmapi/apps/utils.py (1)
21-25: Add typing imports for 3.8 compatibility.You use built-in generics (list[str]) later; target is Python 3.8+. Import and use typing.List/Dict.
Apply:
-from typing import Any, Callable +from typing import Any, Callable, Dict, List
♻️ Duplicate comments (17)
tests/unittest/llmapi/apps/openai_server.py (4)
1-1: Add required NVIDIA SPDX header.Per repo guidelines, prepend the NVIDIA Apache-2.0 header to all Python sources.
Apply:
+ # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # SPDX-License-Identifier: Apache-2.0
31-33: Rank type bug and port sentinel handling.
- SLURM_PROCID is str; comparing to int 0 fails, breaking health-wait logic.
- Treat negative ports as “unset” to avoid binding to -1.
Apply:
- self.port = port if port is not None else find_free_port() - self.rank = rank if rank != -1 else os.environ.get("SLURM_PROCID", 0) + self.port = find_free_port() if (port is None or port < 0) else port + self.rank = rank if rank != -1 else int(os.environ.get("SLURM_PROCID", "0"))
37-43: Broken temp-file usage: invalid argument and persistence.NamedTemporaryFile has no delete_on_close; this raises TypeError and prevents server start. Also use text mode and flush.
Apply:
- if extra_config: - with tempfile.NamedTemporaryFile(mode="w+", - delete=False, - delete_on_close=False) as f: - f.write(yaml.dump(extra_config)) - self.extra_config_file = f.name - args += ["--extra_llm_api_options", self.extra_config_file] + if extra_config: + with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f: + f.write(yaml.dump(extra_config)) + f.flush() + self.extra_config_file = f.name + args += ["--extra_llm_api_options", self.extra_config_file]
134-141: Same temp-file bug in disagg server init.delete_on_close is invalid. Use text mode, suffix, and flush.
Apply:
- with tempfile.NamedTemporaryFile(mode="w+", - delete=False, - delete_on_close=False) as f: - f.write(self._get_extra_config()) - f.flush() - self.extra_config_file = f.name + with tempfile.NamedTemporaryFile(mode="w", suffix=".yaml", delete=False) as f: + f.write(self._get_extra_config()) + f.flush() + self.extra_config_file = f.nametests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (12)
224-232: Guard client fixture on both node and local rank.Prevents AttributeError when disagg_server is None on non-ctx ranks.
Apply:
def client(disagg_server: RemoteDisaggOpenAIServer): - if is_pytest_node(): + if is_pytest_node() and disagg_server is not None: return disagg_server.get_client() else: print(f"skipping client for rank {RANK} node rank {NODE_RANK}") return None
14-18: Use SLURM_LOCALID and avoid fragile parsing of SLURM_NTASKS_PER_NODE.LOCAL_RANK is the canonical per-node rank; SLURM_NTASKS_PER_NODE may be "2(x2)" and int() will fail.
Apply:
RANK = int(os.environ.get("SLURM_PROCID", 0)) NODE_RANK = int(os.environ.get("SLURM_NODEID", 0)) NODE_LIST = expand_slurm_nodelist(os.environ.get("SLURM_NODELIST", "")) -SLURM_NTASKS_PER_NODE = int(os.environ.get("SLURM_NTASKS_PER_NODE", 1)) +LOCAL_RANK = int(os.environ.get("SLURM_LOCALID", "0"))
29-36: Brittle host resolution (can crash on FQDN vs shortname).Do not assert membership; select peers by NODE_RANK and skip if <2 nodes.
Apply:
-def get_the_other_host(idx=0): - assert len(NODE_LIST) >= 2 - assert socket.gethostname() in NODE_LIST - node_list = NODE_LIST.copy() - node_list.remove(socket.gethostname()) - return node_list[idx] +def get_the_other_host(idx: int = 0) -> str: + nodes = [n for n in NODE_LIST if n] + if len(nodes) < 2: + pytest.skip(f"disagg multi-node test requires 2 nodes; got {nodes}") + assert NODE_RANK in (0, 1), f"Unexpected NODE_RANK={NODE_RANK}" + # For now: two-node scenario; pick the non-local node. + return nodes[1 - NODE_RANK]
55-66: Shell misuse and potential injection; also TypeError with list+shell=True.You pass a list with shell=True; this raises TypeError. Remove shell=True, add timeout, and parse robustly.
Apply:
- try: - # iproute2 may not be installed - proc = subprocess.run(["ip", "route", "get", test_ip], - capture_output=True, - text=True, - shell=True, - check=True) + try: + # iproute2 may not be installed + proc = subprocess.run(["ip", "route", "get", test_ip], + capture_output=True, + text=True, + check=True, + timeout=5)
151-160: Use LOCAL_RANK for rank gating; bind to all interfaces already OK.Avoid RANK % SLURM_NTASKS_PER_NODE; pass LOCAL_RANK.
Apply:
with RemoteOpenAIServer(model_path, port=CTX_SERVER_PORT, cli_args=args, host="0.0.0.0", env=env(), llmapi_launch=True, - rank=RANK % SLURM_NTASKS_PER_NODE, + rank=LOCAL_RANK, extra_config=extra_config) as server:
166-173: Same: use LOCAL_RANK for gen server.Mirror ctx fix.
Apply:
with RemoteOpenAIServer(model_path, port=GEN_SERVER_PORT, cli_args=args, host="0.0.0.0", env=env(), - rank=RANK % SLURM_NTASKS_PER_NODE, + rank=LOCAL_RANK, extra_config=extra_config) as server:
178-188: Add request timeout and keep a single GET per loop.Prevents CI hangs and reduces load.
Apply:
- while time.monotonic() - start < timeout: + while time.monotonic() - start < timeout: try: time.sleep(1) - if requests.get(url).status_code == 200: + resp = requests.get(url, timeout=5) + if resp.status_code == 200: print(f"endpoint {url} is ready") return - except Exception as err: + except requests.exceptions.RequestException as err: print(f"endpoint {url} is not ready, with exception: {err}")
190-201: Likewise for “down” wait: timeout + narrower exception + single GET.Apply:
- while time.monotonic() - start < timeout: + while time.monotonic() - start < timeout: try: - if requests.get(url).status_code >= 100: - print( - f"endpoint {url} returned status code {requests.get(url).status_code}" - ) + resp = requests.get(url, timeout=5) + if resp.status_code >= 100: + print(f"endpoint {url} returned status code {resp.status_code}") time.sleep(1) - except Exception as err: + except requests.exceptions.RequestException as err: print(f"endpoint {url} is down, with exception: {err}") return
205-217: Fix URLs to include scheme and silence unused fixture arg.
- Disagg server expects full URLs (http://...).
- The worker fixture is intentionally unused; rename to _worker to satisfy linters.
Apply:
-@pytest.fixture(scope="module") -def disagg_server(worker: RemoteOpenAIServer): +@pytest.fixture(scope="module") +def disagg_server(_worker: RemoteOpenAIServer): if is_disagg_node(): print(f"starting disagg_server for rank {RANK} node rank {NODE_RANK}") - ctx_url = f"localhost:8001" # Use localhost since the ctx server is on the same node + ctx_url = f"http://localhost:{CTX_SERVER_PORT}" # ctx on same node # TODO: Hopefully the NODE_LIST is ordered by NODE_RANK, this test is only expected to run with 2 nodes now # We need to test with 4 nodes or more in the future, which should be easier with service discovery - gen_url = f"{get_the_other_host(0)}:8002" + gen_url = f"http://{get_the_other_host(0)}:{GEN_SERVER_PORT}" with RemoteDisaggOpenAIServer(ctx_servers=[ctx_url], gen_servers=[gen_url], port=DISAGG_SERVER_PORT, llmapi_launch=True, env=env()) as server: yield server
135-137: Avoid fragile node membership assert; prefer skip.Tests should skip gracefully if SLURM env is not as expected.
Apply:
- host = socket.gethostname() - assert host in NODE_LIST + host = socket.gethostname() + if host not in NODE_LIST or len(NODE_LIST) < 2: + pytest.skip(f"disagg multi-node test requires 2 nodes; NODE_LIST={NODE_LIST}, host={host}")
1-1: Add required NVIDIA SPDX header.Tests are Python sources too; add the NVIDIA Apache-2.0 header.
Apply:
+ # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + # SPDX-License-Identifier: Apache-2.0
65-75: Narrow exception type and minor print fix.Catch subprocess errors explicitly and avoid bare Exception; also remove unnecessary f-prefix elsewhere.
Apply:
- else: - raise ValueError("Could not find 'dev' in ip route output") + else: + raise RuntimeError("NIC not found in `ip route get` output") print(f"get NIC name from ip route, result: {nic_name}") return nic_name - except Exception as e: + except (subprocess.SubprocessError, FileNotFoundError, RuntimeError) as e: print(f"Failed to find NIC from ip route: {e}") @@ - except OSError as e: - print(f"Failed to find NIC from local interfaces: {e}") + except OSError as e: + print(f"Failed to find NIC from local interfaces: {e}")Also change line 103 to remove an f-string without placeholders:
- print(f"Failed to find NIC, will use default UCX interface") + print("Failed to find NIC, will use default UCX interface")tests/unittest/llmapi/apps/utils.py (1)
189-220: Fix resource leak and 64-bit ioctl packing; simplify IP parsing.
- Close the socket via context manager.
- Use 'iP' for pointer-sized packing/unpacking.
- Remove dead code and build IP strings directly.
Apply:
-def get_local_interfaces(): - """ Returns a dictionary of name:ip key value pairs. """ - MAX_BYTES = 4096 - FILL_CHAR = b'\0' - SIOCGIFCONF = 0x8912 - sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) - names = array.array('B', MAX_BYTES * FILL_CHAR) - names_address, names_length = names.buffer_info() - mutable_byte_buffer = struct.pack('iL', MAX_BYTES, names_address) - mutated_byte_buffer = fcntl.ioctl(sock.fileno(), SIOCGIFCONF, - mutable_byte_buffer) - max_bytes_out, names_address_out = struct.unpack('iL', mutated_byte_buffer) - namestr = names.tobytes() - namestr[:max_bytes_out] - bytes_out = namestr[:max_bytes_out] - ip_dict = {} +def get_local_interfaces() -> Dict[str, str]: + """Returns a dictionary of name: ip key-value pairs.""" + MAX_BYTES = 4096 + FILL_CHAR = b'\0' + SIOCGIFCONF = 0x8912 + with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as sock: + names = array.array('B', FILL_CHAR * MAX_BYTES) + names_address, _ = names.buffer_info() + mutable_byte_buffer = struct.pack('iP', MAX_BYTES, names_address) + mutated_byte_buffer = fcntl.ioctl(sock.fileno(), SIOCGIFCONF, mutable_byte_buffer) + max_bytes_out, _ = struct.unpack('iP', mutated_byte_buffer) + namestr = names.tobytes() + ip_dict: Dict[str, str] = {} for i in range(0, max_bytes_out, 40): name = namestr[i:i + 16].split(FILL_CHAR, 1)[0] name = name.decode('utf-8') ip_bytes = namestr[i + 20:i + 24] - full_addr = [] - for netaddr in ip_bytes: - if isinstance(netaddr, int): - full_addr.append(str(netaddr)) - elif isinstance(netaddr, str): - full_addr.append(str(ord(netaddr))) - ip_dict[name] = '.'.join(full_addr) + ip_dict[name] = '.'.join(str(b) for b in ip_bytes) return ip_dict
🧹 Nitpick comments (6)
tests/unittest/llmapi/apps/openai_server.py (4)
21-29: Fix type hints: make cli_args Optional.PEP 484: a parameter with default None must be Optional[List[str]].
Apply:
- cli_args: List[str] = None, + cli_args: Optional[List[str]] = None,
44-47: Minor: build launch_cmd with unpacking.Cleaner and avoids unnecessary concatenations.
Apply:
- launch_cmd = ["trtllm-serve"] + [model] + args + launch_cmd = ["trtllm-serve", model, *args] if llmapi_launch: - # start server with llmapi-launch on multi nodes - launch_cmd = ["trtllm-llmapi-launch"] + launch_cmd + # start server with llmapi-launch on multi nodes + launch_cmd = ["trtllm-llmapi-launch", *launch_cmd]
63-77: Tighten exception handling in terminate().
- Remove unused variable e.
- Avoid blanket Exception (BLE001). Narrow to OSError for cleanup.
Apply:
- except subprocess.TimeoutExpired as e: + except subprocess.TimeoutExpired: self.proc.kill() self.proc.wait(timeout=30) - try: + try: if self.extra_config_file: os.remove(self.extra_config_file) - except Exception as e: - print(f"Error removing extra config file: {e}") + except OSError as err: + print(f"Error removing extra config file: {err}")
141-147: Minor: consistent command building.Use unpacking consistently.
Apply:
- launch_cmd = [ - "trtllm-serve", "disaggregated", "-c", self.extra_config_file - ] + launch_cmd = ["trtllm-serve", "disaggregated", "-c", self.extra_config_file] if llmapi_launch: - # start server with llmapi-launch on multi nodes - launch_cmd = ["trtllm-llmapi-launch"] + launch_cmd + # start server with llmapi-launch on multi nodes + launch_cmd = ["trtllm-llmapi-launch", *launch_cmd]tests/unittest/llmapi/apps/utils.py (2)
183-187: Annotate get_local_ip signature (minor).Add type hints for clarity.
Apply:
-def get_local_ip(test_ip): +def get_local_ip(test_ip: str) -> str:
250-257: Remove dead code block.This preliminary loop does nothing and can be deleted.
Apply:
- for group in groups: - bracket_match = re.match(r'^([^\[]+)\[(.+?)\]$', group) - if bracket_match: - prefix = bracket_match.group(1) - range_part = bracket_match.group(2) - - expanded_nodes = [] + expanded_nodes: List[str] = []
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py(1 hunks)tests/unittest/llmapi/apps/openai_server.py(5 hunks)tests/unittest/llmapi/apps/utils.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Use only spaces, no tabs; indent with 4 spaces.
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.pytests/unittest/llmapi/apps/utils.py
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.pytests/unittest/llmapi/apps/utils.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).
Files:
tests/unittest/llmapi/apps/openai_server.pytests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.pytests/unittest/llmapi/apps/utils.py
🧬 Code graph analysis (2)
tests/unittest/llmapi/apps/openai_server.py (2)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (1)
env(88-104)tensorrt_llm/llmapi/mpi_session.py (1)
find_free_port(451-454)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py (2)
tests/unittest/llmapi/apps/openai_server.py (4)
RemoteDisaggOpenAIServer(121-167)RemoteOpenAIServer(17-118)get_client(109-113)terminate(63-77)tests/unittest/llmapi/apps/utils.py (3)
expand_slurm_nodelist(222-302)get_local_interfaces(191-219)get_local_ip(183-186)
🪛 Ruff (0.12.2)
tests/unittest/llmapi/apps/openai_server.py
25-25: PEP 484 prohibits implicit Optional
Convert to Optional[T]
(RUF013)
44-44: Consider ["trtllm-serve", model, *args] instead of concatenation
Replace with ["trtllm-serve", model, *args]
(RUF005)
47-47: Consider ["trtllm-llmapi-launch", *launch_cmd] instead of concatenation
Replace with ["trtllm-llmapi-launch", *launch_cmd]
(RUF005)
50-50: subprocess call: check for execution of untrusted input
(S603)
69-69: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
75-75: Do not catch blind exception: Exception
(BLE001)
145-145: Consider ["trtllm-llmapi-launch", *launch_cmd] instead of concatenation
Replace with ["trtllm-llmapi-launch", *launch_cmd]
(RUF005)
148-148: subprocess call: check for execution of untrusted input
(S603)
tests/unittest/llmapi/apps/_test_disagg_serving_multi_nodes.py
60-60: subprocess call with shell=True identified, security issue
(S602)
60-60: Starting a process with a partial executable path
(S607)
70-70: Abstract raise to an inner function
(TRY301)
70-70: Avoid specifying long messages outside the exception class
(TRY003)
72-72: Consider moving this statement to an else block
(TRY300)
73-73: Do not catch blind exception: Exception
(BLE001)
103-103: f-string without any placeholders
Remove extraneous f prefix
(F541)
155-155: Possible binding to all interfaces
(S104)
169-169: Possible binding to all interfaces
(S104)
183-183: Probable use of requests call without timeout
(S113)
186-186: Do not catch blind exception: Exception
(BLE001)
194-194: Probable use of requests call without timeout
(S113)
196-196: Probable use of requests call without timeout
(S113)
199-199: Do not catch blind exception: Exception
(BLE001)
205-205: Unused function argument: worker
(ARG001)
208-208: f-string without any placeholders
Remove extraneous f prefix
(F541)
tests/unittest/llmapi/apps/utils.py
205-205: Local variable bytes_out is assigned to but never used
Remove assignment to unused variable bytes_out
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (1)
tests/unittest/llmapi/apps/utils.py (1)
302-303: Return type consistency (minor).Ensure return type matches annotation (already fine after prior change). No code change needed.
|
/bot run |
|
PR_Github #18573 [ run ] triggered by Bot |
|
PR_Github #18573 [ run ] completed with state |
6e3dc2c to
a9b0b8f
Compare
|
/bot run |
|
PR_Github #18594 [ run ] triggered by Bot |
|
PR_Github #18594 [ run ] completed with state |
a9b0b8f to
a71d12a
Compare
a71d12a to
68b5fd5
Compare
|
/bot run |
|
PR_Github #19237 [ run ] triggered by Bot |
ceeeeb1 to
bac699b
Compare
|
/bot run |
bac699b to
02229d8
Compare
|
/bot run |
|
PR_Github #19307 [ run ] triggered by Bot |
|
PR_Github #19308 [ ] completed with state |
|
PR_Github #19307 [ run ] completed with state |
53d869d to
a51f2c4
Compare
|
/bot run |
|
PR_Github #19344 [ run ] triggered by Bot |
|
/bot run |
|
PR_Github #19462 [ run ] triggered by Bot |
Signed-off-by: Lizhi Zhou <[email protected]>
Signed-off-by: Lizhi Zhou <[email protected]>
Signed-off-by: Lizhi Zhou <[email protected]>
Signed-off-by: Lizhi Zhou <[email protected]>
Signed-off-by: Lizhi Zhou <[email protected]>
Signed-off-by: Lizhi Zhou <[email protected]>
a51f2c4 to
ba2b86e
Compare
|
/bot run |
|
PR_Github #19558 [ run ] triggered by Bot |
|
PR_Github #19462 [ run ] completed with state |
|
/bot run |
|
PR_Github #19633 [ run ] triggered by Bot |
|
PR_Github #19558 [ run ] completed with state |
|
PR_Github #19633 [ run ] completed with state |
Summary by CodeRabbit
Tests
Utilities
Test servers
Style
Description
Test Coverage
My test cmd is
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.