Skip to content

Commit 1204133

Browse files
Superjomnrmccorm4
andauthored
[https://nvbugs/5416501][doc] add known issues to llmapi doc (#7560)
Signed-off-by: Yan Chunwei <[email protected]> Co-authored-by: Ryan McCormick <[email protected]>
1 parent 88d1bde commit 1204133

File tree

1 file changed

+21
-4
lines changed

1 file changed

+21
-4
lines changed

docs/source/llm-api/index.md

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -53,26 +53,43 @@ llm = LLM(model=<local_path_to_model>)
5353

5454
The following tips typically assist new LLM API users who are familiar with other APIs that are part of TensorRT-LLM:
5555

56-
- RuntimeError: only rank 0 can start multi-node session, got 1
56+
### RuntimeError: only rank 0 can start multi-node session, got 1
5757

5858
There is no need to add an `mpirun` prefix for launching single node multi-GPU inference with the LLM API.
5959

6060
For example, you can run `python llm_inference_distributed.py` to perform multi-GPU on a single node.
6161

62-
- Hang issue on Slurm Node
62+
### Hang issue on Slurm Node
6363

6464
If you experience a hang or other issue on a node managed with Slurm, add prefix `mpirun -n 1 --oversubscribe --allow-run-as-root` to your launch script.
6565

6666
For example, try `mpirun -n 1 --oversubscribe --allow-run-as-root python llm_inference_distributed.py`.
6767

68-
- MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1.
68+
### MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1.
6969

7070
Because the LLM API relies on the `mpi4py` library, put the LLM class in a function and protect the main entrypoint to the program under the `__main__` namespace to avoid a [recursive spawn](https://mpi4py.readthedocs.io/en/stable/mpi4py.futures.html#mpipoolexecutor) process in `mpi4py`.
7171

7272
This limitation is applicable for multi-GPU inference only.
7373

74-
- Cannot quit after generation
74+
### Cannot quit after generation
7575

7676
The LLM instance manages threads and processes, which may prevent its reference count from reaching zero. To address this issue, there are two common solutions:
7777
1. Wrap the LLM instance in a function, as demonstrated in the quickstart guide. This will reduce the reference count and trigger the shutdown process.
7878
2. Use LLM as an contextmanager, with the following code: `with LLM(...) as llm: ...`, the shutdown methed will be invoked automatically once it goes out of the `with`-statement block.
79+
80+
### Single node hanging when using `docker run --net=host`
81+
82+
The root cause may be related to `mpi4py`. There is a [workaround](https://github.com/mpi4py/mpi4py/discussions/491#discussioncomment-12660609) suggesting a change from `--net=host` to `--ipc=host`, or setting the following environment variables:
83+
84+
```bash
85+
export OMPI_MCA_btl_tcp_if_include=lo
86+
export OMPI_MCA_oob_tcp_if_include=lo
87+
```
88+
89+
Another option to improve compatibility with `mpi4py` is to launch the task using:
90+
91+
```bash
92+
mpirun -n 1 --oversubscribe --allow-run-as-root python my_llm_task.py
93+
```
94+
95+
This command can help avoid related runtime issues.

0 commit comments

Comments
 (0)