You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following tips typically assist new LLM API users who are familiar with other APIs that are part of TensorRT-LLM:
55
55
56
-
- RuntimeError: only rank 0 can start multi-node session, got 1
56
+
###RuntimeError: only rank 0 can start multi-node session, got 1
57
57
58
58
There is no need to add an `mpirun` prefix for launching single node multi-GPU inference with the LLM API.
59
59
60
60
For example, you can run `python llm_inference_distributed.py` to perform multi-GPU on a single node.
61
61
62
-
- Hang issue on Slurm Node
62
+
###Hang issue on Slurm Node
63
63
64
64
If you experience a hang or other issue on a node managed with Slurm, add prefix `mpirun -n 1 --oversubscribe --allow-run-as-root` to your launch script.
65
65
66
66
For example, try `mpirun -n 1 --oversubscribe --allow-run-as-root python llm_inference_distributed.py`.
67
67
68
-
- MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1.
68
+
###MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD with errorcode 1.
69
69
70
70
Because the LLM API relies on the `mpi4py` library, put the LLM class in a function and protect the main entrypoint to the program under the `__main__` namespace to avoid a [recursive spawn](https://mpi4py.readthedocs.io/en/stable/mpi4py.futures.html#mpipoolexecutor) process in `mpi4py`.
71
71
72
72
This limitation is applicable for multi-GPU inference only.
73
73
74
-
- Cannot quit after generation
74
+
###Cannot quit after generation
75
75
76
76
The LLM instance manages threads and processes, which may prevent its reference count from reaching zero. To address this issue, there are two common solutions:
77
77
1. Wrap the LLM instance in a function, as demonstrated in the quickstart guide. This will reduce the reference count and trigger the shutdown process.
78
78
2. Use LLM as an contextmanager, with the following code: `with LLM(...) as llm: ...`, the shutdown methed will be invoked automatically once it goes out of the `with`-statement block.
79
+
80
+
### Single node hanging when using `docker run --net=host`
81
+
82
+
The root cause may be related to `mpi4py`. There is a [workaround](https://github.com/mpi4py/mpi4py/discussions/491#discussioncomment-12660609) suggesting a change from `--net=host` to `--ipc=host`, or setting the following environment variables:
83
+
84
+
```bash
85
+
export OMPI_MCA_btl_tcp_if_include=lo
86
+
export OMPI_MCA_oob_tcp_if_include=lo
87
+
```
88
+
89
+
Another option to improve compatibility with `mpi4py` is to launch the task using:
0 commit comments