Unverified Commit 57402e70 authored by Ryan McCormick's avatar Ryan McCormick Committed by GitHub
Browse files

docs: Add slurm env var workaround for MPI spawn errors (#992)

parent 02145479
......@@ -197,6 +197,14 @@ Notes:
cd /workspace/examples/tensorrt_llm
dynamo serve components.worker:TensorRTLLMWorker -f ./configs/disagg.yaml --service-name TensorRTLLMWorker &
```
- If you see an error about MPI Spawn failing during TRTLLM Worker initialziation on a Slurm-based cluster,
try unsetting the following environment variables before launching the TRTLLM worker. If you intend to
run other slurm-based commands or processes on the same node after deploying the TRTLLM worker, you may
want to save these values into temporary variables and then restore them afterwards.
```bash
# Workaround for error: `mpi4py.MPI.Exception: MPI_ERR_SPAWN: could not spawn processes`
unset SLURM_JOBID SLURM_JOB_ID SLURM_NODELIST
```
### Client
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment