"src/vscode:/vscode.git/clone" did not exist on "3b1d2ca1ebdf2f9dc569bcb472a28a31c1abcd78"
Unverified Commit fe0673f1 authored by Yineng Zhang's avatar Yineng Zhang Committed by GitHub
Browse files

set NCCL_IB_GID_INDEX=3 for multi node NVIDIA InfiniBand if needed (#3698)

parent 99c1b9d2
......@@ -84,6 +84,8 @@ For example, there are two H20 nodes, each with 8 GPUs. The first node's IP is `
If the command fails, try setting the `GLOO_SOCKET_IFNAME` parameter. For more information, see [Common Environment Variables](https://pytorch.org/docs/stable/distributed.html#common-environment-variables).
If the multi nodes support NVIDIA InfiniBand and encounter hanging issues during startup, consider adding the parameter `export NCCL_IB_GID_INDEX=3`. For more information, see [this](https://github.com/sgl-project/sglang/issues/3516#issuecomment-2668493307).
```bash
# node 1
python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr 10.0.0.1:5000 --nnodes 2 --node-rank 0 --trust-remote-code
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment