Unverified Commit 72c068b8 authored by TJian's avatar TJian Committed by GitHub
Browse files

[CI] [Bugfix] Fix unbounded variable in `run-multi-node-test.sh` (#31967)


Signed-off-by: default avatartjtanaa <tunjian.tan@embeddedllm.com>
parent 7645bc52
......@@ -7,7 +7,7 @@ set -euox pipefail
if [ -e /dev/kfd ] || \
[ -d /opt/rocm ] || \
command -v rocm-smi &> /dev/null || \
[ -n "$ROCM_HOME" ]; then
[ -n "${ROCM_HOME:-}" ]; then
IS_ROCM=1
else
IS_ROCM=0
......
......@@ -1104,6 +1104,7 @@ steps:
- vllm/model_executor/models/
- tests/distributed/
- tests/examples/offline_inference/data_parallel.py
- .buildkite/scripts/run-multi-node-test.sh
commands:
- # the following commands are for the first node, with ip 192.168.10.10 (ray environment already set up)
- VLLM_TEST_SAME_HOST=0 torchrun --nnodes 2 --nproc-per-node=2 --rdzv_backend=c10d --rdzv_endpoint=192.168.10.10 distributed/test_same_node.py | grep 'Same node test passed'
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment