Unverified Commit cebe9219 authored by Ryan McCormick's avatar Ryan McCormick Committed by GitHub
Browse files

feat: Add vars to multi-node trtllm slurm scripts to support xP yD deployments (#2429)

parent dcfa87be
......@@ -186,6 +186,10 @@ deployment across 8 nodes:
./srun_disaggregated.sh
```
> [!Tip]
> To launch multiple replicas of the configured prefill/decode workers, you can set
> NUM_PREFILL_WORKERS and NUM_DECODE_WORKERS respectively (default: 1).
## Understanding the Output
1. The `srun_aggregated.sh` launches two `srun` jobs. The first launches
......
......@@ -16,9 +16,11 @@ MOUNTS="${MOUNTS:-${DEFAULT_MOUNT}}"
NUM_GPUS_PER_NODE=${NUM_GPUS_PER_NODE:-4}
NUM_PREFILL_NODES=${NUM_PREFILL_NODES:-4}
NUM_PREFILL_WORKERS=${NUM_PREFILL_WORKERS:-1}
PREFILL_ENGINE_CONFIG="${PREFILL_ENGINE_CONFIG:-/mnt/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml}"
NUM_DECODE_NODES=${NUM_DECODE_NODES:-4}
NUM_DECODE_WORKERS=${NUM_DECODE_WORKERS:-1}
DECODE_ENGINE_CONFIG="${DECODE_ENGINE_CONFIG:-/mnt/engine_configs/deepseek_r1/wide_ep/wide_ep_decode.yaml}"
DISAGGREGATION_STRATEGY=${DISAGGREGATION_STRATEGY:-"decode_first"}
......@@ -59,10 +61,11 @@ srun \
# NOTE: Output streamed to stdout for ease of understanding the example, but
# in practice you would probably set `srun --output ... --error ...` to pipe
# the stdout/stderr to files.
echo "Launching multi-node prefill worker in background."
DISAGGREGATION_MODE=prefill \
ENGINE_CONFIG=${PREFILL_ENGINE_CONFIG} \
srun \
for ((i=1; i<=${NUM_PREFILL_WORKERS}; i++)); do
echo "Launching multi-node prefill worker in background."
DISAGGREGATION_MODE=prefill \
ENGINE_CONFIG=${PREFILL_ENGINE_CONFIG} \
srun \
--mpi pmix \
--oversubscribe \
--container-image "${IMAGE}" \
......@@ -76,11 +79,13 @@ srun \
--ntasks-per-node "${NUM_GPUS_PER_NODE}" \
--jobid "${SLURM_JOB_ID}" \
/mnt/multinode/start_trtllm_worker.sh &
done
echo "Launching multi-node decode worker in background."
DISAGGREGATION_MODE=decode \
ENGINE_CONFIG=${DECODE_ENGINE_CONFIG} \
srun \
for ((i=1; i<=${NUM_DECODE_WORKERS}; i++)); do
echo "Launching multi-node decode worker in background."
DISAGGREGATION_MODE=decode \
ENGINE_CONFIG=${DECODE_ENGINE_CONFIG} \
srun \
--mpi pmix \
--oversubscribe \
--container-image "${IMAGE}" \
......@@ -94,3 +99,4 @@ srun \
--ntasks-per-node "${NUM_GPUS_PER_NODE}" \
--jobid "${SLURM_JOB_ID}" \
/mnt/multinode/start_trtllm_worker.sh &
done
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment