Unverified Commit 2a24e4aa authored by ishandhanani's avatar ishandhanani Committed by GitHub
Browse files

fix: fix the ooms in gb200 default instructions (#3768)

parent 9a723734
...@@ -59,7 +59,6 @@ MC_TE_METRIC=true \ ...@@ -59,7 +59,6 @@ MC_TE_METRIC=true \
SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE=100000 \ SGLANG_DISAGGREGATION_HEARTBEAT_MAX_FAILURE=100000 \
SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=100000 \ SGLANG_DISAGGREGATION_BOOTSTRAP_TIMEOUT=100000 \
SGLANG_DISAGGREGATION_WAITING_TIMEOUT=100000 \ SGLANG_DISAGGREGATION_WAITING_TIMEOUT=100000 \
SGLANG_MOONCAKE_CUSTOM_MEM_POOL=True \
MC_FORCE_MNNVL=1 \ MC_FORCE_MNNVL=1 \
NCCL_MNNVL_ENABLE=1 \ NCCL_MNNVL_ENABLE=1 \
NCCL_CUMEM_ENABLE=1 \ NCCL_CUMEM_ENABLE=1 \
...@@ -98,7 +97,7 @@ python3 -m dynamo.sglang \ ...@@ -98,7 +97,7 @@ python3 -m dynamo.sglang \
--disable-cuda-graph \ --disable-cuda-graph \
--chunked-prefill-size 16384 \ --chunked-prefill-size 16384 \
--max-total-tokens 32768 \ --max-total-tokens 32768 \
--mem-fraction-static 0.8 \ --mem-fraction-static 0.82 \
--log-level debug --log-level debug
``` ```
...@@ -128,10 +127,10 @@ python3 -m dynamo.sglang \ ...@@ -128,10 +127,10 @@ python3 -m dynamo.sglang \
--disaggregation-mode decode \ --disaggregation-mode decode \
--dist-init-addr ${HEAD_DECODE_NODE_IP}:29500 \ --dist-init-addr ${HEAD_DECODE_NODE_IP}:29500 \
--disaggregation-bootstrap-port 30001 \ --disaggregation-bootstrap-port 30001 \
--nnodes 12 \ --nnodes 2 \
--node-rank 0 \ --node-rank 0 \
--tp-size 48 \ --tp-size 8 \
--dp-size 48 \ --dp-size 8 \
--enable-dp-attention \ --enable-dp-attention \
--host 0.0.0.0 \ --host 0.0.0.0 \
--decode-log-interval 1 \ --decode-log-interval 1 \
...@@ -155,4 +154,4 @@ python3 -m dynamo.sglang \ ...@@ -155,4 +154,4 @@ python3 -m dynamo.sglang \
--log-level debug --log-level debug
``` ```
On the other decode nodes (this example has 2 total decode nodes), run the same command but change `--node-rank` to 1 On the other decode nodes (this example has 2 total decode nodes), run the same command but change `--node-rank` to 1.
\ No newline at end of file \ No newline at end of file
...@@ -39,6 +39,7 @@ python3 -m dynamo.sglang \ ...@@ -39,6 +39,7 @@ python3 -m dynamo.sglang \
--disaggregation-mode prefill \ --disaggregation-mode prefill \
--disaggregation-transfer-backend nixl \ --disaggregation-transfer-backend nixl \
--disaggregation-bootstrap-port 30001 \ --disaggregation-bootstrap-port 30001 \
--load-balance-method round_robin \
--host 0.0.0.0 \ --host 0.0.0.0 \
--mem-fraction-static 0.82 --mem-fraction-static 0.82
``` ```
...@@ -60,6 +61,7 @@ python3 -m dynamo.sglang \ ...@@ -60,6 +61,7 @@ python3 -m dynamo.sglang \
--disaggregation-transfer-backend nixl \ --disaggregation-transfer-backend nixl \
--disaggregation-bootstrap-port 30001 \ --disaggregation-bootstrap-port 30001 \
--host 0.0.0.0 \ --host 0.0.0.0 \
--load-balance-method round_robin \
--mem-fraction-static 0.82 --mem-fraction-static 0.82
``` ```
...@@ -80,6 +82,7 @@ python3 -m dynamo.sglang \ ...@@ -80,6 +82,7 @@ python3 -m dynamo.sglang \
--disaggregation-transfer-backend nixl \ --disaggregation-transfer-backend nixl \
--disaggregation-bootstrap-port 30001 \ --disaggregation-bootstrap-port 30001 \
--host 0.0.0.0 \ --host 0.0.0.0 \
--prefill-round-robin-balance \
--mem-fraction-static 0.82 --mem-fraction-static 0.82
``` ```
...@@ -100,6 +103,7 @@ python3 -m dynamo.sglang \ ...@@ -100,6 +103,7 @@ python3 -m dynamo.sglang \
--disaggregation-transfer-backend nixl \ --disaggregation-transfer-backend nixl \
--disaggregation-bootstrap-port 30001 \ --disaggregation-bootstrap-port 30001 \
--host 0.0.0.0 \ --host 0.0.0.0 \
--prefill-round-robin-balance \
--mem-fraction-static 0.82 --mem-fraction-static 0.82
``` ```
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment