Unverified Commit 87ea346d authored by ishandhanani's avatar ishandhanani Committed by GitHub
Browse files

fix: gb200 nixl instructions and max cuda graph bs on h100 instructions (#3807)

parent 656b4c44
...@@ -98,7 +98,8 @@ python3 -m dynamo.sglang \ ...@@ -98,7 +98,8 @@ python3 -m dynamo.sglang \
--chunked-prefill-size 16384 \ --chunked-prefill-size 16384 \
--max-total-tokens 32768 \ --max-total-tokens 32768 \
--mem-fraction-static 0.82 \ --mem-fraction-static 0.82 \
--log-level debug --log-level debug \
--disaggregation-transfer-backend nixl
``` ```
On the other prefill nodes (this example has 2 total prefill nodes), run the same command but change `--node-rank` to 1 On the other prefill nodes (this example has 2 total prefill nodes), run the same command but change `--node-rank` to 1
...@@ -151,7 +152,8 @@ python3 -m dynamo.sglang \ ...@@ -151,7 +152,8 @@ python3 -m dynamo.sglang \
--watchdog-timeout 1000000 \ --watchdog-timeout 1000000 \
--chunked-prefill-size 36864 \ --chunked-prefill-size 36864 \
--mem-fraction-static 0.82 \ --mem-fraction-static 0.82 \
--log-level debug --log-level debug \
--disaggregation-transfer-backend nixl
``` ```
On the other decode nodes (this example has 2 total decode nodes), run the same command but change `--node-rank` to 1. On the other decode nodes (this example has 2 total decode nodes), run the same command but change `--node-rank` to 1.
\ No newline at end of file
...@@ -83,7 +83,8 @@ python3 -m dynamo.sglang \ ...@@ -83,7 +83,8 @@ python3 -m dynamo.sglang \
--disaggregation-bootstrap-port 30001 \ --disaggregation-bootstrap-port 30001 \
--host 0.0.0.0 \ --host 0.0.0.0 \
--prefill-round-robin-balance \ --prefill-round-robin-balance \
--mem-fraction-static 0.82 --mem-fraction-static 0.82 \
--cuda-graph-max-bs 8
``` ```
Node 4: Run the remaining 8 shards of the decode worker Node 4: Run the remaining 8 shards of the decode worker
...@@ -104,7 +105,8 @@ python3 -m dynamo.sglang \ ...@@ -104,7 +105,8 @@ python3 -m dynamo.sglang \
--disaggregation-bootstrap-port 30001 \ --disaggregation-bootstrap-port 30001 \
--host 0.0.0.0 \ --host 0.0.0.0 \
--prefill-round-robin-balance \ --prefill-round-robin-balance \
--mem-fraction-static 0.82 --mem-fraction-static 0.82 \
--cuda-graph-max-bs 8
``` ```
**Step 2**: Run inference **Step 2**: Run inference
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment