Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
87ea346d
Unverified
Commit
87ea346d
authored
Oct 21, 2025
by
ishandhanani
Committed by
GitHub
Oct 22, 2025
Browse files
fix: gb200 nixl instructions and max cuda graph bs on h100 instructions (#3807)
parent
656b4c44
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
8 additions
and
4 deletions
+8
-4
docs/backends/sglang/dsr1-wideep-gb200.md
docs/backends/sglang/dsr1-wideep-gb200.md
+4
-2
docs/backends/sglang/multinode-examples.md
docs/backends/sglang/multinode-examples.md
+4
-2
No files found.
docs/backends/sglang/dsr1-wideep-gb200.md
View file @
87ea346d
...
@@ -98,7 +98,8 @@ python3 -m dynamo.sglang \
...
@@ -98,7 +98,8 @@ python3 -m dynamo.sglang \
--chunked-prefill-size
16384
\
--chunked-prefill-size
16384
\
--max-total-tokens
32768
\
--max-total-tokens
32768
\
--mem-fraction-static
0.82
\
--mem-fraction-static
0.82
\
--log-level
debug
--log-level
debug
\
--disaggregation-transfer-backend
nixl
```
```
On the other prefill nodes (this example has 2 total prefill nodes), run the same command but change
`--node-rank`
to 1
On the other prefill nodes (this example has 2 total prefill nodes), run the same command but change
`--node-rank`
to 1
...
@@ -151,7 +152,8 @@ python3 -m dynamo.sglang \
...
@@ -151,7 +152,8 @@ python3 -m dynamo.sglang \
--watchdog-timeout
1000000
\
--watchdog-timeout
1000000
\
--chunked-prefill-size
36864
\
--chunked-prefill-size
36864
\
--mem-fraction-static
0.82
\
--mem-fraction-static
0.82
\
--log-level
debug
--log-level
debug
\
--disaggregation-transfer-backend
nixl
```
```
On the other decode nodes (this example has 2 total decode nodes), run the same command but change
`--node-rank`
to 1.
On the other decode nodes (this example has 2 total decode nodes), run the same command but change
`--node-rank`
to 1.
\ No newline at end of file
docs/backends/sglang/multinode-examples.md
View file @
87ea346d
...
@@ -83,7 +83,8 @@ python3 -m dynamo.sglang \
...
@@ -83,7 +83,8 @@ python3 -m dynamo.sglang \
--disaggregation-bootstrap-port
30001
\
--disaggregation-bootstrap-port
30001
\
--host
0.0.0.0
\
--host
0.0.0.0
\
--prefill-round-robin-balance
\
--prefill-round-robin-balance
\
--mem-fraction-static
0.82
--mem-fraction-static
0.82
\
--cuda-graph-max-bs
8
```
```
Node 4: Run the remaining 8 shards of the decode worker
Node 4: Run the remaining 8 shards of the decode worker
...
@@ -104,7 +105,8 @@ python3 -m dynamo.sglang \
...
@@ -104,7 +105,8 @@ python3 -m dynamo.sglang \
--disaggregation-bootstrap-port
30001
\
--disaggregation-bootstrap-port
30001
\
--host
0.0.0.0
\
--host
0.0.0.0
\
--prefill-round-robin-balance
\
--prefill-round-robin-balance
\
--mem-fraction-static
0.82
--mem-fraction-static
0.82
\
--cuda-graph-max-bs
8
```
```
**Step 2**
: Run inference
**Step 2**
: Run inference
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment