Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
bc3456b4
"examples/vscode:/vscode.git/clone" did not exist on "14eceb43df78d11407df03059f0e857d88c991ea"
Unverified
Commit
bc3456b4
authored
Aug 05, 2025
by
Neal Vaidya
Committed by
GitHub
Aug 05, 2025
Browse files
docs: fix issues in gpt-oss guide (#2304)
parent
20d284e8
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
8 additions
and
8 deletions
+8
-8
components/backends/trtllm/gpt-oss.md
components/backends/trtllm/gpt-oss.md
+7
-7
components/backends/trtllm/launch/gpt_oss_disagg.sh
components/backends/trtllm/launch/gpt_oss_disagg.sh
+1
-1
No files found.
components/backends/trtllm/gpt-oss.md
View file @
bc3456b4
...
...
@@ -70,7 +70,7 @@ docker build -f container/Dockerfile.tensorrt_llm_prebuilt . \
```
bash
export
MODEL_PATH
=
<LOCAL_MODEL_DIRECTORY>
huggingface-cli download openai/gpt-oss-120b
--
in
clude
"original/*"
--local-dir
$MODEL_PATH
huggingface-cli download openai/gpt-oss-120b
--
ex
clude
"original/*"
--exclude
"metal/*"
--local-dir
$MODEL_PATH
```
### 3. Run the Container
...
...
@@ -84,7 +84,7 @@ docker run \
--rm
\
--network
host
\
--volume
$MODEL_PATH
:/model
\
--volume
$PWD
:/workspace
/dynamo
\
--volume
$PWD
:/workspace
\
--shm-size
=
10G
\
--ulimit
memlock
=
-1
\
--ulimit
stack
=
67108864
\
...
...
@@ -149,7 +149,7 @@ You can use the provided launch script or run the components manually:
#### Option A: Using the Launch Script
```
bash
cd
/workspace/
dynamo/
components/backends/trtllm
cd
/workspace/components/backends/trtllm
./launch/gpt_oss_disagg.sh
```
...
...
@@ -170,7 +170,7 @@ python3 -m dynamo.frontend --router-mode round-robin --http-port 8000 &
```
bash
CUDA_VISIBLE_DEVICES
=
0,1,2,3 python3
-m
dynamo.trtllm
\
--model-path
/model
\
--served-model-name
gpt-oss-120b
\
--served-model-name
openai/
gpt-oss-120b
\
--extra-engine-args
engine_configs/gpt_oss/prefill.yaml
\
--disaggregation-mode
prefill
\
--disaggregation-strategy
prefill_first
\
...
...
@@ -185,7 +185,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m dynamo.trtllm \
```
bash
CUDA_VISIBLE_DEVICES
=
4,5,6,7 python3
-m
dynamo.trtllm
\
--model-path
/model
\
--served-model-name
gpt-oss-120b
\
--served-model-name
openai/
gpt-oss-120b
\
--extra-engine-args
engine_configs/gpt_oss/decode.yaml
\
--disaggregation-mode
decode
\
--disaggregation-strategy
prefill_first
\
...
...
@@ -204,7 +204,7 @@ Send a test request to verify the deployment:
curl
-X
POST http://localhost:8000/v1/responses
\
-H
"Content-Type: application/json"
\
-d
'{
"model": "gpt-oss-120b",
"model": "
openai/
gpt-oss-120b",
"input": "Explain the concept of disaggregated serving in LLM inference in 3 sentences.",
"max_output_tokens": 200,
"stream": false
...
...
@@ -227,7 +227,7 @@ mkdir -p /tmp/benchmark-results
# Run the benchmark - this command tests the deployment with high-concurrency synthetic workload
genai-perf profile
\
--model
gpt-oss-120b
\
--model
openai/
gpt-oss-120b
\
--tokenizer
/model
\
--endpoint-type
chat
\
--endpoint
/v1/chat/completions
\
...
...
components/backends/trtllm/launch/gpt_oss_disagg.sh
View file @
bc3456b4
...
...
@@ -4,7 +4,7 @@
# Environment variables with defaults
export
MODEL_PATH
=
${
MODEL_PATH
:-
"/model"
}
export
SERVED_MODEL_NAME
=
${
SERVED_MODEL_NAME
:-
"gpt-oss-120b"
}
export
SERVED_MODEL_NAME
=
${
SERVED_MODEL_NAME
:-
"
openai/
gpt-oss-120b"
}
export
DISAGGREGATION_STRATEGY
=
${
DISAGGREGATION_STRATEGY
:-
"prefill_first"
}
export
PREFILL_ENGINE_ARGS
=
${
PREFILL_ENGINE_ARGS
:-
"engine_configs/gpt_oss/prefill.yaml"
}
export
DECODE_ENGINE_ARGS
=
${
DECODE_ENGINE_ARGS
:-
"engine_configs/gpt_oss/decode.yaml"
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment