Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
bc3456b4
Unverified
Commit
bc3456b4
authored
Aug 05, 2025
by
Neal Vaidya
Committed by
GitHub
Aug 05, 2025
Browse files
docs: fix issues in gpt-oss guide (#2304)
parent
20d284e8
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
8 additions
and
8 deletions
+8
-8
components/backends/trtllm/gpt-oss.md
components/backends/trtllm/gpt-oss.md
+7
-7
components/backends/trtllm/launch/gpt_oss_disagg.sh
components/backends/trtllm/launch/gpt_oss_disagg.sh
+1
-1
No files found.
components/backends/trtllm/gpt-oss.md
View file @
bc3456b4
...
@@ -70,7 +70,7 @@ docker build -f container/Dockerfile.tensorrt_llm_prebuilt . \
...
@@ -70,7 +70,7 @@ docker build -f container/Dockerfile.tensorrt_llm_prebuilt . \
```
bash
```
bash
export
MODEL_PATH
=
<LOCAL_MODEL_DIRECTORY>
export
MODEL_PATH
=
<LOCAL_MODEL_DIRECTORY>
huggingface-cli download openai/gpt-oss-120b
--
in
clude
"original/*"
--local-dir
$MODEL_PATH
huggingface-cli download openai/gpt-oss-120b
--
ex
clude
"original/*"
--exclude
"metal/*"
--local-dir
$MODEL_PATH
```
```
### 3. Run the Container
### 3. Run the Container
...
@@ -84,7 +84,7 @@ docker run \
...
@@ -84,7 +84,7 @@ docker run \
--rm
\
--rm
\
--network
host
\
--network
host
\
--volume
$MODEL_PATH
:/model
\
--volume
$MODEL_PATH
:/model
\
--volume
$PWD
:/workspace
/dynamo
\
--volume
$PWD
:/workspace
\
--shm-size
=
10G
\
--shm-size
=
10G
\
--ulimit
memlock
=
-1
\
--ulimit
memlock
=
-1
\
--ulimit
stack
=
67108864
\
--ulimit
stack
=
67108864
\
...
@@ -149,7 +149,7 @@ You can use the provided launch script or run the components manually:
...
@@ -149,7 +149,7 @@ You can use the provided launch script or run the components manually:
#### Option A: Using the Launch Script
#### Option A: Using the Launch Script
```
bash
```
bash
cd
/workspace/
dynamo/
components/backends/trtllm
cd
/workspace/components/backends/trtllm
./launch/gpt_oss_disagg.sh
./launch/gpt_oss_disagg.sh
```
```
...
@@ -170,7 +170,7 @@ python3 -m dynamo.frontend --router-mode round-robin --http-port 8000 &
...
@@ -170,7 +170,7 @@ python3 -m dynamo.frontend --router-mode round-robin --http-port 8000 &
```
bash
```
bash
CUDA_VISIBLE_DEVICES
=
0,1,2,3 python3
-m
dynamo.trtllm
\
CUDA_VISIBLE_DEVICES
=
0,1,2,3 python3
-m
dynamo.trtllm
\
--model-path
/model
\
--model-path
/model
\
--served-model-name
gpt-oss-120b
\
--served-model-name
openai/
gpt-oss-120b
\
--extra-engine-args
engine_configs/gpt_oss/prefill.yaml
\
--extra-engine-args
engine_configs/gpt_oss/prefill.yaml
\
--disaggregation-mode
prefill
\
--disaggregation-mode
prefill
\
--disaggregation-strategy
prefill_first
\
--disaggregation-strategy
prefill_first
\
...
@@ -185,7 +185,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m dynamo.trtllm \
...
@@ -185,7 +185,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m dynamo.trtllm \
```
bash
```
bash
CUDA_VISIBLE_DEVICES
=
4,5,6,7 python3
-m
dynamo.trtllm
\
CUDA_VISIBLE_DEVICES
=
4,5,6,7 python3
-m
dynamo.trtllm
\
--model-path
/model
\
--model-path
/model
\
--served-model-name
gpt-oss-120b
\
--served-model-name
openai/
gpt-oss-120b
\
--extra-engine-args
engine_configs/gpt_oss/decode.yaml
\
--extra-engine-args
engine_configs/gpt_oss/decode.yaml
\
--disaggregation-mode
decode
\
--disaggregation-mode
decode
\
--disaggregation-strategy
prefill_first
\
--disaggregation-strategy
prefill_first
\
...
@@ -204,7 +204,7 @@ Send a test request to verify the deployment:
...
@@ -204,7 +204,7 @@ Send a test request to verify the deployment:
curl
-X
POST http://localhost:8000/v1/responses
\
curl
-X
POST http://localhost:8000/v1/responses
\
-H
"Content-Type: application/json"
\
-H
"Content-Type: application/json"
\
-d
'{
-d
'{
"model": "gpt-oss-120b",
"model": "
openai/
gpt-oss-120b",
"input": "Explain the concept of disaggregated serving in LLM inference in 3 sentences.",
"input": "Explain the concept of disaggregated serving in LLM inference in 3 sentences.",
"max_output_tokens": 200,
"max_output_tokens": 200,
"stream": false
"stream": false
...
@@ -227,7 +227,7 @@ mkdir -p /tmp/benchmark-results
...
@@ -227,7 +227,7 @@ mkdir -p /tmp/benchmark-results
# Run the benchmark - this command tests the deployment with high-concurrency synthetic workload
# Run the benchmark - this command tests the deployment with high-concurrency synthetic workload
genai-perf profile
\
genai-perf profile
\
--model
gpt-oss-120b
\
--model
openai/
gpt-oss-120b
\
--tokenizer
/model
\
--tokenizer
/model
\
--endpoint-type
chat
\
--endpoint-type
chat
\
--endpoint
/v1/chat/completions
\
--endpoint
/v1/chat/completions
\
...
...
components/backends/trtllm/launch/gpt_oss_disagg.sh
View file @
bc3456b4
...
@@ -4,7 +4,7 @@
...
@@ -4,7 +4,7 @@
# Environment variables with defaults
# Environment variables with defaults
export
MODEL_PATH
=
${
MODEL_PATH
:-
"/model"
}
export
MODEL_PATH
=
${
MODEL_PATH
:-
"/model"
}
export
SERVED_MODEL_NAME
=
${
SERVED_MODEL_NAME
:-
"gpt-oss-120b"
}
export
SERVED_MODEL_NAME
=
${
SERVED_MODEL_NAME
:-
"
openai/
gpt-oss-120b"
}
export
DISAGGREGATION_STRATEGY
=
${
DISAGGREGATION_STRATEGY
:-
"prefill_first"
}
export
DISAGGREGATION_STRATEGY
=
${
DISAGGREGATION_STRATEGY
:-
"prefill_first"
}
export
PREFILL_ENGINE_ARGS
=
${
PREFILL_ENGINE_ARGS
:-
"engine_configs/gpt_oss/prefill.yaml"
}
export
PREFILL_ENGINE_ARGS
=
${
PREFILL_ENGINE_ARGS
:-
"engine_configs/gpt_oss/prefill.yaml"
}
export
DECODE_ENGINE_ARGS
=
${
DECODE_ENGINE_ARGS
:-
"engine_configs/gpt_oss/decode.yaml"
}
export
DECODE_ENGINE_ARGS
=
${
DECODE_ENGINE_ARGS
:-
"engine_configs/gpt_oss/decode.yaml"
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment