Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
change
sglang
Commits
d0934a51
Unverified
Commit
d0934a51
authored
Aug 28, 2025
by
Liangsheng Yin
Committed by
GitHub
Aug 28, 2025
Browse files
gpt-oss blog reproduction document (#9728)
parent
3f2d0cef
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
163 additions
and
0 deletions
+163
-0
benchmark/gpt_oss/README.md
benchmark/gpt_oss/README.md
+163
-0
No files found.
benchmark/gpt_oss/README.md
0 → 100644
View file @
d0934a51
# How to reproduce the result of GPT-OSS with SGLang
### Install the latest SGLang
```
bash
git clone https://github.com/sgl-project/sglang.git
cd
sglang
git checkout v0.5.1.post3
pip
install
--upgrade
pip
pip
install
-e
"python[all]"
```
### Reproduce the benchmark throughput result (Batch Size 1)
Launch Command
```
bash
# MXFP4 120B on H100
python3
-m
sglang.launch_server
--model
openai/gpt-oss-120b
--tp
8
--attention-backend
triton
# BF16 120B on H100
python3
-m
sglang.launch_server
--model
lmsys/gpt-oss-120b-bf16
--tp
8
--attention-backend
triton
# MXFP4 120B on B200
python3
-m
sglang.launch_server
--model
openai/gpt-oss-120b
--tp
4
# BF16 120B on B200
python3
-m
sglang.launch_server
--model
lmsys/gpt-oss-120b-bf16
--tp
4
```
Benchmark Command
```
bash
# MXFP4 120B on H100
python3
-m
sglang.bench_one_batch_server
--model
openai/gpt-oss-120b
--base-url
http://localhost:30000
--batch-size
1
--input-len
1024
--output-len
512
--show-report
```
### Reproduce the benchmark throughput result (Batch Size 32)
Launch Command
```
bash
# MXFP4 120B on H100
python3
-m
sglang.launch_server
--model
openai/gpt-oss-120b
--tp
8
# BF16 120B on H100
python3
-m
sglang.launch_server
--model
lmsys/gpt-oss-120b-bf16
--tp
8
# MXFP4 120B on B200
python3
-m
sglang.launch_server
--model
openai/gpt-oss-120b
--tp
4
# BF16 120B on B200
python3
-m
sglang.launch_server
--model
lmsys/gpt-oss-120b-bf16
--tp
4
```
Benchmark Command
```
bash
python3
-m
sglang.bench_one_batch_server
--model
openai/gpt-oss-120b
--base-url
http://localhost:30000
--batch-size
32
--input-len
1024 8192
--output-len
512
--show-report
```
### Reproduce the evaluation result
Install gpt-oss
```
bash
git clone https://github.com/openai/gpt-oss.git
cd
gpt-oss
pip
install
-e
.
```
Evaluation Command
```
bash
DATASET
=
gpqa
BASE_URL
=
YOUR_BASE_URL
OPENAI_API_KEY
=
dummy python
-m
gpt_oss.evals
\
--base-url
${
BASE_URL
}
/v1
\
--model
dummy
\
--reasoning-effort
low,medium,high
\
--eval
$DATASET
\
--n-threads
1000
```
### Reproduce the benchmark result of acceptance length
```
bash
config_list
=(
"1,0,0,0"
"1,3,1,4"
"1,5,4,8"
)
python3 bench_model_speedup.py
\
--model-path
openai/gpt-oss-120b
\
--speculative-draft-model-path
lmsys/EAGLE3-gpt-oss-120b-bf16
\
--port
20001
\
--trust-remote-code
\
--mem-fraction-static
0.8
\
--tp-size
4
\
--attention-backend
fa3
\
--config-list
"
${
config_list
[@]
}
"
\
--benchmark-list
mtbench:80 gsm8k:200 humaneval:200 math500:200
\
--output
lmsys_gpt-oss-120b_Eagle3_result.jsonl
python3 bench_model_speedup.py
\
--model-path
openai/gpt-oss-120b
\
--speculative-draft-model-path
nvidia/gpt-oss-120b-Eagle3
\
--port
20001
\
--trust-remote-code
\
--mem-fraction-static
0.8
\
--tp-size
4
\
--attention-backend
fa3
\
--config-list
"
${
config_list
[@]
}
"
\
--benchmark-list
mtbench:80 gsm8k:200 humaneval:200 math500:200
\
--output
nv_gpt-oss-120b_Eagle3_result.jsonl
```
### Reproduce the result of speculative decoding speedup
Launch Command
```
bash
# On Hopper:
# - Tree decoding (topk > 1) and chain decoding (topk = 1) are supported on both FA3 and Triton backends.
python3
-m
sglang.launch_server
--model
openai/gpt-oss-120b
--speculative-algorithm
EAGLE3
--speculative-draft-model-path
lmsys/EAGLE3-gpt-oss-120b-bf16
--speculative-num-steps
3
--speculative-eagle-topk
1
--speculative-num-draft-tokens
4
--tp
4
python3
-m
sglang.launch_server
--model
openai/gpt-oss-120b
--speculative-algorithm
EAGLE3
--speculative-draft-model-path
lmsys/EAGLE3-gpt-oss-120b-bf16
--speculative-num-steps
5
--speculative-eagle-topk
4
--speculative-num-draft-tokens
8
--tp
4
# On Blackwell:
# - Chain decoding (topk = 1) is supported on TRTLLM-MHA backend. Tree decoding (topk > 1) is in progress, stay tuned!
# - Both tree decoding (topk > 1) and chain decoding (topk = 1) are supported on the Triton backend.
python3
-m
sglang.launch_server
--model
openai/gpt-oss-120b
--speculative-algo
EAGLE3
--speculative-draft
lmsys/EAGLE3-gpt-oss-120b-bf16
--speculative-num-steps
3
--speculative-eagle-topk
1
--speculative-num-draft-tokens
4
--tp
4
python3
-m
sglang.launch_server
--model
openai/gpt-oss-120b
--speculative-algo
EAGLE3
--speculative-draft
lmsys/EAGLE3-gpt-oss-120b-bf16
--speculative-num-steps
5
--speculative-eagle-topk
4
--speculative-num-draft-tokens
8
--attention-backend
triton
--tp
4
```
Benchmark Command
```
bash
git clone https://github.com/sgl-project/SpecForge.git
cd
SpecForge/benchmarks
config_list
=(
"1,0,0,0"
"1,3,1,4"
"1,5,4,8"
)
python3 bench_model_speedup.py
\
--model-path
openai/gpt-oss-120b
\
--speculative-draft-model-path
lmsys/EAGLE3-gpt-oss-120b-bf16
\
--port
20001
\
--trust-remote-code
\
--mem-fraction-static
0.8
\
--tp-size
4
\
--attention-backend
fa3
\
--config-list
"
${
config_list
[@]
}
"
\
--benchmark-list
gsm8k:200 humaneval:200 math500:200
\
--output
lmsys_gpt-oss-120b_Eagle3_result.jsonl
```
We can gain the best speedup with the following settings:
-
**1.39x**
speedup with the
`--speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4`
setting.
-
**1.52x**
speedup with the
`--speculative-num-steps 5 --speculative-eagle-topk 4 --speculative-num-draft-tokens 8`
setting.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment