Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
a8ffc4f0
Unverified
Commit
a8ffc4f0
authored
Sep 23, 2025
by
Michael Goin
Committed by
GitHub
Sep 23, 2025
Browse files
[Bugfix] Lower gpt-oss max cudagraph size to 992 to be compatible with FA3 (#25508)
Signed-off-by:
mgoin
<
mgoin64@gmail.com
>
parent
d5944d51
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
5 deletions
+5
-5
vllm/model_executor/models/config.py
vllm/model_executor/models/config.py
+5
-5
No files found.
vllm/model_executor/models/config.py
View file @
a8ffc4f0
...
@@ -266,24 +266,24 @@ class GptOssForCausalLMConfig(VerifyAndUpdateConfig):
...
@@ -266,24 +266,24 @@ class GptOssForCausalLMConfig(VerifyAndUpdateConfig):
if
structured_outputs_config
.
reasoning_parser
==
""
:
if
structured_outputs_config
.
reasoning_parser
==
""
:
structured_outputs_config
.
reasoning_parser
=
"openai_gptoss"
structured_outputs_config
.
reasoning_parser
=
"openai_gptoss"
# Increase the max capture size from 512 to
1024
for performance.
# Increase the max capture size from 512 to
992
for performance.
# NOTE(woosuk): This will increase the number of CUDA graphs
# NOTE(woosuk): This will increase the number of CUDA graphs
# from 67 to 8
3
.
# from 67 to 8
1
.
scheduler_config
=
vllm_config
.
scheduler_config
scheduler_config
=
vllm_config
.
scheduler_config
if
len
(
scheduler_config
.
cuda_graph_sizes
)
==
1
:
if
len
(
scheduler_config
.
cuda_graph_sizes
)
==
1
:
max_capture_size
=
scheduler_config
.
cuda_graph_sizes
[
0
]
max_capture_size
=
scheduler_config
.
cuda_graph_sizes
[
0
]
# FIXME(woosuk): When using full cuda graph with FA3, the max
# FIXME(woosuk): When using full cuda graph with FA3, the max
# supported size is 992.
# supported size is 992.
if
max_capture_size
<
1024
:
if
max_capture_size
<
992
:
cuda_graph_sizes
=
[
1
,
2
,
4
]
cuda_graph_sizes
=
[
1
,
2
,
4
]
# Step size 8 for small batch sizes
# Step size 8 for small batch sizes
cuda_graph_sizes
+=
[
i
for
i
in
range
(
8
,
256
,
8
)]
cuda_graph_sizes
+=
[
i
for
i
in
range
(
8
,
256
,
8
)]
# Step size 16 for larger batch sizes
# Step size 16 for larger batch sizes
cuda_graph_sizes
+=
[
i
for
i
in
range
(
256
,
1025
,
16
)]
cuda_graph_sizes
+=
[
i
for
i
in
range
(
256
,
993
,
16
)]
scheduler_config
.
cuda_graph_sizes
=
cuda_graph_sizes
scheduler_config
.
cuda_graph_sizes
=
cuda_graph_sizes
logger
.
info
(
logger
.
info
(
"Overriding max cuda graph capture size to "
"Overriding max cuda graph capture size to "
"%d for performance."
,
1024
)
"%d for performance."
,
992
)
class
MambaModelConfig
(
VerifyAndUpdateConfig
):
class
MambaModelConfig
(
VerifyAndUpdateConfig
):
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment