Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
72273242
Commit
72273242
authored
Jul 21, 2025
by
zhuwenwen
Browse files
update max_seq_len_to_capture
parent
267cc5ff
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
7 additions
and
4 deletions
+7
-4
vllm/config.py
vllm/config.py
+7
-4
No files found.
vllm/config.py
View file @
72273242
...
@@ -313,7 +313,8 @@ class ModelConfig:
...
@@ -313,7 +313,8 @@ class ModelConfig:
graph and always execute the model in eager mode. If False, we will use
graph and always execute the model in eager mode. If False, we will use
CUDA graph and eager execution in hybrid for maximal performance and
CUDA graph and eager execution in hybrid for maximal performance and
flexibility."""
flexibility."""
max_seq_len_to_capture
:
int
=
8192
# max_seq_len_to_capture: int = 8192
max_seq_len_to_capture
:
bool
=
None
"""Maximum sequence len covered by CUDA graphs. When a sequence has context
"""Maximum sequence len covered by CUDA graphs. When a sequence has context
length larger than this, we fall back to eager mode. Additionally for
length larger than this, we fall back to eager mode. Additionally for
encoder-decoder models, if the sequence length of the encoder input is
encoder-decoder models, if the sequence length of the encoder input is
...
@@ -973,9 +974,11 @@ class ModelConfig:
...
@@ -973,9 +974,11 @@ class ModelConfig:
"non-quantized models."
,
self
.
quantization
)
"non-quantized models."
,
self
.
quantization
)
def
_verify_cuda_graph
(
self
)
->
None
:
def
_verify_cuda_graph
(
self
)
->
None
:
# self.max_seq_len_to_capture = min(self.max_seq_len_to_capture,
if
self
.
max_seq_len_to_capture
is
None
:
# self.max_model_len)
self
.
max_seq_len_to_capture
=
self
.
max_model_len
self
.
max_seq_len_to_capture
=
self
.
max_model_len
self
.
max_seq_len_to_capture
=
min
(
self
.
max_seq_len_to_capture
,
self
.
max_model_len
)
# self.max_seq_len_to_capture = self.max_model_len
# CUDAGraph capture not supported for enc-dec models and mllama on ROCm
# CUDAGraph capture not supported for enc-dec models and mllama on ROCm
ROCM_UNSUPPORTED_MODELS
=
[
'mllama'
]
ROCM_UNSUPPORTED_MODELS
=
[
'mllama'
]
unsupported_rocm
=
(
self
.
hf_config
.
model_type
unsupported_rocm
=
(
self
.
hf_config
.
model_type
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment