Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
e81d4e69
Unverified
Commit
e81d4e69
authored
Sep 03, 2025
by
Jiangyun Zhu
Committed by
GitHub
Sep 03, 2025
Browse files
[Misc] Add check for dual_chunk_attention (#24070)
Signed-off-by:
zjy0516
<
riverclouds.zhu@qq.com
>
parent
02d411fd
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
6 additions
and
1 deletion
+6
-1
vllm/config/__init__.py
vllm/config/__init__.py
+6
-1
No files found.
vllm/config/__init__.py
View file @
e81d4e69
...
...
@@ -49,7 +49,8 @@ from vllm.transformers_utils.config import (
try_get_tokenizer_config
,
uses_mrope
)
from
vllm.transformers_utils.s3_utils
import
S3Model
from
vllm.transformers_utils.utils
import
is_s3
,
maybe_model_redirect
from
vllm.utils
import
(
DEFAULT_MAX_NUM_BATCHED_TOKENS
,
LayerBlockType
,
from
vllm.utils
import
(
DEFAULT_MAX_NUM_BATCHED_TOKENS
,
STR_DUAL_CHUNK_FLASH_ATTN_VAL
,
LayerBlockType
,
LazyLoader
,
common_broadcastable_dtype
,
random_uuid
)
if
TYPE_CHECKING
:
...
...
@@ -1304,6 +1305,10 @@ class ModelConfig:
self
.
hf_config
.
dual_chunk_attention_config
[
"sparse_attention_enabled"
]
=
True
if
envs
.
VLLM_ATTENTION_BACKEND
!=
STR_DUAL_CHUNK_FLASH_ATTN_VAL
:
raise
ValueError
(
"please set VLLM_ATTENTION_BACKEND to "
f
"
{
STR_DUAL_CHUNK_FLASH_ATTN_VAL
}
"
)
def
verify_async_output_proc
(
self
,
parallel_config
,
speculative_config
,
device_config
)
->
None
:
if
not
self
.
use_async_output_proc
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment