Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
08bfedc1
Unverified
Commit
08bfedc1
authored
Apr 07, 2026
by
Yubo Wang
Committed by
GitHub
Apr 07, 2026
Browse files
[Bugfix] Fix extract_hidden_states crash with quantized KV cache dtype (#39160)
Signed-off-by:
Yubo Wang
<
yubowang2019@gmail.com
>
parent
0102bd2f
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
0 deletions
+5
-0
vllm/model_executor/models/extract_hidden_states.py
vllm/model_executor/models/extract_hidden_states.py
+5
-0
No files found.
vllm/model_executor/models/extract_hidden_states.py
View file @
08bfedc1
...
...
@@ -9,6 +9,7 @@ extract_hidden_states speculative decoding method.
"""
from
collections.abc
import
Iterable
from
dataclasses
import
replace
from
typing
import
ClassVar
import
torch
...
...
@@ -352,6 +353,10 @@ class ExtractHiddenStatesModel(nn.Module):
cache_config
=
vllm_config
.
cache_config
# Hidden states dtype should be independent of KV cache dtype.
if
cache_config
is
not
None
and
is_quantized_kv_cache
(
cache_config
.
cache_dtype
):
cache_config
=
replace
(
cache_config
,
cache_dtype
=
"auto"
)
# Create a single cache-only attention layer
# Note: We set num_heads <- self.num_hidden_states
# and head_size <- hidden_size so that we can insert
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment