Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
6b2ef5cd
Unverified
Commit
6b2ef5cd
authored
Mar 06, 2025
by
Michael Goin
Committed by
GitHub
Mar 06, 2025
Browse files
[Bug] Fix Attention when ignored in by quant_method (#14313)
Signed-off-by:
mgoin
<
mgoin64@gmail.com
>
parent
958adce4
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
1 deletion
+3
-1
vllm/attention/layer.py
vllm/attention/layer.py
+3
-1
No files found.
vllm/attention/layer.py
View file @
6b2ef5cd
...
...
@@ -11,6 +11,7 @@ from vllm.attention import AttentionType
from
vllm.attention.selector
import
backend_name_to_enum
,
get_attn_backend
from
vllm.config
import
CacheConfig
,
get_current_vllm_config
from
vllm.forward_context
import
ForwardContext
,
get_forward_context
from
vllm.model_executor.layers.linear
import
UnquantizedLinearMethod
from
vllm.model_executor.layers.quantization.base_config
import
(
QuantizationConfig
)
from
vllm.model_executor.layers.quantization.kv_cache
import
BaseKVCacheMethod
...
...
@@ -97,7 +98,8 @@ class Attention(nn.Module):
quant_method
=
quant_config
.
get_quant_method
(
self
,
prefix
=
prefix
)
if
quant_config
else
None
if
quant_method
is
not
None
:
if
quant_method
is
not
None
and
not
isinstance
(
quant_method
,
UnquantizedLinearMethod
):
assert
isinstance
(
quant_method
,
BaseKVCacheMethod
)
# TODO (mgoin): kv cache dtype should be specified in the FP8
# checkpoint config and become the "auto" behavior
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment