Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
8bf99b0b
Commit
8bf99b0b
authored
Mar 12, 2026
by
wanglong3
Browse files
fix: fix bug
http://hpczentao.sugon.com/bug-view-118388.html
parent
d2c4f48b
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
1 addition
and
1 deletion
+1
-1
vllm/model_executor/layers/attention/mla_attention.py
vllm/model_executor/layers/attention/mla_attention.py
+1
-1
No files found.
vllm/model_executor/layers/attention/mla_attention.py
View file @
8bf99b0b
...
...
@@ -1249,7 +1249,7 @@ class MLACommonBaseImpl(MLAAttentionImpl[A], Generic[A]):
# `W_UV` and `W_UK_T`, we just store fp16/bf16 copies and perform
# the bmm's in 16-bit, the extra memory overhead of this is fairly low
from
vllm.model_executor.layers.linear
import
UnquantizedLinearMethod
if
(
envs
.
VLLM_USE_NN
or
self
.
use_llama_nn
)
and
isinstance
(
self
.
kv_b_proj
.
quant_method
,
UnquantizedLinearMethod
):
if
(
envs
.
VLLM_USE_NN
or
self
.
use_llama_nn
)
:
#
and isinstance(self.kv_b_proj.quant_method, UnquantizedLinearMethod):
kv_b_proj_weight
=
get_and_maybe_dequant_weights
(
self
.
kv_b_proj
,
out_dtype
=
act_dtype
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment