Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
3f80d9ac
Commit
3f80d9ac
authored
May 22, 2025
by
lizhigong
Browse files
修改MLA prefill阶段出现的Device2Host拷贝同步现象。
parent
cb563bb5
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
0 deletions
+2
-0
vllm/attention/backends/mla/common.py
vllm/attention/backends/mla/common.py
+2
-0
No files found.
vllm/attention/backends/mla/common.py
View file @
3f80d9ac
...
@@ -1072,6 +1072,8 @@ class MLACommonImpl(MLAAttentionImpl[T], Generic[T]):
...
@@ -1072,6 +1072,8 @@ class MLACommonImpl(MLAAttentionImpl[T], Generic[T]):
self
.
use_llama_nn
=
os
.
environ
.
get
(
'LLAMA_NN'
)
==
'1'
self
.
use_llama_nn
=
os
.
environ
.
get
(
'LLAMA_NN'
)
==
'1'
self
.
has_context_default
=
os
.
environ
.
get
(
'VLLM_HAS_CONTEXT_DEFAULT'
)
==
'1'
# For MLA the v head dim is smaller than qk head dim so we pad out
# For MLA the v head dim is smaller than qk head dim so we pad out
# v with 0s to match the qk head dim for attention backends that do
# v with 0s to match the qk head dim for attention backends that do
# not support different headdims
# not support different headdims
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment