修改MLA prefill阶段出现的Device2Host拷贝同步现象。

3f80d9ac · lizhigong · cb563bb5 · 3f80d9ac
Commit 3f80d9ac authored May 22, 2025 by lizhigong
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 0 deletions

vllm/attention/backends/mla/common.py vllm/attention/backends/mla/common.py +2 -0

No files found.
--- a/vllm/attention/backends/mla/common.py
+++ b/vllm/attention/backends/mla/common.py
@@ -1072,6 +1072,8 @@ class MLACommonImpl(MLAAttentionImpl[T], Generic[T]):
        
        self.use_llama_nn = os.environ.get('LLAMA_NN') == '1'

+        self.has_context_default = os.environ.get('VLLM_HAS_CONTEXT_DEFAULT') == '1'
+
        # For MLA the v head dim is smaller than qk head dim so we pad out
        # v with 0s to match the qk head dim for attention backends that do
        # not support different headdims