Commit 3f80d9ac authored by lizhigong's avatar lizhigong
Browse files

修改MLA prefill阶段出现的Device2Host拷贝同步现象。

parent cb563bb5
......@@ -1072,6 +1072,8 @@ class MLACommonImpl(MLAAttentionImpl[T], Generic[T]):
self.use_llama_nn = os.environ.get('LLAMA_NN') == '1'
self.has_context_default = os.environ.get('VLLM_HAS_CONTEXT_DEFAULT') == '1'
# For MLA the v head dim is smaller than qk head dim so we pad out
# v with 0s to match the qk head dim for attention backends that do
# not support different headdims
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment