Unverified Commit 72aaac5b authored by Andreas Karatzas's avatar Andreas Karatzas Committed by GitHub
Browse files

[ROCm][Bugfix] Add MLACommonMetadata to allowed attention types for speculative decoding (#30430)


Signed-off-by: default avatarAndreas Karatzas <akaratza@amd.com>
parent 0e71eaa6
...@@ -178,6 +178,12 @@ class EagleProposer: ...@@ -178,6 +178,12 @@ class EagleProposer:
) )
rocm_types.append(AiterFlashAttentionMetadata) rocm_types.append(AiterFlashAttentionMetadata)
# TRITON_MLA backend support for MLA models (e.g., DeepSeek)
from vllm.v1.attention.backends.mla.common import MLACommonMetadata
rocm_types.append(MLACommonMetadata)
self.allowed_attn_types = tuple(rocm_types) self.allowed_attn_types = tuple(rocm_types)
# Parse the speculative token tree. # Parse the speculative token tree.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment