Commit 0e5d399a authored by 王敏's avatar 王敏
Browse files

[fix]避免mla中cudagraph的适配影响非并行解码的逻辑

parent fe393be8
......@@ -690,7 +690,8 @@ class MLACommonMetadataBuilder(AttentionMetadataBuilder[M]):
def can_run_in_cudagraph(
self, common_attn_metadata: CommonAttentionMetadata) -> bool:
#return common_attn_metadata.max_query_len == 1
if not self.use_spec_decode:
return common_attn_metadata.max_query_len == 1
return self._num_prefills == 0
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment