Commit 2db9a54d authored by zhuwenwen's avatar zhuwenwen
Browse files

Merge branch 'v0.9.2-dev-wm' into 'v0.9.2-dev'

[fix]避免mla中cudagraph的适配影响非并行解码的逻辑

See merge request dcutoolkit/deeplearing/vllm!165
parents d0cc5577 0e5d399a
......@@ -690,7 +690,8 @@ class MLACommonMetadataBuilder(AttentionMetadataBuilder[M]):
def can_run_in_cudagraph(
self, common_attn_metadata: CommonAttentionMetadata) -> bool:
#return common_attn_metadata.max_query_len == 1
if not self.use_spec_decode:
return common_attn_metadata.max_query_len == 1
return self._num_prefills == 0
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment