[Bugfix] Correct num_q_heads on DCP for Flashinfer backends (#29487)

Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>

[Bugfix] Correct num_q_heads on DCP for Flashinfer backends (#29487)
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
d698bb38 · Jingchun Gao · GitHub · 2c22c4ca · d698bb38
Unverified Commit d698bb38 authored Dec 05, 2025 by Jingchun Gao Committed by GitHub Dec 05, 2025
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 3 deletions

vllm/v1/attention/backends/flashinfer.py vllm/v1/attention/backends/flashinfer.py +2 -3

No files found.
--- a/vllm/v1/attention/backends/flashinfer.py
+++ b/vllm/v1/attention/backends/flashinfer.py
@@ -482,9 +482,8 @@ class FlashInferMetadataBuilder(AttentionMetadataBuilder[FlashInferMetadata]):
            self.dcp_rank = 0
            self.dcp_kv_cache_interleave_size = 1

-        self.num_qo_heads = (
-            self.model_config.get_num_attention_heads(self.vllm_config.parallel_config)
-            * self.dcp_world_size
+        self.num_qo_heads = self.model_config.get_num_attention_heads(
+            self.vllm_config.parallel_config
        )

        self.num_kv_heads = self.kv_cache_spec.num_kv_heads