Unverified Commit d698bb38 authored by Jingchun Gao's avatar Jingchun Gao Committed by GitHub
Browse files

[Bugfix] Correct num_q_heads on DCP for Flashinfer backends (#29487)


Signed-off-by: default avatarJingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: default avatarJingchun Gao <63247409+gjc0824@users.noreply.github.com>
Co-authored-by: default avatarJingchun Gao <gaojingchun1@huawei.com>
parent 2c22c4ca
......@@ -482,9 +482,8 @@ class FlashInferMetadataBuilder(AttentionMetadataBuilder[FlashInferMetadata]):
self.dcp_rank = 0
self.dcp_kv_cache_interleave_size = 1
self.num_qo_heads = (
self.model_config.get_num_attention_heads(self.vllm_config.parallel_config)
* self.dcp_world_size
self.num_qo_heads = self.model_config.get_num_attention_heads(
self.vllm_config.parallel_config
)
self.num_kv_heads = self.kv_cache_spec.num_kv_heads
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment