[Bug] Fix `Number of dimensions of tensors must match.` for Deepseek V3.2 (#31160)

Signed-off-by: yewentao256 <zhyanwentao@126.com>

[Bug] Fix `Number of dimensions of tensors must match.` for Deepseek V3.2 (#31160)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
76e6a951 · Wentao Ye · GitHub · 8b59753c · 76e6a951
Unverified Commit 76e6a951 authored Dec 23, 2025 by Wentao Ye Committed by GitHub Dec 24, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 3 deletions

vllm/model_executor/models/deepseek_v2.py vllm/model_executor/models/deepseek_v2.py +6 -3

No files found.
--- a/vllm/model_executor/models/deepseek_v2.py
+++ b/vllm/model_executor/models/deepseek_v2.py
@@ -878,11 +878,14 @@ class Indexer(nn.Module):
        )

        q_pe, k_pe = rotary_emb(positions, q_pe, k_pe.unsqueeze(1))
-        # `rotary_emb` is shape-preserving; `q_pe` is already
-        # [num_tokens, n_head, rope_dim].
+        # Note: RoPE (NeoX) can introduce extra leading dimensions during compilation
+        # so we need to reshape back to token-flattened shapes
+        q_pe = q_pe.reshape(-1, self.n_head, self.rope_dim)
+        k_pe = k_pe.reshape(-1, 1, self.rope_dim)
+
        q = torch.cat([q_pe, q_nope], dim=-1)
        # `k_pe` is [num_tokens, 1, rope_dim] (MQA).
-        k = torch.cat([k_pe.squeeze(1), k_nope], dim=-1)
+        k = torch.cat([k_pe.squeeze(-2), k_nope], dim=-1)

        # we only quant q here since k quant is fused with cache insertion
        q = q.view(-1, self.head_dim)