fix some note

Signed-off-by: zhaochao <zhaochao1@sugon.com>

fix some note
Signed-off-by: zhaochao <zhaochao1@sugon.com>
183a88cf · zhaochao · ca2958a8 · 183a88cf
Commit 183a88cf authored Oct 23, 2025 by zhaochao
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

transformer_engine/pytorch/attention/dot_product_attention/backends.py ...ngine/pytorch/attention/dot_product_attention/backends.py +2 -2

No files found.
--- a/transformer_engine/pytorch/attention/dot_product_attention/backends.py
+++ b/transformer_engine/pytorch/attention/dot_product_attention/backends.py
@@ -890,12 +890,12 @@ class FlashAttention(torch.nn.Module):
        elif q_format == "thd":
            # thd -> t(hd)
            output = output.reshape(output.shape[0], -1)
+        # Handle output shape when V head dim differs from Q/K head dim
        if value_layer.shape[-1] != query_layer.shape[-1]:
            v_dim = value_layer.shape[-1]
            num_heads = query_layer.shape[-2]
-            # 恢复为 (..., num_heads, head_dim_qk)
            out_shape_heads = output.shape[:-1] + (num_heads, query_layer.shape[-1])
-            output = output.view(out_shape_heads)[..., :v_dim]          # 裁剪到 V 的维度
+            output = output.view(out_shape_heads)[..., :v_dim]          
            output = output.reshape(output.shape[:-2] + (num_heads * v_dim,))
        return output.contiguous()