[PyTorch] Fix tp_size for MQA/GQA (#1044)

fix tp_size for GQA Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

[PyTorch] Fix tp_size for MQA/GQA (#1044)
fix tp_size for GQA Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
0b303dad · Charlene Yang · GitHub · 4cc220c9 · 0b303dad
Unverified Commit 0b303dad authored Jul 26, 2024 by Charlene Yang Committed by GitHub Jul 26, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

transformer_engine/pytorch/attention.py transformer_engine/pytorch/attention.py +1 -1

No files found.
--- a/transformer_engine/pytorch/attention.py
+++ b/transformer_engine/pytorch/attention.py
@@ -5125,7 +5125,7 @@ class DotProductAttention(TransformerEngineBaseModule):
        self.hidden_size_per_attention_head = kv_channels
        self.num_gqa_groups = num_attention_heads if num_gqa_groups is None else num_gqa_groups
-        self.num_gqa_groups_per_partition = int(self.num_gqa_groups // tp_size)
+        self.num_gqa_groups_per_partition = int(self.num_gqa_groups // self.tp_size)
        assert (
            num_attention_heads % self.num_gqa_groups == 0