[PyTorch] Reduce the amount of roundup for max_seqlen in THD (#1079)

reduce the roundup of max_seqlen for THD to multiples of 64 Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>

[PyTorch] Reduce the amount of roundup for max_seqlen in THD (#1079)
reduce the roundup of max_seqlen for THD to multiples of 64 Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
8833a8d0 · Charlene Yang · GitHub · 121ff62a · 8833a8d0
Unverified Commit 8833a8d0 authored Aug 06, 2024 by Charlene Yang Committed by GitHub Aug 06, 2024
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

transformer_engine/pytorch/attention.py transformer_engine/pytorch/attention.py +2 -2

No files found.
--- a/transformer_engine/pytorch/attention.py
+++ b/transformer_engine/pytorch/attention.py
@@ -5725,13 +5725,13 @@ class DotProductAttention(TransformerEngineBaseModule):
                        seqlens_q = cu_seqlens_q_padded[1:] - cu_seqlens_q_padded[:-1]
                    else:
                        seqlens_q = cu_seqlens_q[1:] - cu_seqlens_q[:-1]
-                    max_seqlen_q = pow(2, math.ceil(math.log2(seqlens_q.max().item())))
+                    max_seqlen_q = int((seqlens_q.max().item() + 63) // 64 * 64)
                if max_seqlen_kv is None:
                    if cu_seqlens_kv_padded is not None:
                        seqlens_kv = cu_seqlens_kv_padded[1:] - cu_seqlens_kv_padded[:-1]
                    else:
                        seqlens_kv = cu_seqlens_kv[1:] - cu_seqlens_kv[:-1]
-                    max_seqlen_kv = pow(2, math.ceil(math.log2(seqlens_kv.max().item())))
+                    max_seqlen_kv = int((seqlens_kv.max().item() + 63) // 64 * 64)
                batch_size = len(cu_seqlens_q) - 1
            cp_size = 1 if self.cp_group is None else get_distributed_world_size(self.cp_group)