[PyTorch] Fix MultiheadAttention docstring (#634)

Fix MHA docstring Signed-off-by: Isaac Ong <isaacong.jw@gmail.com>

[PyTorch] Fix MultiheadAttention docstring (#634)
Fix MHA docstring Signed-off-by: Isaac Ong <isaacong.jw@gmail.com>
e531cd2f · Isaac Ong · GitHub · 6c1a8bb5 · e531cd2f
Unverified Commit e531cd2f authored Jan 26, 2024 by Isaac Ong Committed by GitHub Jan 26, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

transformer_engine/pytorch/attention.py transformer_engine/pytorch/attention.py +2 -2

No files found.
--- a/transformer_engine/pytorch/attention.py
+++ b/transformer_engine/pytorch/attention.py
@@ -2909,8 +2909,8 @@ class MultiheadAttention(torch.nn.Module):
                             together with the output of the linear transformation.
                             Example use case: residual connection for transformer module is
                             taken post layernorm.
-    input_layernorm: bool, default = `True`
+    input_layernorm: bool, default = `False`
-                     if set to `False`, layer normalization to the input is not applied.
+                     if set to `True`, layer normalization to the input is applied.
    attention_type: { 'self', 'cross' }, default = 'self'
                   type of attention applied.
    zero_centered_gamma : bool, default = 'False'