Fix uninitialized parameter in conformer relative attention. (#18368)

`torch.Tensor` creates an unitialized tensor (as via `torch.empty`), this leads to undeterministic behavior, poor initialization, and nans if you have unlucky init. The paper does not specify the initialization for bias terms, so I guess zero seems like a good choice - no bias initially. `torch.Tensor` is usually populated with zeros, so this fix will be close to the intended behavior: ``` >>> torch.Tensor(100, 100).sum() tensor(0.) >>> torch.Tensor(100, 100).sum() tensor(nan) >>> torch.Tensor(100, 100).sum() tensor(0.) ```

Fix uninitialized parameter in conformer relative attention. (#18368)
`torch.Tensor` creates an unitialized tensor (as via `torch.empty`), this leads to undeterministic behavior, poor initialization, and nans if you have unlucky init. The paper does not specify the initialization for bias terms, so I guess zero seems like a good choice - no bias initially. `torch.Tensor` is usually populated with zeros, so this fix will be close to the intended behavior: ``` >>> torch.Tensor(100, 100).sum() tensor(0.) >>> torch.Tensor(100, 100).sum() tensor(nan) >>> torch.Tensor(100, 100).sum() tensor(0.) ```
68a894a5 · Piotr Dabkowski · GitHub · df5e4232 · 68a894a5
Unverified Commit 68a894a5 authored Aug 02, 2022 by Piotr Dabkowski Committed by GitHub Aug 02, 2022
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

src/transformers/models/wav2vec2_conformer/modeling_wav2vec2_conformer.py .../models/wav2vec2_conformer/modeling_wav2vec2_conformer.py +2 -2

No files found.
--- a/src/transformers/models/wav2vec2_conformer/modeling_wav2vec2_conformer.py
+++ b/src/transformers/models/wav2vec2_conformer/modeling_wav2vec2_conformer.py
@@ -670,8 +670,8 @@ class Wav2Vec2ConformerSelfAttention(nn.Module):
            self.linear_pos = nn.Linear(config.hidden_size, config.hidden_size, bias=False)
            # these two learnable bias are used in matrix c and matrix d
            # as described in https://arxiv.org/abs/1901.02860 Section 3.3
-            self.pos_bias_u = nn.Parameter(torch.Tensor(self.num_heads, self.head_size))
-            self.pos_bias_v = nn.Parameter(torch.Tensor(self.num_heads, self.head_size))
+            self.pos_bias_u = nn.Parameter(torch.zeros(self.num_heads, self.head_size))
+            self.pos_bias_v = nn.Parameter(torch.zeros(self.num_heads, self.head_size))

    def forward(
        self,