"docs/source/vscode:/vscode.git/clone" did not exist on "7ba028fccb82cbee792b67d596120da8ae9397c9"
Fix uninitialized parameter in conformer relative attention. (#18368)
`torch.Tensor` creates an unitialized tensor (as via `torch.empty`), this leads to undeterministic behavior, poor initialization, and nans if you have unlucky init. The paper does not specify the initialization for bias terms, so I guess zero seems like a good choice - no bias initially. `torch.Tensor` is usually populated with zeros, so this fix will be close to the intended behavior: ``` >>> torch.Tensor(100, 100).sum() tensor(0.) >>> torch.Tensor(100, 100).sum() tensor(nan) >>> torch.Tensor(100, 100).sum() tensor(0.) ```
Showing
Please register or sign in to comment