Make loading of pretrained gpt2 faster by avoiding initialization of Conv1D's weights (#21879)

apply normal_ after assigning weight as nn.Parameter to avoid unnecessary initialization computation

Make loading of pretrained gpt2 faster by avoiding initialization of Conv1D's weights (#21879)
apply normal_ after assigning weight as nn.Parameter to avoid unnecessary initialization computation
45e11091 · twaka · GitHub · 1d3a1cc4 · 45e11091
Unverified Commit 45e11091 authored Mar 02, 2023 by twaka Committed by GitHub Mar 01, 2023
Show whitespace changes
Inline Side-by-side

Showing with 2 additions and 3 deletions

src/transformers/pytorch_utils.py src/transformers/pytorch_utils.py +2 -3

No files found.
--- a/src/transformers/pytorch_utils.py
+++ b/src/transformers/pytorch_utils.py
@@ -105,10 +105,9 @@ class Conv1D(nn.Module):
    def __init__(self, nf, nx):
        super().__init__()
        self.nf = nf
-        w = torch.empty(nx, nf)
-        nn.init.normal_(w, std=0.02)
-        self.weight = nn.Parameter(w)
+        self.weight = nn.Parameter(torch.empty(nx, nf))
        self.bias = nn.Parameter(torch.zeros(nf))
+        nn.init.normal_(self.weight, std=0.02)

    def forward(self, x):
        size_out = x.size()[:-1] + (self.nf,)