Make loading of pretrained gpt2 faster by avoiding initialization of Conv1D's weights (#21879)
apply normal_ after assigning weight as nn.Parameter to avoid unnecessary initialization computation
Showing
Please register or sign in to comment