Fix bug in gpt2's (from-scratch) special scaled weight initialization (#17877)

* only special scale init each gpt2 c_proj weight once, on exact match * fix double quotes Co-authored-by: leandro <leandro.vonwerra@spoud.io>

Fix bug in gpt2's (from-scratch) special scaled weight initialization (#17877)
* only special scale init each gpt2 c_proj weight once, on exact match * fix double quotes Co-authored-by: leandro <leandro.vonwerra@spoud.io>
e02037b3 · Andrej · GitHub · 6dd00f6b · e02037b3
Unverified Commit e02037b3 authored Jun 27, 2022 by Andrej Committed by GitHub Jun 27, 2022
Show whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

src/transformers/models/gpt2/modeling_gpt2.py src/transformers/models/gpt2/modeling_gpt2.py +1 -1

No files found.
--- a/src/transformers/models/gpt2/modeling_gpt2.py
+++ b/src/transformers/models/gpt2/modeling_gpt2.py
@@ -484,7 +484,7 @@ class GPT2PreTrainedModel(PreTrainedModel):
        #
        # Reference (Megatron-LM): https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/model/gpt_model.py
        for name, p in module.named_parameters():
-            if "c_proj" in name and "weight" in name:
+            if name == "c_proj.weight":
                # Special Scaled Initialization --> There are 2 Layer Norms per Transformer Block
                p.data.normal_(mean=0.0, std=(self.config.initializer_range / math.sqrt(2 * self.config.n_layer)))