GPTNeo: handle padded wte (#11079)

* GPTNeo: handle padded wte * Switch to config.vocab_size * apply review suggestion Co-authored-by: Suraj Patil <surajp815@gmail.com>

GPTNeo: handle padded wte (#11079)
* GPTNeo: handle padded wte * Switch to config.vocab_size * apply review suggestion Co-authored-by: Suraj Patil <surajp815@gmail.com>
247bed38 · Leo Gao · GitHub · 083ad7d4 · 247bed38
Unverified Commit 247bed38 authored Apr 07, 2021 by Leo Gao Committed by GitHub Apr 07, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 0 deletions

src/transformers/models/gpt_neo/modeling_gpt_neo.py src/transformers/models/gpt_neo/modeling_gpt_neo.py +4 -0

No files found.
--- a/src/transformers/models/gpt_neo/modeling_gpt_neo.py
+++ b/src/transformers/models/gpt_neo/modeling_gpt_neo.py
@@ -112,6 +112,10 @@ def load_tf_weights_in_gpt_neo(model, config, gpt_neo_checkpoint_path):
        if name[-1] == "w" and name[-2] in ["out_proj", "k_proj", "q_proj", "v_proj", "c_proj", "c_fc"]:
            array = array.transpose()
+        if name == ["wte"]:
+            # if vocab is padded, then trim off the padding embeddings
+            array = array[: config.vocab_size]
        try:
            assert (
                pointer.shape == array.shape