[llama] Fix comments in weights converter (#24436)

Explain the reason to clone tensor

[llama] Fix comments in weights converter (#24436)
Explain the reason to clone tensor
feb83521 · Weiming Zhao · GitHub · 2c977e4a · feb83521
Unverified Commit feb83521 authored Jun 22, 2023 by Weiming Zhao Committed by GitHub Jun 22, 2023
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

src/transformers/models/llama/convert_llama_weights_to_hf.py src/transformers/models/llama/convert_llama_weights_to_hf.py +4 -2

No files found.
--- a/src/transformers/models/llama/convert_llama_weights_to_hf.py
+++ b/src/transformers/models/llama/convert_llama_weights_to_hf.py
@@ -136,8 +136,10 @@ def write_model(model_path, input_base_path, model_size):
            }
        else:
            # Sharded
-            # Note that in the 13B checkpoint, not cloning the two following weights will result in the checkpoint
-            # becoming 37GB instead of 26GB for some reason.
+            # Note that attention.w{q,k,v,o}, feed_fordward.w[1,2,3], attention_norm.weight and ffn_norm.weight share
+            # the same storage object, saving attention_norm and ffn_norm will save other weights too, which is
+            # redundant as other weights will be stitched from multiple shards. To avoid that, they are cloned.
+
            state_dict = {
                f"model.layers.{layer_i}.input_layernorm.weight": loaded[0][
                    f"layers.{layer_i}.attention_norm.weight"