Unverified Commit feb83521 authored by Weiming Zhao's avatar Weiming Zhao Committed by GitHub
Browse files

[llama] Fix comments in weights converter (#24436)

Explain the reason to clone tensor
parent 2c977e4a
...@@ -136,8 +136,10 @@ def write_model(model_path, input_base_path, model_size): ...@@ -136,8 +136,10 @@ def write_model(model_path, input_base_path, model_size):
} }
else: else:
# Sharded # Sharded
# Note that in the 13B checkpoint, not cloning the two following weights will result in the checkpoint # Note that attention.w{q,k,v,o}, feed_fordward.w[1,2,3], attention_norm.weight and ffn_norm.weight share
# becoming 37GB instead of 26GB for some reason. # the same storage object, saving attention_norm and ffn_norm will save other weights too, which is
# redundant as other weights will be stitched from multiple shards. To avoid that, they are cloned.
state_dict = { state_dict = {
f"model.layers.{layer_i}.input_layernorm.weight": loaded[0][ f"model.layers.{layer_i}.input_layernorm.weight": loaded[0][
f"layers.{layer_i}.attention_norm.weight" f"layers.{layer_i}.attention_norm.weight"
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment