Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
feb83521
Unverified
Commit
feb83521
authored
Jun 22, 2023
by
Weiming Zhao
Committed by
GitHub
Jun 22, 2023
Browse files
[llama] Fix comments in weights converter (#24436)
Explain the reason to clone tensor
parent
2c977e4a
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
2 deletions
+4
-2
src/transformers/models/llama/convert_llama_weights_to_hf.py
src/transformers/models/llama/convert_llama_weights_to_hf.py
+4
-2
No files found.
src/transformers/models/llama/convert_llama_weights_to_hf.py
View file @
feb83521
...
...
@@ -136,8 +136,10 @@ def write_model(model_path, input_base_path, model_size):
}
else
:
# Sharded
# Note that in the 13B checkpoint, not cloning the two following weights will result in the checkpoint
# becoming 37GB instead of 26GB for some reason.
# Note that attention.w{q,k,v,o}, feed_fordward.w[1,2,3], attention_norm.weight and ffn_norm.weight share
# the same storage object, saving attention_norm and ffn_norm will save other weights too, which is
# redundant as other weights will be stitched from multiple shards. To avoid that, they are cloned.
state_dict
=
{
f
"model.layers.
{
layer_i
}
.input_layernorm.weight"
:
loaded
[
0
][
f
"layers.
{
layer_i
}
.attention_norm.weight"
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment