Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
ee38e7f9
Commit
ee38e7f9
authored
Oct 09, 2019
by
Mohammad Shoeybi
Browse files
fixed deserializing issue with old checkpoint
parent
9993ea25
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
13 additions
and
1 deletion
+13
-1
megatron/utils.py
megatron/utils.py
+13
-1
No files found.
megatron/utils.py
View file @
ee38e7f9
...
...
@@ -338,7 +338,19 @@ def load_checkpoint(model, optimizer, lr_scheduler, args):
torch
.
distributed
.
get_rank
(),
checkpoint_name
))
# Load the checkpoint.
try
:
sd
=
torch
.
load
(
checkpoint_name
,
map_location
=
'cpu'
)
except
ModuleNotFoundError
:
# For backward compatibility.
print_rank_0
(
' > deserializing using the old code structure ...'
)
import
sys
sys
.
modules
[
'fp16.loss_scaler'
]
=
sys
.
modules
[
'megatron.fp16.loss_scaler'
]
sd
=
torch
.
load
(
checkpoint_name
,
map_location
=
'cpu'
)
sys
.
modules
.
pop
(
'fp16.loss_scaler'
,
None
)
except
:
print_rank_0
(
'could not load the checkpoint'
)
exit
()
# Iterations.
if
args
.
finetune
or
release
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment