Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
5d2e13a6
"vscode:/vscode.git/clone" did not exist on "7d7edf6d37576fb6eda65db6db43fda54a7f06ba"
Commit
5d2e13a6
authored
Apr 28, 2022
by
Vijay Korthikanti
Browse files
reverting pre allocation of optimizer states; it does not seem to help with fragmentation
parent
cfd2e216
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
0 additions
and
12 deletions
+0
-12
megatron/optimizer/__init__.py
megatron/optimizer/__init__.py
+0
-12
No files found.
megatron/optimizer/__init__.py
View file @
5d2e13a6
...
...
@@ -91,18 +91,6 @@ def get_megatron_optimizer(model,
weight_decay
=
args
.
weight_decay
,
betas
=
(
args
.
adam_beta1
,
args
.
adam_beta2
),
eps
=
args
.
adam_eps
)
# preallocating state tensors to avoid fragmentation
for
param_group
in
optimizer
.
param_groups
:
for
i
,
param
in
enumerate
(
param_group
[
'params'
]):
if
param
.
requires_grad
:
state
=
optimizer
.
state
[
param
]
if
len
(
state
)
==
0
:
# Exponential moving average of gradient values
state
[
'exp_avg'
]
=
torch
.
zeros_like
(
param
.
data
,
dtype
=
torch
.
float
)
# Exponential moving average of squared gradient values
state
[
'exp_avg_sq'
]
=
torch
.
zeros_like
(
param
.
data
,
dtype
=
torch
.
float
)
elif
args
.
optimizer
==
'sgd'
:
optimizer
=
SGD
(
param_groups
,
lr
=
args
.
lr
,
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment