Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
c52c9dbf
Commit
c52c9dbf
authored
Mar 23, 2022
by
Lawrence McAfee
Browse files
fixed clip_grads for when len(grads_for_norm) == 0.
parent
dcf2ccc9
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
9 additions
and
6 deletions
+9
-6
megatron/optimizer/clip_grads.py
megatron/optimizer/clip_grads.py
+9
-6
No files found.
megatron/optimizer/clip_grads.py
View file @
c52c9dbf
...
@@ -75,12 +75,15 @@ def clip_grad_norm_fp32(parameters, grads_for_norm,
...
@@ -75,12 +75,15 @@ def clip_grad_norm_fp32(parameters, grads_for_norm,
# Use apex's multi-tensor applier for efficiency reasons.
# Use apex's multi-tensor applier for efficiency reasons.
# Multi-tensor applier takes a function and a list of list
# Multi-tensor applier takes a function and a list of list
# and performs the operation on that list all in one kernel.
# and performs the operation on that list all in one kernel.
grad_norm
,
_
=
multi_tensor_applier
(
if
grads_for_norm
:
amp_C
.
multi_tensor_l2norm
,
grad_norm
,
_
=
multi_tensor_applier
(
dummy_overflow_buf
,
amp_C
.
multi_tensor_l2norm
,
[
grads_for_norm
],
dummy_overflow_buf
,
False
# no per-parameter norm
[
grads_for_norm
],
)
False
# no per-parameter norm
)
else
:
grad_norm
=
torch
.
cuda
.
FloatTensor
([
0
])
# Since we will be summing across data parallel groups,
# Since we will be summing across data parallel groups,
# we need the pow(norm-type).
# we need the pow(norm-type).
total_norm
=
grad_norm
**
norm_type
total_norm
=
grad_norm
**
norm_type
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment