Commit 571f10a0 authored by mohammad's avatar mohammad
Browse files

added comment about norm power

parent db88a27b
......@@ -92,6 +92,8 @@ def clip_grad_norm_fp32(parameters, max_norm, norm_type=2):
[grads_for_norm],
False # no per-parameter norm
)
# Since we will be summing across data parallel groups,
# we need the pow(norm-type).
total_norm = grad_norm ** norm_type
else:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment