Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
ResNet50_tensorflow
Commits
14efeaa5
Commit
14efeaa5
authored
Nov 07, 2019
by
Yeqing Li
Committed by
A. Unique TensorFlower
Nov 07, 2019
Browse files
Check NaN during training loop.
PiperOrigin-RevId: 279116219
parent
98db9b25
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
3 additions
and
0 deletions
+3
-0
official/modeling/training/distributed_executor.py
official/modeling/training/distributed_executor.py
+3
-0
No files found.
official/modeling/training/distributed_executor.py
View file @
14efeaa5
...
...
@@ -22,6 +22,7 @@ from __future__ import print_function
import
json
import
os
import
numpy
as
np
from
absl
import
flags
from
absl
import
logging
import
tensorflow
as
tf
...
...
@@ -512,6 +513,8 @@ class DistributedExecutor(object):
train_loss
)
if
not
isinstance
(
train_loss
,
dict
):
train_loss
=
{
'total_loss'
:
train_loss
}
if
np
.
isnan
(
train_loss
[
'total_loss'
]):
raise
ValueError
(
'total loss is NaN.'
)
if
train_metric
:
train_metric_result
=
train_metric
.
result
()
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment