Commits · 682d36b5c7c88ef3eaa4322b0af9727d40e41d55 · ModelZoo / ResNet50_tensorflow

"projects/vscode:/vscode.git/clone" did not exist on "601b44bfe00c8ab01b4bf4558fca2f68abe791b3"

10 Mar, 2020 1 commit

Save to tmp directory on non-chief workers in model_training_utils · 682d36b5

Ran Chen authored Mar 10, 2020

In a multi worker set up saving is done on each worker. If they're saving to the same location, e.g. GCS, there will be conflicts. With this change we save to temporary directory on non-chief workers.

Note that, there may be synchronization in saving that needs all workers to participate, so we cannot only save on one worker.

PiperOrigin-RevId: 300141152

682d36b5

07 Mar, 2020 1 commit
- Add TimeHistory callback to BERT. · 7d86c317
  Hongkun Yu authored Mar 07, 2020
```
PiperOrigin-RevId: 299594839
```
  7d86c317
06 Mar, 2020 1 commit

Temporarily disable explicit allreduce in BERT SQuAD · 11ccb99e

Zongwei Zhou authored Mar 05, 2020

In BERT SQuAD, disable explicit allreduce for now to keep the original clip_by_global_norm math. With explicit allreduce, the gradients before allreduce are scaled so even if we move clip_by_global_norm before allreduce (as in TF1 and pre-TF 2.2) it will operate on scaled gradients, the math will be changed. So with explicit allreduce, it is better to move clip_by_global_norm to after allreduce.

PiperOrigin-RevId: 299278082

11ccb99e

05 Mar, 2020 1 commit
- Internal change · cf01596c
  Zongwei Zhou authored Mar 04, 2020
```
PiperOrigin-RevId: 299007295
```
  cf01596c
02 Mar, 2020 2 commits
- Add TimeHistory callback to BERT. · 533d1e6b
  Will Cromar authored Mar 02, 2020
```
PiperOrigin-RevId: 298466825
```
  533d1e6b
- Remove an assert. · 98abe4b8
  Hongkun Yu authored Mar 02, 2020
```
PiperOrigin-RevId: 298402269
```
  98abe4b8
26 Feb, 2020 1 commit
- Internal change · ce83a9db
  Hongkun Yu authored Feb 26, 2020
```
PiperOrigin-RevId: 297383836
```
  ce83a9db
25 Feb, 2020 1 commit
- Internal change · 2781377d
  Zongwei Zhou authored Feb 25, 2020
```
PiperOrigin-RevId: 297222995
```
  2781377d
20 Feb, 2020 1 commit
- Remove some tpu_lib not used at all · 58df0dc6
  Hongkun Yu authored Feb 19, 2020
```
PiperOrigin-RevId: 296140699
```
  58df0dc6
19 Feb, 2020 1 commit
- Use host python training loop in GPU BERT tests · 1ca9e3e4
  Zongwei Zhou authored Feb 18, 2020
```
PiperOrigin-RevId: 295869937
```
  1ca9e3e4
13 Feb, 2020 1 commit
- Revert log passing change since it might hurt performance. · 91c681af
  Haitang Hu authored Feb 13, 2020
```
PiperOrigin-RevId: 294922828
```
  91c681af
07 Feb, 2020 1 commit

Pass training_loss in logs dict to customized training loop. · 67f6015a

Haitang Hu authored Feb 07, 2020

This would match the behavior described in on_batch_end() functions in Keras callback.
See https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/Callback.

PiperOrigin-RevId: 293887583

67f6015a

23 Nov, 2019 1 commit
- Internal change · 83f0a576
  Chen Chen authored Nov 22, 2019
```
PiperOrigin-RevId: 282096004
```
  83f0a576
18 Nov, 2019 1 commit
- Internal change · 81d031d0
  Hongkun Yu authored Nov 18, 2019
```
PiperOrigin-RevId: 281117886
```
  81d031d0
28 Oct, 2019 1 commit
- remove use_remote_tpu as it is deprecated. · 4457c1a8
  Hongkun Yu authored Oct 28, 2019
```
PiperOrigin-RevId: 277087491
```
  4457c1a8
18 Oct, 2019 1 commit
- Fix xlnet save_steps and steps_per_epoch conflicts · e6750c5d
  Hongkun Yu authored Oct 18, 2019
```
Remove train_data_size flag

PiperOrigin-RevId: 275545035
```
  e6750c5d
11 Oct, 2019 1 commit

Change summary directory and model checkpoint directory so that training via... · e67a2064

A. Unique TensorFlower authored Oct 11, 2019

Change summary directory and model checkpoint directory so that training via Keras Compile/Fit() and custom training loop is consistent.

PiperOrigin-RevId: 274202793

e67a2064

07 Oct, 2019 1 commit
- Internal change · 482784ab
  Jing Li authored Oct 07, 2019
```
PiperOrigin-RevId: 273233857
```
  482784ab
24 Sep, 2019 2 commits
- Use experimental_connect_to_cluster API in TPU lib to support training on a slice of a TPU pod. · 497989e0
  Bruce Fontaine authored Sep 24, 2019
```
PiperOrigin-RevId: 270926016
```
  497989e0
- Perfzero XLNet classifier Imdb accuracy test on 8 GPUs. · a52564cb
  Hongkun Yu authored Sep 23, 2019
```
PiperOrigin-RevId: 270817869
```
  a52564cb
23 Sep, 2019 1 commit
- OSS model_training_utils_test · c27127b8
  Hongkun Yu authored Sep 23, 2019
```
PiperOrigin-RevId: 270749832
```
  c27127b8
17 Sep, 2019 1 commit
- Move Bert to NLP. Tasks are moved to nlp/bert/ · 1862b9c3
  Hongkun Yu authored Sep 17, 2019
```
Refactor basic utils to modeling/

PiperOrigin-RevId: 269600561
```
  1862b9c3
06 Sep, 2019 2 commits
- Internal change · 4dbdb450
  A. Unique TensorFlower authored Sep 06, 2019
```
PiperOrigin-RevId: 267607964
```
  4dbdb450
- Remove vars dedup as keras fixed it. · 6d1dd03d
  Hongkun Yu authored Sep 05, 2019
```
PiperOrigin-RevId: 267525663
```
  6d1dd03d
04 Sep, 2019 1 commit
- move collection trainable variables outside loop. · a009f4fb
  Hongkun Yu authored Sep 03, 2019
```
add a flag to control loss scaling.

PiperOrigin-RevId: 267091566
```
  a009f4fb
03 Sep, 2019 1 commit
- Avoid importing private ObjectIdentitySet class · bd211e3e
  Gaurav Jain authored Sep 02, 2019
```
PiperOrigin-RevId: 266848625
```
  bd211e3e
16 Aug, 2019 1 commit
- Use get_primary_cpu_task from tpu_lib · cd85fd8a
  Hongkun Yu authored Aug 16, 2019
```
PiperOrigin-RevId: 263874363
```
  cd85fd8a
07 Aug, 2019 1 commit

Merged commit includes the following changes: (#7404) · e38d570e

Hongkun Yu authored Aug 07, 2019

262178259  by hongkuny<hongkuny@google.com>:

    We should call training=True in CTL train step.

--
262081759  by akuegel<akuegel@google.com>:

    Internal change

PiperOrigin-RevId: 262178259

e38d570e

06 Aug, 2019 1 commit

Merged commit includes the following changes: (#7385) · 8384b05d

Hongkun Yu authored Aug 05, 2019

261786323  by yanhuasun<yanhuasun@google.com>:

    Replace set, dict with ObjectIdentityDict/Set to prepare for eq implementation

--

PiperOrigin-RevId: 261786323

8384b05d

26 Jul, 2019 1 commit

Merged commit includes the following changes: (#7309) · 8c7a0e75

Hongkun Yu authored Jul 26, 2019

260060237  by zongweiz<zongweiz@google.com>:

    [BERT SQuAD] Enable mixed precision training

    Add mixed precision training support for BERT SQuAD model. Using the experimental Keras mixed precision API. For numeric stability, use fp32 for layer normalization, dense layers with GELU activation, etc.

--

PiperOrigin-RevId: 260060237

8c7a0e75

25 Jul, 2019 1 commit

Merged commit includes the following changes: (#7301) · 53e3adb8

Hongkun Yu authored Jul 24, 2019

259889221  by hongkuny<hongkuny@google.com>:

    Add no ds / xla / eager perfzero tests

--

PiperOrigin-RevId: 259889221

53e3adb8

24 Jul, 2019 1 commit

Merged commit includes the following changes: (#7289) · ab8febd4

Hongkun Yu authored Jul 23, 2019

259649972  by hongkuny<hongkuny@google.com>:

    Update docs.

--
259470074  by hongkuny<hongkuny@google.com>:

    Adds a dedup phase for trainable variables.

--

PiperOrigin-RevId: 259649972

ab8febd4

19 Jul, 2019 1 commit

Merged commit includes the following changes: (#7263) · c5a4978d

Jing Li authored Jul 19, 2019

* Merged commit includes the following changes:
258867180  by jingli<jingli@google.com>:

    Add new folders for upcoming reorg in model garden.

--
258893811  by hongkuny<hongkuny@google.com>:

    Adds summaries for metrics, allowing metrics inside keras.model.

--
258893048  by isaprykin<isaprykin@google.com>:

    Remove the `cloning` argument to `compile()`.

    Keras models are distributed by cloning in graph mode and without cloning in eager mode as of the change # 258652546.

--
258881002  by hongkuny<hongkuny@google.com>:

    Fix lint.

--
258874998  by hongkuny<hongkuny@google.com>:

    Internal

--
258872662  by hongkuny<hongkuny@google.com>:

    Fix doc

--

PiperOrigin-RevId: 258867180

* Create __init__.py

* Update __init__.py

* Update __init__.py

* Update __init__.py

c5a4978d

16 Jul, 2019 1 commit

Merged commit includes the following changes: (#7221) · e21dcdd0

Hongkun Yu authored Jul 16, 2019

258208153  by hongkuny<hongkuny@google.com>:

    Adds run_eagerly option for bert.

--

PiperOrigin-RevId: 258208153

e21dcdd0

15 Jul, 2019 1 commit

Merged commit includes the following changes: (#7209) · dc8c6ce1

Hongkun Yu authored Jul 15, 2019

257883986  by hongkuny<hongkuny@google.com>:

    Adds tf.summary for bert training

--

PiperOrigin-RevId: 257883986

dc8c6ce1

24 Jun, 2019 1 commit

Merged commit includes the following changes: (#7093) · 240623ac

saberkun authored Jun 24, 2019

254785517  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Use train_single_step for BERT GPU models to temporarily work around some performance bugs in GPU runs

--
254497647  by hongkuny<hongkuny@google.com>:

    Fix device placement for TPU export model.

--

PiperOrigin-RevId: 254785517

240623ac

20 Jun, 2019 1 commit

Merged commit includes the following changes: (#7060) · e0e6d981

saberkun authored Jun 19, 2019

254069984  by hongkuny<hongkuny@google.com>:
    Automated rollback of changelist 254060732.

254061429  by hongkuny<hongkuny@google.com>:

    Use host while loop for training steps.

--
254060732  by yifeif<yifeif@google.com>:
    Automated rollback of changelist 254027750.

254027750  by hongkuny<hongkuny@google.com>:

    Internal change

PiperOrigin-RevId: 254069984

e0e6d981

18 Jun, 2019 1 commit

Merged commit includes the following changes: (#7049) · a1c47f28

saberkun authored Jun 18, 2019

253850824  by hongkuny<hongkuny@google.com>:

    Improve bert training utils.

--
253818191  by hongkuny<hongkuny@google.com>:

    Update savedmodel export to use new model.save() api.

--

PiperOrigin-RevId: 253850824

a1c47f28

12 Jun, 2019 1 commit

Merged commit includes the following changes: (#6998) · ce03903f

David M. Chen authored Jun 11, 2019

252697519 by dmchen<dmchen@google.com>:

        BERT SQuAD accuracy test

25266352 by hongjunchoi<hongjunchoi@google.com>:

        Internal change

252647871 by hongjunchoi<hongjunchoi@google.com>:

        Enable multi worker TPU training for BERT pretraining.

ce03903f

11 Jun, 2019 1 commit

Merged commit includes the following changes: (#6992) · f2eb1701

saberkun authored Jun 10, 2019

252522861  by hongkuny<hongkuny@google.com>:

    Remove export using trained model due to implementation error

--
252156812  by yuefengz<yuefengz@google.com>:

    Fix the callback method name in BERT: replaced on_batch_start with on_batch_begin. Without the fix, it won't work with Keras callbacks.

--
251782065  by A. Unique TensorFlower<gardener@tensorflow.org>:

    Internal change

PiperOrigin-RevId: 252522861

f2eb1701