Commits · 713e0fb859c0299269eb038cf95e232ae2086444 · OpenDAS / apex

05 Feb, 2019 1 commit

Better FP16 support in pytorch fp16 utils. · 713e0fb8

Jerry Ma authored Feb 01, 2019

This commit adds an FP16Model class as a successor to network_to_half.

The benefits of this class are:

- Preservation of single-precision for BatchNorm layers. The models
  generated by network_to_half() convert BatchNorm moment tensors to
  half-precision, then back to single-precision, which hurts the
  accuracy of the moment estimators and occasionally results in NaNs.
- Support for multi-argument nn.Modules (self-explanatory from code).

713e0fb8

01 Feb, 2019 6 commits
- async->non_blocking, module-specific logging · cc85a2e5
  Michael Carilli authored Feb 01, 2019
  
  cc85a2e5
- Making note of loss scaling in README · 859f528b
  Michael Carilli authored Feb 01, 2019
  
  859f528b
- Merge branch 'master' of https://github.com/NVIDIA/apex · ae5982cb
  Michael Carilli authored Feb 01, 2019
  
  ae5982cb
- Making static loss scale the default, and clipping master grads when running with --fp16 · 43522e63
  Michael Carilli authored Feb 01, 2019
  
  43522e63
- Update README.md · 33512f93
  mcarilli authored Jan 31, 2019
  
  33512f93
- Update README.md · b83e38a6
  mcarilli authored Jan 31, 2019
  
  b83e38a6
31 Jan, 2019 2 commits
- Merging in master · aed3086a
  Michael Carilli authored Jan 31, 2019
  
  aed3086a
- Removing spurious references to Penn Tree Bank results · b5465fe6
  Michael Carilli authored Jan 31, 2019
  
  b5465fe6
30 Jan, 2019 4 commits
- Merge pull request #142 from NVIDIA/update_word_language_model · 9041a868
  mcarilli authored Jan 30, 2019
```
Update default dims in word_language_model to be multiples of 8 to enable Tensor Core use
```
  9041a868
- clean README · d9be3f90
  Michael Carilli authored Jan 30, 2019
  
  d9be3f90
- Updated default sizes to be multiples of 8 to enable Tensor Core use. Added... · e21946e0
  Michael Carilli authored Jan 30, 2019
```
Updated default sizes to be multiples of 8 to enable Tensor Core use.  Added performance guidelines to README.
```
  e21946e0
- Merge pull request #100 from FDecaYed/deyuf/optimizer_unittests · def8fb85
  mcarilli authored Jan 30, 2019
```
add unit tests for optimizers/fp16_optimizer
```
  def8fb85
29 Jan, 2019 5 commits
- Merge pull request #137 from ngimel/bn_convert · 0b848f0d
  mcarilli authored Jan 28, 2019
```
don't convert to float bn with affine=False
```
  0b848f0d
- Update two_gpu_unit_test.py · 8b9ce244
  mcarilli authored Jan 28, 2019
  
  8b9ce244
- Merge pull request #138 from NVIDIA/sbn_test_cases · 878ba512
  mcarilli authored Jan 28, 2019
```
[syncBN]
```
  878ba512
- Update two_gpu_unit_test.py · d0624f4f
  mcarilli authored Jan 28, 2019
  
  d0624f4f
- adding comment to explain single process gradient averaging · c8d7c9f1
  jiej authored Jan 28, 2019
  
  c8d7c9f1
28 Jan, 2019 3 commits

[syncBN] · 63e47d29

jiej authored Jan 28, 2019

test update to resolve
  https://github.com/NVIDIA/apex/issues/134#issue-403525480

Using identical learning rate for both DDP with sync BN and single process BN.
The previous configure leaves the impression that sync BN requires adjusting lr
in the script, which is not true.

63e47d29

Update README.md · 95fe7f6a
mcarilli authored Jan 28, 2019

95fe7f6a
don't convert to float bn with affine=False · fe365d58
Natalia Gimelshein authored Jan 28, 2019

fe365d58

25 Jan, 2019 7 commits
- Merge pull request #132 from NVIDIA/testing_cache_fix · c8bc3e62
  mcarilli authored Jan 25, 2019
```
Fix + tests for the eval->training caching issue
```
  c8bc3e62
- Update explanation of is_grad_enabled() use · a88c09cf
  mcarilli authored Jan 25, 2019
  
  a88c09cf
- Adding tests, also, don't drop cache during eval. · dfd40f9a
  Michael Carilli authored Jan 24, 2019
  
  dfd40f9a
- Removing some whitespace · 3b8e5c4a
  Michael Carilli authored Jan 24, 2019
  
  3b8e5c4a
- Removing some print statements · a0ae9e91
  Michael Carilli authored Jan 24, 2019
  
  a0ae9e91
- Merge branch 'master' into testing_cache_fix · c7dcb0e1
  Michael Carilli authored Jan 24, 2019
  
  c7dcb0e1
- Updating comments · c619fe6e
  Michael Carilli authored Jan 24, 2019
  
  c619fe6e
24 Jan, 2019 2 commits
- commenting out print statements · 646fc0d0
  Michael Carilli authored Jan 23, 2019
  
  646fc0d0
- saving for carl to review · 56ea6d78
  Michael Carilli authored Jan 23, 2019
  
  56ea6d78
23 Jan, 2019 1 commit
- Adding dummy _deactivate method · 06e11bd3
  Michael Carilli authored Jan 23, 2019
  
  06e11bd3
18 Jan, 2019 5 commits
- Update README.md · 79ad5a88
  mcarilli authored Jan 18, 2019
  
  79ad5a88
- Update README.md · d9bce818
  mcarilli authored Jan 18, 2019
  
  d9bce818
- Update Dockerfile · 4ef4fabf
  mcarilli authored Jan 18, 2019
  
  4ef4fabf
- Merge pull request #126 from NVIDIA/nhwc_sbn_patch_Pr · 7d05704c
  mcarilli authored Jan 17, 2019
```
patching grid reduction to be volta-safe
```
  7d05704c
- patching grid reduction to be volta-safe · 38bada23
  Jie authored Jan 17, 2019
  
  38bada23
17 Jan, 2019 1 commit
- Merge pull request #125 from NVIDIA/nhwc_sbn_pr · 438f6f9f
  mcarilli authored Jan 17, 2019
```
[sync BN nhwc]
```
  438f6f9f
15 Jan, 2019 2 commits
- fixing utility function convert_syncbn_model to accept channel_last flag and... · a62b87ea
  Jie authored Jan 14, 2019
```
fixing utility function convert_syncbn_model to accept channel_last flag and properly set attribute for nested layers
```
  a62b87ea
- [sync BN nhwc] · 443fa76e
  Jie authored Jan 14, 2019
```
Added kernel to support sync BN for channel last tensor
```
  443fa76e
05 Jan, 2019 1 commit
- Adding amp imagenet example · 3c7a0e44
  Michael Carilli authored Jan 04, 2019
  
  3c7a0e44