Commits · 8db3f95c8344952279ffb8882f55c2b60576da3c · OpenDAS / apex

08 Feb, 2019 2 commits
- Merge branch 'master' into api_refactor · 8db3f95c
  Michael Carilli authored Feb 08, 2019
  
  8db3f95c
- stashing work · 1f693b92
  Michael Carilli authored Feb 08, 2019
  
  1f693b92
06 Feb, 2019 7 commits
- Merge pull request #144 from jma127/master · 1b903852
  ngimel authored Feb 06, 2019
```
Better FP16 support in pytorch fp16 utils.
```
  1b903852
- Some documentation cleanup · b2f63c48
  Michael Carilli authored Feb 06, 2019
  
  b2f63c48
- Merge branch 'master' into api_refactor · 2cbca1a4
  Michael Carilli authored Feb 06, 2019
  
  2cbca1a4
- Tests for the fused downscale kernel · 340e71a4
  Michael Carilli authored Feb 05, 2019
  
  340e71a4
- Merge branch 'new_downscale_kernel' · 8818ba9e
  Michael Carilli authored Feb 05, 2019
  
  8818ba9e
- Tests and resnet50 example work · a5bc76db
  Michael Carilli authored Feb 05, 2019
  
  a5bc76db
- ready for testing · 6e9159d8
  Michael Carilli authored Feb 05, 2019
  
  6e9159d8
05 Feb, 2019 8 commits
- Better FP16 support in pytorch fp16 utils. · 713e0fb8
  Jerry Ma authored Feb 01, 2019
```
This commit adds an FP16Model class as a successor to network_to_half.

The benefits of this class are:

- Preservation of single-precision for BatchNorm layers. The models
  generated by network_to_half() convert BatchNorm moment tensors to
  half-precision, then back to single-precision, which hurts the
  accuracy of the moment estimators and occasionally results in NaNs.
- Support for multi-argument nn.Modules (self-explanatory from code).
```
  713e0fb8
- Merge branch 'master' of https://github.com/NVIDIA/apex · 9288ba5c
  Michael Carilli authored Feb 05, 2019
  
  9288ba5c
- Removing patching of loss.backward, which appears to cause memory leaks... · a11c45a4
  Michael Carilli authored Feb 05, 2019
```
Removing patching of loss.backward, which appears to cause memory leaks (reference cycles?) in some models
```
  a11c45a4
- New downscale kernel is working but not perf tested · 337056c1
  Michael Carilli authored Feb 05, 2019
  
  337056c1
- Merge pull request #123 from donglixp/patch-1 · 57ad1840
  mcarilli authored Feb 05, 2019
```
apex.optimizers.FP16_Optimizer: add state_dict() and load_state_dict()
```
  57ad1840
- Merge pull request #146 from NVIDIA/restore_fused_kernel · 45537d34
  mcarilli authored Feb 04, 2019
```
Restore fused kernel
```
  45537d34
- Only warn once in LossScaler constructor · 03b0eeb8
  Michael Carilli authored Feb 04, 2019
  
  03b0eeb8
- FP16 grad downscale (which shouldn't happen in user code) fallback + warning · a153c41a
  Michael Carilli authored Feb 04, 2019
  
  a153c41a
04 Feb, 2019 2 commits
- Merge pull request #143 from NVIDIA/sbn_no_affine · d81ed26d
  mcarilli authored Feb 04, 2019
```
allowing syncBN to run with affine = False
```
  d81ed26d
- Restoring fused inf/nan check + downscale kernel · fd03f26a
  Michael Carilli authored Feb 03, 2019
  
  fd03f26a
03 Feb, 2019 1 commit
- Lazy imports to reduce error spam · 48299b0d
  Michael Carilli authored Feb 02, 2019
  
  48299b0d
01 Feb, 2019 8 commits
- async->non_blocking, module-specific logging · cc85a2e5
  Michael Carilli authored Feb 01, 2019
  
  cc85a2e5
- Stashing work · a9a3fe57
  Michael Carilli authored Feb 01, 2019
  
  a9a3fe57
- Making note of loss scaling in README · 859f528b
  Michael Carilli authored Feb 01, 2019
  
  859f528b
- Merge branch 'master' of https://github.com/NVIDIA/apex · ae5982cb
  Michael Carilli authored Feb 01, 2019
  
  ae5982cb
- Making static loss scale the default, and clipping master grads when running with --fp16 · 43522e63
  Michael Carilli authored Feb 01, 2019
  
  43522e63
- Update README.md · 33512f93
  mcarilli authored Jan 31, 2019
  
  33512f93
- Update README.md · b83e38a6
  mcarilli authored Jan 31, 2019
  
  b83e38a6
- allowing syncBN to run with affine = False · 223a47e9
  jiej authored Jan 31, 2019
  
  223a47e9
31 Jan, 2019 2 commits
- Merging in master · aed3086a
  Michael Carilli authored Jan 31, 2019
  
  aed3086a
- Removing spurious references to Penn Tree Bank results · b5465fe6
  Michael Carilli authored Jan 31, 2019
  
  b5465fe6
30 Jan, 2019 4 commits
- Merge pull request #142 from NVIDIA/update_word_language_model · 9041a868
  mcarilli authored Jan 30, 2019
```
Update default dims in word_language_model to be multiples of 8 to enable Tensor Core use
```
  9041a868
- clean README · d9be3f90
  Michael Carilli authored Jan 30, 2019
  
  d9be3f90
- Updated default sizes to be multiples of 8 to enable Tensor Core use. Added... · e21946e0
  Michael Carilli authored Jan 30, 2019
```
Updated default sizes to be multiples of 8 to enable Tensor Core use.  Added performance guidelines to README.
```
  e21946e0
- Merge pull request #100 from FDecaYed/deyuf/optimizer_unittests · def8fb85
  mcarilli authored Jan 30, 2019
```
add unit tests for optimizers/fp16_optimizer
```
  def8fb85
29 Jan, 2019 5 commits
- Merge pull request #137 from ngimel/bn_convert · 0b848f0d
  mcarilli authored Jan 28, 2019
```
don't convert to float bn with affine=False
```
  0b848f0d
- Update two_gpu_unit_test.py · 8b9ce244
  mcarilli authored Jan 28, 2019
  
  8b9ce244
- Merge pull request #138 from NVIDIA/sbn_test_cases · 878ba512
  mcarilli authored Jan 28, 2019
```
[syncBN]
```
  878ba512
- Update two_gpu_unit_test.py · d0624f4f
  mcarilli authored Jan 28, 2019
  
  d0624f4f
- adding comment to explain single process gradient averaging · c8d7c9f1
  jiej authored Jan 28, 2019
  
  c8d7c9f1
28 Jan, 2019 1 commit

[syncBN] · 63e47d29

jiej authored Jan 28, 2019

test update to resolve
  https://github.com/NVIDIA/apex/issues/134#issue-403525480

Using identical learning rate for both DDP with sync BN and single process BN.
The previous configure leaves the impression that sync BN requires adjusting lr
in the script, which is not true.

63e47d29