Commits · d594826cceb79875663b798b5cc38daceaf8ce8b · OpenDAS / apex

30 Oct, 2018 5 commits
- Adam tests (#67) · d594826c
  ngimel authored Oct 30, 2018
```
* Add unittest for FusedAdam.

* Fix some bugs.

* set seed for adam test
```
  d594826c
- Merge pull request #68 from ngimel/includes · a01a7326
  ngimel authored Oct 30, 2018
```
update includes
```
  a01a7326
- update includes · ef3a0025
  Natalia Gimelshein authored Oct 30, 2018
  
  ef3a0025
- Updating documentation for merged utilities · 8124fba2
  Michael Carilli authored Oct 30, 2018
  
  8124fba2
- Warning message for FusedAdam import if unavailable · 1fa1a073
  Michael Carilli authored Oct 30, 2018
  
  1fa1a073
29 Oct, 2018 1 commit

Merging in fused adam optimizer, additional DDP features tested in 18.10 (#60) · e0bc5d62

mcarilli authored Oct 29, 2018

* test passes

* notes

* Using C++-side flatten and unflatten functions

* Adding csrc

* Persistent synchronization event so it doesn't need to be created and destroyed each time

* Interop with parameter flattening in SSD

* Added deterministic option to imagenet main.py

* Adding options to split gradient averaging and allreduce in pure fp32

* Fixing allreduce_maybe_retain call

* Fixing allreduce_fallback

* Also sync active_i_buckets from rank 0

* Making retain_allreduce_buffers compatible with/orthogonal to delay_allreduce=True|False

* Correcting syntax error, now all seems to work with SSD

* Optional cpp extension build

* Add mixed precision adam optimizer (#59)

* Add FusedAdam Optimizer to Apex that places all the math into a cuda kernel.

* Added fixes to fused_adam to get it to work with network.

* wip work on python interface for adam with options

* fix dispatch for halfs, add python options to handle optional half gradients and params

* cleanup, get rid of grid-stride loop

e0bc5d62

23 Oct, 2018 1 commit

[syncBN] (#48) · 81eef1ef

jjsjann123 authored Oct 23, 2018

* [syncBN]
  added syncBN in native pure python apex
  added fused cuda kernels used for sync BN. Using welford for mean/var
    optional installation using 'python setup.py install --cuda_ext'
  added unit test with side to side comparison between apex sync BN with
    PyTorch BN. Notice that for pytorch BN implementation, because of
    numerical issue for mean/var, the output will be slightly off.

* [syncBN PR]
  added fp16 support
  addressing review comments on:
    1. updating last pow 2
    2. look for import error when importing syncBN kernel

* [syncBN PR]
  added convert function to insert SyncBatchNorm
  refactored some kernel code

* fixing type issue (fp16/fp32/fp64)
added Kahan summation
editing unit test to use pytorch primitive ops with double, passing reasonable tests now

* updating tensor creation calls

* fixing the all_reduce contiguous tensor

* transposed all reduce results

* [syncBN]
support fp16 input & fp32 layer for apex fp16
partially fixing launch configs
enabling imagenet example to run with --sync_bn

* [syncBN PR]
Documentation added

* adjusting README

* adjusting again

* added some doc to imagenet example

* [syncBN]
  warp-level reduction
  bug fix: warp reduction logic updated. check for dummy element to avoid nan.
  improved launch config for better reduction kernels. Further improvements
would be to increase grid size.

* [syncBN]
  fixing undefined behavior in __shfl_down_sync from divergent threads in warp
reduction.
  changing at::native::empty to at::empty (upstream comments)

81eef1ef

10 Oct, 2018 2 commits
- Docstring updates · e12c1ec3
  Michael Carilli authored Oct 10, 2018
  
  e12c1ec3
- Docstring updates · 8add2b07
  Michael Carilli authored Oct 10, 2018
  
  8add2b07
08 Oct, 2018 2 commits
- Update README.md · cd788317
  mcarilli authored Oct 08, 2018
  
  cd788317
- Moving gradient division back to after the allreduce. Empirically, it appears... · fd9b02c0
  Michael Carilli authored Oct 08, 2018
```
Moving gradient division back to after the allreduce.  Empirically, it appears underflow is more of a danger than overflow.
```
  fd9b02c0
07 Oct, 2018 2 commits
- Merge branch 'master' of https://github.com/NVIDIA/apex · 9eab1ac3
  Michael Carilli authored Oct 07, 2018
  
  9eab1ac3
- Updating imagenet FP16_Optimizer example for new syntax · 2361a646
  Michael Carilli authored Oct 07, 2018
  
  2361a646
05 Oct, 2018 1 commit
- Adding set_grads_to_None option to fp16_optimizer · e4c97f32
  Michael Carilli authored Oct 05, 2018
  
  e4c97f32
03 Oct, 2018 1 commit
- Move gradient division to before the allreduce · e4af2d90
  mcarilli authored Oct 03, 2018
```
This is consistent with upstream, and safer against overflow.
```
  e4af2d90
29 Sep, 2018 4 commits
- fix error message · 2f204bca
  mcarilli authored Sep 29, 2018
  
  2f204bca
- Move other logic after forward to take advantage of GPU skew · 89fa152b
  mcarilli authored Sep 28, 2018
  
  89fa152b
- Clean up race condition test, need to figure out a clean way to create distributed unit tests · 9d731777
  Michael Carilli authored Sep 29, 2018
  
  9d731777
- Efficient bucketing (#49) · fa183ee8
  mcarilli authored Sep 28, 2018
```
* beautiful

* IT'S WORKING

* Hopefully fix race condition for fallback hook

* Updating test

* shared_param -> delayed_allreduce

* Adding a safety check

* One more check

* syntax...
```
  fa183ee8
19 Sep, 2018 1 commit

Fix param freezing (#47) · 53e1b61a

mcarilli authored Sep 18, 2018

* Fix appears to work in Tomasz's example.

* Somehow shared_param got de-enabled again?

53e1b61a

18 Sep, 2018 1 commit
- Forward compatibility fixes for distributed backend, thanks to @Ssnl · ed47ebff
  Michael Carilli authored Sep 18, 2018
  
  ed47ebff
17 Sep, 2018 1 commit

Remove some fp16 examples that don't converge (#45) · 0ec8addb

Christian Sarofeen authored Sep 17, 2018

* Remove some fp16 examples that don't converge

Default static loss scale of 1.0 (default value) for resnet50 doesn't converge. Either remove example or put static loss scale 128 on it, which is known to converge well.

* Update README.md

0ec8addb

14 Sep, 2018 2 commits
- Only save and load master params if training with FP16 · 48f105d9
  Michael Carilli authored Sep 14, 2018
  
  48f105d9
- Fixing imagenet main.py and main_reducer.py to save and load master params · 327b2446
  Michael Carilli authored Sep 13, 2018
  
  327b2446
13 Sep, 2018 1 commit
- Skeleton for modular tests · b7025fc9
  Michael Carilli authored Sep 13, 2018
  
  b7025fc9
11 Sep, 2018 1 commit

amp support for Aten RNNs (#41) · 3e1a1c09

Carl Case authored Sep 11, 2018

* WIP: update to support new RNN backend code

* small refactor

* add test for rnn w/ packed sequences

3e1a1c09

10 Sep, 2018 4 commits
- Merge pull request #40 from NVIDIA/amp_tests · 1579b9e3
  Carl Case authored Sep 10, 2018
```
amp unit tests
```
  1579b9e3
- add rnn tests · 5f5dfa42
  Carl Case authored Sep 10, 2018
  
  5f5dfa42
- tests on banned methods · 39928327
  Carl Case authored Sep 10, 2018
  
  39928327
- add more promotion testing · 22920fe0
  Carl Case authored Sep 10, 2018
  
  22920fe0
07 Sep, 2018 1 commit
- Adding verbose option to FP16_Optimizer · 75a865e3
  Michael Carilli authored Sep 07, 2018
  
  75a865e3
06 Sep, 2018 2 commits
- Enabling single-process fallback for examples/imagenet/main_reducer.py · cb6d8f1a
  Michael Carilli authored Sep 06, 2018
  
  cb6d8f1a
- Revising LR scaling to account for any choice of num processes, batch size per process · a2801d91
  Michael Carilli authored Sep 06, 2018
  
  a2801d91
05 Sep, 2018 2 commits
- minor fix · 01e29c97
  Michael Carilli authored Sep 05, 2018
  
  01e29c97
- Fixing needs_refresh logic to allow multiple forwards between each backward · ed14f39c
  Michael Carilli authored Sep 05, 2018
  
  ed14f39c
30 Aug, 2018 2 commits
- Update README.md · 586c507e
  mcarilli authored Aug 30, 2018
  
  586c507e
- Update distributed.py · 5a39c5e3
  mcarilli authored Aug 30, 2018
  
  5a39c5e3
28 Aug, 2018 3 commits
- Reformatting · 559141e8
  Michael Carilli authored Aug 28, 2018
  
  559141e8
- Updating imagenet README · 034b8f02
  Michael Carilli authored Aug 28, 2018
  
  034b8f02
- Cleaning up git weirdness + updating docs for Reducer · 37a4b221
  Michael Carilli authored Aug 28, 2018
  
  37a4b221