Commits · 6d3c75e5166dd890f70259bfaf5f4cd3f3179a97 · OpenDAS / apex

10 Dec, 2018 1 commit
- Adding process group in convert_syncbn_model · 6d3c75e5
  Jie authored Dec 10, 2018
  
  6d3c75e5
04 Dec, 2018 2 commits
- Merge pull request #89 from ptrblck/update_examples · 920da6da
  mcarilli authored Dec 04, 2018
```
Update examples to PyTorch >=0.4.0
```
  920da6da
- call .float() on GPU, remove unnecessary push to GPU · 9ccebe5b
  ptrblck authored Dec 04, 2018
  
  9ccebe5b
03 Dec, 2018 2 commits

mcarilli authored Dec 03, 2018

adjusted kernel config for better perf.
removed divergence in welford warp reduction.

0273d7ad

[syncBN] (#90) · 5dad4c21
jjsjann123 authored Dec 03, 2018
```
supporting user specified process group
```
5dad4c21

02 Dec, 2018 1 commit
- update examples to PyTorch >=0.4.0 · 28bdc04e
  ptrblck authored Dec 02, 2018
  
  28bdc04e
30 Nov, 2018 2 commits
- Syncing main.py a bit more · bc62f325
  Michael Carilli authored Nov 30, 2018
  
  bc62f325
- Adding deterministic option to main_fp16_optimizer.py · 2a8022ca
  Michael Carilli authored Nov 30, 2018
  
  2a8022ca
28 Nov, 2018 3 commits
- minor latex touchup · b436213e
  Michael Carilli authored Nov 28, 2018
  
  b436213e
- Shortening import path for layernorm · 98b76a86
  Michael Carilli authored Nov 28, 2018
  
  98b76a86
- Adding layernorm docs · 67ad3065
  Michael Carilli authored Nov 28, 2018
  
  67ad3065
14 Nov, 2018 1 commit
- Distributed backend compatibility update · 64f3d362
  mcarilli authored Nov 14, 2018
  
  64f3d362
10 Nov, 2018 1 commit
- Updating example instructions to use batch size 224 for safety · 2b8277e5
  Michael Carilli authored Nov 10, 2018
  
  2b8277e5
06 Nov, 2018 1 commit

[syncBN] · ee67e56a

Jie authored Oct 24, 2018

adjusted kernel config for better perf.
removed divergence in welford warp reduction.

ee67e56a

01 Nov, 2018 4 commits
- Minor docstring update · 8bd382fa
  Michael Carilli authored Nov 01, 2018
  
  8bd382fa
- Update adamopt docs (#73) · da515dca
  schetlur authored Nov 01, 2018
```
* Adding some missing fields to adamopt documentation.

* Adding some clarification to documentation.
```
  da515dca
- Docstring updates · 97ab5ad3
  Michael Carilli authored Nov 01, 2018
  
  97ab5ad3
- Adding switch to control averaging of gradients. · efc561ba
  Michael Carilli authored Nov 01, 2018
  
  efc561ba
31 Oct, 2018 1 commit

[WIP] Fused layer norm cuda (#69) · 1b9b65ca

Thor Johnsen authored Oct 31, 2018

* Pre-release of fused layer norm apex extension

* Remove half and __half2 specializations

* Code changes from review

1b9b65ca

30 Oct, 2018 7 commits
- Remove arch from adam compile options · 45f030db
  ngimel authored Oct 30, 2018
  
  45f030db
- Adding some missing fields to adamopt documentation. (#70) · daea4188
  mcarilli authored Oct 30, 2018
  
  daea4188
- Adam tests (#67) · d594826c
  ngimel authored Oct 30, 2018
```
* Add unittest for FusedAdam.

* Fix some bugs.

* set seed for adam test
```
  d594826c
- Merge pull request #68 from ngimel/includes · a01a7326
  ngimel authored Oct 30, 2018
```
update includes
```
  a01a7326
- update includes · ef3a0025
  Natalia Gimelshein authored Oct 30, 2018
  
  ef3a0025
- Updating documentation for merged utilities · 8124fba2
  Michael Carilli authored Oct 30, 2018
  
  8124fba2
- Warning message for FusedAdam import if unavailable · 1fa1a073
  Michael Carilli authored Oct 30, 2018
  
  1fa1a073
29 Oct, 2018 1 commit

Merging in fused adam optimizer, additional DDP features tested in 18.10 (#60) · e0bc5d62

mcarilli authored Oct 29, 2018

* test passes

* notes

* Using C++-side flatten and unflatten functions

* Adding csrc

* Persistent synchronization event so it doesn't need to be created and destroyed each time

* Interop with parameter flattening in SSD

* Added deterministic option to imagenet main.py

* Adding options to split gradient averaging and allreduce in pure fp32

* Fixing allreduce_maybe_retain call

* Fixing allreduce_fallback

* Also sync active_i_buckets from rank 0

* Making retain_allreduce_buffers compatible with/orthogonal to delay_allreduce=True|False

* Correcting syntax error, now all seems to work with SSD

* Optional cpp extension build

* Add mixed precision adam optimizer (#59)

* Add FusedAdam Optimizer to Apex that places all the math into a cuda kernel.

* Added fixes to fused_adam to get it to work with network.

* wip work on python interface for adam with options

* fix dispatch for halfs, add python options to handle optional half gradients and params

* cleanup, get rid of grid-stride loop

e0bc5d62

23 Oct, 2018 1 commit

[syncBN] (#48) · 81eef1ef

jjsjann123 authored Oct 23, 2018

* [syncBN]
  added syncBN in native pure python apex
  added fused cuda kernels used for sync BN. Using welford for mean/var
    optional installation using 'python setup.py install --cuda_ext'
  added unit test with side to side comparison between apex sync BN with
    PyTorch BN. Notice that for pytorch BN implementation, because of
    numerical issue for mean/var, the output will be slightly off.

* [syncBN PR]
  added fp16 support
  addressing review comments on:
    1. updating last pow 2
    2. look for import error when importing syncBN kernel

* [syncBN PR]
  added convert function to insert SyncBatchNorm
  refactored some kernel code

* fixing type issue (fp16/fp32/fp64)
added Kahan summation
editing unit test to use pytorch primitive ops with double, passing reasonable tests now

* updating tensor creation calls

* fixing the all_reduce contiguous tensor

* transposed all reduce results

* [syncBN]
support fp16 input & fp32 layer for apex fp16
partially fixing launch configs
enabling imagenet example to run with --sync_bn

* [syncBN PR]
Documentation added

* adjusting README

* adjusting again

* added some doc to imagenet example

* [syncBN]
  warp-level reduction
  bug fix: warp reduction logic updated. check for dummy element to avoid nan.
  improved launch config for better reduction kernels. Further improvements
would be to increase grid size.

* [syncBN]
  fixing undefined behavior in __shfl_down_sync from divergent threads in warp
reduction.
  changing at::native::empty to at::empty (upstream comments)

81eef1ef

10 Oct, 2018 2 commits
- Docstring updates · e12c1ec3
  Michael Carilli authored Oct 10, 2018
  
  e12c1ec3
- Docstring updates · 8add2b07
  Michael Carilli authored Oct 10, 2018
  
  8add2b07
08 Oct, 2018 2 commits
- Update README.md · cd788317
  mcarilli authored Oct 08, 2018
  
  cd788317
- Moving gradient division back to after the allreduce. Empirically, it appears... · fd9b02c0
  Michael Carilli authored Oct 08, 2018
```
Moving gradient division back to after the allreduce.  Empirically, it appears underflow is more of a danger than overflow.
```
  fd9b02c0
07 Oct, 2018 2 commits
- Merge branch 'master' of https://github.com/NVIDIA/apex · 9eab1ac3
  Michael Carilli authored Oct 07, 2018
  
  9eab1ac3
- Updating imagenet FP16_Optimizer example for new syntax · 2361a646
  Michael Carilli authored Oct 07, 2018
  
  2361a646
05 Oct, 2018 1 commit
- Adding set_grads_to_None option to fp16_optimizer · e4c97f32
  Michael Carilli authored Oct 05, 2018
  
  e4c97f32
03 Oct, 2018 1 commit
- Move gradient division to before the allreduce · e4af2d90
  mcarilli authored Oct 03, 2018
```
This is consistent with upstream, and safer against overflow.
```
  e4af2d90
29 Sep, 2018 4 commits
- fix error message · 2f204bca
  mcarilli authored Sep 29, 2018
  
  2f204bca
- Move other logic after forward to take advantage of GPU skew · 89fa152b
  mcarilli authored Sep 28, 2018
  
  89fa152b
- Clean up race condition test, need to figure out a clean way to create distributed unit tests · 9d731777
  Michael Carilli authored Sep 29, 2018
  
  9d731777
- Efficient bucketing (#49) · fa183ee8
  mcarilli authored Sep 28, 2018
```
* beautiful

* IT'S WORKING

* Hopefully fix race condition for fallback hook

* Updating test

* shared_param -> delayed_allreduce

* Adding a safety check

* One more check

* syntax...
```
  fa183ee8