Commits · a99e18758090e859238e531702915aeeaaaed8f6 · OpenDAS / apex

17 Dec, 2018 3 commits
- Fix deprecation warnings for ReduceOp · a99e1875
  Michael Carilli authored Dec 17, 2018
  
  a99e1875
- Compatibility with new_group() API · 35891b28
  Michael Carilli authored Dec 17, 2018
  
  35891b28
- Adding retain_graph=True option · 48343d94
  Michael Carilli authored Dec 17, 2018
  
  48343d94
14 Dec, 2018 1 commit
- Attempt to fix 97 (not sure why it's happening to begin with) · 4212b3e9
  Michael Carilli authored Dec 13, 2018
  
  4212b3e9
12 Dec, 2018 1 commit
- Warning instead of error if nvcc is not found · 197bcc48
  Michael Carilli authored Dec 12, 2018
  
  197bcc48
11 Dec, 2018 5 commits
- Merge pull request #92 from FDecaYed/deyuf/fused_adam · b56a2088
  mcarilli authored Dec 11, 2018
```
WIP: improve fused adam
```
  b56a2088
- address comment · b6188fc4
  Deyu Fu authored Dec 11, 2018
  
  b6188fc4
- improve backward compatibility · d665ab90
  Deyu Fu authored Dec 10, 2018
  
  d665ab90
- better print · f06feced
  Deyu Fu authored Dec 10, 2018
  
  f06feced
- remove temporarily added lr warmup · c8ca4bf4
  Deyu Fu authored Dec 10, 2018
```
This way optimizer will be more general, and warm up should be handled by user
```
  c8ca4bf4
10 Dec, 2018 2 commits
- Merge pull request #94 from NVIDIA/sbn_util · 3f9b5c98
  mcarilli authored Dec 10, 2018
```
Adding process group in convert_syncbn_model
```
  3f9b5c98
- Adding process group in convert_syncbn_model · 6d3c75e5
  Jie authored Dec 10, 2018
  
  6d3c75e5
07 Dec, 2018 1 commit
- address comments · 4d9dcb57
  Deyu Fu authored Dec 07, 2018
  
  4d9dcb57
06 Dec, 2018 1 commit
- WIP: improve fused adam · be42aad5
  Deyu Fu authored Dec 05, 2018
  
  be42aad5
04 Dec, 2018 2 commits
- Merge pull request #89 from ptrblck/update_examples · 920da6da
  mcarilli authored Dec 04, 2018
```
Update examples to PyTorch >=0.4.0
```
  920da6da
- call .float() on GPU, remove unnecessary push to GPU · 9ccebe5b
  ptrblck authored Dec 04, 2018
  
  9ccebe5b
03 Dec, 2018 2 commits

mcarilli authored Dec 03, 2018

adjusted kernel config for better perf.
removed divergence in welford warp reduction.

0273d7ad

[syncBN] (#90) · 5dad4c21
jjsjann123 authored Dec 03, 2018
```
supporting user specified process group
```
5dad4c21

02 Dec, 2018 1 commit
- update examples to PyTorch >=0.4.0 · 28bdc04e
  ptrblck authored Dec 02, 2018
  
  28bdc04e
30 Nov, 2018 2 commits
- Syncing main.py a bit more · bc62f325
  Michael Carilli authored Nov 30, 2018
  
  bc62f325
- Adding deterministic option to main_fp16_optimizer.py · 2a8022ca
  Michael Carilli authored Nov 30, 2018
  
  2a8022ca
28 Nov, 2018 3 commits
- minor latex touchup · b436213e
  Michael Carilli authored Nov 28, 2018
  
  b436213e
- Shortening import path for layernorm · 98b76a86
  Michael Carilli authored Nov 28, 2018
  
  98b76a86
- Adding layernorm docs · 67ad3065
  Michael Carilli authored Nov 28, 2018
  
  67ad3065
14 Nov, 2018 1 commit
- Distributed backend compatibility update · 64f3d362
  mcarilli authored Nov 14, 2018
  
  64f3d362
10 Nov, 2018 1 commit
- Updating example instructions to use batch size 224 for safety · 2b8277e5
  Michael Carilli authored Nov 10, 2018
  
  2b8277e5
06 Nov, 2018 1 commit

[syncBN] · ee67e56a

Jie authored Oct 24, 2018

adjusted kernel config for better perf.
removed divergence in welford warp reduction.

ee67e56a

01 Nov, 2018 4 commits
- Minor docstring update · 8bd382fa
  Michael Carilli authored Nov 01, 2018
  
  8bd382fa
- Update adamopt docs (#73) · da515dca
  schetlur authored Nov 01, 2018
```
* Adding some missing fields to adamopt documentation.

* Adding some clarification to documentation.
```
  da515dca
- Docstring updates · 97ab5ad3
  Michael Carilli authored Nov 01, 2018
  
  97ab5ad3
- Adding switch to control averaging of gradients. · efc561ba
  Michael Carilli authored Nov 01, 2018
  
  efc561ba
31 Oct, 2018 1 commit

[WIP] Fused layer norm cuda (#69) · 1b9b65ca

Thor Johnsen authored Oct 31, 2018

* Pre-release of fused layer norm apex extension

* Remove half and __half2 specializations

* Code changes from review

1b9b65ca

30 Oct, 2018 7 commits
- Remove arch from adam compile options · 45f030db
  ngimel authored Oct 30, 2018
  
  45f030db
- Adding some missing fields to adamopt documentation. (#70) · daea4188
  mcarilli authored Oct 30, 2018
  
  daea4188
- Adam tests (#67) · d594826c
  ngimel authored Oct 30, 2018
```
* Add unittest for FusedAdam.

* Fix some bugs.

* set seed for adam test
```
  d594826c
- Merge pull request #68 from ngimel/includes · a01a7326
  ngimel authored Oct 30, 2018
```
update includes
```
  a01a7326
- update includes · ef3a0025
  Natalia Gimelshein authored Oct 30, 2018
  
  ef3a0025
- Updating documentation for merged utilities · 8124fba2
  Michael Carilli authored Oct 30, 2018
  
  8124fba2
- Warning message for FusedAdam import if unavailable · 1fa1a073
  Michael Carilli authored Oct 30, 2018
  
  1fa1a073
29 Oct, 2018 1 commit

Merging in fused adam optimizer, additional DDP features tested in 18.10 (#60) · e0bc5d62

mcarilli authored Oct 29, 2018

* test passes

* notes

* Using C++-side flatten and unflatten functions

* Adding csrc

* Persistent synchronization event so it doesn't need to be created and destroyed each time

* Interop with parameter flattening in SSD

* Added deterministic option to imagenet main.py

* Adding options to split gradient averaging and allreduce in pure fp32

* Fixing allreduce_maybe_retain call

* Fixing allreduce_fallback

* Also sync active_i_buckets from rank 0

* Making retain_allreduce_buffers compatible with/orthogonal to delay_allreduce=True|False

* Correcting syntax error, now all seems to work with SSD

* Optional cpp extension build

* Add mixed precision adam optimizer (#59)

* Add FusedAdam Optimizer to Apex that places all the math into a cuda kernel.

* Added fixes to fused_adam to get it to work with network.

* wip work on python interface for adam with options

* fix dispatch for halfs, add python options to handle optional half gradients and params

* cleanup, get rid of grid-stride loop

e0bc5d62