Commits · 75c8a97a3c4bc889875ea7f9a56dcf89f817ac30 · OpenDAS / apex

08 Mar, 2019 3 commits

Simon Layton authored Mar 08, 2019

Only support the 4 specific cases we care about
Remove more general set of switch statements

75c8a97a

Code cleanup, add fused fp16 read / write · cac061a1
Simon Layton authored Mar 08, 2019
```
Fuse in fp16 gradient -> fp32 convert
Additional option fp16 weight copy written out
```
cac061a1
Fused multi-tensor SGD · cadad920
Simon Layton authored Mar 08, 2019
```
Initial implementation, all fp32
Tested against torch.optim.sgd
```
cadad920

03 Mar, 2019 1 commit
- Bug fix in next power of 2 · ca6c2760
  Marek Kolodziej authored Mar 03, 2019
  
  ca6c2760
28 Feb, 2019 1 commit
- Comprehensive tests for cross product of options · d24c25b9
  Michael Carilli authored Feb 27, 2019
  
  d24c25b9
24 Feb, 2019 1 commit
- Stashing work · d137b800
  Michael Carilli authored Feb 24, 2019
  
  d137b800
22 Feb, 2019 1 commit
- Allow multi-tensor unscale to handle FP16 output, so it can also be used for... · 80a3f3ca
  Michael Carilli authored Feb 21, 2019
```
Allow multi-tensor unscale to handle FP16 output, so it can also be used for copy-scatter. Rename some options.
```
  80a3f3ca
19 Feb, 2019 1 commit
- Reworked multi tensor apply, added tests · 6763a8be
  Michael Carilli authored Feb 18, 2019
  
  6763a8be
13 Feb, 2019 1 commit
- New API tentatively works on resnet50, ready for stress testing. · 889d1712
  Michael Carilli authored Feb 12, 2019
  
  889d1712
11 Feb, 2019 1 commit
- Stashing work · fad78c16
  Michael Carilli authored Feb 10, 2019
  
  fad78c16
08 Feb, 2019 1 commit
- stashing work · 1f693b92
  Michael Carilli authored Feb 08, 2019
  
  1f693b92
06 Feb, 2019 2 commits
- Tests and resnet50 example work · a5bc76db
  Michael Carilli authored Feb 05, 2019
  
  a5bc76db
- ready for testing · 6e9159d8
  Michael Carilli authored Feb 05, 2019
  
  6e9159d8
05 Feb, 2019 1 commit
- New downscale kernel is working but not perf tested · 337056c1
  Michael Carilli authored Feb 05, 2019
  
  337056c1
04 Feb, 2019 1 commit
- Restoring fused inf/nan check + downscale kernel · fd03f26a
  Michael Carilli authored Feb 03, 2019
  
  fd03f26a
01 Feb, 2019 1 commit
- allowing syncBN to run with affine = False · 223a47e9
  jiej authored Jan 31, 2019
  
  223a47e9
18 Jan, 2019 1 commit
- patching grid reduction to be volta-safe · 38bada23
  Jie authored Jan 17, 2019
  
  38bada23
15 Jan, 2019 1 commit
- [sync BN nhwc] · 443fa76e
  Jie authored Jan 14, 2019
```
Added kernel to support sync BN for channel last tensor
```
  443fa76e
06 Nov, 2018 1 commit

[syncBN] · ee67e56a

Jie authored Oct 24, 2018

adjusted kernel config for better perf.
removed divergence in welford warp reduction.

ee67e56a

30 Oct, 2018 1 commit
- update includes · ef3a0025
  Natalia Gimelshein authored Oct 30, 2018
  
  ef3a0025
29 Oct, 2018 1 commit

Merging in fused adam optimizer, additional DDP features tested in 18.10 (#60) · e0bc5d62

mcarilli authored Oct 29, 2018

* test passes

* notes

* Using C++-side flatten and unflatten functions

* Adding csrc

* Persistent synchronization event so it doesn't need to be created and destroyed each time

* Interop with parameter flattening in SSD

* Added deterministic option to imagenet main.py

* Adding options to split gradient averaging and allreduce in pure fp32

* Fixing allreduce_maybe_retain call

* Fixing allreduce_fallback

* Also sync active_i_buckets from rank 0

* Making retain_allreduce_buffers compatible with/orthogonal to delay_allreduce=True|False

* Correcting syntax error, now all seems to work with SSD

* Optional cpp extension build

* Add mixed precision adam optimizer (#59)

* Add FusedAdam Optimizer to Apex that places all the math into a cuda kernel.

* Added fixes to fused_adam to get it to work with network.

* wip work on python interface for adam with options

* fix dispatch for halfs, add python options to handle optional half gradients and params

* cleanup, get rid of grid-stride loop

e0bc5d62

23 Oct, 2018 1 commit

[syncBN] (#48) · 81eef1ef

jjsjann123 authored Oct 23, 2018

* [syncBN]
  added syncBN in native pure python apex
  added fused cuda kernels used for sync BN. Using welford for mean/var
    optional installation using 'python setup.py install --cuda_ext'
  added unit test with side to side comparison between apex sync BN with
    PyTorch BN. Notice that for pytorch BN implementation, because of
    numerical issue for mean/var, the output will be slightly off.

* [syncBN PR]
  added fp16 support
  addressing review comments on:
    1. updating last pow 2
    2. look for import error when importing syncBN kernel

* [syncBN PR]
  added convert function to insert SyncBatchNorm
  refactored some kernel code

* fixing type issue (fp16/fp32/fp64)
added Kahan summation
editing unit test to use pytorch primitive ops with double, passing reasonable tests now

* updating tensor creation calls

* fixing the all_reduce contiguous tensor

* transposed all reduce results

* [syncBN]
support fp16 input & fp32 layer for apex fp16
partially fixing launch configs
enabling imagenet example to run with --sync_bn

* [syncBN PR]
Documentation added

* adjusting README

* adjusting again

* added some doc to imagenet example

* [syncBN]
  warp-level reduction
  bug fix: warp reduction logic updated. check for dummy element to avoid nan.
  improved launch config for better reduction kernels. Further improvements
would be to increase grid size.

* [syncBN]
  fixing undefined behavior in __shfl_down_sync from divergent threads in warp
reduction.
  changing at::native::empty to at::empty (upstream comments)

81eef1ef

23 Jul, 2018 1 commit
- Switch to simple Python-only install, in preparation for upstreaming C++ backend. · d695b68b
  Michael Carilli authored Jul 23, 2018
  
  d695b68b
22 Jun, 2018 1 commit
- Reverting changes. · bddbbdcb
  Syed Tousif Ahmed authored Jun 22, 2018
```
Co-authored-by: Michael Carilli <mcarilli@gmail.com>
```
  bddbbdcb
08 Jun, 2018 1 commit
- Compilation succeeds on 0.4, 18.04-6 containers, and current upstream master · fb075b86
  Michael Carilli authored Jun 08, 2018
  
  fb075b86
06 Jun, 2018 1 commit
- Macros based on torch.__version__ to compile with 0.4 and 0.5 · d506eff2
  Michael Carilli authored Jun 06, 2018
  
  d506eff2
26 May, 2018 1 commit
- Fleshed out Cuda version checking and compiling for multiple arches · fb7d4e1d
  Michael Carilli authored May 25, 2018
  
  fb7d4e1d
25 May, 2018 1 commit
- Transferred backend and build system to use Pytorch C++ extension + ATen dispatch. · d17a015f
  Michael Carilli authored May 25, 2018
  
  d17a015f
18 May, 2018 1 commit
- Initial support for automatic mixed precision · e733e78c
  Carl Case authored May 16, 2018
  
  e733e78c
25 Apr, 2018 2 commits
- Cleaned comments in fp16_utils and csrc. Keeping comments that are non-docstring but informative. · a3e2776a
  Michael Carilli authored Apr 25, 2018
  
  a3e2776a
- Initial release · 2fa4dbaf
  Christian Sarofeen authored Apr 25, 2018
  
  2fa4dbaf