Commits · 855808f3fc268e9715d613f3c2e56469d8c986d8 · OpenDAS / apex

26 Apr, 2019 1 commit

Replace type().ScalarType() with scalar_type() (#272) · 855808f3

ptrblck authored Apr 26, 2019

* change .type().ScalarType() to .scalar_type() + at::ScalarType::X to at::kX

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF

* revert scalar_type() to type() in AT_DISPATCH_FLOATING_TYPES

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF in welford.cu

* revert scalar_type() to type() in layer_norm_cuda_kernel.cu

* revert at::kType  to at::ScalarType::Type

* use DISPATCH_FLOAT_AND_HALF to get rid of warnings

* add dispatch mechanisms for double+float and double+float+half

855808f3

10 Apr, 2019 2 commits
- Quick kernel to clean up l2norm · 683b6e0e
  Michael Carilli authored Apr 10, 2019
  
  683b6e0e
- Kernel + sizes stress test · 1a48b26b
  Michael Carilli authored Apr 09, 2019
  
  1a48b26b
09 Apr, 2019 1 commit
- Simple cut of the kernel in place · e57f5d0e
  Michael Carilli authored Apr 09, 2019
  
  e57f5d0e
08 Apr, 2019 1 commit
- Fix for #246 · 03100f46
  Michael Carilli authored Apr 08, 2019
  
  03100f46
04 Apr, 2019 1 commit

WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f

mcarilli authored Apr 03, 2019

* Refactor to allow more flexible treatment of multiple optimizers/models/losses

* Adding _process_optimizers.py

* Created L0 tests (now passing).

* fix: minor print typo (#234)

* make L1 results easier to read

* L0 multiple model/optimizer/loss test fleshed out

* Adding test that master params remain synced across distributed processes

* Docstring updates

* Docstring updates

3f87614f

21 Mar, 2019 2 commits
- Use build macro for backward compat · 0f5e3fe0
  Syed Tousif Ahmed authored Mar 07, 2019
  
  0f5e3fe0
- Rename IntList to IntArrayRef · 2a467090
  Syed Tousif Ahmed authored Feb 22, 2019
  
  2a467090
19 Mar, 2019 2 commits
- Fixing interaction of DDP with dynamic loss scaling · 8437d295
  Michael Carilli authored Mar 19, 2019
  
  8437d295
- Multi-tensor axpby kernel for more flexible unscaling (groundwork for #163 and #179 fix) · 5e552004
  Michael Carilli authored Mar 18, 2019
  
  5e552004
15 Mar, 2019 1 commit
- Anticipating upstream #17996 · 2c8e1c86
  Michael Carilli authored Mar 15, 2019
  
  2c8e1c86
12 Mar, 2019 1 commit
- Forward/backward compatibility around pytorch 3aeb78, to fix #191 · 42180bd9
  Michael Carilli authored Mar 11, 2019
  
  42180bd9
10 Mar, 2019 2 commits
- fix includes · f34686f1
  Natalia Gimelshein authored Mar 09, 2019
  
  f34686f1
- Removing deprecated scale_check_overflow kernel · 8f53411a
  Michael Carilli authored Mar 10, 2019
  
  8f53411a
03 Mar, 2019 1 commit
- Bug fix in next power of 2 · ca6c2760
  Marek Kolodziej authored Mar 03, 2019
  
  ca6c2760
28 Feb, 2019 1 commit
- Comprehensive tests for cross product of options · d24c25b9
  Michael Carilli authored Feb 27, 2019
  
  d24c25b9
24 Feb, 2019 1 commit
- Stashing work · d137b800
  Michael Carilli authored Feb 24, 2019
  
  d137b800
22 Feb, 2019 1 commit
- Allow multi-tensor unscale to handle FP16 output, so it can also be used for... · 80a3f3ca
  Michael Carilli authored Feb 21, 2019
```
Allow multi-tensor unscale to handle FP16 output, so it can also be used for copy-scatter. Rename some options.
```
  80a3f3ca
19 Feb, 2019 1 commit
- Reworked multi tensor apply, added tests · 6763a8be
  Michael Carilli authored Feb 18, 2019
  
  6763a8be
13 Feb, 2019 1 commit
- New API tentatively works on resnet50, ready for stress testing. · 889d1712
  Michael Carilli authored Feb 12, 2019
  
  889d1712
11 Feb, 2019 1 commit
- Stashing work · fad78c16
  Michael Carilli authored Feb 10, 2019
  
  fad78c16
08 Feb, 2019 1 commit
- stashing work · 1f693b92
  Michael Carilli authored Feb 08, 2019
  
  1f693b92
06 Feb, 2019 2 commits
- Tests and resnet50 example work · a5bc76db
  Michael Carilli authored Feb 05, 2019
  
  a5bc76db
- ready for testing · 6e9159d8
  Michael Carilli authored Feb 05, 2019
  
  6e9159d8
05 Feb, 2019 1 commit
- New downscale kernel is working but not perf tested · 337056c1
  Michael Carilli authored Feb 05, 2019
  
  337056c1
04 Feb, 2019 1 commit
- Restoring fused inf/nan check + downscale kernel · fd03f26a
  Michael Carilli authored Feb 03, 2019
  
  fd03f26a
01 Feb, 2019 1 commit
- allowing syncBN to run with affine = False · 223a47e9
  jiej authored Jan 31, 2019
  
  223a47e9
18 Jan, 2019 1 commit
- patching grid reduction to be volta-safe · 38bada23
  Jie authored Jan 17, 2019
  
  38bada23
15 Jan, 2019 1 commit
- [sync BN nhwc] · 443fa76e
  Jie authored Jan 14, 2019
```
Added kernel to support sync BN for channel last tensor
```
  443fa76e
06 Nov, 2018 1 commit

[syncBN] · ee67e56a

Jie authored Oct 24, 2018

adjusted kernel config for better perf.
removed divergence in welford warp reduction.

ee67e56a

30 Oct, 2018 1 commit
- update includes · ef3a0025
  Natalia Gimelshein authored Oct 30, 2018
  
  ef3a0025
29 Oct, 2018 1 commit

Merging in fused adam optimizer, additional DDP features tested in 18.10 (#60) · e0bc5d62

mcarilli authored Oct 29, 2018

* test passes

* notes

* Using C++-side flatten and unflatten functions

* Adding csrc

* Persistent synchronization event so it doesn't need to be created and destroyed each time

* Interop with parameter flattening in SSD

* Added deterministic option to imagenet main.py

* Adding options to split gradient averaging and allreduce in pure fp32

* Fixing allreduce_maybe_retain call

* Fixing allreduce_fallback

* Also sync active_i_buckets from rank 0

* Making retain_allreduce_buffers compatible with/orthogonal to delay_allreduce=True|False

* Correcting syntax error, now all seems to work with SSD

* Optional cpp extension build

* Add mixed precision adam optimizer (#59)

* Add FusedAdam Optimizer to Apex that places all the math into a cuda kernel.

* Added fixes to fused_adam to get it to work with network.

* wip work on python interface for adam with options

* fix dispatch for halfs, add python options to handle optional half gradients and params

* cleanup, get rid of grid-stride loop

e0bc5d62

23 Oct, 2018 1 commit

[syncBN] (#48) · 81eef1ef

jjsjann123 authored Oct 23, 2018

* [syncBN]
  added syncBN in native pure python apex
  added fused cuda kernels used for sync BN. Using welford for mean/var
    optional installation using 'python setup.py install --cuda_ext'
  added unit test with side to side comparison between apex sync BN with
    PyTorch BN. Notice that for pytorch BN implementation, because of
    numerical issue for mean/var, the output will be slightly off.

* [syncBN PR]
  added fp16 support
  addressing review comments on:
    1. updating last pow 2
    2. look for import error when importing syncBN kernel

* [syncBN PR]
  added convert function to insert SyncBatchNorm
  refactored some kernel code

* fixing type issue (fp16/fp32/fp64)
added Kahan summation
editing unit test to use pytorch primitive ops with double, passing reasonable tests now

* updating tensor creation calls

* fixing the all_reduce contiguous tensor

* transposed all reduce results

* [syncBN]
support fp16 input & fp32 layer for apex fp16
partially fixing launch configs
enabling imagenet example to run with --sync_bn

* [syncBN PR]
Documentation added

* adjusting README

* adjusting again

* added some doc to imagenet example

* [syncBN]
  warp-level reduction
  bug fix: warp reduction logic updated. check for dummy element to avoid nan.
  improved launch config for better reduction kernels. Further improvements
would be to increase grid size.

* [syncBN]
  fixing undefined behavior in __shfl_down_sync from divergent threads in warp
reduction.
  changing at::native::empty to at::empty (upstream comments)

81eef1ef

23 Jul, 2018 1 commit
- Switch to simple Python-only install, in preparation for upstreaming C++ backend. · d695b68b
  Michael Carilli authored Jul 23, 2018
  
  d695b68b
22 Jun, 2018 1 commit
- Reverting changes. · bddbbdcb
  Syed Tousif Ahmed authored Jun 22, 2018
```
Co-authored-by: Michael Carilli <mcarilli@gmail.com>
```
  bddbbdcb
08 Jun, 2018 1 commit
- Compilation succeeds on 0.4, 18.04-6 containers, and current upstream master · fb075b86
  Michael Carilli authored Jun 08, 2018
  
  fb075b86
06 Jun, 2018 1 commit
- Macros based on torch.__version__ to compile with 0.4 and 0.5 · d506eff2
  Michael Carilli authored Jun 06, 2018
  
  d506eff2
26 May, 2018 1 commit
- Fleshed out Cuda version checking and compiling for multiple arches · fb7d4e1d
  Michael Carilli authored May 25, 2018
  
  fb7d4e1d
25 May, 2018 1 commit
- Transferred backend and build system to use Pytorch C++ extension + ATen dispatch. · d17a015f
  Michael Carilli authored May 25, 2018
  
  d17a015f
18 May, 2018 1 commit
- Initial support for automatic mixed precision · e733e78c
  Carl Case authored May 16, 2018
  
  e733e78c