Commits · 3d01e4a0a188cc8df54bc6e44cf5eb40ff6b4cc5 · OpenDAS / apex

12 Jul, 2019 1 commit
- Add missing semicolon. (#390) · 80e0143e
  Edward Z. Yang authored Jul 12, 2019
```
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
```
  80e0143e
03 Jul, 2019 2 commits
- Pulling in deprecation warning changes · 665b2dd7
  Michael Carilli authored Jul 03, 2019
  
  665b2dd7
- Remove deprecated Type.h · 816813f9
  Michael Carilli authored Jul 03, 2019
  
  816813f9
28 Jun, 2019 1 commit
- Add support for fp16 update term (new UPD_T typename in template) · 3aeea0d8
  Thor Johnsen authored Jun 28, 2019
  
  3aeea0d8
14 Jun, 2019 1 commit
- Separate LDG/STG from compute loop (#359) · 121a2500
  Thor Johnsen authored Jun 13, 2019
  
  121a2500
11 Jun, 2019 1 commit
- Allow multi_tensor_lamb to update fp16 params · 47e3367f
  Michael Carilli authored Jun 11, 2019
  
  47e3367f
31 May, 2019 2 commits

Multi tensor lamb optimizer (#334) · 8be5b6be

Thor Johnsen authored May 31, 2019

* First draft, for discussion

* Fix mistakes in LAMB equations

* Add loop over chunk

* Bug fix

* Bug fix

* Bug fix

* Undo bug fix

* Bug fix

* Add multi tensor LAMB optimizer to setup.py

* Rename step_size to learning_rate

* Fix compilation errors

8be5b6be

Give multi-tensor L2 norm the ability to compute norms per-tensor as well as globally (#333) · 93338e62

mcarilli authored May 31, 2019

* Existing tests passing, still need to add per-tensor tests

* Test is passing, still need to measure performance

* ILP for l2norm functor

93338e62

26 Apr, 2019 1 commit

Replace type().ScalarType() with scalar_type() (#272) · 855808f3

ptrblck authored Apr 26, 2019

* change .type().ScalarType() to .scalar_type() + at::ScalarType::X to at::kX

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF

* revert scalar_type() to type() in AT_DISPATCH_FLOATING_TYPES

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF in welford.cu

* revert scalar_type() to type() in layer_norm_cuda_kernel.cu

* revert at::kType  to at::ScalarType::Type

* use DISPATCH_FLOAT_AND_HALF to get rid of warnings

* add dispatch mechanisms for double+float and double+float+half

855808f3

10 Apr, 2019 2 commits
- Quick kernel to clean up l2norm · 683b6e0e
  Michael Carilli authored Apr 10, 2019
  
  683b6e0e
- Kernel + sizes stress test · 1a48b26b
  Michael Carilli authored Apr 09, 2019
  
  1a48b26b
09 Apr, 2019 1 commit
- Simple cut of the kernel in place · e57f5d0e
  Michael Carilli authored Apr 09, 2019
  
  e57f5d0e
08 Apr, 2019 1 commit
- Fix for #246 · 03100f46
  Michael Carilli authored Apr 08, 2019
  
  03100f46
04 Apr, 2019 1 commit

WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f

mcarilli authored Apr 03, 2019

* Refactor to allow more flexible treatment of multiple optimizers/models/losses

* Adding _process_optimizers.py

* Created L0 tests (now passing).

* fix: minor print typo (#234)

* make L1 results easier to read

* L0 multiple model/optimizer/loss test fleshed out

* Adding test that master params remain synced across distributed processes

* Docstring updates

* Docstring updates

3f87614f

21 Mar, 2019 2 commits
- Use build macro for backward compat · 0f5e3fe0
  Syed Tousif Ahmed authored Mar 07, 2019
  
  0f5e3fe0
- Rename IntList to IntArrayRef · 2a467090
  Syed Tousif Ahmed authored Feb 22, 2019
  
  2a467090
19 Mar, 2019 2 commits
- Fixing interaction of DDP with dynamic loss scaling · 8437d295
  Michael Carilli authored Mar 19, 2019
  
  8437d295
- Multi-tensor axpby kernel for more flexible unscaling (groundwork for #163 and #179 fix) · 5e552004
  Michael Carilli authored Mar 18, 2019
  
  5e552004
15 Mar, 2019 1 commit
- Anticipating upstream #17996 · 2c8e1c86
  Michael Carilli authored Mar 15, 2019
  
  2c8e1c86
12 Mar, 2019 1 commit
- Forward/backward compatibility around pytorch 3aeb78, to fix #191 · 42180bd9
  Michael Carilli authored Mar 11, 2019
  
  42180bd9
10 Mar, 2019 2 commits
- fix includes · f34686f1
  Natalia Gimelshein authored Mar 09, 2019
  
  f34686f1
- Removing deprecated scale_check_overflow kernel · 8f53411a
  Michael Carilli authored Mar 10, 2019
  
  8f53411a
03 Mar, 2019 1 commit
- Bug fix in next power of 2 · ca6c2760
  Marek Kolodziej authored Mar 03, 2019
  
  ca6c2760
28 Feb, 2019 1 commit
- Comprehensive tests for cross product of options · d24c25b9
  Michael Carilli authored Feb 27, 2019
  
  d24c25b9
24 Feb, 2019 1 commit
- Stashing work · d137b800
  Michael Carilli authored Feb 24, 2019
  
  d137b800
22 Feb, 2019 1 commit
- Allow multi-tensor unscale to handle FP16 output, so it can also be used for... · 80a3f3ca
  Michael Carilli authored Feb 21, 2019
```
Allow multi-tensor unscale to handle FP16 output, so it can also be used for copy-scatter. Rename some options.
```
  80a3f3ca
19 Feb, 2019 1 commit
- Reworked multi tensor apply, added tests · 6763a8be
  Michael Carilli authored Feb 18, 2019
  
  6763a8be
13 Feb, 2019 1 commit
- New API tentatively works on resnet50, ready for stress testing. · 889d1712
  Michael Carilli authored Feb 12, 2019
  
  889d1712
11 Feb, 2019 1 commit
- Stashing work · fad78c16
  Michael Carilli authored Feb 10, 2019
  
  fad78c16
08 Feb, 2019 1 commit
- stashing work · 1f693b92
  Michael Carilli authored Feb 08, 2019
  
  1f693b92
06 Feb, 2019 2 commits
- Tests and resnet50 example work · a5bc76db
  Michael Carilli authored Feb 05, 2019
  
  a5bc76db
- ready for testing · 6e9159d8
  Michael Carilli authored Feb 05, 2019
  
  6e9159d8
05 Feb, 2019 1 commit
- New downscale kernel is working but not perf tested · 337056c1
  Michael Carilli authored Feb 05, 2019
  
  337056c1
04 Feb, 2019 1 commit
- Restoring fused inf/nan check + downscale kernel · fd03f26a
  Michael Carilli authored Feb 03, 2019
  
  fd03f26a
01 Feb, 2019 1 commit
- allowing syncBN to run with affine = False · 223a47e9
  jiej authored Jan 31, 2019
  
  223a47e9
18 Jan, 2019 1 commit
- patching grid reduction to be volta-safe · 38bada23
  Jie authored Jan 17, 2019
  
  38bada23
15 Jan, 2019 1 commit
- [sync BN nhwc] · 443fa76e
  Jie authored Jan 14, 2019
```
Added kernel to support sync BN for channel last tensor
```
  443fa76e
06 Nov, 2018 1 commit

[syncBN] · ee67e56a

Jie authored Oct 24, 2018

adjusted kernel config for better perf.
removed divergence in welford warp reduction.

ee67e56a

30 Oct, 2018 1 commit
- update includes · ef3a0025
  Natalia Gimelshein authored Oct 30, 2018
  
  ef3a0025
29 Oct, 2018 1 commit

Merging in fused adam optimizer, additional DDP features tested in 18.10 (#60) · e0bc5d62

mcarilli authored Oct 29, 2018

* test passes

* notes

* Using C++-side flatten and unflatten functions

* Adding csrc

* Persistent synchronization event so it doesn't need to be created and destroyed each time

* Interop with parameter flattening in SSD

* Added deterministic option to imagenet main.py

* Adding options to split gradient averaging and allreduce in pure fp32

* Fixing allreduce_maybe_retain call

* Fixing allreduce_fallback

* Also sync active_i_buckets from rank 0

* Making retain_allreduce_buffers compatible with/orthogonal to delay_allreduce=True|False

* Correcting syntax error, now all seems to work with SSD

* Optional cpp extension build

* Add mixed precision adam optimizer (#59)

* Add FusedAdam Optimizer to Apex that places all the math into a cuda kernel.

* Added fixes to fused_adam to get it to work with network.

* wip work on python interface for adam with options

* fix dispatch for halfs, add python options to handle optional half gradients and params

* cleanup, get rid of grid-stride loop

e0bc5d62