Commits · 651150cb285f33541e07372819f559c9f5083cc1 · OpenDAS / apex

18 Apr, 2019 1 commit
- cleanup · 651150cb
  Michael Carilli authored Apr 18, 2019
  
  651150cb
10 Apr, 2019 2 commits
- Quick kernel to clean up l2norm · 683b6e0e
  Michael Carilli authored Apr 10, 2019
  
  683b6e0e
- Kernel + sizes stress test · 1a48b26b
  Michael Carilli authored Apr 09, 2019
  
  1a48b26b
09 Apr, 2019 1 commit
- Simple cut of the kernel in place · e57f5d0e
  Michael Carilli authored Apr 09, 2019
  
  e57f5d0e
08 Apr, 2019 1 commit
- Fix for #246 · 03100f46
  Michael Carilli authored Apr 08, 2019
  
  03100f46
04 Apr, 2019 1 commit

WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f

mcarilli authored Apr 03, 2019

* Refactor to allow more flexible treatment of multiple optimizers/models/losses

* Adding _process_optimizers.py

* Created L0 tests (now passing).

* fix: minor print typo (#234)

* make L1 results easier to read

* L0 multiple model/optimizer/loss test fleshed out

* Adding test that master params remain synced across distributed processes

* Docstring updates

* Docstring updates

3f87614f

21 Mar, 2019 2 commits
- Use build macro for backward compat · 0f5e3fe0
  Syed Tousif Ahmed authored Mar 07, 2019
  
  0f5e3fe0
- Rename IntList to IntArrayRef · 2a467090
  Syed Tousif Ahmed authored Feb 22, 2019
  
  2a467090
19 Mar, 2019 2 commits
- Fixing interaction of DDP with dynamic loss scaling · 8437d295
  Michael Carilli authored Mar 19, 2019
  
  8437d295
- Multi-tensor axpby kernel for more flexible unscaling (groundwork for #163 and #179 fix) · 5e552004
  Michael Carilli authored Mar 18, 2019
  
  5e552004
15 Mar, 2019 1 commit
- Anticipating upstream #17996 · 2c8e1c86
  Michael Carilli authored Mar 15, 2019
  
  2c8e1c86
12 Mar, 2019 1 commit
- Forward/backward compatibility around pytorch 3aeb78, to fix #191 · 42180bd9
  Michael Carilli authored Mar 11, 2019
  
  42180bd9
11 Mar, 2019 2 commits
- Fix momentum initialization with weight decay · 724672d7
  Simon Layton authored Mar 11, 2019
  
  724672d7
- Fix dispatch, add wd after momentum option · b265b0b5
  Simon Layton authored Mar 11, 2019
```
Fix dispatch where we have a parameter group with multiple
combinations of types
Optionally apply weight decay after momentum
```
  b265b0b5
10 Mar, 2019 2 commits
- fix includes · f34686f1
  Natalia Gimelshein authored Mar 09, 2019
  
  f34686f1
- Removing deprecated scale_check_overflow kernel · 8f53411a
  Michael Carilli authored Mar 10, 2019
  
  8f53411a
09 Mar, 2019 1 commit
- Fix momentum in non-nesterov case · ac74f345
  Simon Layton authored Mar 08, 2019
  
  ac74f345
08 Mar, 2019 5 commits
- Simplify noop exit condition · 6d6f0bc2
  Simon Layton authored Mar 08, 2019
  
  6d6f0bc2
- Handle fp16 weights case without forcing fp16 math · a2799893
  Simon Layton authored Mar 08, 2019
```
Incorrect types used in a few places
```
  a2799893
- Simplify C++-side logic · 75c8a97a
  Simon Layton authored Mar 08, 2019
```
Only support the 4 specific cases we care about
Remove more general set of switch statements
```
  75c8a97a
- Code cleanup, add fused fp16 read / write · cac061a1
  Simon Layton authored Mar 08, 2019
```
Fuse in fp16 gradient -> fp32 convert
Additional option fp16 weight copy written out
```
  cac061a1
- Fused multi-tensor SGD · cadad920
  Simon Layton authored Mar 08, 2019
```
Initial implementation, all fp32
Tested against torch.optim.sgd
```
  cadad920
03 Mar, 2019 1 commit
- Bug fix in next power of 2 · ca6c2760
  Marek Kolodziej authored Mar 03, 2019
  
  ca6c2760
28 Feb, 2019 1 commit
- Comprehensive tests for cross product of options · d24c25b9
  Michael Carilli authored Feb 27, 2019
  
  d24c25b9
24 Feb, 2019 1 commit
- Stashing work · d137b800
  Michael Carilli authored Feb 24, 2019
  
  d137b800
22 Feb, 2019 1 commit
- Allow multi-tensor unscale to handle FP16 output, so it can also be used for... · 80a3f3ca
  Michael Carilli authored Feb 21, 2019
```
Allow multi-tensor unscale to handle FP16 output, so it can also be used for copy-scatter. Rename some options.
```
  80a3f3ca
19 Feb, 2019 1 commit
- Reworked multi tensor apply, added tests · 6763a8be
  Michael Carilli authored Feb 18, 2019
  
  6763a8be
13 Feb, 2019 1 commit
- New API tentatively works on resnet50, ready for stress testing. · 889d1712
  Michael Carilli authored Feb 12, 2019
  
  889d1712
11 Feb, 2019 1 commit
- Stashing work · fad78c16
  Michael Carilli authored Feb 10, 2019
  
  fad78c16
08 Feb, 2019 1 commit
- stashing work · 1f693b92
  Michael Carilli authored Feb 08, 2019
  
  1f693b92
06 Feb, 2019 2 commits
- Tests and resnet50 example work · a5bc76db
  Michael Carilli authored Feb 05, 2019
  
  a5bc76db
- ready for testing · 6e9159d8
  Michael Carilli authored Feb 05, 2019
  
  6e9159d8
05 Feb, 2019 1 commit
- New downscale kernel is working but not perf tested · 337056c1
  Michael Carilli authored Feb 05, 2019
  
  337056c1
04 Feb, 2019 1 commit
- Restoring fused inf/nan check + downscale kernel · fd03f26a
  Michael Carilli authored Feb 03, 2019
  
  fd03f26a
01 Feb, 2019 1 commit
- allowing syncBN to run with affine = False · 223a47e9
  jiej authored Jan 31, 2019
  
  223a47e9
18 Jan, 2019 1 commit
- patching grid reduction to be volta-safe · 38bada23
  Jie authored Jan 17, 2019
  
  38bada23
15 Jan, 2019 1 commit
- [sync BN nhwc] · 443fa76e
  Jie authored Jan 14, 2019
```
Added kernel to support sync BN for channel last tensor
```
  443fa76e
06 Nov, 2018 1 commit

[syncBN] · ee67e56a

Jie authored Oct 24, 2018

adjusted kernel config for better perf.
removed divergence in welford warp reduction.

ee67e56a

30 Oct, 2018 1 commit
- update includes · ef3a0025
  Natalia Gimelshein authored Oct 30, 2018
  
  ef3a0025
29 Oct, 2018 1 commit

Merging in fused adam optimizer, additional DDP features tested in 18.10 (#60) · e0bc5d62

mcarilli authored Oct 29, 2018

* test passes

* notes

* Using C++-side flatten and unflatten functions

* Adding csrc

* Persistent synchronization event so it doesn't need to be created and destroyed each time

* Interop with parameter flattening in SSD

* Added deterministic option to imagenet main.py

* Adding options to split gradient averaging and allreduce in pure fp32

* Fixing allreduce_maybe_retain call

* Fixing allreduce_fallback

* Also sync active_i_buckets from rank 0

* Making retain_allreduce_buffers compatible with/orthogonal to delay_allreduce=True|False

* Correcting syntax error, now all seems to work with SSD

* Optional cpp extension build

* Add mixed precision adam optimizer (#59)

* Add FusedAdam Optimizer to Apex that places all the math into a cuda kernel.

* Added fixes to fused_adam to get it to work with network.

* wip work on python interface for adam with options

* fix dispatch for halfs, add python options to handle optional half gradients and params

* cleanup, get rid of grid-stride loop

e0bc5d62