Commits · 3f87614f3e07055bf7680cdbea2340dcf10d63c5 · OpenDAS / apex

04 Apr, 2019 1 commit

WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f

mcarilli authored Apr 03, 2019

* Refactor to allow more flexible treatment of multiple optimizers/models/losses

* Adding _process_optimizers.py

* Created L0 tests (now passing).

* fix: minor print typo (#234)

* make L1 results easier to read

* L0 multiple model/optimizer/loss test fleshed out

* Adding test that master params remain synced across distributed processes

* Docstring updates

* Docstring updates

3f87614f

22 Mar, 2019 1 commit

Check cuda version (#216) · 5b8faa29

mcarilli authored Mar 21, 2019

* Adding Torch + bare-metal nvcc version check and container build tests

* Putting a canary in the coalmine

* canary proved elusive

* Trying direct setup.py install

* this should work

* Removing canary

* hopefully this works

5b8faa29

19 Mar, 2019 1 commit
- Multi-tensor axpby kernel for more flexible unscaling (groundwork for #163 and #179 fix) · 5e552004
  Michael Carilli authored Mar 18, 2019
  
  5e552004
13 Mar, 2019 1 commit
- Casting model output as well as input, for #195 · d1f74a3e
  Michael Carilli authored Mar 12, 2019
  
  d1f74a3e
12 Mar, 2019 1 commit
- Moving test_groups.py · d27a321a
  Michael Carilli authored Mar 12, 2019
  
  d27a321a
10 Mar, 2019 1 commit
- Removing deprecated scale_check_overflow kernel · 8f53411a
  Michael Carilli authored Mar 10, 2019
  
  8f53411a
08 Mar, 2019 3 commits
- Fix for #188 · a3a09c8c
  Michael Carilli authored Mar 08, 2019
  
  a3a09c8c
- Repr for import error · 371633d5
  Michael Carilli authored Mar 08, 2019
  
  371633d5
- Stashing to test on the cluster · 59e992da
  Michael Carilli authored Mar 08, 2019
  
  59e992da
07 Mar, 2019 1 commit
- Updating error checking on property overrides · 248d7b10
  Michael Carilli authored Mar 06, 2019
  
  248d7b10
02 Mar, 2019 1 commit
- some test cleanup · 484292f0
  Michael Carilli authored Mar 02, 2019
  
  484292f0
01 Mar, 2019 4 commits
- Cherry picking RNN fix · 2445031d
  Michael Carilli authored Feb 25, 2019
  
  2445031d
- Moving common test stuff to common · 612d4193
  Michael Carilli authored Feb 28, 2019
  
  612d4193
- Cleaning up FusedAdam testing · 7c82f221
  Michael Carilli authored Feb 28, 2019
  
  7c82f221
- Adding distributed tests and support for FusedAdam · d8b5d1be
  Michael Carilli authored Feb 28, 2019
  
  d8b5d1be
28 Feb, 2019 1 commit
- Comprehensive tests for cross product of options · d24c25b9
  Michael Carilli authored Feb 27, 2019
  
  d24c25b9
26 Feb, 2019 1 commit
- No need for casts during optimizer step · 613997ea
  Michael Carilli authored Feb 26, 2019
  
  613997ea
24 Feb, 2019 1 commit
- Stashing work · d137b800
  Michael Carilli authored Feb 24, 2019
  
  d137b800
22 Feb, 2019 1 commit
- Allow multi-tensor unscale to handle FP16 output, so it can also be used for... · 80a3f3ca
  Michael Carilli authored Feb 21, 2019
```
Allow multi-tensor unscale to handle FP16 output, so it can also be used for copy-scatter. Rename some options.
```
  80a3f3ca
19 Feb, 2019 1 commit
- Reworked multi tensor apply, added tests · 6763a8be
  Michael Carilli authored Feb 18, 2019
  
  6763a8be
16 Feb, 2019 3 commits
- moved process group creation into apex so it can be called by users · 37cd5dfd
  root authored Feb 16, 2019
  
  37cd5dfd
- fixing it to work properly in multi-node environment · e49dca6e
  root authored Feb 16, 2019
  
  e49dca6e
- adding test_groups.py to unit_test.sh · 598fbc88
  root authored Feb 16, 2019
  
  598fbc88
13 Feb, 2019 1 commit
- New API tentatively works on resnet50, ready for stress testing. · 889d1712
  Michael Carilli authored Feb 12, 2019
  
  889d1712
08 Feb, 2019 2 commits
- printout message update · f5725555
  Evgeni Krimer authored Feb 08, 2019
  
  f5725555
- a test and example for sync (group) bn with group_size<world_size · 18d412a6
  Evgeni Krimer authored Feb 08, 2019
  
  18d412a6
06 Feb, 2019 1 commit
- Tests for the fused downscale kernel · 340e71a4
  Michael Carilli authored Feb 05, 2019
  
  340e71a4
05 Feb, 2019 1 commit

Better FP16 support in pytorch fp16 utils. · 713e0fb8

Jerry Ma authored Feb 01, 2019

This commit adds an FP16Model class as a successor to network_to_half.

The benefits of this class are:

- Preservation of single-precision for BatchNorm layers. The models
  generated by network_to_half() convert BatchNorm moment tensors to
  half-precision, then back to single-precision, which hurts the
  accuracy of the moment estimators and occasionally results in NaNs.
- Support for multi-argument nn.Modules (self-explanatory from code).

713e0fb8

03 Feb, 2019 1 commit
- Lazy imports to reduce error spam · 48299b0d
  Michael Carilli authored Feb 02, 2019
  
  48299b0d
01 Feb, 2019 1 commit
- async->non_blocking, module-specific logging · cc85a2e5
  Michael Carilli authored Feb 01, 2019
  
  cc85a2e5
29 Jan, 2019 3 commits
- Update two_gpu_unit_test.py · 8b9ce244
  mcarilli authored Jan 28, 2019
  
  8b9ce244
- Update two_gpu_unit_test.py · d0624f4f
  mcarilli authored Jan 28, 2019
  
  d0624f4f
- adding comment to explain single process gradient averaging · c8d7c9f1
  jiej authored Jan 28, 2019
  
  c8d7c9f1
28 Jan, 2019 1 commit

[syncBN] · 63e47d29

jiej authored Jan 28, 2019

test update to resolve
  https://github.com/NVIDIA/apex/issues/134#issue-403525480

Using identical learning rate for both DDP with sync BN and single process BN.
The previous configure leaves the impression that sync BN requires adjusting lr
in the script, which is not true.

63e47d29

25 Jan, 2019 1 commit
- Adding tests, also, don't drop cache during eval. · dfd40f9a
  Michael Carilli authored Jan 24, 2019
  
  dfd40f9a
15 Jan, 2019 1 commit
- [sync BN nhwc] · 443fa76e
  Jie authored Jan 14, 2019
```
Added kernel to support sync BN for channel last tensor
```
  443fa76e
15 Dec, 2018 1 commit
- add unit tests for optimizers/fp16_optimizer · afc8d1b2
  Deyu Fu authored Dec 14, 2018
  
  afc8d1b2
01 Nov, 2018 1 commit
- Adding switch to control averaging of gradients. · efc561ba
  Michael Carilli authored Nov 01, 2018
  
  efc561ba
30 Oct, 2018 1 commit

Adam tests (#67) · d594826c

ngimel authored Oct 30, 2018

* Add unittest for FusedAdam.

* Fix some bugs.

* set seed for adam test

d594826c

29 Oct, 2018 1 commit

Merging in fused adam optimizer, additional DDP features tested in 18.10 (#60) · e0bc5d62

mcarilli authored Oct 29, 2018

* test passes

* notes

* Using C++-side flatten and unflatten functions

* Adding csrc

* Persistent synchronization event so it doesn't need to be created and destroyed each time

* Interop with parameter flattening in SSD

* Added deterministic option to imagenet main.py

* Adding options to split gradient averaging and allreduce in pure fp32

* Fixing allreduce_maybe_retain call

* Fixing allreduce_fallback

* Also sync active_i_buckets from rank 0

* Making retain_allreduce_buffers compatible with/orthogonal to delay_allreduce=True|False

* Correcting syntax error, now all seems to work with SSD

* Optional cpp extension build

* Add mixed precision adam optimizer (#59)

* Add FusedAdam Optimizer to Apex that places all the math into a cuda kernel.

* Added fixes to fused_adam to get it to work with network.

* wip work on python interface for adam with options

* fix dispatch for halfs, add python options to handle optional half gradients and params

* cleanup, get rid of grid-stride loop

e0bc5d62