Commits · 843cdbe01a63fe2e0eca35e4d910130b1b8a5aad · OpenDAS / apex

"vscode:/vscode.git/clone" did not exist on "1b5b7de5daef4fbc93934e525ab3c0e8c7d029d1"

10 Apr, 2019 3 commits
- add new tests to run_test.py · 6d40465a
  Lam Dang authored Apr 10, 2019
  
  6d40465a
- quick fix: make FusedLayerNorm compatible with cpu · d130ec1f
  Lam Dang authored Apr 10, 2019
  
  d130ec1f
- Kernel + sizes stress test · 1a48b26b
  Michael Carilli authored Apr 09, 2019
  
  1a48b26b
04 Apr, 2019 1 commit

WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f

mcarilli authored Apr 03, 2019

* Refactor to allow more flexible treatment of multiple optimizers/models/losses

* Adding _process_optimizers.py

* Created L0 tests (now passing).

* fix: minor print typo (#234)

* make L1 results easier to read

* L0 multiple model/optimizer/loss test fleshed out

* Adding test that master params remain synced across distributed processes

* Docstring updates

* Docstring updates

3f87614f

22 Mar, 2019 1 commit

Check cuda version (#216) · 5b8faa29

mcarilli authored Mar 21, 2019

* Adding Torch + bare-metal nvcc version check and container build tests

* Putting a canary in the coalmine

* canary proved elusive

* Trying direct setup.py install

* this should work

* Removing canary

* hopefully this works

5b8faa29

19 Mar, 2019 1 commit
- Multi-tensor axpby kernel for more flexible unscaling (groundwork for #163 and #179 fix) · 5e552004
  Michael Carilli authored Mar 18, 2019
  
  5e552004
13 Mar, 2019 1 commit
- Casting model output as well as input, for #195 · d1f74a3e
  Michael Carilli authored Mar 12, 2019
  
  d1f74a3e
12 Mar, 2019 1 commit
- Moving test_groups.py · d27a321a
  Michael Carilli authored Mar 12, 2019
  
  d27a321a
10 Mar, 2019 1 commit
- Removing deprecated scale_check_overflow kernel · 8f53411a
  Michael Carilli authored Mar 10, 2019
  
  8f53411a
08 Mar, 2019 3 commits
- Fix for #188 · a3a09c8c
  Michael Carilli authored Mar 08, 2019
  
  a3a09c8c
- Repr for import error · 371633d5
  Michael Carilli authored Mar 08, 2019
  
  371633d5
- Stashing to test on the cluster · 59e992da
  Michael Carilli authored Mar 08, 2019
  
  59e992da
07 Mar, 2019 1 commit
- Updating error checking on property overrides · 248d7b10
  Michael Carilli authored Mar 06, 2019
  
  248d7b10
02 Mar, 2019 1 commit
- some test cleanup · 484292f0
  Michael Carilli authored Mar 02, 2019
  
  484292f0
01 Mar, 2019 4 commits
- Cherry picking RNN fix · 2445031d
  Michael Carilli authored Feb 25, 2019
  
  2445031d
- Moving common test stuff to common · 612d4193
  Michael Carilli authored Feb 28, 2019
  
  612d4193
- Cleaning up FusedAdam testing · 7c82f221
  Michael Carilli authored Feb 28, 2019
  
  7c82f221
- Adding distributed tests and support for FusedAdam · d8b5d1be
  Michael Carilli authored Feb 28, 2019
  
  d8b5d1be
28 Feb, 2019 1 commit
- Comprehensive tests for cross product of options · d24c25b9
  Michael Carilli authored Feb 27, 2019
  
  d24c25b9
26 Feb, 2019 1 commit
- No need for casts during optimizer step · 613997ea
  Michael Carilli authored Feb 26, 2019
  
  613997ea
24 Feb, 2019 1 commit
- Stashing work · d137b800
  Michael Carilli authored Feb 24, 2019
  
  d137b800
22 Feb, 2019 1 commit
- Allow multi-tensor unscale to handle FP16 output, so it can also be used for... · 80a3f3ca
  Michael Carilli authored Feb 21, 2019
```
Allow multi-tensor unscale to handle FP16 output, so it can also be used for copy-scatter. Rename some options.
```
  80a3f3ca
19 Feb, 2019 1 commit
- Reworked multi tensor apply, added tests · 6763a8be
  Michael Carilli authored Feb 18, 2019
  
  6763a8be
16 Feb, 2019 3 commits
- moved process group creation into apex so it can be called by users · 37cd5dfd
  root authored Feb 16, 2019
  
  37cd5dfd
- fixing it to work properly in multi-node environment · e49dca6e
  root authored Feb 16, 2019
  
  e49dca6e
- adding test_groups.py to unit_test.sh · 598fbc88
  root authored Feb 16, 2019
  
  598fbc88
13 Feb, 2019 1 commit
- New API tentatively works on resnet50, ready for stress testing. · 889d1712
  Michael Carilli authored Feb 12, 2019
  
  889d1712
08 Feb, 2019 2 commits
- printout message update · f5725555
  Evgeni Krimer authored Feb 08, 2019
  
  f5725555
- a test and example for sync (group) bn with group_size<world_size · 18d412a6
  Evgeni Krimer authored Feb 08, 2019
  
  18d412a6
06 Feb, 2019 1 commit
- Tests for the fused downscale kernel · 340e71a4
  Michael Carilli authored Feb 05, 2019
  
  340e71a4
05 Feb, 2019 1 commit

Better FP16 support in pytorch fp16 utils. · 713e0fb8

Jerry Ma authored Feb 01, 2019

This commit adds an FP16Model class as a successor to network_to_half.

The benefits of this class are:

- Preservation of single-precision for BatchNorm layers. The models
  generated by network_to_half() convert BatchNorm moment tensors to
  half-precision, then back to single-precision, which hurts the
  accuracy of the moment estimators and occasionally results in NaNs.
- Support for multi-argument nn.Modules (self-explanatory from code).

713e0fb8

03 Feb, 2019 1 commit
- Lazy imports to reduce error spam · 48299b0d
  Michael Carilli authored Feb 02, 2019
  
  48299b0d
01 Feb, 2019 1 commit
- async->non_blocking, module-specific logging · cc85a2e5
  Michael Carilli authored Feb 01, 2019
  
  cc85a2e5
29 Jan, 2019 3 commits
- Update two_gpu_unit_test.py · 8b9ce244
  mcarilli authored Jan 28, 2019
  
  8b9ce244
- Update two_gpu_unit_test.py · d0624f4f
  mcarilli authored Jan 28, 2019
  
  d0624f4f
- adding comment to explain single process gradient averaging · c8d7c9f1
  jiej authored Jan 28, 2019
  
  c8d7c9f1
28 Jan, 2019 1 commit

[syncBN] · 63e47d29

jiej authored Jan 28, 2019

test update to resolve
  https://github.com/NVIDIA/apex/issues/134#issue-403525480

Using identical learning rate for both DDP with sync BN and single process BN.
The previous configure leaves the impression that sync BN requires adjusting lr
in the script, which is not true.

63e47d29

25 Jan, 2019 1 commit
- Adding tests, also, don't drop cache during eval. · dfd40f9a
  Michael Carilli authored Jan 24, 2019
  
  dfd40f9a
15 Jan, 2019 1 commit
- [sync BN nhwc] · 443fa76e
  Jie authored Jan 14, 2019
```
Added kernel to support sync BN for channel last tensor
```
  443fa76e
15 Dec, 2018 1 commit
- add unit tests for optimizers/fp16_optimizer · afc8d1b2
  Deyu Fu authored Dec 14, 2018
  
  afc8d1b2