- 26 Feb, 2019 1 commit
-
-
Michael Carilli authored
-
- 24 Feb, 2019 1 commit
-
-
Michael Carilli authored
-
- 22 Feb, 2019 1 commit
-
-
Michael Carilli authored
Allow multi-tensor unscale to handle FP16 output, so it can also be used for copy-scatter. Rename some options.
-
- 19 Feb, 2019 1 commit
-
-
Michael Carilli authored
-
- 13 Feb, 2019 1 commit
-
-
Michael Carilli authored
-
- 06 Feb, 2019 1 commit
-
-
Michael Carilli authored
-
- 05 Feb, 2019 1 commit
-
-
Jerry Ma authored
This commit adds an FP16Model class as a successor to network_to_half. The benefits of this class are: - Preservation of single-precision for BatchNorm layers. The models generated by network_to_half() convert BatchNorm moment tensors to half-precision, then back to single-precision, which hurts the accuracy of the moment estimators and occasionally results in NaNs. - Support for multi-argument nn.Modules (self-explanatory from code).
-
- 03 Feb, 2019 1 commit
-
-
Michael Carilli authored
-
- 01 Feb, 2019 1 commit
-
-
Michael Carilli authored
-
- 29 Jan, 2019 3 commits
- 28 Jan, 2019 1 commit
-
-
jiej authored
test update to resolve https://github.com/NVIDIA/apex/issues/134#issue-403525480 Using identical learning rate for both DDP with sync BN and single process BN. The previous configure leaves the impression that sync BN requires adjusting lr in the script, which is not true.
-
- 25 Jan, 2019 1 commit
-
-
Michael Carilli authored
-
- 15 Jan, 2019 1 commit
-
-
Jie authored
Added kernel to support sync BN for channel last tensor
-
- 15 Dec, 2018 1 commit
-
-
Deyu Fu authored
-
- 01 Nov, 2018 1 commit
-
-
Michael Carilli authored
-
- 30 Oct, 2018 1 commit
-
-
ngimel authored
* Add unittest for FusedAdam. * Fix some bugs. * set seed for adam test
-
- 29 Oct, 2018 1 commit
-
-
mcarilli authored
* test passes * notes * Using C++-side flatten and unflatten functions * Adding csrc * Persistent synchronization event so it doesn't need to be created and destroyed each time * Interop with parameter flattening in SSD * Added deterministic option to imagenet main.py * Adding options to split gradient averaging and allreduce in pure fp32 * Fixing allreduce_maybe_retain call * Fixing allreduce_fallback * Also sync active_i_buckets from rank 0 * Making retain_allreduce_buffers compatible with/orthogonal to delay_allreduce=True|False * Correcting syntax error, now all seems to work with SSD * Optional cpp extension build * Add mixed precision adam optimizer (#59) * Add FusedAdam Optimizer to Apex that places all the math into a cuda kernel. * Added fixes to fused_adam to get it to work with network. * wip work on python interface for adam with options * fix dispatch for halfs, add python options to handle optional half gradients and params * cleanup, get rid of grid-stride loop
-
- 23 Oct, 2018 1 commit
-
-
jjsjann123 authored
* [syncBN] added syncBN in native pure python apex added fused cuda kernels used for sync BN. Using welford for mean/var optional installation using 'python setup.py install --cuda_ext' added unit test with side to side comparison between apex sync BN with PyTorch BN. Notice that for pytorch BN implementation, because of numerical issue for mean/var, the output will be slightly off. * [syncBN PR] added fp16 support addressing review comments on: 1. updating last pow 2 2. look for import error when importing syncBN kernel * [syncBN PR] added convert function to insert SyncBatchNorm refactored some kernel code * fixing type issue (fp16/fp32/fp64) added Kahan summation editing unit test to use pytorch primitive ops with double, passing reasonable tests now * updating tensor creation calls * fixing the all_reduce contiguous tensor * transposed all reduce results * [syncBN] support fp16 input & fp32 layer for apex fp16 partially fixing launch configs enabling imagenet example to run with --sync_bn * [syncBN PR] Documentation added * adjusting README * adjusting again * added some doc to imagenet example * [syncBN] warp-level reduction bug fix: warp reduction logic updated. check for dummy element to avoid nan. improved launch config for better reduction kernels. Further improvements would be to increase grid size. * [syncBN] fixing undefined behavior in __shfl_down_sync from divergent threads in warp reduction. changing at::native::empty to at::empty (upstream comments)
-
- 29 Sep, 2018 2 commits
-
-
Michael Carilli authored
-
mcarilli authored
* beautiful * IT'S WORKING * Hopefully fix race condition for fallback hook * Updating test * shared_param -> delayed_allreduce * Adding a safety check * One more check * syntax...
-
- 13 Sep, 2018 1 commit
-
-
Michael Carilli authored
-
- 23 Jul, 2018 1 commit
-
-
Michael Carilli authored
-
- 06 Jun, 2018 1 commit
-
-
Michael Carilli authored
-
- 26 May, 2018 1 commit
-
-
Michael Carilli authored
-
- 14 May, 2018 1 commit
-
-
Michael Carilli authored
-
- 07 May, 2018 1 commit
-
-
Christian Sarofeen authored
-
- 25 Apr, 2018 2 commits
-
-
Michael Carilli authored
-
Christian Sarofeen authored
-