Commits · 37cd5dfdf59a9f984863cc99ba7b9c3c4df8a618 · OpenDAS / apex

16 Feb, 2019 2 commits
- moved process group creation into apex so it can be called by users · 37cd5dfd
  root authored Feb 16, 2019
  
  37cd5dfd
- fixing it to work properly in multi-node environment · e49dca6e
  root authored Feb 16, 2019
  
  e49dca6e
08 Feb, 2019 2 commits
- printout message update · f5725555
  Evgeni Krimer authored Feb 08, 2019
  
  f5725555
- a test and example for sync (group) bn with group_size<world_size · 18d412a6
  Evgeni Krimer authored Feb 08, 2019
  
  18d412a6
29 Jan, 2019 3 commits
- Update two_gpu_unit_test.py · 8b9ce244
  mcarilli authored Jan 28, 2019
  
  8b9ce244
- Update two_gpu_unit_test.py · d0624f4f
  mcarilli authored Jan 28, 2019
  
  d0624f4f
- adding comment to explain single process gradient averaging · c8d7c9f1
  jiej authored Jan 28, 2019
  
  c8d7c9f1
28 Jan, 2019 1 commit

[syncBN] · 63e47d29

jiej authored Jan 28, 2019

test update to resolve
  https://github.com/NVIDIA/apex/issues/134#issue-403525480

Using identical learning rate for both DDP with sync BN and single process BN.
The previous configure leaves the impression that sync BN requires adjusting lr
in the script, which is not true.

63e47d29

15 Jan, 2019 1 commit
- [sync BN nhwc] · 443fa76e
  Jie authored Jan 14, 2019
```
Added kernel to support sync BN for channel last tensor
```
  443fa76e
23 Oct, 2018 1 commit

[syncBN] (#48) · 81eef1ef

jjsjann123 authored Oct 23, 2018

* [syncBN]
  added syncBN in native pure python apex
  added fused cuda kernels used for sync BN. Using welford for mean/var
    optional installation using 'python setup.py install --cuda_ext'
  added unit test with side to side comparison between apex sync BN with
    PyTorch BN. Notice that for pytorch BN implementation, because of
    numerical issue for mean/var, the output will be slightly off.

* [syncBN PR]
  added fp16 support
  addressing review comments on:
    1. updating last pow 2
    2. look for import error when importing syncBN kernel

* [syncBN PR]
  added convert function to insert SyncBatchNorm
  refactored some kernel code

* fixing type issue (fp16/fp32/fp64)
added Kahan summation
editing unit test to use pytorch primitive ops with double, passing reasonable tests now

* updating tensor creation calls

* fixing the all_reduce contiguous tensor

* transposed all reduce results

* [syncBN]
support fp16 input & fp32 layer for apex fp16
partially fixing launch configs
enabling imagenet example to run with --sync_bn

* [syncBN PR]
Documentation added

* adjusting README

* adjusting again

* added some doc to imagenet example

* [syncBN]
  warp-level reduction
  bug fix: warp reduction logic updated. check for dummy element to avoid nan.
  improved launch config for better reduction kernels. Further improvements
would be to increase grid size.

* [syncBN]
  fixing undefined behavior in __shfl_down_sync from divergent threads in warp
reduction.
  changing at::native::empty to at::empty (upstream comments)

81eef1ef