"docs/vscode:/vscode.git/clone" did not exist on "e9777adbac856c0b8a0654a122567b0ce442e487"
  1. 23 Oct, 2018 1 commit
    • jjsjann123's avatar
      [syncBN] (#48) · 81eef1ef
      jjsjann123 authored
      * [syncBN]
        added syncBN in native pure python apex
        added fused cuda kernels used for sync BN. Using welford for mean/var
          optional installation using 'python setup.py install --cuda_ext'
        added unit test with side to side comparison between apex sync BN with
          PyTorch BN. Notice that for pytorch BN implementation, because of
          numerical issue for mean/var, the output will be slightly off.
      
      * [syncBN PR]
        added fp16 support
        addressing review comments on:
          1. updating last pow 2
          2. look for import error when importing syncBN kernel
      
      * [syncBN PR]
        added convert function to insert SyncBatchNorm
        refactored some kernel code
      
      * fixing type issue (fp16/fp32/fp64)
      added Kahan summation
      editing unit test to use pytorch primitive ops with double, passing reasonable tests now
      
      * updating tensor creation calls
      
      * fixing the all_reduce contiguous tensor
      
      * transposed all reduce results
      
      * [syncBN]
      support fp16 input & fp32 layer for apex fp16
      partially fixing launch configs
      enabling imagenet example to run with --sync_bn
      
      * [syncBN PR]
      Documentation added
      
      * adjusting README
      
      * adjusting again
      
      * added some doc to imagenet example
      
      * [syncBN]
        warp-level reduction
        bug fix: warp reduction logic updated. check for dummy element to avoid nan.
        improved launch config for better reduction kernels. Further improvements
      would be to increase grid size.
      
      * [syncBN]
        fixing undefined behavior in __shfl_down_sync from divergent threads in warp
      reduction.
        changing at::native::empty to at::empty (upstream comments)
      81eef1ef
  2. 10 Oct, 2018 2 commits
  3. 08 Oct, 2018 2 commits
  4. 07 Oct, 2018 2 commits
  5. 05 Oct, 2018 1 commit
  6. 03 Oct, 2018 1 commit
  7. 29 Sep, 2018 4 commits
  8. 19 Sep, 2018 1 commit
    • mcarilli's avatar
      Fix param freezing (#47) · 53e1b61a
      mcarilli authored
      * Fix appears to work in Tomasz's example.
      
      * Somehow shared_param got de-enabled again?
      53e1b61a
  9. 18 Sep, 2018 1 commit
  10. 17 Sep, 2018 1 commit
    • Christian Sarofeen's avatar
      Remove some fp16 examples that don't converge (#45) · 0ec8addb
      Christian Sarofeen authored
      * Remove some fp16 examples that don't converge
      
      Default static loss scale of 1.0 (default value) for resnet50 doesn't converge. Either remove example or put static loss scale 128 on it, which is known to converge well.
      
      * Update README.md
      0ec8addb
  11. 14 Sep, 2018 2 commits
  12. 13 Sep, 2018 1 commit
  13. 11 Sep, 2018 1 commit
  14. 10 Sep, 2018 4 commits
  15. 07 Sep, 2018 1 commit
  16. 06 Sep, 2018 2 commits
  17. 05 Sep, 2018 2 commits
  18. 30 Aug, 2018 2 commits
  19. 28 Aug, 2018 7 commits
  20. 27 Aug, 2018 2 commits