1. 07 Jul, 2022 1 commit
  2. 07 Feb, 2022 1 commit
  3. 10 Oct, 2019 1 commit
  4. 27 Aug, 2019 2 commits
    • ptrblck's avatar
      Enable Checkpointing (#420) · dec4fdd6
      ptrblck authored
      * add state_dict, load_state_dict
      
      * add test_restoring, test_loss_scale_decrease
      
      * disable amp outputs for checkpoint tests
      
      * add test for amp.state_dict, cleanup
      
      * add state_dict patch, add test
      
      * fixed testing, cleanup
      
      * add readme for checkpointing
      
      * add docs to source/amp
      
      * add review changes to doc
      dec4fdd6
    • Michael Carilli's avatar
      Updating docstrings for fused optimizers · 427e82cd
      Michael Carilli authored
      427e82cd
  5. 27 Jun, 2019 1 commit
  6. 24 Jun, 2019 2 commits
  7. 04 Apr, 2019 1 commit
    • mcarilli's avatar
      WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f
      mcarilli authored
      * Refactor to allow more flexible treatment of multiple optimizers/models/losses
      
      * Adding _process_optimizers.py
      
      * Created L0 tests (now passing).
      
      * fix: minor print typo (#234)
      
      * make L1 results easier to read
      
      * L0 multiple model/optimizer/loss test fleshed out
      
      * Adding test that master params remain synced across distributed processes
      
      * Docstring updates
      
      * Docstring updates
      3f87614f
  8. 20 Mar, 2019 1 commit
  9. 13 Mar, 2019 1 commit
  10. 12 Mar, 2019 2 commits
  11. 11 Mar, 2019 1 commit
  12. 07 Mar, 2019 4 commits
  13. 06 Mar, 2019 1 commit
  14. 05 Mar, 2019 1 commit
  15. 28 Feb, 2019 1 commit
  16. 06 Feb, 2019 1 commit
  17. 12 Dec, 2018 1 commit
  18. 28 Nov, 2018 2 commits
  19. 30 Oct, 2018 1 commit
  20. 23 Oct, 2018 1 commit
    • jjsjann123's avatar
      [syncBN] (#48) · 81eef1ef
      jjsjann123 authored
      * [syncBN]
        added syncBN in native pure python apex
        added fused cuda kernels used for sync BN. Using welford for mean/var
          optional installation using 'python setup.py install --cuda_ext'
        added unit test with side to side comparison between apex sync BN with
          PyTorch BN. Notice that for pytorch BN implementation, because of
          numerical issue for mean/var, the output will be slightly off.
      
      * [syncBN PR]
        added fp16 support
        addressing review comments on:
          1. updating last pow 2
          2. look for import error when importing syncBN kernel
      
      * [syncBN PR]
        added convert function to insert SyncBatchNorm
        refactored some kernel code
      
      * fixing type issue (fp16/fp32/fp64)
      added Kahan summation
      editing unit test to use pytorch primitive ops with double, passing reasonable tests now
      
      * updating tensor creation calls
      
      * fixing the all_reduce contiguous tensor
      
      * transposed all reduce results
      
      * [syncBN]
      support fp16 input & fp32 layer for apex fp16
      partially fixing launch configs
      enabling imagenet example to run with --sync_bn
      
      * [syncBN PR]
      Documentation added
      
      * adjusting README
      
      * adjusting again
      
      * added some doc to imagenet example
      
      * [syncBN]
        warp-level reduction
        bug fix: warp reduction logic updated. check for dummy element to avoid nan.
        improved launch config for better reduction kernels. Further improvements
      would be to increase grid size.
      
      * [syncBN]
        fixing undefined behavior in __shfl_down_sync from divergent threads in warp
      reduction.
        changing at::native::empty to at::empty (upstream comments)
      81eef1ef
  21. 28 Aug, 2018 1 commit
  22. 20 Jun, 2018 1 commit
  23. 16 Jun, 2018 2 commits
  24. 15 Jun, 2018 2 commits
  25. 14 Jun, 2018 1 commit
  26. 08 May, 2018 1 commit
  27. 25 Apr, 2018 2 commits