1. 04 Apr, 2019 1 commit
    • mcarilli's avatar
      WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f
      mcarilli authored
      * Refactor to allow more flexible treatment of multiple optimizers/models/losses
      
      * Adding _process_optimizers.py
      
      * Created L0 tests (now passing).
      
      * fix: minor print typo (#234)
      
      * make L1 results easier to read
      
      * L0 multiple model/optimizer/loss test fleshed out
      
      * Adding test that master params remain synced across distributed processes
      
      * Docstring updates
      
      * Docstring updates
      3f87614f
  2. 22 Mar, 2019 1 commit
    • mcarilli's avatar
      Check cuda version (#216) · 5b8faa29
      mcarilli authored
      * Adding Torch + bare-metal nvcc version check and container build tests
      
      * Putting a canary in the coalmine
      
      * canary proved elusive
      
      * Trying direct setup.py install
      
      * this should work
      
      * Removing canary
      
      * hopefully this works
      5b8faa29
  3. 19 Mar, 2019 1 commit
  4. 13 Mar, 2019 1 commit
  5. 12 Mar, 2019 1 commit
  6. 10 Mar, 2019 1 commit
  7. 08 Mar, 2019 3 commits
  8. 07 Mar, 2019 1 commit
  9. 02 Mar, 2019 1 commit
  10. 01 Mar, 2019 4 commits
  11. 28 Feb, 2019 1 commit
  12. 26 Feb, 2019 1 commit
  13. 24 Feb, 2019 1 commit
  14. 22 Feb, 2019 1 commit
  15. 19 Feb, 2019 1 commit
  16. 16 Feb, 2019 3 commits
  17. 13 Feb, 2019 1 commit
  18. 08 Feb, 2019 2 commits
  19. 06 Feb, 2019 1 commit
  20. 05 Feb, 2019 1 commit
    • Jerry Ma's avatar
      Better FP16 support in pytorch fp16 utils. · 713e0fb8
      Jerry Ma authored
      This commit adds an FP16Model class as a successor to network_to_half.
      
      The benefits of this class are:
      
      - Preservation of single-precision for BatchNorm layers. The models
        generated by network_to_half() convert BatchNorm moment tensors to
        half-precision, then back to single-precision, which hurts the
        accuracy of the moment estimators and occasionally results in NaNs.
      - Support for multi-argument nn.Modules (self-explanatory from code).
      713e0fb8
  21. 03 Feb, 2019 1 commit
  22. 01 Feb, 2019 1 commit
  23. 29 Jan, 2019 3 commits
  24. 28 Jan, 2019 1 commit
  25. 25 Jan, 2019 1 commit
  26. 15 Jan, 2019 1 commit
    • Jie's avatar
      [sync BN nhwc] · 443fa76e
      Jie authored
      Added kernel to support sync BN for channel last tensor
      443fa76e
  27. 15 Dec, 2018 1 commit
  28. 01 Nov, 2018 1 commit
  29. 30 Oct, 2018 1 commit
    • ngimel's avatar
      Adam tests (#67) · d594826c
      ngimel authored
      * Add unittest for FusedAdam.
      
      * Fix some bugs.
      
      * set seed for adam test
      d594826c
  30. 29 Oct, 2018 1 commit
    • mcarilli's avatar
      Merging in fused adam optimizer, additional DDP features tested in 18.10 (#60) · e0bc5d62
      mcarilli authored
      * test passes
      
      * notes
      
      * Using C++-side flatten and unflatten functions
      
      * Adding csrc
      
      * Persistent synchronization event so it doesn't need to be created and destroyed each time
      
      * Interop with parameter flattening in SSD
      
      * Added deterministic option to imagenet main.py
      
      * Adding options to split gradient averaging and allreduce in pure fp32
      
      * Fixing allreduce_maybe_retain call
      
      * Fixing allreduce_fallback
      
      * Also sync active_i_buckets from rank 0
      
      * Making retain_allreduce_buffers compatible with/orthogonal to delay_allreduce=True|False
      
      * Correcting syntax error, now all seems to work with SSD
      
      * Optional cpp extension build
      
      * Add mixed precision adam optimizer (#59)
      
      * Add FusedAdam Optimizer to Apex that places all the math into a cuda kernel.
      
      * Added fixes to fused_adam to get it to work with network.
      
      * wip work on python interface for adam with options
      
      * fix dispatch for halfs, add python options to handle optional half gradients and params
      
      * cleanup, get rid of grid-stride loop
      e0bc5d62