1. 19 Nov, 2021 1 commit
  2. 25 Jan, 2021 1 commit
    • Jeff Daily's avatar
      fix bugs in syncbn (#46) · 3f49dbf0
      Jeff Daily authored
      - incorrect use of __shfl_down
      - fix warp size assumptions
      - update unit tests to exit on failure
      3f49dbf0
  3. 21 Jan, 2021 1 commit
  4. 18 Jan, 2021 1 commit
  5. 15 Jan, 2021 1 commit
  6. 31 Dec, 2020 2 commits
  7. 01 Dec, 2020 1 commit
  8. 04 Nov, 2020 1 commit
  9. 05 Aug, 2020 2 commits
  10. 31 Jul, 2020 1 commit
  11. 10 Jul, 2020 1 commit
  12. 07 Jul, 2020 1 commit
  13. 06 Jul, 2020 1 commit
    • jjsjann123's avatar
      [sync BN] (#792) · 1ff54b8f
      jjsjann123 authored
      * [sync BN]
      
      support non-uniform batch size across process group.
      
      TODO: test should be added once cleaned up.
      
      * updating unit tests
      
      * new unit tests for different inputs
      
      * cleaning
      1ff54b8f
  14. 23 Jun, 2020 3 commits
  15. 03 Jun, 2020 1 commit
  16. 26 May, 2020 1 commit
  17. 21 May, 2020 2 commits
  18. 20 May, 2020 2 commits
  19. 19 May, 2020 4 commits
  20. 15 May, 2020 2 commits
  21. 14 May, 2020 1 commit
  22. 13 May, 2020 1 commit
  23. 07 May, 2020 1 commit
    • Chaitanya Sri Krishna Lolla's avatar
      [Upstream] IFU 05072020 (#4) · e85a1d4b
      Chaitanya Sri Krishna Lolla authored
      
      
      * fix dropout scaling from p to 1/(1-p) (#816)
      Co-authored-by: default avatarSukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
      
      * Improvements to apex.mlp (#804)
      
      * update fused bias relu backward kernel
      
      * adding support for not require first layer dgrad
      
      * fix bug: wrong layer in requires grad
      
      * add infrastructure for optional bias and activation, currently only support no bias and no relu
      
      * make bias and relu optional separately
      
      * add sigmoid activation option
      
      * enable wider load/store for multi_tensor_apply kernels (#763)
      
      * modify MTA axpby for wider load/store
      
      * Make scale/axpby/l2/adam/lamb multi_tensor uses wider load
      
      * Changes to make xentropysoftmax load/store vectorized when possible: (#725)
      
      * Changes to make xentropysoftmax load/store vectorized when possible:
      Increase default ILP so that each thread handle 16 Bytes data in one step
      Make thread load/store longest vector possible
      Make unroll case handle adjacent data instead of strided, so same order compare to vector case
      
      * Add shift for not aligned case. Remove less than 16 bytes aligned access
      Co-authored-by: default avatarBurc Eryilmaz <sberyilm@gmail.com>
      Co-authored-by: default avatarSukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
      Co-authored-by: default avatarDeyu Fu <deyuf@nvidia.com>
      e85a1d4b
  24. 30 Apr, 2020 1 commit
    • Deyu Fu's avatar
      Improvements to apex.mlp (#804) · 31aceeaa
      Deyu Fu authored
      * update fused bias relu backward kernel
      
      * adding support for not require first layer dgrad
      
      * fix bug: wrong layer in requires grad
      
      * add infrastructure for optional bias and activation, currently only support no bias and no relu
      
      * make bias and relu optional separately
      
      * add sigmoid activation option
      31aceeaa
  25. 22 Apr, 2020 2 commits
    • Deyu Fu's avatar
    • Vinicius Reis's avatar
      Fix LARC with mixed precision (#793) · 2ec84ebd
      Vinicius Reis authored
      The LARC optimizer wraps an underlying optimizer and then needs to be passed
      to amp.initialize for mixed precision. There were 3 different crashes happening
      in this situation, fix all of them and add a unit test.
      
      I don't know if the 'LARC' in sys.modules check ever worked. In my setup, the
      entry in sys.modules is 'apex.parallel.LARC'. Checking if the variable is
      defined seems more reliable though.
      2ec84ebd
  26. 31 Mar, 2020 1 commit
  27. 27 Feb, 2020 1 commit
  28. 06 Nov, 2019 1 commit
  29. 03 Oct, 2019 1 commit