1. 14 Dec, 2021 1 commit
  2. 22 Nov, 2021 1 commit
  3. 27 Oct, 2021 1 commit
  4. 26 Oct, 2021 1 commit
  5. 20 Oct, 2021 1 commit
  6. 19 Oct, 2021 1 commit
  7. 08 Oct, 2021 1 commit
  8. 06 Oct, 2021 1 commit
  9. 02 Oct, 2021 1 commit
  10. 15 Apr, 2021 1 commit
    • Sudhakar Singh's avatar
      Add unit tests for Fused NovoGrad (#1065) · 59d2f7ac
      Sudhakar Singh authored
      * Add unit tests for fused-novograd
      
      * Fix: tensors should reside on the same device
      
      * Fix: Cudastream should be called on the same device on which the tensors reside on. Found this during debugging fused novograd multi-device unit test
      
      * fixed issues mentioned in the comments
      59d2f7ac
  11. 21 Jan, 2021 1 commit
  12. 18 Jan, 2021 1 commit
  13. 15 Jan, 2021 1 commit
  14. 31 Dec, 2020 2 commits
  15. 01 Dec, 2020 1 commit
  16. 04 Nov, 2020 1 commit
  17. 05 Aug, 2020 2 commits
  18. 07 Jul, 2020 1 commit
  19. 23 Jun, 2020 3 commits
  20. 26 May, 2020 1 commit
  21. 21 May, 2020 2 commits
  22. 20 May, 2020 2 commits
  23. 19 May, 2020 4 commits
  24. 15 May, 2020 2 commits
  25. 14 May, 2020 1 commit
  26. 13 May, 2020 1 commit
  27. 07 May, 2020 1 commit
    • Chaitanya Sri Krishna Lolla's avatar
      [Upstream] IFU 05072020 (#4) · e85a1d4b
      Chaitanya Sri Krishna Lolla authored
      
      
      * fix dropout scaling from p to 1/(1-p) (#816)
      Co-authored-by: default avatarSukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
      
      * Improvements to apex.mlp (#804)
      
      * update fused bias relu backward kernel
      
      * adding support for not require first layer dgrad
      
      * fix bug: wrong layer in requires grad
      
      * add infrastructure for optional bias and activation, currently only support no bias and no relu
      
      * make bias and relu optional separately
      
      * add sigmoid activation option
      
      * enable wider load/store for multi_tensor_apply kernels (#763)
      
      * modify MTA axpby for wider load/store
      
      * Make scale/axpby/l2/adam/lamb multi_tensor uses wider load
      
      * Changes to make xentropysoftmax load/store vectorized when possible: (#725)
      
      * Changes to make xentropysoftmax load/store vectorized when possible:
      Increase default ILP so that each thread handle 16 Bytes data in one step
      Make thread load/store longest vector possible
      Make unroll case handle adjacent data instead of strided, so same order compare to vector case
      
      * Add shift for not aligned case. Remove less than 16 bytes aligned access
      Co-authored-by: default avatarBurc Eryilmaz <sberyilm@gmail.com>
      Co-authored-by: default avatarSukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
      Co-authored-by: default avatarDeyu Fu <deyuf@nvidia.com>
      e85a1d4b
  28. 30 Apr, 2020 1 commit
    • Deyu Fu's avatar
      Improvements to apex.mlp (#804) · 31aceeaa
      Deyu Fu authored
      * update fused bias relu backward kernel
      
      * adding support for not require first layer dgrad
      
      * fix bug: wrong layer in requires grad
      
      * add infrastructure for optional bias and activation, currently only support no bias and no relu
      
      * make bias and relu optional separately
      
      * add sigmoid activation option
      31aceeaa
  29. 22 Apr, 2020 2 commits
    • Deyu Fu's avatar
    • Vinicius Reis's avatar
      Fix LARC with mixed precision (#793) · 2ec84ebd
      Vinicius Reis authored
      The LARC optimizer wraps an underlying optimizer and then needs to be passed
      to amp.initialize for mixed precision. There were 3 different crashes happening
      in this situation, fix all of them and add a unit test.
      
      I don't know if the 'LARC' in sys.modules check ever worked. In my setup, the
      entry in sys.modules is 'apex.parallel.LARC'. Checking if the variable is
      defined seems more reliable though.
      2ec84ebd