1. 12 May, 2020 2 commits
  2. 08 May, 2020 1 commit
  3. 07 May, 2020 2 commits
  4. 06 May, 2020 3 commits
  5. 05 May, 2020 1 commit
  6. 04 May, 2020 1 commit
  7. 01 May, 2020 1 commit
    • Deyu Fu's avatar
      Changes to make xentropysoftmax load/store vectorized when possible: (#725) · cf50dc7c
      Deyu Fu authored
      * Changes to make xentropysoftmax load/store vectorized when possible:
      Increase default ILP so that each thread handle 16 Bytes data in one step
      Make thread load/store longest vector possible
      Make unroll case handle adjacent data instead of strided, so same order compare to vector case
      
      * Add shift for not aligned case. Remove less than 16 bytes aligned access
      cf50dc7c
  8. 30 Apr, 2020 5 commits
  9. 29 Apr, 2020 5 commits
  10. 23 Apr, 2020 1 commit
  11. 22 Apr, 2020 2 commits
    • Deyu Fu's avatar
    • Vinicius Reis's avatar
      Fix LARC with mixed precision (#793) · 2ec84ebd
      Vinicius Reis authored
      The LARC optimizer wraps an underlying optimizer and then needs to be passed
      to amp.initialize for mixed precision. There were 3 different crashes happening
      in this situation, fix all of them and add a unit test.
      
      I don't know if the 'LARC' in sys.modules check ever worked. In my setup, the
      entry in sys.modules is 'apex.parallel.LARC'. Checking if the variable is
      defined seems more reliable though.
      2ec84ebd
  12. 20 Apr, 2020 3 commits
  13. 16 Apr, 2020 5 commits
  14. 15 Apr, 2020 2 commits
  15. 13 Apr, 2020 1 commit
  16. 10 Apr, 2020 2 commits
  17. 09 Apr, 2020 1 commit
  18. 08 Apr, 2020 1 commit
  19. 07 Apr, 2020 1 commit