1. 25 Jan, 2021 1 commit
    • Jeff Daily's avatar
      fix bugs in syncbn (#46) · 3f49dbf0
      Jeff Daily authored
      - incorrect use of __shfl_down
      - fix warp size assumptions
      - update unit tests to exit on failure
      3f49dbf0
  2. 21 Jan, 2021 1 commit
  3. 18 Jan, 2021 1 commit
  4. 15 Jan, 2021 1 commit
  5. 04 Nov, 2020 1 commit
  6. 19 Oct, 2020 1 commit
    • lly-zero-one's avatar
      Optimize the sync batchnorm by batching the communication (#980) · 8a1ed9e8
      lly-zero-one authored
      In this PR, we mainly tried to optimize the performance of Syncatchnorm and also fixed one potential issue in the welford_parallel kernel implementation.
      
      For performance improvement, we batched the mean/var/count all_gather communication together and sent it once in the forward path
      We also batch the all_reduce in backward path
      We add the contiguous call on the input of welford_parallel kernel.
      If there is any standard perf benchmark, I would be happy to run it.
      8a1ed9e8
  7. 05 Aug, 2020 2 commits
  8. 10 Jul, 2020 1 commit
  9. 06 Jul, 2020 1 commit
    • jjsjann123's avatar
      [sync BN] (#792) · 1ff54b8f
      jjsjann123 authored
      * [sync BN]
      
      support non-uniform batch size across process group.
      
      TODO: test should be added once cleaned up.
      
      * updating unit tests
      
      * new unit tests for different inputs
      
      * cleaning
      1ff54b8f
  10. 22 Jun, 2020 1 commit
  11. 15 Jun, 2020 1 commit
  12. 26 May, 2020 1 commit
  13. 23 May, 2020 1 commit
  14. 22 May, 2020 5 commits
  15. 21 May, 2020 2 commits
  16. 20 May, 2020 1 commit
  17. 14 May, 2020 1 commit
  18. 12 May, 2020 2 commits
  19. 07 May, 2020 2 commits
    • Chaitanya Sri Krishna Lolla's avatar
      2d0f9cf2
    • Chaitanya Sri Krishna Lolla's avatar
      [Upstream] IFU 05072020 (#4) · e85a1d4b
      Chaitanya Sri Krishna Lolla authored
      
      
      * fix dropout scaling from p to 1/(1-p) (#816)
      Co-authored-by: default avatarSukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
      
      * Improvements to apex.mlp (#804)
      
      * update fused bias relu backward kernel
      
      * adding support for not require first layer dgrad
      
      * fix bug: wrong layer in requires grad
      
      * add infrastructure for optional bias and activation, currently only support no bias and no relu
      
      * make bias and relu optional separately
      
      * add sigmoid activation option
      
      * enable wider load/store for multi_tensor_apply kernels (#763)
      
      * modify MTA axpby for wider load/store
      
      * Make scale/axpby/l2/adam/lamb multi_tensor uses wider load
      
      * Changes to make xentropysoftmax load/store vectorized when possible: (#725)
      
      * Changes to make xentropysoftmax load/store vectorized when possible:
      Increase default ILP so that each thread handle 16 Bytes data in one step
      Make thread load/store longest vector possible
      Make unroll case handle adjacent data instead of strided, so same order compare to vector case
      
      * Add shift for not aligned case. Remove less than 16 bytes aligned access
      Co-authored-by: default avatarBurc Eryilmaz <sberyilm@gmail.com>
      Co-authored-by: default avatarSukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
      Co-authored-by: default avatarDeyu Fu <deyuf@nvidia.com>
      e85a1d4b
  20. 30 Apr, 2020 3 commits
  21. 28 Apr, 2020 2 commits
  22. 22 Apr, 2020 1 commit
  23. 10 Apr, 2020 1 commit
  24. 27 Feb, 2020 1 commit
  25. 04 Oct, 2019 1 commit
  26. 06 Sep, 2019 1 commit
    • mcarilli's avatar
      Fix for #456 (#477) · 325f5a0b
      mcarilli authored
      * Pushing for build tests
      
      * Contrib files
      
      * Removing deprecated checks
      325f5a0b
  27. 20 Aug, 2019 1 commit
  28. 17 Aug, 2019 1 commit
  29. 16 Aug, 2019 1 commit
    • Deyu Fu's avatar
      clean up variance options support by all fused optimizers: · 18062b69
      Deyu Fu authored
      correctly not apply bias correction to epsilon(same as recent upstream change)
      correctly not apply bias correction to weight decay(consistent with upstream AdamW)
      Make adam_w_mode for FusedAdam/LAMB, to do L2 or Weight Decay (Adam vs AdamW)
      Correct document reg_inside_moment differently from adam_w_mode in FusedNovoGrad
      Removed legacy eps_mode from FusedAdam
      Make internal math type float across fused optimizers
      18062b69