1. 21 May, 2020 1 commit
  2. 20 May, 2020 1 commit
  3. 14 May, 2020 1 commit
  4. 12 May, 2020 2 commits
  5. 07 May, 2020 2 commits
    • Chaitanya Sri Krishna Lolla's avatar
      2d0f9cf2
    • Chaitanya Sri Krishna Lolla's avatar
      [Upstream] IFU 05072020 (#4) · e85a1d4b
      Chaitanya Sri Krishna Lolla authored
      
      
      * fix dropout scaling from p to 1/(1-p) (#816)
      Co-authored-by: default avatarSukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
      
      * Improvements to apex.mlp (#804)
      
      * update fused bias relu backward kernel
      
      * adding support for not require first layer dgrad
      
      * fix bug: wrong layer in requires grad
      
      * add infrastructure for optional bias and activation, currently only support no bias and no relu
      
      * make bias and relu optional separately
      
      * add sigmoid activation option
      
      * enable wider load/store for multi_tensor_apply kernels (#763)
      
      * modify MTA axpby for wider load/store
      
      * Make scale/axpby/l2/adam/lamb multi_tensor uses wider load
      
      * Changes to make xentropysoftmax load/store vectorized when possible: (#725)
      
      * Changes to make xentropysoftmax load/store vectorized when possible:
      Increase default ILP so that each thread handle 16 Bytes data in one step
      Make thread load/store longest vector possible
      Make unroll case handle adjacent data instead of strided, so same order compare to vector case
      
      * Add shift for not aligned case. Remove less than 16 bytes aligned access
      Co-authored-by: default avatarBurc Eryilmaz <sberyilm@gmail.com>
      Co-authored-by: default avatarSukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
      Co-authored-by: default avatarDeyu Fu <deyuf@nvidia.com>
      e85a1d4b
  6. 30 Apr, 2020 2 commits
    • Deyu Fu's avatar
      enable wider load/store for multi_tensor_apply kernels (#763) · 17ee854e
      Deyu Fu authored
      * modify MTA axpby for wider load/store
      
      * Make scale/axpby/l2/adam/lamb multi_tensor uses wider load
      17ee854e
    • Deyu Fu's avatar
      Improvements to apex.mlp (#804) · 31aceeaa
      Deyu Fu authored
      * update fused bias relu backward kernel
      
      * adding support for not require first layer dgrad
      
      * fix bug: wrong layer in requires grad
      
      * add infrastructure for optional bias and activation, currently only support no bias and no relu
      
      * make bias and relu optional separately
      
      * add sigmoid activation option
      31aceeaa
  7. 28 Apr, 2020 1 commit
  8. 22 Apr, 2020 1 commit
  9. 10 Apr, 2020 1 commit
  10. 27 Feb, 2020 1 commit
  11. 04 Oct, 2019 1 commit
  12. 06 Sep, 2019 1 commit
    • mcarilli's avatar
      Fix for #456 (#477) · 325f5a0b
      mcarilli authored
      * Pushing for build tests
      
      * Contrib files
      
      * Removing deprecated checks
      325f5a0b
  13. 20 Aug, 2019 1 commit
  14. 17 Aug, 2019 1 commit
  15. 16 Aug, 2019 2 commits
    • Deyu Fu's avatar
      clean up variance options support by all fused optimizers: · 18062b69
      Deyu Fu authored
      correctly not apply bias correction to epsilon(same as recent upstream change)
      correctly not apply bias correction to weight decay(consistent with upstream AdamW)
      Make adam_w_mode for FusedAdam/LAMB, to do L2 or Weight Decay (Adam vs AdamW)
      Correct document reg_inside_moment differently from adam_w_mode in FusedNovoGrad
      Removed legacy eps_mode from FusedAdam
      Make internal math type float across fused optimizers
      18062b69
    • Deyu Fu's avatar
      add fused lamb, put lamb kernels into one file · c8f9cceb
      Deyu Fu authored
      c8f9cceb
  16. 08 Aug, 2019 1 commit
  17. 06 Aug, 2019 1 commit
    • ngimel's avatar
      Clean up layer norm tests (#418) · 3ef01fae
      ngimel authored
      * Bug fix for non-affine layer-norm + add backward unit test
      
      * clean up tests and add tests for a large batch
      3ef01fae
  18. 01 Aug, 2019 1 commit
  19. 26 Jul, 2019 1 commit
  20. 12 Jul, 2019 1 commit
  21. 03 Jul, 2019 4 commits
  22. 28 Jun, 2019 1 commit
  23. 14 Jun, 2019 1 commit
  24. 11 Jun, 2019 1 commit
  25. 31 May, 2019 2 commits
  26. 27 May, 2019 1 commit
  27. 10 May, 2019 1 commit
  28. 03 May, 2019 1 commit
  29. 27 Apr, 2019 1 commit
    • jjsjann123's avatar
      Bnp integration pr (#275) · fedfe0d7
      jjsjann123 authored
      * Persistent group batchnorm added
      
      Added persistent grouped batch norm for performance run on strong scaling case:
      currently only supporting:
      
        1. nhwc layout
        2. fp16
        3. synchronization only within a node!
      
      Environment variable is used to tune LAUNCH_MARGIN that limits the CTAs usage
      by the persistent kernel.
      
      Documentation and examples will follow.
      
      * updating type().scalarType() to scalar_type()
      
      * moving launch margin to be defined at layer creation, adding a knob cap max ctas per sm
      
      * fixing the cta computation
      
      * review comment:
      
      set device_id through cudaGetDevice()
      move cudaMemset to cudaMemsetAsync
      updated __threadfence() to __threadfence_system() inter device write
      fedfe0d7
  30. 26 Apr, 2019 3 commits
    • Michael Carilli's avatar
    • Michael Carilli's avatar
      whitespace · c978bda5
      Michael Carilli authored
      c978bda5
    • ptrblck's avatar
      Replace type().ScalarType() with scalar_type() (#272) · 855808f3
      ptrblck authored
      * change .type().ScalarType() to .scalar_type() + at::ScalarType::X to at::kX
      
      * revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF
      
      * revert scalar_type() to type() in AT_DISPATCH_FLOATING_TYPES
      
      * revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF in welford.cu
      
      * revert scalar_type() to type() in layer_norm_cuda_kernel.cu
      
      * revert at::kType  to at::ScalarType::Type
      
      * use DISPATCH_FLOAT_AND_HALF to get rid of warnings
      
      * add dispatch mechanisms for double+float and double+float+half
      855808f3