1. 19 Sep, 2023 1 commit
  2. 18 Sep, 2023 1 commit
  3. 06 Sep, 2023 1 commit
  4. 11 Aug, 2023 1 commit
  5. 12 Jun, 2023 1 commit
    • flyingdown's avatar
      1.修改了readme · f8b650c8
      flyingdown authored
      2.添加环境变量APEX_ROCBLAS_GEMM_ALLOW_HALF用于控制是否使用fp16r
      3.添加dcu版本信息
      
      whl包名修改
      
      readme更新安装步骤
      f8b650c8
  6. 23 Apr, 2023 5 commits
  7. 30 Mar, 2023 1 commit
  8. 01 Mar, 2023 1 commit
  9. 13 Feb, 2023 1 commit
    • luise.chen's avatar
      Luise/gbn optimization (#105) · 56c283b6
      luise.chen authored
      * GroupBN: Reduced buffering for better hiding calculations in some loops of length OUTER_LOOPS
      
      * GroupBN: Use C_ELEMENTS_PER_CTA=64 for BN and BN_relu kernels for improvement of resnet50
      
      * GroupBN: Use C_ELEMENTS_PER_CTA=64 for BN_add_relu kernels for ~10% E2E improvement of resnet50
      56c283b6
  10. 20 Dec, 2022 1 commit
  11. 09 Dec, 2022 1 commit
  12. 14 Nov, 2022 1 commit
  13. 08 Nov, 2022 1 commit
  14. 21 Sep, 2022 1 commit
  15. 08 Sep, 2022 1 commit
    • Hubert Lu's avatar
      Enable --transducer extension for ROCm (#88) · ae5ca671
      Hubert Lu authored
      * Enable --transducer extension for ROCm
      
      * Enable --transducer unit tests for ROCm
      
      * Skip some failing tests in test_transducer_joint.py
      
      * Skip test_transducer_joint_pack for transducer extension
      
      * Keep transducer extension CUDA-compatible
      ae5ca671
  16. 23 Aug, 2022 2 commits
  17. 22 Aug, 2022 2 commits
  18. 08 Aug, 2022 1 commit
  19. 29 Jul, 2022 1 commit
  20. 26 Jul, 2022 1 commit
    • Tim Moon's avatar
      Improvements in distributed Adam optimizer for Megatron (#1432) · 2e025ab5
      Tim Moon authored
      * Improvements in distributed Adam optimizer for Megatron
      
      Add option to allocate gradient buckets out of one large buffer. Add option to initialize params in user-provided order. Perform communication when saving optimizer state. Support param sync with any dtype.
      
      * Style fixes in distributed Adam helper classes
      
      Review suggestions from @crcrpar
      2e025ab5
  21. 21 Jul, 2022 1 commit
  22. 14 Jul, 2022 1 commit
  23. 05 Jul, 2022 1 commit
    • Tim Moon's avatar
      Add features to distributed Adam for Megatron support (#1414) · cd499737
      Tim Moon authored
      * Add features to distributed Adam for Megatron support
      
      Support gradient clipping, gradient scaling, FP32 grad accumulation, and multiple dtypes and devices.
      
      * Restore closure arg to distributed Adam
      
      Review suggestion from @crcrpar
      cd499737
  24. 23 Jun, 2022 1 commit
    • Tim Moon's avatar
      Move distributed Adam unit test to contrib dir (#1406) · 57f890a7
      Tim Moon authored
      * Increase default bucket size in distributed Adam
      
      * Move distributed Adam unit test to contrib tests
      
      Integrate into unit testing framework
      
      * Tweak hyperparameters for dist Adam optimizer test
      
      Improves numerical stability so we can keep tight tolerances. Adopting suggestions from @crcrpar.
      
      * Use distributed test infrastructure in distributed Adam unit test
      
      Suggestion from @crcrpar.
      57f890a7
  25. 22 Jun, 2022 1 commit
    • Tim Moon's avatar
      Gradient clipping with fused kernels (#1405) · dcb02fcf
      Tim Moon authored
      * Gradient clipping routine with fused kernels
      
      Identical API as PyTorch. Falls back to PyTorch impl when not computing L2 norm.
      
      * Add unit test for gradient clipping
      
      * Add fp16 case to gradient clipping unit test
      
      * Tweaks to grad clipping unit test
      
      Review suggestions from @crcrpar
      
      * Debug gradient clipping tests
      
      When checking that incorrect results produce assertion errors, make sure to generate a discrepancy outside the range of numerical error.
      dcb02fcf
  26. 16 Jun, 2022 1 commit
  27. 14 Jun, 2022 1 commit
  28. 13 Jun, 2022 1 commit
  29. 31 May, 2022 1 commit
  30. 29 Apr, 2022 1 commit
  31. 21 Apr, 2022 1 commit
  32. 19 Apr, 2022 1 commit
  33. 14 Apr, 2022 2 commits
    • mahathis's avatar
      Added support for memory format API(torch.channels_last) in GBN (#72) · dd584a59
      mahathis authored
      
      
      * Added suuport for memory format API(torch.channels_last) in GBN
      
      Group Batch Norm (GBN) is an NHWC operation.  It assumes that the
      underlying memory format of an input tensor is NHWC.  It originally does
      not support PyTorch's memory_format API.
      
      To support PyTorch's memory_format API, i.e., .to(memory_format=...) or
      .contiguous(memory_format=...), we add the torch_channels_last
      flag to indicate whether the workload adopts the PyTorch memory_format
      API by setting memory_format=torch.channels_last.  This flag allows GBN
      to handle memory formats of input tensors properly.
      
      An example to use memory_format in GBN:
      
      """
      from apex.contrib.groupbn.batch_norm import BatchNorm2d_NHWC
      
      GBN = BatchNorm2d_NHWC(planes, fuse_relu=True, bn_group=1, torch_channels_last=True)
      
      """
      
      The cases that GBN handles are as follows:
      
      1. torch_channels_last=True and input tensor's
      memory_format=torch.channels_last, GBN will generate the
      torch.channels_last output tensor.
      
      2. torch_channels_last=True and input tensor's
      memory_format=torch.contiguous_format, GBN will convert the input tensor
      to torch.channels_last and will generate the torch.channels_last output
      tensor.
      
      3. use_pytorch_channels_last=False and input tensor's
      memory_format=torch.contiguous_format, GBN will generate the
      torch.contiguous_format output tensor.
      
      * Add GBN unit tests for channel_last memory format
      Co-authored-by: default avatarhubertlu-tw <hubertlu@amd.com>
      dd584a59
    • Thor Johnsen's avatar
      Bit faster · 5698eeeb
      Thor Johnsen authored
      5698eeeb