Commits · cb8b7a889cef0dc8318ddff4e9b461ed78489f6e · OpenDAS / apex

05 Aug, 2022 1 commit

Hubert Lu authored Aug 05, 2022



* FusedRMSNorm/"T5LayerNorm" based on FusedLayerNorm (#1274)

* FusedRMSNorm based on FusedLayerNorm

* refactor duplicated kernels

* delete comments

* delete comments

* cleanup

* cleanup

* cleanup, fixed clobbering forward_affine_mixed_dtypes

* fix pybind naming and add MixedFused test

* undo skipping

* check elementwise_affine

* Update tests/L0/run_fused_layer_norm/test_fused_layer_norm.py

Oof, nice catch, thanks
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>

* fix and generate docs for FusedRMSNorm (#1285)

* [FusedRMSNorm doc] document where epsilon is added (#1295)

* [FusedRMSNorm doc] add epsilon to formula

* correct

* better wording

* Fix some bugs

* Optimize HostRMSNormGradient and HostApplyRMSNorm for AMD GPUs

* Fix NaN issues in FusedRMSNorm

* Update test_fused_layer_norm.py

* Skip test_fused_layer_norm.TestAutocastFusedRMSNorm on ROCm

* Use at::cuda::warp_size() instead of at::cuda::getCurrentDeviceProperties()->warpSize
Co-authored-by: eqy <eddiey@nvidia.com>
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

c97ebfab

29 Jul, 2022 1 commit
- Fix some compiling errors · 038ed999
  hubertlu-tw authored Jul 29, 2022
  
  038ed999
15 Apr, 2022 4 commits

Fix NaN issues in FusedRMSNorm · 8df1b6b8
hubertlu-tw authored Apr 15, 2022

8df1b6b8
Optimize HostRMSNormGradient and HostApplyRMSNorm for AMD GPUs · 28c5638d
hubertlu-tw authored Apr 15, 2022

28c5638d
Fix some bugs · d755f1f1
hubertlu-tw authored Apr 15, 2022

d755f1f1

FusedRMSNorm/"T5LayerNorm" based on FusedLayerNorm (#1274) · c14cfb10

eqy authored Feb 03, 2022



* FusedRMSNorm based on FusedLayerNorm

* refactor duplicated kernels

* delete comments

* delete comments

* cleanup

* cleanup

* cleanup, fixed clobbering forward_affine_mixed_dtypes

* fix pybind naming and add MixedFused test

* undo skipping

* check elementwise_affine

* Update tests/L0/run_fused_layer_norm/test_fused_layer_norm.py

Oof, nice catch, thanks
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>

c14cfb10

04 Feb, 2022 1 commit

FusedRMSNorm/"T5LayerNorm" based on FusedLayerNorm (#1274) · 684c4733

eqy authored Feb 03, 2022



* FusedRMSNorm based on FusedLayerNorm

* refactor duplicated kernels

* delete comments

* delete comments

* cleanup

* cleanup

* cleanup, fixed clobbering forward_affine_mixed_dtypes

* fix pybind naming and add MixedFused test

* undo skipping

* check elementwise_affine

* Update tests/L0/run_fused_layer_norm/test_fused_layer_norm.py

Oof, nice catch, thanks
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>

684c4733

25 Jan, 2022 1 commit

Optimize layer normalization for AMD GPUs (#66) · 1cb3da87

Hubert Lu authored Jan 25, 2022

* Optimize fused layer normalization for MI100

* Optimize cuComputePartGradGammaBeta for AMD GPUs

1cb3da87

17 Nov, 2021 1 commit
- THCDeviceUtils.cuh -> ATen/cuda/DeviceUtils.cuh (#1173) · 5c79a278
  Masaki Kozuki authored Sep 24, 2021
  
  5c79a278
08 Oct, 2021 1 commit
- check in (#1187) · 3ad9db2a
  eqy authored Oct 07, 2021
  
  3ad9db2a
07 Oct, 2021 1 commit
- Update layer_norm_cuda_kernel.cu (#1184) · 5adf7bc2
  eqy authored Oct 06, 2021
  
  5adf7bc2
02 Oct, 2021 1 commit

transformer utils (#1181) · 365fdc18

Masaki Kozuki authored Oct 02, 2021


Co-authored-by: Piotr Bialecki <pbialecki@nvidia.com>
Co-authored-by: Eddie Yan <eddiey@nvidia.com>
Co-authored-by: Rishi Puri <riship@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>

365fdc18

24 Sep, 2021 1 commit
- THCDeviceUtils.cuh -> ATen/cuda/DeviceUtils.cuh (#1173) · 76daa454
  Masaki Kozuki authored Sep 24, 2021
  
  76daa454
04 Nov, 2020 1 commit

Fix LayerNorm op on ROCm (#36) · 7eed38aa

Ashish Farmer authored Nov 04, 2020

* fix warp size in WARP_SHFL* in layernorm

* enable fused_layer_norm tests on ROCm

7eed38aa

15 Jun, 2020 1 commit
- fix bf16 layernorm bug · c9d35a49
  rohithkrn authored Jun 15, 2020
  
  c9d35a49
26 May, 2020 1 commit
- enable bfloat16 for optimizers · 85549903
  rohithkrn authored May 26, 2020
  
  85549903
07 May, 2020 1 commit
- Enable fusedlayernorm extension (#3) · 2d0f9cf2
  Chaitanya Sri Krishna Lolla authored May 07, 2020
  
  2d0f9cf2
06 Sep, 2019 1 commit

Fix for #456 (#477) · 325f5a0b

mcarilli authored Sep 05, 2019

* Pushing for build tests

* Contrib files

* Removing deprecated checks

325f5a0b

06 Aug, 2019 1 commit

Clean up layer norm tests (#418) · 3ef01fae

ngimel authored Aug 06, 2019

* Bug fix for non-affine layer-norm + add backward unit test

* clean up tests and add tests for a large batch

3ef01fae

01 Aug, 2019 1 commit
- fix fused layer norm for >65535 batch · 4a8e1a87
  Natalia Gimelshein authored Aug 01, 2019
  
  4a8e1a87
26 Apr, 2019 1 commit

Replace type().ScalarType() with scalar_type() (#272) · 855808f3

ptrblck authored Apr 26, 2019

* change .type().ScalarType() to .scalar_type() + at::ScalarType::X to at::kX

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF

* revert scalar_type() to type() in AT_DISPATCH_FLOATING_TYPES

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF in welford.cu

* revert scalar_type() to type() in layer_norm_cuda_kernel.cu

* revert at::kType  to at::ScalarType::Type

* use DISPATCH_FLOAT_AND_HALF to get rid of warnings

* add dispatch mechanisms for double+float and double+float+half

855808f3

08 Apr, 2019 1 commit
- Fix for #246 · 03100f46
  Michael Carilli authored Apr 08, 2019
  
  03100f46
21 Mar, 2019 2 commits
- Use build macro for backward compat · 0f5e3fe0
  Syed Tousif Ahmed authored Mar 07, 2019
  
  0f5e3fe0
- Rename IntList to IntArrayRef · 2a467090
  Syed Tousif Ahmed authored Feb 22, 2019
  
  2a467090
15 Mar, 2019 1 commit
- Anticipating upstream #17996 · 2c8e1c86
  Michael Carilli authored Mar 15, 2019
  
  2c8e1c86
12 Mar, 2019 1 commit
- Forward/backward compatibility around pytorch 3aeb78, to fix #191 · 42180bd9
  Michael Carilli authored Mar 11, 2019
  
  42180bd9
31 Oct, 2018 1 commit

[WIP] Fused layer norm cuda (#69) · 1b9b65ca

Thor Johnsen authored Oct 31, 2018

* Pre-release of fused layer norm apex extension

* Remove half and __half2 specializations

* Code changes from review

1b9b65ca