Commits · c97ebfab667b629bccd399f54d5d9a5caf9bdaa3 · OpenDAS / apex

05 Aug, 2022 1 commit

Hubert Lu authored Aug 05, 2022



* FusedRMSNorm/"T5LayerNorm" based on FusedLayerNorm (#1274)

* FusedRMSNorm based on FusedLayerNorm

* refactor duplicated kernels

* delete comments

* delete comments

* cleanup

* cleanup

* cleanup, fixed clobbering forward_affine_mixed_dtypes

* fix pybind naming and add MixedFused test

* undo skipping

* check elementwise_affine

* Update tests/L0/run_fused_layer_norm/test_fused_layer_norm.py

Oof, nice catch, thanks
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>

* fix and generate docs for FusedRMSNorm (#1285)

* [FusedRMSNorm doc] document where epsilon is added (#1295)

* [FusedRMSNorm doc] add epsilon to formula

* correct

* better wording

* Fix some bugs

* Optimize HostRMSNormGradient and HostApplyRMSNorm for AMD GPUs

* Fix NaN issues in FusedRMSNorm

* Update test_fused_layer_norm.py

* Skip test_fused_layer_norm.TestAutocastFusedRMSNorm on ROCm

* Use at::cuda::warp_size() instead of at::cuda::getCurrentDeviceProperties()->warpSize
Co-authored-by: eqy <eddiey@nvidia.com>
Co-authored-by: Masaki Kozuki <masaki.kozuki.2014@gmail.com>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

c97ebfab

02 Oct, 2021 1 commit

transformer utils (#1181) · 365fdc18

Masaki Kozuki authored Oct 02, 2021


Co-authored-by: Piotr Bialecki <pbialecki@nvidia.com>
Co-authored-by: Eddie Yan <eddiey@nvidia.com>
Co-authored-by: Rishi Puri <riship@nvidia.com>
Co-authored-by: Sangkug Lym <slym@nvidia.com>

365fdc18

15 Jun, 2020 1 commit
- fix bf16 layernorm bug · c9d35a49
  rohithkrn authored Jun 15, 2020
  
  c9d35a49
03 Jul, 2019 2 commits
- Pulling in deprecation warning changes · 665b2dd7
  Michael Carilli authored Jul 03, 2019
  
  665b2dd7
- Changing AT_CHECK to TORCH_CHECK · adee29f6
  Michael Carilli authored Jul 03, 2019
  
  adee29f6
26 Apr, 2019 1 commit

Replace type().ScalarType() with scalar_type() (#272) · 855808f3

ptrblck authored Apr 26, 2019

* change .type().ScalarType() to .scalar_type() + at::ScalarType::X to at::kX

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF

* revert scalar_type() to type() in AT_DISPATCH_FLOATING_TYPES

* revert scalar_type() to type() for AT_DISPATCH_FLOATING_TYPES_AND_HALF in welford.cu

* revert scalar_type() to type() in layer_norm_cuda_kernel.cu

* revert at::kType  to at::ScalarType::Type

* use DISPATCH_FLOAT_AND_HALF to get rid of warnings

* add dispatch mechanisms for double+float and double+float+half

855808f3

21 Mar, 2019 2 commits
- Use build macro for backward compat · 0f5e3fe0
  Syed Tousif Ahmed authored Mar 07, 2019
  
  0f5e3fe0
- Rename IntList to IntArrayRef · 2a467090
  Syed Tousif Ahmed authored Feb 22, 2019
  
  2a467090
12 Mar, 2019 1 commit
- Forward/backward compatibility around pytorch 3aeb78, to fix #191 · 42180bd9
  Michael Carilli authored Mar 11, 2019
  
  42180bd9
31 Oct, 2018 1 commit

[WIP] Fused layer norm cuda (#69) · 1b9b65ca

Thor Johnsen authored Oct 31, 2018

* Pre-release of fused layer norm apex extension

* Remove half and __half2 specializations

* Code changes from review

1b9b65ca