• Hubert Lu's avatar
    Enable FusedRMSNorm (#78) · c97ebfab
    Hubert Lu authored
    
    
    * FusedRMSNorm/"T5LayerNorm" based on FusedLayerNorm (#1274)
    
    * FusedRMSNorm based on FusedLayerNorm
    
    * refactor duplicated kernels
    
    * delete comments
    
    * delete comments
    
    * cleanup
    
    * cleanup
    
    * cleanup, fixed clobbering forward_affine_mixed_dtypes
    
    * fix pybind naming and add MixedFused test
    
    * undo skipping
    
    * check elementwise_affine
    
    * Update tests/L0/run_fused_layer_norm/test_fused_layer_norm.py
    
    Oof, nice catch, thanks
    Co-authored-by: default avatarMasaki Kozuki <masaki.kozuki.2014@gmail.com>
    Co-authored-by: default avatarMasaki Kozuki <masaki.kozuki.2014@gmail.com>
    
    * fix and generate docs for FusedRMSNorm (#1285)
    
    * [FusedRMSNorm doc] document where epsilon is added (#1295)
    
    * [FusedRMSNorm doc] add epsilon to formula
    
    * correct
    
    * better wording
    
    * Fix some bugs
    
    * Optimize HostRMSNormGradient and HostApplyRMSNorm for AMD GPUs
    
    * Fix NaN issues in FusedRMSNorm
    
    * Update test_fused_layer_norm.py
    
    * Skip test_fused_layer_norm.TestAutocastFusedRMSNorm on ROCm
    
    * Use at::cuda::warp_size() instead of at::cuda::getCurrentDeviceProperties()->warpSize
    Co-authored-by: default avatareqy <eddiey@nvidia.com>
    Co-authored-by: default avatarMasaki Kozuki <masaki.kozuki.2014@gmail.com>
    Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
    c97ebfab
layer_norm_cuda.cpp 12.6 KB