• ltqin's avatar
    update layernorm (#1570) · 0394f8a7
    ltqin authored
    * port layernorm
    
    * change warp_welford.hpp
    
    * Update warpshuffle
    
    * 1. Add save mean and save std back
    2. Move construction of tensor_view and tile_window to operator()
    
    * refine welford max count calculation
    
    * unify layernorm api
    
    * Rename file
    
    * Remove save mean and inv std
    
    * Revert "refine welford max count calculation"
    
    This reverts commit 02236580
    
    .
    
    * Fix order of parameter
    
    * refine welford max count calculation again
    
    * Remove fp32 instances
    
    * Fix bug of padding
    
    * refactor api
    
    * Support bf16
    
    * Extract common function
    
    * Refine arg of operator()
    
    * Add kMThreadPerBlock to template parameter
    
    * clang format
    
    * Refine variable name
    
    * Refine file name
    
    * remove redundant line
    
    * refactor layernorm2d pipeline and add block-per-block utility
    
    * fix name
    
    * rename more
    
    * add more block-per-tile instance
    
    * remove duplicated define
    
    * update instance for 2048, 1024 case
    
    * support up to 2048 now
    
    * opt loading
    
    * add n1536
    
    * Add two pass pipeline
    
    * format
    
    * Fix incorrect type
    
    * parallel compilation
    
    * Use smaller N
    
    * fix 2p pass
    
    * Support Repeat_M in distribution
    
    * Refine nameing
    
    * Add reduce example
    
    ---------
    Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
    Co-authored-by: default avataraska-0096 <haocwang@amd.com>
    Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
    Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
    0394f8a7
host.hpp 1.29 KB