1. 22 Oct, 2024 1 commit
    • ltqin's avatar
      update layernorm (#1570) · 0394f8a7
      ltqin authored
      * port layernorm
      
      * change warp_welford.hpp
      
      * Update warpshuffle
      
      * 1. Add save mean and save std back
      2. Move construction of tensor_view and tile_window to operator()
      
      * refine welford max count calculation
      
      * unify layernorm api
      
      * Rename file
      
      * Remove save mean and inv std
      
      * Revert "refine welford max count calculation"
      
      This reverts commit 02236580
      
      .
      
      * Fix order of parameter
      
      * refine welford max count calculation again
      
      * Remove fp32 instances
      
      * Fix bug of padding
      
      * refactor api
      
      * Support bf16
      
      * Extract common function
      
      * Refine arg of operator()
      
      * Add kMThreadPerBlock to template parameter
      
      * clang format
      
      * Refine variable name
      
      * Refine file name
      
      * remove redundant line
      
      * refactor layernorm2d pipeline and add block-per-block utility
      
      * fix name
      
      * rename more
      
      * add more block-per-tile instance
      
      * remove duplicated define
      
      * update instance for 2048, 1024 case
      
      * support up to 2048 now
      
      * opt loading
      
      * add n1536
      
      * Add two pass pipeline
      
      * format
      
      * Fix incorrect type
      
      * parallel compilation
      
      * Use smaller N
      
      * fix 2p pass
      
      * Support Repeat_M in distribution
      
      * Refine nameing
      
      * Add reduce example
      
      ---------
      Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
      Co-authored-by: default avataraska-0096 <haocwang@amd.com>
      Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      0394f8a7