• Anthony Chang's avatar
    Single-kernel GEMM + layernorm (#263) · 63fd5da6
    Anthony Chang authored
    
    
    * dump lds content in appropriate precision type
    
    * add squared add reduction op; allows sq sum
    
    * initial stub from regular gemm impl
    
    * layernorm example code & host verification
    
    * initial layernorm implementation
    
    * tidy up
    
    * make C0 precision type consistent with C
    
    * clang-tidy and additional comments
    
    * tighten up example code
    
    * account for extra flops/bytes from normalization
    
    * clang-format
    
    * c0 bias/beta/gamma now have its own precision type
    
    * AccElemOp for gemm outputs prior to feeding to layernorm
    
    * update workgroup mapping
    
    * rename kernel template param to reflect its dual use
    
    * use LDS mem pool for reduction workspace
    
    * change cshuffle precision type to f16; clean up
    
    * clang-format
    
    * correct naming
    
    * explicit cast
    
    * fully implemented gemm + bias + activation + add + norm
    
    * activation in correct order
    
    * reflect reduction API's recent change
    
    * amend
    
    * clean up; add comment
    
    * keep up with recent changes in reduction API
    
    * format
    
    * resolve merge conflicts
    Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
    63fd5da6
reduction_operator.hpp 9.58 KB