include/ck/utility/reduction_operator.hpp · fa2d894be1b3c0213da06d58af0df2de2c5308ad · yangql / composable_kernel-1

Single-kernel GEMM + layernorm (#263) · 63fd5da6

Anthony Chang authored Jul 01, 2022



* dump lds content in appropriate precision type

* add squared add reduction op; allows sq sum

* initial stub from regular gemm impl

* layernorm example code & host verification

* initial layernorm implementation

* tidy up

* make C0 precision type consistent with C

* clang-tidy and additional comments

* tighten up example code

* account for extra flops/bytes from normalization

* clang-format

* c0 bias/beta/gamma now have its own precision type

* AccElemOp for gemm outputs prior to feeding to layernorm

* update workgroup mapping

* rename kernel template param to reflect its dual use

* use LDS mem pool for reduction workspace

* change cshuffle precision type to f16; clean up

* clang-format

* correct naming

* explicit cast

* fully implemented gemm + bias + activation + add + norm

* activation in correct order

* reflect reduction API's recent change

* amend

* clean up; add comment

* keep up with recent changes in reduction API

* format

* resolve merge conflicts
Co-authored-by: Chao Liu <chao.liu2@amd.com>

63fd5da6

reduction_operator.hpp 9.58 KB

Replace reduction_operator.hpp