[Ck tile] layernorm2d fwd optimize (#1637)
* optimze small N case using vec io and using rcp div
* [Ck_tile] layernorm, add param to control fastdiv; change generate codes and test pass
* [Ck_tile] fix blockSize compute in Generic2dBlockShape
* [Ck_tile]fix kfastfdiv template style
* [Ck_tile] layernorm, fix stype in review
---------
Co-authored-by:
dummycoderfe <noplydummmycoder@163.com>
Showing
Please register or sign in to comment