src/targets/gpu/fuse_ops.cpp · e2cbb01eb182c4bbf67fff88078b6aaac8e2d091 · gaoqiong / MIGraphX

Improve layernorm performance (#613) · 56b3bf58

Paul Fultz II authored Aug 25, 2020

* Use increment instead of division to compute register offset

* Formatting

* Limit layernorm to 1024 elements

* Formatting

* Add verification to driver

* Formatting

* Remove early return

* Use block_size 256

* Vectorize the kernel

* Formatting

* Convert to vector type

* Add layernorm tests

* Formatting

* Formatting

* Refactor layernorm to run both algos

* Formatting

* Fix compile error

* Fix tidy warnings

* Formatting

* Add layernorm function

* Formatting

56b3bf58

fuse_ops.cpp 23.6 KB

Replace fuse_ops.cpp