Commits · 56b3bf580fdec853f83a1932d0c0773ef3ce6ba4 · gaoqiong / MIGraphX

25 Aug, 2020 1 commit

Improve layernorm performance (#613) · 56b3bf58

Paul Fultz II authored Aug 25, 2020

* Use increment instead of division to compute register offset

* Formatting

* Limit layernorm to 1024 elements

* Formatting

* Add verification to driver

* Formatting

* Remove early return

* Use block_size 256

* Vectorize the kernel

* Formatting

* Convert to vector type

* Add layernorm tests

* Formatting

* Formatting

* Refactor layernorm to run both algos

* Formatting

* Fix compile error

* Fix tidy warnings

* Formatting

* Add layernorm function

* Formatting

56b3bf58

14 Aug, 2020 1 commit

Layernorm onnx support (#599) · 2c5d5fee

kahmed10 authored Aug 14, 2020



* fix pad calc

* bert tf passes correctness

* formatting

* add test

* formatting

* remove comment

* add inline

* formatting

* fix order for literal

* formatting

* test no mul_add

* formatting

* debug layernorm

* debug layernorm

* manual merge

* more progress

* formatting

* remove miopen batchnorm

* remove headers

* Fix compile error with no dpp reductions

* fix indices

* formatting

* change matcher

* formatting

* remove binds

* formatting

* disable tf matcher

* formatting

* use fast div

* formatting

* fix matcher

* formatting

* remove comment

* move find_matches

* add assert

* formatting

* fix deepcode issue
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: Shucai Xiao <shucai.xiao@amd.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

2c5d5fee