Commits · 580673a0af2e6f7fb7764450f7b3a7887fc9f896 · gaoqiong / MIGraphX

28 Mar, 2022 3 commits
- clang format · 580673a0
  Shucai Xiao authored Mar 28, 2022
  
  580673a0
- backup code changes · 80a6ca93
  Shucai Xiao authored Mar 28, 2022
  
  80a6ca93
- layernorm kernel optimization · a5181cd0
  Shucai Xiao authored Mar 28, 2022
  
  a5181cd0
04 Mar, 2022 2 commits
- clang format · ea656c84
  Shucai Xiao authored Mar 04, 2022
  
  ea656c84
- remove unnecessary code · efac0323
  Shucai Xiao authored Mar 04, 2022
  
  efac0323
27 Sep, 2021 1 commit

Dpp opts for wavefront 32 (#951) · 6e2df9de

kahmed10 authored Sep 27, 2021

Checks wavefront size, then changes implementation and number of threads for DPP reduce

6e2df9de

20 Nov, 2020 1 commit

Fuse skip layernorm (#683) · 1bfb147d

Paul Fultz II authored Nov 20, 2020



* Unify the vectorized and non-vectorized path

* Formatting

* Make fusion easily extendable

* Add skip layernorm fusion

* Formatting

* Call correct layernorm function

* Fix compile errors

* Add DCE

* Add test for skip layernorm

* Formatting

* Remove unused typedef

* Formatting

* Fix tidy issues

* Formatting
Co-authored-by: Shucai Xiao <shucai.xiao@amd.com>

1bfb147d

25 Aug, 2020 1 commit

Improve layernorm performance (#613) · 56b3bf58

Paul Fultz II authored Aug 25, 2020

* Use increment instead of division to compute register offset

* Formatting

* Limit layernorm to 1024 elements

* Formatting

* Add verification to driver

* Formatting

* Remove early return

* Use block_size 256

* Vectorize the kernel

* Formatting

* Convert to vector type

* Add layernorm tests

* Formatting

* Formatting

* Refactor layernorm to run both algos

* Formatting

* Fix compile error

* Fix tidy warnings

* Formatting

* Add layernorm function

* Formatting

56b3bf58

14 Aug, 2020 1 commit

Layernorm onnx support (#599) · 2c5d5fee

kahmed10 authored Aug 14, 2020



* fix pad calc

* bert tf passes correctness

* formatting

* add test

* formatting

* remove comment

* add inline

* formatting

* fix order for literal

* formatting

* test no mul_add

* formatting

* debug layernorm

* debug layernorm

* manual merge

* more progress

* formatting

* remove miopen batchnorm

* remove headers

* Fix compile error with no dpp reductions

* fix indices

* formatting

* change matcher

* formatting

* remove binds

* formatting

* disable tf matcher

* formatting

* use fast div

* formatting

* fix matcher

* formatting

* remove comment

* move find_matches

* add assert

* formatting

* fix deepcode issue
Co-authored-by: Paul <pfultz2@yahoo.com>
Co-authored-by: Shucai Xiao <shucai.xiao@amd.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

2c5d5fee