Commits · c393f233ea359c722c61dde9b8a3bfade5798765 · gaoqiong / MIGraphX

21 Sep, 2023 2 commits
- Formatting · c393f233
  Alan Turner authored Sep 21, 2023
  
  c393f233
- Move fuse_gsg to fuse_ck and fix bugs · 37939805
  Alan Turner authored Sep 21, 2023
  
  37939805
30 Aug, 2023 2 commits
- Formatting · 0a463c1e
  Alan Turner authored Aug 29, 2023
  
  0a463c1e
- Add gemm_softmax_gemm · 8ab0b22e
  Alan Turner authored Aug 29, 2023
  
  8ab0b22e
10 Aug, 2023 1 commit

[MLIR] Changes for tuning API v2, MLIR grid layout changes. (#1961) · 065d06af

Krzysztof Drewniak authored Aug 09, 2023

This PR constitutes the MIGraphX-side changes needed to not break the build in the presence of
ROCmSoftwarePlatform/rocMLIR#1136 , and updates what data is sent in to MLIR during the kernel generation and tuning process.

065d06af

08 Aug, 2023 1 commit
- Update to Cppcheck 2.11 (#1914) · a359d2c8
  Paul Fultz II authored Aug 08, 2023
  
  a359d2c8
30 Jul, 2023 1 commit

Enable tuning for MLIR (#1965) · be6ecff6

Paul Fultz II authored Jul 30, 2023

* Add initial tuning support

* Format

* Add extra param

* Format

* Use exauhstive flag

* Format

* Set expected shapes

* Format

* Format

* Fix missing symbol

* Format

* Add missing license header

* Format

* Update src/targets/gpu/include/migraphx/gpu/mlir.hpp

be6ecff6

28 Jul, 2023 1 commit

Improve performance of pointwise/reduction kernels when using NHWC layouts (#1955) · f33f2298

Paul Fultz II authored Jul 28, 2023

* Improve performance of pointwise/reduction kernels when using NHWC layouts

* Format

* Add nhwc test

* Format

* Remove inline namespace

* Add reduce test

f33f2298

06 Jul, 2023 1 commit

Use MIGRAPHX_GLOBAL (#1918) · c45b34c3

Paul Fultz II authored Jul 06, 2023

This will also annotate the function with the block size so the compiler can do a better job of optimizing.

c45b34c3

02 Jul, 2023 1 commit

Improvement to ck integration (#1859) · 3c9df3b4

Paul Fultz II authored Jul 02, 2023

Add a CI job to test CK
Add MIGRAPHX_TUNE_CK env variable to only do tuning for CK
Continue tuning even when there is invalid configs
Fix a bug with parallel compilation not using all available threads
Add additional test for gemms using half types
Removed int32 as supported type since it doesnt pass our test suite

3c9df3b4

17 Jun, 2023 1 commit

Update CK commit hash and add gfx940 to supported archs (#1842) · b8898d7e

turneram authored Jun 17, 2023

* Add initial ck_gemm code

* Format

* Add additional src files

* Format

* Add include

* Simplify fuse_ck

* Format

* Rename var

* Enable pass

* Update ck version

* Fix include

* Add group stride

* Disable warnings for ck headers

* Format

* Add unpack array

* Add interface to enable tuning

* Format

* Update compile_ops to handle tuning config

* Format

* Add some comments

* Move time_op to migraphx_gpu

* Add banchmarking

* Refactor

* Format

* Add lift class macro

* Use device name

* Format

* Generate configs

* Format

* Pass tuning parameter

* Move data type to is_ck_gemm matcher

* Format

* Add problem_cache to avoid retuning same configs

* Format

* Format

* Mark the problems

* Format

* Use is_null

* Format

* Resize vector

* Only tune with exaustive tuning

* Format

* Use assert

* FOrmat

* Tidy fixes

* More tidy fixes

* Format

* Add license to missing files

* Format

* Use transform

* Format

* Fix tidy

* Format

* Fix cppcheck issues

* Format

* Add static_assert

* Add ops header

* Add assertion in batcher

* Format

* Improve the batch fold check

* Format

* Add where op workaround for CK

* Skip if any input is not a supported ck type

* Format

* Check batch is standard

* Format

* Remove redundant static keyword

* Update commit hash

* Fix error when running without --exhaustive-tune

* Formatting

* Formatting

* Remove fuse_ck_gemm_softmax_gemm

* Update ck hash

* Correct spelling mistake

* Remove commented out logic from fuse_ck

* Remove unused include and add comment

* Formatting

* Remove redundant get_shape and remove ck_gemm from names

* Formatting

* Allow for mixed types with int8 gemms

* Formatting

* Add back find_package from merge

* Update CK commit hash and add gfx940 to fuse_ops supported archs

* Formatting

* Update CK hash

b8898d7e

09 Jun, 2023 1 commit

Add missing specialization for the `nullptr` for the hash function (#1824) · 26aabd2a

Umang Yadav authored Jun 09, 2023

#1791 Added hash function for value class. It uses the Visit function and has specialization for the bool_type and <vector> type but was missing specialization for the nullptr. Nullptr caused compilation issues for RHEL, SLES and CentOS.

26aabd2a

08 Jun, 2023 1 commit
- Add initial CK integration plus auto-tuning for kernels (#1791) · 25af8710
  Paul Fultz II authored Jun 08, 2023
```
Enable with MIGRAPHX_ENABLE_CK=1 and --exhaustive-tune tune flag
```
  25af8710
24 May, 2023 1 commit
- Change compiler_replace to a class that stores the code objects directly (#1739) · 37f5df20
  Paul Fultz II authored May 24, 2023
```
Enable retrieving the code object to do tuning in the future.
```
  37f5df20
06 Apr, 2023 1 commit
- Add reduction fusion (#1614) · f201285c
  Paul Fultz II authored Apr 05, 2023
```
Automatically fuse multiple reductions and pointwise operations.
```
  f201285c
29 Mar, 2023 1 commit
- Fix bug when concatting with the vectorization axis (#1653) · b1506c73
  Paul Fultz II authored Mar 29, 2023
  
  b1506c73
27 Mar, 2023 1 commit

[MLIR] add dot offloads with manual tuning support (#1631) · 7c4dc99a

Manupa Karunaratne authored Mar 27, 2023

* [MLIR] add dot offloads with manual tuning support
* This commit adds dot + pointwise fusion support
along with manual tuning using rocMLIR.

7c4dc99a

10 Mar, 2023 1 commit
- Fix static_assert in large reduction (#1604) · 206b9a51
  Paul Fultz II authored Mar 09, 2023
  
  206b9a51
16 Feb, 2023 1 commit

Copy into registers first when doing reductions with layernorm and softmax (#1489) · ac531d99

Paul Fultz II authored Feb 16, 2023

Avoids double global loads.  Strided loops are unrolled which lets store results in array which compiler will use registers for since the index access is constant.   Updated to handle large reductions so which results with a better stable diffusion result

ac531d99

17 Jan, 2023 1 commit
- Use float accumulator when reduction size is too large for half (#1515) · 3af50e07
  Paul Fultz II authored Jan 17, 2023
  
  3af50e07
09 Jan, 2023 1 commit

Add JIT Gather Operator (#1492) · 054364cd

Ted Themistokleous authored Jan 09, 2023

JIT implementation of the gather operator
Added a few more unit tests to this one as well since I saw some odd behavior during bring up.

054364cd

06 Dec, 2022 1 commit

Update MLIR integration (#1451) · be70702d

jungpark-mlir authored Dec 06, 2022

Update dialect registration interface
Update 2nd build pipeline call and use full arch name

be70702d

02 Nov, 2022 2 commits
- Add nhwc layout to gpu backend (#1391) · 1820198e
  Paul Fultz II authored Nov 02, 2022
```
Can be enabled via environment variable MIGRAPHX_ENABLE_NHWC
```
  1820198e
- Concat pointwise fusions (#1388) · 2f48b11a
  Paul Fultz II authored Nov 02, 2022
  
  2f48b11a
27 Oct, 2022 1 commit

Add JIT pad (#1411) · 0d841ded

kahmed10 authored Oct 27, 2022

updated GPU pad to now use JIT version.
added range functions for JIT kernels.

0d841ded

19 Oct, 2022 1 commit

Refactor dynamic compute; Dynamic ref unary functions (#1407) · 693cb5d8

Charlie Lin authored Oct 19, 2022

Refactor dynamic compute
- add a compute_output_shape object that implicitly converts to a new dyn_output or shape object
- dyn_output object can handle computing the static output shape of an operator given the input arguments shapes
  change an operator's compute function to argument compute(const dyn_output& dyn_out, std::vector<argument> args) to 
  use dyn_output object

Dynamic ref unary functions
-  Included these changes to have an example of the refactored dynamic compute being used
-  Changes to unary base class to handle dynamic shapes
-  Changed elu and leaky_relu to use unary base class and pointwise JIT

693cb5d8

18 Oct, 2022 1 commit

Add support in mlir for transposed and broadcasted shaped (#1378) · c3e02b18

Paul Fultz II authored Oct 18, 2022



* Enable non-standard shape
* Use perfdb for non xdlops
* Fix transpose+broadcast strides
Co-authored-by: jungpark-mlir <jungwook.park@amd.com>

c3e02b18

04 Oct, 2022 1 commit
- Fast softmax (#1290) · a9a47402
  Paul Fultz II authored Oct 04, 2022
```
optimize the softmax operator
```
  a9a47402
29 Sep, 2022 1 commit

Use find_2.0 API for the convolution (#1346) · e19f78ae

Umang Yadav authored Sep 29, 2022

Improvements/Additions to be made:

changes for the quant_convolution,
changes for the deconvolution,
Macros for MIOpen status checks

e19f78ae

26 Sep, 2022 1 commit
- Use larger vector size instead of preloading for broadcasted inputs (#1389) · 492c4a6c
  Paul Fultz II authored Sep 26, 2022
  
  492c4a6c
21 Sep, 2022 1 commit

Parameterize epsilon for layernorm kernel (#1367) · d9578ba6

kahmed10 authored Sep 21, 2022

This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.

d9578ba6

14 Sep, 2022 1 commit
- Implement concat using jit compilation (#1356) · 7662d9c0
  Paul Fultz II authored Sep 14, 2022
```
* Implement concat using jit compilation
```
  7662d9c0
08 Sep, 2022 1 commit
- Remove unused headers (#1363) · ed2c73ac
  Paul Fultz II authored Sep 08, 2022
```
* Remove unused headers
```
  ed2c73ac
17 Aug, 2022 1 commit
- Add jit layernorm fusion (#1301) · 1784584e
  Paul Fultz II authored Aug 16, 2022
  
  1784584e
25 Jul, 2022 1 commit

Add onnx mod operator (#1302) · 77e80b8e

Ted Themistokleous authored Jul 25, 2022

* Add in changes for onnx Mod operator

Initial operator for mod implementation and test cases for integer and floating based types.

Need to use fmod from stdlib for floating point types. half_float::half thankfully is specced to the use the existing std::fmod() call when looking at the half.hpp implementation.

fmod_flag should mirror the onnx fmod attribute. Right now using a floating point type without setting that on the user side to true will result in an exception.

Ref ticket #1283

77e80b8e

05 Jul, 2022 1 commit
- Add jit softmax (#1243) · 8520e0b8
  Paul Fultz II authored Jul 05, 2022
```
* Add softmax kernel
```
  8520e0b8
03 Jul, 2022 1 commit

Add mlir fusion (#1251) · ca8a54fe

Paul Fultz II authored Jul 03, 2022

* Add mlir c api

* Formatting

* Create a type attribute

* Formatting

* Parse module

* Formatting

* Add mlir dump function

* Add test case

* Formatting

* Fix tidy issues

* Update mlit version

* Update to newer mlir

* Format

* Move mlir to the gpu and update the test

* Formatting

* Fix bug when appending module

* Format

* Remove old cmake flag

* Update message

* Add return

* Format

* Add mlir_compile

* Format

* Register dialect

* Handle unsinged integers

* Dont provide output for return instruction

* Format

* Add code to insert memrefs

* Format

* Add mlir verification

* Formatting

* Enable pointwise_fusion

* Disable eliminate_data_type

* Set kernal name

* Format

* Fix device name

* Formatting

* Fix output arg

* Format

* Updates

* Upate hash

* Add fuse_mlir pass

* Format

* Add fuse mlir

* Format

* Update mlir

* Sort parameter names

* Format

* Reenable disabled passes

* Remove old mlir conv

* Remove asym default padding

* Add more verbose tracing

* Format

* Fix compilation errors

* Format

* Whitelist operators

* Format

* Add namespace

* Format

* Update triple

* Format

* Use func dialect

* Format

* Use func.return

* Format

* Upgrade mlir version

* Add comment

* Handle symetrical padding

* Format

* Cleanup debug output

* Format

* List failed tests

* Move mlir compile to jit pipeline

* Format

* Update version

* Add source locations

* Format

* Correctly add module

* Format

* Update failed tests

* Fix failures when mlir is disabled

* Format

* Update mlir version

* Check type for fp32

* Format

* Remove failed test

* Update mlir in driver

* Tidy fixes

* Foramt

* Tidy fixes

* Format

* Fix const

* Remove from requirements

* Fix cmake version

* Fix tidy warning

* Use another ifdef

* Fix tidy

* Other tidy fix

* Format

* Update hash

* Add missing license files

* Format

* Format

* Fix fnction name

ca8a54fe

25 Jun, 2022 1 commit
- Use jit for contiguous operator (#1217) · b75c83d8
  Paul Fultz II authored Jun 24, 2022
```
* Jit contiguous
```
  b75c83d8
22 Jun, 2022 1 commit
- Update license files (#1248) · e44cecbc
  Ted Themistokleous authored Jun 22, 2022
```
Updated each source file in the repo with the existing license.
```
  e44cecbc
10 Jun, 2022 1 commit

Add vectorized reduce (#1202) · aa7ff911

Paul Fultz II authored Jun 09, 2022



Consolidate the vectorize and preload
Add vectorization to reduction
Co-authored-by: kahmed10 <15948690+kahmed10@users.noreply.github.com>

aa7ff911