Commits · db63cc7720d35efb4aa2d7f655b95249f53fc2e1 · gaoqiong / MIGraphX

17 Jun, 2023 1 commit

Update CK commit hash and add gfx940 to supported archs (#1842) · b8898d7e

turneram authored Jun 17, 2023

* Add initial ck_gemm code

* Format

* Add additional src files

* Format

* Add include

* Simplify fuse_ck

* Format

* Rename var

* Enable pass

* Update ck version

* Fix include

* Add group stride

* Disable warnings for ck headers

* Format

* Add unpack array

* Add interface to enable tuning

* Format

* Update compile_ops to handle tuning config

* Format

* Add some comments

* Move time_op to migraphx_gpu

* Add banchmarking

* Refactor

* Format

* Add lift class macro

* Use device name

* Format

* Generate configs

* Format

* Pass tuning parameter

* Move data type to is_ck_gemm matcher

* Format

* Add problem_cache to avoid retuning same configs

* Format

* Format

* Mark the problems

* Format

* Use is_null

* Format

* Resize vector

* Only tune with exaustive tuning

* Format

* Use assert

* FOrmat

* Tidy fixes

* More tidy fixes

* Format

* Add license to missing files

* Format

* Use transform

* Format

* Fix tidy

* Format

* Fix cppcheck issues

* Format

* Add static_assert

* Add ops header

* Add assertion in batcher

* Format

* Improve the batch fold check

* Format

* Add where op workaround for CK

* Skip if any input is not a supported ck type

* Format

* Check batch is standard

* Format

* Remove redundant static keyword

* Update commit hash

* Fix error when running without --exhaustive-tune

* Formatting

* Formatting

* Remove fuse_ck_gemm_softmax_gemm

* Update ck hash

* Correct spelling mistake

* Remove commented out logic from fuse_ck

* Remove unused include and add comment

* Formatting

* Remove redundant get_shape and remove ck_gemm from names

* Formatting

* Allow for mixed types with int8 gemms

* Formatting

* Add back find_package from merge

* Update CK commit hash and add gfx940 to fuse_ops supported archs

* Formatting

* Update CK hash

b8898d7e

15 Jun, 2023 1 commit
- use __hmax, __hmin (#1813) · d208adfc
  Umang Yadav authored Jun 15, 2023
  
  d208adfc
14 Jun, 2023 1 commit

Fix TRACE_EVAL > 1 (#1835) · 5bf067ed

Umang Yadav authored Jun 14, 2023



* add fix for the trace_eval

* Add throw for the debug builds

* Formatting

---------
Co-authored-by: Chris Austen <causten@users.noreply.github.com>

5bf067ed

09 Jun, 2023 2 commits

Enable hipRTC (#1827) · c900e382
Chris Austen authored Jun 09, 2023

c900e382

Add missing specialization for the `nullptr` for the hash function (#1824) · 26aabd2a

Umang Yadav authored Jun 09, 2023

#1791 Added hash function for value class. It uses the Visit function and has specialization for the bool_type and <vector> type but was missing specialization for the nullptr. Nullptr caused compilation issues for RHEL, SLES and CentOS.

26aabd2a

08 Jun, 2023 2 commits
- Add initial CK integration plus auto-tuning for kernels (#1791) · 25af8710
  Paul Fultz II authored Jun 08, 2023
```
Enable with MIGRAPHX_ENABLE_CK=1 and --exhaustive-tune tune flag
```
  25af8710
- disable hipRTC temporarily (#1817) · e5a33aad
  Chris Austen authored Jun 07, 2023
  
  e5a33aad
06 Jun, 2023 2 commits

re-enable hiprtc (#1812) · 85ff4f85
Umang Yadav authored Jun 06, 2023

85ff4f85

Conditionally enable GeLU approximation (#1810) · c5d0c5b6

Umang Yadav authored Jun 05, 2023

Sigmoid approximation for GeLU was introduced in #1299 for Fp16. The sigmoid approximation is known to get better perf but lower accuracy. https://arxiv.org/pdf/1606.08415.pdf

c5d0c5b6

31 May, 2023 1 commit
- Update pass manager to handle multi-target compilation (#1672) · 9473e3a2
  Umang Yadav authored May 31, 2023
```
partially solves #1656
This PR only handles compilation part of multitarget.
```
  9473e3a2
24 May, 2023 2 commits
- Change compiler_replace to a class that stores the code objects directly (#1739) · 37f5df20
  Paul Fultz II authored May 24, 2023
```
Enable retrieving the code object to do tuning in the future.
```
  37f5df20
- Update xdlops/rocblas fp32 arch (#1752) · 77042e30
  kahmed10 authored May 24, 2023
```
Refactor supported gfx archs
```
  77042e30
23 May, 2023 1 commit
- Backout fp16 max/min HIP API change (#1771) · 42772fd6
  Umang Yadav authored May 23, 2023
```
back out changes for rocm-5.5
```
  42772fd6
20 May, 2023 1 commit
- Use half HIP APIs to compute max and min (#1764) · 88fb551c
  Umang Yadav authored May 19, 2023
```
* use half hip functions to compute max and min
* add verify test for min and max
```
  88fb551c
19 May, 2023 1 commit
- Enabling native int32 type support (#1721) · 8d9d5d1c
  Zhuoran Yin authored May 19, 2023
```
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
```
  8d9d5d1c
17 May, 2023 1 commit
- adjust docker files to support new rocm 5.5 (#1729) · 5e35957b
  Chris Austen authored May 17, 2023
```
Move CI to support the rocm5.5 release
```
  5e35957b
08 May, 2023 1 commit
- Remove workaround for Sin (#1701) · 89f7ac0d
  Umang Yadav authored May 08, 2023
  
  89f7ac0d
05 May, 2023 1 commit
- [MLIR][5.7] add input fusion support for view ops (#1705) · 4996c6d7
  Manupa Karunaratne authored May 05, 2023
```
Adds support for slice,transpose,contigous and reshape fusions into input tensors for a fused mlir kernel.
```
  4996c6d7
04 May, 2023 1 commit

[mlir] Adding quant convolution fusion as anchor op (#1683) · 7f105952

Zhuoran Yin authored May 03, 2023

Exposed the mlir_enabled() call the decide for lowering pipeline's enablement
Disabled the rewrite quantization pipeline in mlir compilation
Added quant convolution as anchor ops
Fixed the return type expectations
Added the fall back hip implementation for quantizelinear and dequantizelinear
Will need advises to improve the implementation for quantizelinear

7f105952

28 Apr, 2023 1 commit
- Removed split_single_dyn_dim compile flag (#1711) · bcc1f64a
  Charlie Lin authored Apr 28, 2023
  
  bcc1f64a
25 Apr, 2023 2 commits
- update rocBLAS version check to support 3.0 and above (#1716) · ed6542ee
  kahmed10 authored Apr 25, 2023
```
update rocBLAS version check to support 3.0 and above with simplified logic
```
  ed6542ee
- Disable hipRTC revert to hipClang (#1714) · eb69b36c
  Chris Austen authored Apr 24, 2023
  
  eb69b36c
24 Apr, 2023 3 commits
- Dynamic shape hip::copy_to_gpu and hip::copy_from_gpu (#1694) · 84acaea0
  Charlie Lin authored Apr 24, 2023
```
Updates the hip::copy_to_gpu and hip::copy_from_gpu operators to work with dynamic shapes

Allows for offload_copy to be used with dynamic batch

Changed assert in select_module because the argument might now be smaller with how offload_copy will work with dynamic batch. (maximum buffer size will be used)
```
  84acaea0
- Fix compile failure in reduction fusion of instance norm (#1702) · 08360e83
  Paul Fultz II authored Apr 24, 2023
```
This fixes #1700
```
  08360e83
- Fix incorrect assertion in vec_packed_at (#1704) · 4339af75
  Paul Fultz II authored Apr 23, 2023
  
  4339af75
21 Apr, 2023 1 commit
- disable fusion only but create pointwise modules (#1706) · 2a44dfe9
  Umang Yadav authored Apr 21, 2023
  
  2a44dfe9
13 Apr, 2023 1 commit
- [mlir] Adding quantizelinear, dequantizelinear and quant_convolution support (#1675) · 7b2a5ccf
  Zhuoran Yin authored Apr 13, 2023
  
  7b2a5ccf
11 Apr, 2023 1 commit
- Enable tidy on gpu driver (#1659) · 3385dcc8
  Paul Fultz II authored Apr 11, 2023
  
  3385dcc8
09 Apr, 2023 1 commit
- Enable hiprtc by default (#1658) · db6c75e7
  Paul Fultz II authored Apr 09, 2023
```
* Enable hiprtc by default
```
  db6c75e7
06 Apr, 2023 2 commits

Driver dynamic batch update (#1652) · adccec52

Charlie Lin authored Apr 06, 2023

Examples..

bin/driver verify /codes/onnx_models/resnet50-v1-7/resnet50-v1-7.onnx --split-single-dyn-dim --batch 3 --dyn-input-dim @data "[{min:1, max:4}, 3, 224, 224]"

bin/driver compile /codes/onnx_models/resnet50-v1-7/resnet50-v1-7.onnx --split-single-dyn-dim --default-dyn-dim "{min:1, max:10}" --output resnet50_batch1-10.mxr

bin/driver perf resnet50_batch1-10.mxr --batch 4

adccec52

Add reduction fusion (#1614) · f201285c
Paul Fultz II authored Apr 05, 2023
```
Automatically fuse multiple reductions and pointwise operations.
```
f201285c

05 Apr, 2023 1 commit

Optimize add convolution (#1549) · df32040d

Paul Fultz II authored Apr 05, 2023

This will replace conv(x+a, w) with conv(x, w) + conv(a, w) where a is a constant so conv(a, w) can be replaced with a constant.

df32040d

03 Apr, 2023 1 commit

promote_literals pass (#1593) · e3fb3a0d

Charlie Lin authored Apr 03, 2023

Adds the promote_literals compiler pass that moves literals from the submodules to the main module.
With the eliminate_common_subexpression pass, it will remove copies of literals created during split_single_dyn_dim.
Pass is enabled with the split_single_dyn_dim compile option.

e3fb3a0d

01 Apr, 2023 1 commit
- Enable header tests for FPGA and CPU backend (#1634) · 6a0a5ffe
  Umang Yadav authored Apr 01, 2023
  
  6a0a5ffe
31 Mar, 2023 1 commit

Split single dynamic dimension compiler pass (#1580) · e9e3eacc

Charlie Lin authored Mar 30, 2023

Adds a new GPU compiler pass split_single_dyn_dim that handles when one input parameter has a single non-fixed dynamic_dimension.
commonly occurs for dynamic batch or BERT sequence length
Splits the dynamic shape into several submodules will static input parameters to handle all of the cases in the dynamic_dimension range.
Essentially does what I manually did for the select_module verify tests
Adds a compile option split_single_dyn_dim that toggles the pass on/off. Defaults to false.
Updates verify_program.hpp and run_verify.cpp to allow for the tests to change the compile_options

e9e3eacc

30 Mar, 2023 1 commit
- Enable parallel compilation with hiprtc (#1647) · 32b9fd08
  Paul Fultz II authored Mar 30, 2023
```
* Add hiprtc driver
```
  32b9fd08
29 Mar, 2023 1 commit
- Fix bug when concatting with the vectorization axis (#1653) · b1506c73
  Paul Fultz II authored Mar 29, 2023
  
  b1506c73
28 Mar, 2023 1 commit
- Remove version name from check_context (#1639) · 49fc6138
  Umang Yadav authored Mar 28, 2023
```
* Remove version from check_context and bump program version
```
  49fc6138
27 Mar, 2023 1 commit

[MLIR] add dot offloads with manual tuning support (#1631) · 7c4dc99a

Manupa Karunaratne authored Mar 27, 2023

* [MLIR] add dot offloads with manual tuning support
* This commit adds dot + pointwise fusion support
along with manual tuning using rocMLIR.

7c4dc99a

25 Mar, 2023 1 commit
- remove /opt/rocm (#1623) · 018e5318
  Umang Yadav authored Mar 24, 2023
```
Co-authored-by: Chris Austen <causten@users.noreply.github.com>
```
  018e5318