Commits · f791188a44442b746b7bc93d151ae25a7767e12a · gaoqiong / MIGraphX

17 Jun, 2022 11 commits

Format · f791188a
Paul authored Jun 17, 2022

f791188a
Tidy fixes · 33423f8c
Paul authored Jun 17, 2022

33423f8c
Foramt · 390586c5
Paul authored Jun 17, 2022

390586c5
Tidy fixes · f374143f
Paul authored Jun 17, 2022

f374143f
Format · bf3e958d
Paul authored Jun 17, 2022

bf3e958d
Check type for fp32 · 2bba1c7c
Paul authored Jun 17, 2022

2bba1c7c
Format · d97b3111
Paul authored Jun 17, 2022

d97b3111
Fix failures when mlir is disabled · 6f768f82
Paul authored Jun 17, 2022

6f768f82

Update lowering of Dot operator (#1247) · c99be32c

Umang Yadav authored Jun 17, 2022



* remove code for allocation of C param in dot lowering

* formatting
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

c99be32c

Update tf_parser to have add_common_op() for parse_relu6 (#1241) · 421a5621

Ted Themistokleous authored Jun 17, 2022



* [#935] Update tf_parser to have add_common_op() for parse_relu6

Similar to that of the onnx_parser.cpp add a add_common_op template and functionality to support clip based operations. This is done so clip operations can be guarenteed to have the same dimensions.

* fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6

* fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6

* fixup! fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6

* fixup! fixup! fixup! fixup! [#935] Update tf_parser to have add_common_op() for parse_relu6

* Formatting

* fixup! Formatting
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

421a5621

Create allocate op and replace_allocate pass (#1183) · add6fb3b

kahmed10 authored Jun 17, 2022



* add allocate op header

* formatting

* add replace_allocate pass

* formatting

* move output param to remove_allocate pass

* formatting

* fix bugs in replace_allocate pass

* formatting

* fix verify if tests

* formatting

* move if op logic

* formatting

* cleanup lowering

* cleanup lowering

* formatting

* fix tidy

* formatting

* fix tidy

* add cpu allocate check

* formatting

* change cpu allocate in pass

* formatting

* add some tests for replace_allocate pass

* formatting

* pass by ref

* fix run_pass

* formatting

* update variable name for module

* update dce to use contains() and fix tidy

* formatting

* update cppcheck

* add if test

* formatting

* add if test

* rename var to mod_output_names

* formatting

* remove conditional

* update allocate op and tests

* formatting

* update replace_allocate tests

* update create_output_names() and conditional in replace_allocate

* formatting

* remove extra variable in replace_allocate

* update tools script for allocation_model
Co-authored-by: Umang Yadav <29876643+umangyadav@users.noreply.github.com>
Co-authored-by: Chris Austen <causten@users.noreply.github.com>
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

add6fb3b

16 Jun, 2022 1 commit

Instruction distance check fix (#1237) · f5980619

Charlie Lin authored Jun 16, 2022



* Use custom distance function

* Pass module, skip order check if other module

* Change other valid()

* Remove unnecessary declaration

* test multiple module dependency

* Refactor to make more clear

* Code cleanup

* Simplify fix

* Test EXPECT
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

f5980619

13 Jun, 2022 4 commits
- Format · 1770a342
  Paul authored Jun 13, 2022
  
  1770a342
- Correctly add module · aeb60bce
  Paul authored Jun 13, 2022
  
  aeb60bce
- Format · f75c5a38
  Paul authored Jun 12, 2022
  
  f75c5a38
- Add source locations · af09c35f
  Paul authored Jun 12, 2022
  
  af09c35f
10 Jun, 2022 1 commit

Add vectorized reduce (#1202) · aa7ff911

Paul Fultz II authored Jun 09, 2022



Consolidate the vectorize and preload
Add vectorization to reduction
Co-authored-by: kahmed10 <15948690+kahmed10@users.noreply.github.com>

aa7ff911

09 Jun, 2022 2 commits
- Format · 6b5c64ff
  Paul authored Jun 09, 2022
  
  6b5c64ff
- Move mlir compile to jit pipeline · 02b0095c
  Paul authored Jun 09, 2022
  
  02b0095c
07 Jun, 2022 1 commit

Prioritizing int8 over int8x4 when it is applicable (#1218) · 37c47504

Zhuoran Yin authored Jun 07, 2022



prioritizing int8 over int8x4 when it is applicable
Amend return to continue in apply loop
Adding error handling in case int8x4 compilation failed
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

37c47504

03 Jun, 2022 1 commit

Group code objects by kernel name in perf report summary (#1234) · 7271ddbc

Paul Fultz II authored Jun 02, 2022

Break up the gpu::code_object  print to show the actual kernels...

gpu::code_object::add_kernel: 0.646121ms, 5%
gpu::code_object::mul_kernel: 0.623822ms, 5%
gpu::code_object::add_mul_erf_add_mul_mul_kernel: 0.498902ms, 4%
gpu::code_object::mul_add_kernel: 0.478352ms, 4%

7271ddbc

02 Jun, 2022 1 commit
- Fix dangling reference with gemm add fusion (#1233) · 1339ba35
  Paul Fultz II authored Jun 01, 2022
  
  1339ba35
30 May, 2022 1 commit

Improve eliminate contiguous pass (#1223) · 86061b4d

shivadbhavsar authored May 29, 2022

Following up on issue #1166 and PR #1220. Using the same approach as in #1220 for parallelizing the eval calls, we can significantly reduce the time spent on eliminate_contiguous pass.

86061b4d

26 May, 2022 2 commits

Parallelize evaluations in propagate_constant (#1220) · bf603a76

shivadbhavsar authored May 26, 2022

Addressing issue #1166 - propagate_constant pass currently uses a recursive approach to find all instructions in a module that can be evaluated to a literal and performs the replacement in the same call.

New approach:

Perform single pass though instructions in the module to determine which instructions can be evaluated
Evaluate selected instructions in parallel
Replace the selected instructions with the corresponding literal

bf603a76

Upgrade to cppcheck 2.8 and fix new issues found (#1225) · a401e72a
Paul Fultz II authored May 26, 2022
```
* Upgrade to cppcheck 2.8
```
a401e72a

25 May, 2022 2 commits
- Format · 79ffac9f
  Paul authored May 24, 2022
  
  79ffac9f
- Cleanup debug output · b7f31df5
  Paul authored May 24, 2022
  
  b7f31df5
24 May, 2022 6 commits
- Format · 9dcbd52b
  Paul authored May 24, 2022
  
  9dcbd52b
- Handle symetrical padding · 4272fff1
  Paul authored May 24, 2022
  
  4272fff1
- Improve applicable batched gemms (#1214) · bf0a4713
  Paul Fultz II authored May 24, 2022
```
* Improve applicable batched gemms for bert
```
  bf0a4713
- Remove std references in runtime compilation (#1186) · 150d6d20
  Paul Fultz II authored May 24, 2022
```
Remove std references in runtime compilation since these are not available when using hiprtc and the headers may not be available on the system
```
  150d6d20
- Fuse gemm add with pointwise fusions (#1213) · a500620e
  Paul Fultz II authored May 24, 2022
```
* Fuse gemm add with pointwise fusions
```
  a500620e
- Fix onnx mean parsing for integral inputs (#1209) · d895104a
  shivadbhavsar authored May 23, 2022
```
As described in #1196, the ONNX mean parser does not work correctly for integral types. This update fixes the issue by handling integral types separately, where summation is performed before division. Additional test cases have also been added for handling integral types.
```
  d895104a
20 May, 2022 2 commits
- Rename pointwise ops (#1145) · 4a312201
  kahmed10 authored May 20, 2022
```
For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.
```
  4a312201
- Improve matching with has_value when there are convert operators (#1212) · 27af0170
  Paul Fultz II authored May 19, 2022
  
  27af0170
18 May, 2022 3 commits
- Use func.return · 56a6b232
  Paul authored May 18, 2022
  
  56a6b232
- Format · 516779cb
  Paul authored May 18, 2022
  
  516779cb
- Use func dialect · a4d40fd0
  Paul authored May 18, 2022
  
  a4d40fd0
17 May, 2022 1 commit
- renamed variables for module from p to m (#1204) · a27dd28c
  shivadbhavsar authored May 17, 2022
```
Updated variable names according to #1193
```
  a27dd28c
11 May, 2022 1 commit

Prefuse layernorm for gpu (#1190) · 671f24be

Paul Fultz II authored May 11, 2022

Fuse layernorm and added triadd_layernorm fusion.  This is a prep performance booster

671f24be