Commits · 319a0cf4be121e868d30a4d10bf831db74cfb6da · gaoqiong / MIGraphX

10 Jun, 2022 1 commit

Add vectorized reduce (#1202) · aa7ff911

Paul Fultz II authored Jun 09, 2022



Consolidate the vectorize and preload
Add vectorization to reduction
Co-authored-by: kahmed10 <15948690+kahmed10@users.noreply.github.com>

aa7ff911

07 Jun, 2022 1 commit

Prioritizing int8 over int8x4 when it is applicable (#1218) · 37c47504

Zhuoran Yin authored Jun 07, 2022



prioritizing int8 over int8x4 when it is applicable
Amend return to continue in apply loop
Adding error handling in case int8x4 compilation failed
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

37c47504

03 Jun, 2022 1 commit

Group code objects by kernel name in perf report summary (#1234) · 7271ddbc

Paul Fultz II authored Jun 02, 2022

Break up the gpu::code_object  print to show the actual kernels...

gpu::code_object::add_kernel: 0.646121ms, 5%
gpu::code_object::mul_kernel: 0.623822ms, 5%
gpu::code_object::add_mul_erf_add_mul_mul_kernel: 0.498902ms, 4%
gpu::code_object::mul_add_kernel: 0.478352ms, 4%

7271ddbc

02 Jun, 2022 3 commits
- Format · 84327e69
  Paul authored Jun 02, 2022
  
  84327e69
- Fix div by zero · 86b49567
  Paul authored Jun 02, 2022
  
  86b49567
- Fix dangling reference with gemm add fusion (#1233) · 1339ba35
  Paul Fultz II authored Jun 01, 2022
  
  1339ba35
30 May, 2022 1 commit

Improve eliminate contiguous pass (#1223) · 86061b4d

shivadbhavsar authored May 29, 2022

Following up on issue #1166 and PR #1220. Using the same approach as in #1220 for parallelizing the eval calls, we can significantly reduce the time spent on eliminate_contiguous pass.

86061b4d

26 May, 2022 2 commits

Parallelize evaluations in propagate_constant (#1220) · bf603a76

shivadbhavsar authored May 26, 2022

Addressing issue #1166 - propagate_constant pass currently uses a recursive approach to find all instructions in a module that can be evaluated to a literal and performs the replacement in the same call.

New approach:

Perform single pass though instructions in the module to determine which instructions can be evaluated
Evaluate selected instructions in parallel
Replace the selected instructions with the corresponding literal

bf603a76

Upgrade to cppcheck 2.8 and fix new issues found (#1225) · a401e72a
Paul Fultz II authored May 26, 2022
```
* Upgrade to cppcheck 2.8
```
a401e72a

25 May, 2022 3 commits
- Add missing header · 124ed38d
  Paul authored May 25, 2022
  
  124ed38d
- Format · 49e1e618
  Paul authored May 25, 2022
  
  49e1e618
- Set kernel name · 85f22ffd
  Paul authored May 25, 2022
  
  85f22ffd
24 May, 2022 4 commits

Improve applicable batched gemms (#1214) · bf0a4713
Paul Fultz II authored May 24, 2022
```
* Improve applicable batched gemms for bert
```
bf0a4713

Remove std references in runtime compilation (#1186) · 150d6d20

Paul Fultz II authored May 24, 2022

Remove std references in runtime compilation since these are not available when using hiprtc and the headers may not be available on the system

150d6d20

Fuse gemm add with pointwise fusions (#1213) · a500620e
Paul Fultz II authored May 24, 2022
```
* Fuse gemm add with pointwise fusions
```
a500620e

Fix onnx mean parsing for integral inputs (#1209) · d895104a

shivadbhavsar authored May 23, 2022

As described in #1196, the ONNX mean parser does not work correctly for integral types. This update fixes the issue by handling integral types separately, where summation is performed before division. Additional test cases have also been added for handling integral types.

d895104a

23 May, 2022 2 commits
- Format · 92324d57
  Paul authored May 23, 2022
  
  92324d57
- Vectorize softmax · 4d66f031
  Paul authored May 23, 2022
  
  4d66f031
20 May, 2022 2 commits
- Rename pointwise ops (#1145) · 4a312201
  kahmed10 authored May 20, 2022
```
For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.
```
  4a312201
- Improve matching with has_value when there are convert operators (#1212) · 27af0170
  Paul Fultz II authored May 19, 2022
  
  27af0170
19 May, 2022 1 commit
- Fix perf regression · c84154b8
  Paul authored May 19, 2022
  
  c84154b8
18 May, 2022 1 commit
- Fix tidy issue · 7133eee6
  Paul authored May 18, 2022
  
  7133eee6
17 May, 2022 3 commits
- Format · d0b7fc9a
  Paul authored May 17, 2022
  
  d0b7fc9a
- Fix wrong global size · 5515c9a5
  Paul authored May 17, 2022
  
  5515c9a5
- renamed variables for module from p to m (#1204) · a27dd28c
  shivadbhavsar authored May 17, 2022
```
Updated variable names according to #1193
```
  a27dd28c
12 May, 2022 3 commits
- Fix vec_reduce · b4c4234d
  Paul authored May 12, 2022
  
  b4c4234d
- Fix div by zero · 172f47f5
  Paul authored May 12, 2022
  
  172f47f5
- Fix tidy · 8344791c
  Paul authored May 12, 2022
  
  8344791c
11 May, 2022 5 commits
- Prefuse layernorm for gpu (#1190) · 671f24be
  Paul Fultz II authored May 11, 2022
```
Fuse layernorm and added triadd_layernorm fusion.  This is a prep performance booster
```
  671f24be
- Format · db2def39
  Paul authored May 10, 2022
  
  db2def39
- Fix vec issues · f1f60be1
  Paul authored May 10, 2022
  
  f1f60be1
- Format · c13780c2
  Paul authored May 10, 2022
  
  c13780c2
- Add vectorization to reduction · 15fd8205
  Paul authored May 10, 2022
  
  15fd8205
10 May, 2022 3 commits
- Format · 8a6ae079
  Paul authored May 10, 2022
  
  8a6ae079
- Consolidate the vecotrize and preload · d60364a3
  Paul authored May 10, 2022
  
  d60364a3
- Expose `add_literal` in C and Python API (#1173) · 5e5ed37a
  Umang Yadav authored May 10, 2022
```
Expose add_literal method in C/C++ api
```
  5e5ed37a
09 May, 2022 1 commit

Refactor vectorization and preloading for pointwise fusions (#1184) · ddbbe54b

Paul Fultz II authored May 09, 2022

Improves performance for add_gelu.  In bert it is 4x faster and for mul_add it is 50% faster than what we current have.

ddbbe54b

06 May, 2022 1 commit

upgrade docker images to ROCm 5.0.2 (#1133) · f55d7c24

Chris Austen authored May 06, 2022

Move to CI containers to rocm 5.0.2
upgrade to 20.04
free up some more file space in github action environments

f55d7c24

05 May, 2022 1 commit

Cppcheck fixes (#1195) · d582425b

Paul Fultz II authored May 05, 2022

Fixes the #error when using cppcheck. This no longer suppresses cppcheck errors when including those errors. This fixes the cppcheck errors that was there already.

d582425b

03 May, 2022 1 commit
- Format · bb0fff52
  Paul authored May 03, 2022
  
  bb0fff52