Commits · c06d254a7cd4f279396d155ca0a1ab80e024761f · gaoqiong / MIGraphX

10 Jun, 2022 1 commit

Add vectorized reduce (#1202) · aa7ff911

Paul Fultz II authored Jun 09, 2022



Consolidate the vectorize and preload
Add vectorization to reduction
Co-authored-by: kahmed10 <15948690+kahmed10@users.noreply.github.com>

aa7ff911

07 Jun, 2022 3 commits
- Prioritizing int8 over int8x4 when it is applicable (#1218) · 37c47504
  Zhuoran Yin authored Jun 07, 2022
```
prioritizing int8 over int8x4 when it is applicable
Amend return to continue in apply loop
Adding error handling in case int8x4 compilation failed
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
```
  37c47504
- Format · 1dda8943
  Paul authored Jun 06, 2022
  
  1dda8943
- Improve gemm fusion · 297572f5
  Paul authored Jun 06, 2022
  
  297572f5
06 Jun, 2022 1 commit
- Ensure standard shape for triadd layernorm · 4ac2919f
  Paul authored Jun 06, 2022
  
  4ac2919f
03 Jun, 2022 1 commit

Group code objects by kernel name in perf report summary (#1234) · 7271ddbc

Paul Fultz II authored Jun 02, 2022

Break up the gpu::code_object  print to show the actual kernels...

gpu::code_object::add_kernel: 0.646121ms, 5%
gpu::code_object::mul_kernel: 0.623822ms, 5%
gpu::code_object::add_mul_erf_add_mul_mul_kernel: 0.498902ms, 4%
gpu::code_object::mul_add_kernel: 0.478352ms, 4%

7271ddbc

02 Jun, 2022 3 commits
- Format · 84327e69
  Paul authored Jun 02, 2022
  
  84327e69
- Fix div by zero · 86b49567
  Paul authored Jun 02, 2022
  
  86b49567
- Fix dangling reference with gemm add fusion (#1233) · 1339ba35
  Paul Fultz II authored Jun 01, 2022
  
  1339ba35
26 May, 2022 1 commit
- Upgrade to cppcheck 2.8 and fix new issues found (#1225) · a401e72a
  Paul Fultz II authored May 26, 2022
```
* Upgrade to cppcheck 2.8
```
  a401e72a
25 May, 2022 5 commits
- Format · 8136ac3e
  Paul authored May 25, 2022
  
  8136ac3e
- Remove unused variable · cac1d8aa
  Paul authored May 25, 2022
  
  cac1d8aa
- Add missing header · 124ed38d
  Paul authored May 25, 2022
  
  124ed38d
- Format · 49e1e618
  Paul authored May 25, 2022
  
  49e1e618
- Set kernel name · 85f22ffd
  Paul authored May 25, 2022
  
  85f22ffd
24 May, 2022 3 commits
- Improve applicable batched gemms (#1214) · bf0a4713
  Paul Fultz II authored May 24, 2022
```
* Improve applicable batched gemms for bert
```
  bf0a4713
- Remove std references in runtime compilation (#1186) · 150d6d20
  Paul Fultz II authored May 24, 2022
```
Remove std references in runtime compilation since these are not available when using hiprtc and the headers may not be available on the system
```
  150d6d20
- Fuse gemm add with pointwise fusions (#1213) · a500620e
  Paul Fultz II authored May 24, 2022
```
* Fuse gemm add with pointwise fusions
```
  a500620e
23 May, 2022 2 commits
- Format · 92324d57
  Paul authored May 23, 2022
  
  92324d57
- Vectorize softmax · 4d66f031
  Paul authored May 23, 2022
  
  4d66f031
20 May, 2022 2 commits

Format · dc296a73
Paul authored May 20, 2022

dc296a73

Rename pointwise ops (#1145) · 4a312201

kahmed10 authored May 20, 2022

For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.

4a312201

19 May, 2022 1 commit
- Fix perf regression · c84154b8
  Paul authored May 19, 2022
  
  c84154b8
18 May, 2022 1 commit
- Fix tidy issue · 7133eee6
  Paul authored May 18, 2022
  
  7133eee6
17 May, 2022 9 commits
- Format · 835cc1e2
  Paul authored May 17, 2022
  
  835cc1e2
- Fuse contiguous · 77be2528
  Paul authored May 17, 2022
  
  77be2528
- Format · 9426aae5
  Paul authored May 17, 2022
  
  9426aae5
- Dont hinder eliminate_contiguous · e83dc134
  Paul authored May 17, 2022
  
  e83dc134
- Format · 8e49a9f2
  Paul authored May 17, 2022
  
  8e49a9f2
- Jit contiguous · 407acb7d
  Paul authored May 17, 2022
  
  407acb7d
- Format · d0b7fc9a
  Paul authored May 17, 2022
  
  d0b7fc9a
- Fix wrong global size · 5515c9a5
  Paul authored May 17, 2022
  
  5515c9a5
- renamed variables for module from p to m (#1204) · a27dd28c
  shivadbhavsar authored May 17, 2022
```
Updated variable names according to #1193
```
  a27dd28c
12 May, 2022 3 commits
- Fix vec_reduce · b4c4234d
  Paul authored May 12, 2022
  
  b4c4234d
- Fix div by zero · 172f47f5
  Paul authored May 12, 2022
  
  172f47f5
- Fix tidy · 8344791c
  Paul authored May 12, 2022
  
  8344791c
11 May, 2022 4 commits
- Prefuse layernorm for gpu (#1190) · 671f24be
  Paul Fultz II authored May 11, 2022
```
Fuse layernorm and added triadd_layernorm fusion.  This is a prep performance booster
```
  671f24be
- Format · db2def39
  Paul authored May 10, 2022
  
  db2def39
- Fix vec issues · f1f60be1
  Paul authored May 10, 2022
  
  f1f60be1
- Format · c13780c2
  Paul authored May 10, 2022
  
  c13780c2