Commits · 9bf7ed8bbc3df1bc2c5458e1365aa19eec84e910 · gaoqiong / MIGraphX

20 May, 2022 8 commits
- Fix · 9bf7ed8b
  Paul authored May 20, 2022
  
  9bf7ed8b
- Format · dc296a73
  Paul authored May 20, 2022
  
  dc296a73
- Rename pointwise ops (#1145) · 4a312201
  kahmed10 authored May 20, 2022
```
For clarity on kernel names found when profiling. The new names are set to the order of the ops being compiled. For example: add + relu = add_relu_kernel.
```
  4a312201
- Format · 4001eef7
  Paul authored May 19, 2022
  
  4001eef7
- Try to calculate local · 2aa25de2
  Paul authored May 19, 2022
  
  2aa25de2
- Improve matching with has_value when there are convert operators (#1212) · 27af0170
  Paul Fultz II authored May 19, 2022
  
  27af0170
- Format · dfc7bbac
  Paul authored May 19, 2022
  
  dfc7bbac
- Fix contiguous after splits · 3fb1b7f3
  Paul authored May 19, 2022
  
  3fb1b7f3
19 May, 2022 1 commit
- Fix perf regression · c84154b8
  Paul authored May 19, 2022
  
  c84154b8
18 May, 2022 1 commit
- Fix tidy issue · 7133eee6
  Paul authored May 18, 2022
  
  7133eee6
17 May, 2022 11 commits
- Format · 39bbf87c
  Paul authored May 17, 2022
  
  39bbf87c
- Horizontally fuse contiguous · dcd3d04b
  Paul authored May 17, 2022
  
  dcd3d04b
- Format · 835cc1e2
  Paul authored May 17, 2022
  
  835cc1e2
- Fuse contiguous · 77be2528
  Paul authored May 17, 2022
  
  77be2528
- Format · 9426aae5
  Paul authored May 17, 2022
  
  9426aae5
- Dont hinder eliminate_contiguous · e83dc134
  Paul authored May 17, 2022
  
  e83dc134
- Format · 8e49a9f2
  Paul authored May 17, 2022
  
  8e49a9f2
- Jit contiguous · 407acb7d
  Paul authored May 17, 2022
  
  407acb7d
- Format · d0b7fc9a
  Paul authored May 17, 2022
  
  d0b7fc9a
- Fix wrong global size · 5515c9a5
  Paul authored May 17, 2022
  
  5515c9a5
- renamed variables for module from p to m (#1204) · a27dd28c
  shivadbhavsar authored May 17, 2022
```
Updated variable names according to #1193
```
  a27dd28c
12 May, 2022 3 commits
- Fix vec_reduce · b4c4234d
  Paul authored May 12, 2022
  
  b4c4234d
- Fix div by zero · 172f47f5
  Paul authored May 12, 2022
  
  172f47f5
- Fix tidy · 8344791c
  Paul authored May 12, 2022
  
  8344791c
11 May, 2022 5 commits
- Prefuse layernorm for gpu (#1190) · 671f24be
  Paul Fultz II authored May 11, 2022
```
Fuse layernorm and added triadd_layernorm fusion.  This is a prep performance booster
```
  671f24be
- Format · db2def39
  Paul authored May 10, 2022
  
  db2def39
- Fix vec issues · f1f60be1
  Paul authored May 10, 2022
  
  f1f60be1
- Format · c13780c2
  Paul authored May 10, 2022
  
  c13780c2
- Add vectorization to reduction · 15fd8205
  Paul authored May 10, 2022
  
  15fd8205
10 May, 2022 3 commits
- Format · 8a6ae079
  Paul authored May 10, 2022
  
  8a6ae079
- Consolidate the vecotrize and preload · d60364a3
  Paul authored May 10, 2022
  
  d60364a3
- Expose `add_literal` in C and Python API (#1173) · 5e5ed37a
  Umang Yadav authored May 10, 2022
```
Expose add_literal method in C/C++ api
```
  5e5ed37a
09 May, 2022 1 commit

Refactor vectorization and preloading for pointwise fusions (#1184) · ddbbe54b

Paul Fultz II authored May 09, 2022

Improves performance for add_gelu.  In bert it is 4x faster and for mul_add it is 50% faster than what we current have.

ddbbe54b

06 May, 2022 1 commit

upgrade docker images to ROCm 5.0.2 (#1133) · f55d7c24

Chris Austen authored May 06, 2022

Move to CI containers to rocm 5.0.2
upgrade to 20.04
free up some more file space in github action environments

f55d7c24

05 May, 2022 1 commit

Cppcheck fixes (#1195) · d582425b

Paul Fultz II authored May 05, 2022

Fixes the #error when using cppcheck. This no longer suppresses cppcheck errors when including those errors. This fixes the cppcheck errors that was there already.

d582425b

03 May, 2022 1 commit

Extend lifetimes in C++ API (#1139) · 4a5a23a4

Paul Fultz II authored May 02, 2022

Helps avoid dangling references. This also deprecates the constructors that didnt take a lifetime annotation since its ambiguous the lifetime.

4a5a23a4

29 Apr, 2022 1 commit
- Add GatherND operator (#1089) · 4ec35e5f
  turneram authored Apr 28, 2022
```
Add ref and gpu implementations for ONNX op GatherND

Resolves #1032
```
  4ec35e5f
27 Apr, 2022 1 commit

Add lane reduction (#1180) · 4c72cc95

Paul Fultz II authored Apr 27, 2022

With reductions such as {2048, 2, 1456} on axes 1, this is 23x faster than using our new block_reduce, and its even over 100x faster than our original reduce_sum:

# lane
gpu::code_object[code_object=13736,symbol_name=kernel,global=2981888,local=1024,]: 0.0672928ms
# block
gpu::code_object[code_object=13800,symbol_name=kernel,global=39321600,local=64,]: 1.46072ms
# original
gpu::reduce_sum[axes={1}]: 6.73456ms
There is some basic logic to pick between lane and block reduce automatically.

4c72cc95

26 Apr, 2022 1 commit
- Expose get_queue method for context in API (#1161) · 36656030
  Umang Yadav authored Apr 26, 2022
```
* expose get_queue method
```
  36656030
23 Apr, 2022 1 commit

ReverseSequence op (#1177) · 31906785

Charlie Lin authored Apr 22, 2022

Implements the ReverseSequence ONNX operator as a parser.

This parser can only handle a constant sequence_lens input. This is the same as what is handled for TensorRT as far as I can tell.
We could handle a variable sequence_lens input; that would require ref and GPU implementations of the operator.
The ONNX backend tests are disabled because this does not handle variable sequence_lens.

31906785