Commits · umang_ir_dump · gaoqiong / MIGraphX

13 May, 2022 4 commits
- Merge branch 'develop' into umang_ir_dump · d4663624
  Umang Yadav authored May 13, 2022
  
  d4663624
- Update install_prereqs.sh for individual use (#1197) · 8c94ad07
  Chris Austen authored May 13, 2022
```
Our documentation indicates a user with sudo can run the install_prereqs.sh file. Turns out that the file is not complete enough to run on Ubuntu 18.04/20.04 independently. I updated the file to resolve the failures.

resolves #1191
```
  8c94ad07
- formatting · 883312be
  umangyadav authored May 13, 2022
  
  883312be
- fix tidy cppcheck · d0fec7fd
  umangyadav authored May 13, 2022
  
  d0fec7fd
12 May, 2022 20 commits
- fix cppcheck · 2d27b694
  umangyadav authored May 12, 2022
  
  2d27b694
- cppcheck fix · ddaa22b3
  umangyadav authored May 12, 2022
  
  ddaa22b3
- formatting · d7033279
  umangyadav authored May 12, 2022
  
  d7033279
- fix errors and add another environment variable for dump_passes · 884d3eb1
  umangyadav authored May 12, 2022
  
  884d3eb1
- formatting · 751fd21a
  umangyadav authored Apr 13, 2022
  
  751fd21a
- remove directory after each use · f6fc4eb9
  umangyadav authored Apr 13, 2022
  
  f6fc4eb9
- formatting · f47a39ea
  umangyadav authored Apr 13, 2022
  
  f47a39ea
- change mod passes trce · 3768c23e
  umangyadav authored Apr 13, 2022
  
  3768c23e
- formatting · 6c34bc60
  umangyadav authored Apr 13, 2022
  
  6c34bc60
- enabled module tracw only wehn trace is enabled · d9bbe8ac
  umangyadav authored Apr 13, 2022
  
  d9bbe8ac
- formatting · d1c4fb1a
  umangyadav authored Apr 07, 2022
  
  d1c4fb1a
- fix tidy · 1535e7b2
  umangyadav authored Apr 07, 2022
  
  1535e7b2
- enable trace on mod passes · 5ad42259
  umangyadav authored Apr 07, 2022
  
  5ad42259
- mod_passes · 09cb3e48
  umangyadav authored Apr 07, 2022
  
  09cb3e48
- change lowering names · e137db54
  umangyadav authored Apr 07, 2022
  
  e137db54
- formatting · e05243e0
  umangyadav authored Apr 07, 2022
  
  e05243e0
- fix tidy · 187495fe
  umangyadav authored Apr 07, 2022
  
  187495fe
- undo cmake changes · 7b878710
  umangyadav authored Apr 06, 2022
  
  7b878710
- formatting · e45e0216
  umangyadav authored Apr 06, 2022
  
  e45e0216
- dump IR passes into seperate files, dump modules passes into seperate directories · aff10174
  umangyadav authored Apr 06, 2022
  
  aff10174
11 May, 2022 2 commits
- Prefuse layernorm for gpu (#1190) · 671f24be
  Paul Fultz II authored May 11, 2022
```
Fuse layernorm and added triadd_layernorm fusion.  This is a prep performance booster
```
  671f24be
- Updated a path to the bert-squad onnx file after upstream changed path (#1201) · 4ec8209f
  Chris Austen authored May 10, 2022
```
ONNX Models changed from master to main. Changing path reflect the proper location
```
  4ec8209f
10 May, 2022 1 commit
- Expose `add_literal` in C and Python API (#1173) · 5e5ed37a
  Umang Yadav authored May 10, 2022
```
Expose add_literal method in C/C++ api
```
  5e5ed37a
09 May, 2022 1 commit

Refactor vectorization and preloading for pointwise fusions (#1184) · ddbbe54b

Paul Fultz II authored May 09, 2022

Improves performance for add_gelu.  In bert it is 4x faster and for mul_add it is 50% faster than what we current have.

ddbbe54b

06 May, 2022 2 commits
- upgrade docker images to ROCm 5.0.2 (#1133) · f55d7c24
  Chris Austen authored May 06, 2022
```
Move to CI containers to rocm 5.0.2
upgrade to 20.04
free up some more file space in github action environments
```
  f55d7c24
- Add compile tests for gpu math functions (#1182) · 6a5cda96
  Paul Fultz II authored May 06, 2022
```
Add compile tests for gpu math functions
```
  6a5cda96
05 May, 2022 1 commit

Cppcheck fixes (#1195) · d582425b

Paul Fultz II authored May 05, 2022

Fixes the #error when using cppcheck. This no longer suppresses cppcheck errors when including those errors. This fixes the cppcheck errors that was there already.

d582425b

03 May, 2022 1 commit

Extend lifetimes in C++ API (#1139) · 4a5a23a4

Paul Fultz II authored May 02, 2022

Helps avoid dangling references. This also deprecates the constructors that didnt take a lifetime annotation since its ambiguous the lifetime.

4a5a23a4

02 May, 2022 1 commit
- Bumping version to support next ROCm release (#1192) · 8b4c417c
  Chris Austen authored May 02, 2022
```
Release branch created for ROCm 5.2 so moving develop branch to 2.3
```
  8b4c417c
29 Apr, 2022 1 commit
- Add GatherND operator (#1089) · 4ec35e5f
  turneram authored Apr 28, 2022
```
Add ref and gpu implementations for ONNX op GatherND

Resolves #1032
```
  4ec35e5f
27 Apr, 2022 1 commit

Add lane reduction (#1180) · 4c72cc95

Paul Fultz II authored Apr 27, 2022

With reductions such as {2048, 2, 1456} on axes 1, this is 23x faster than using our new block_reduce, and its even over 100x faster than our original reduce_sum:

# lane
gpu::code_object[code_object=13736,symbol_name=kernel,global=2981888,local=1024,]: 0.0672928ms
# block
gpu::code_object[code_object=13800,symbol_name=kernel,global=39321600,local=64,]: 1.46072ms
# original
gpu::reduce_sum[axes={1}]: 6.73456ms
There is some basic logic to pick between lane and block reduce automatically.

4c72cc95

26 Apr, 2022 1 commit
- Expose get_queue method for context in API (#1161) · 36656030
  Umang Yadav authored Apr 26, 2022
```
* expose get_queue method
```
  36656030
23 Apr, 2022 1 commit

ReverseSequence op (#1177) · 31906785

Charlie Lin authored Apr 22, 2022

Implements the ReverseSequence ONNX operator as a parser.

This parser can only handle a constant sequence_lens input. This is the same as what is handled for TensorRT as far as I can tell.
We could handle a variable sequence_lens input; that would require ref and GPU implementations of the operator.
The ONNX backend tests are disabled because this does not handle variable sequence_lens.

31906785

19 Apr, 2022 1 commit

Refactor Pooling and implement ONNX LpPool and GlobalLpPool (#1152) · 764273e4

Charlie Lin authored Apr 18, 2022

Refactored the reference implementation of pooling to something like what was done for roialign. Moved the reference implementation of pooling from targets/ref/lowering.cpp to pooling.hpp.
Removed cpu_pooling, instead using reference pooling in pooling.hpp
Added reference implementation of Lp Norm pooling and the global version
Added tests for the Lp Norm Pooling

764273e4

17 Apr, 2022 1 commit

Reduce with runtime compilation (#1150) · f9a5b81e

Paul Fultz II authored Apr 17, 2022

There is significant improvement on larger tensors with half almost 50% faster:

lens: [1024, 384, 768]
gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms
gpu::reduce_sum[axes={2}]: 1.73126ms
Also for non-trivial layouts this can sometimes be over 2x faster:

lens: [64, 1024, 768, 4]
gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms
gpu::reduce_sum[axes={1}]: 2.63375ms
Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR.

Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.

f9a5b81e

14 Apr, 2022 1 commit

Half2 overloads (#1157) · 12007dba

bpickrel authored Apr 14, 2022

Issue 1127 Updates the math.hpp header file to perform overloads of various standard functions (ops) for the hip half2 type. The half2 type is two 16-bit floats packed into a 32-bit number and therefore the overloads act on vectors of sizes that are multiples of 2. They are invoked in runtime compilation any time one of the ops is called on a tensor declared with the data type shape::half_type.

Defined new template, made instances of the template for those math operations that the hip library contains, added verify tests for the sqrt operator for three cases:

tensor size not divisible by 2
tensor size divisible by 2 but not by 4
tensor size divisible by 4

12007dba