Commits · a8939c5b5e2feb39a6fe1446ffa4acf20e163c44 · gaoqiong / MIGraphX

25 May, 2022 1 commit
- Add to conv dynamic batch test · 421769c0
  charlie authored May 24, 2022
  
  421769c0
24 May, 2022 5 commits
- Improve applicable batched gemms (#1214) · bf0a4713
  Paul Fultz II authored May 24, 2022
```
* Improve applicable batched gemms for bert
```
  bf0a4713
- Coverage update · 97a40ac3
  charlie authored May 24, 2022
  
  97a40ac3
- Fix onnx mean parsing for integral inputs (#1209) · d895104a
  shivadbhavsar authored May 23, 2022
```
As described in #1196, the ONNX mean parser does not work correctly for integral types. This update fixes the issue by handling integral types separately, where summation is performed before division. Additional test cases have also been added for handling integral types.
```
  d895104a
- Add operator<< tests for coverage · a8d7c888
  charlie authored May 23, 2022
  
  a8d7c888
- Dynamic conv draft progress · a465fc9d
  charlie authored May 23, 2022
  
  a465fc9d
19 May, 2022 2 commits
- Formatting · 79e27dac
  charlie authored May 19, 2022
  
  79e27dac
- progress · 53c4b899
  charlie authored May 19, 2022
  
  53c4b899
13 May, 2022 1 commit
- Test the dynamic lens functions · eb9fe865
  charlie authored May 13, 2022
  
  eb9fe865
11 May, 2022 4 commits
- Prefuse layernorm for gpu (#1190) · 671f24be
  Paul Fultz II authored May 11, 2022
```
Fuse layernorm and added triadd_layernorm fusion.  This is a prep performance booster
```
  671f24be
- dynamic bytes() test · 38b5c752
  charlie authored May 11, 2022
  
  38b5c752
- formatting · 8f76125c
  charlie authored May 11, 2022
  
  8f76125c
- element_space, min,max,opt _lens change · c497c12d
  charlie authored May 11, 2022
  
  c497c12d
10 May, 2022 2 commits
- Use std::initializer_list in constructor · ac0224a9
  charlie authored May 10, 2022
```
Reverts the dyn_data struct change
Should get around the ambiguous braced initialization list error
```
  ac0224a9
- Expose `add_literal` in C and Python API (#1173) · 5e5ed37a
  Umang Yadav authored May 10, 2022
```
Expose add_literal method in C/C++ api
```
  5e5ed37a
09 May, 2022 1 commit
- Add dyn_data struct to avoid ambiguous constructor · de4c1b44
  charlie authored May 09, 2022
  
  de4c1b44
06 May, 2022 3 commits
- Fix serialize errors · b31735e8
  charlie authored May 06, 2022
  
  b31735e8
- Dynamic shape tests · 7c63b13b
  charlie authored May 06, 2022
  
  7c63b13b
- Add compile tests for gpu math functions (#1182) · 6a5cda96
  Paul Fultz II authored May 06, 2022
```
Add compile tests for gpu math functions
```
  6a5cda96
03 May, 2022 3 commits
- rewrite empty lens multibroadcast test · 4680518a
  charlie authored May 03, 2022
  
  4680518a
- Dynamic shape handling in shape object · c0e18e78
  charlie authored May 03, 2022
  
  c0e18e78
- Extend lifetimes in C++ API (#1139) · 4a5a23a4
  Paul Fultz II authored May 02, 2022
```
Helps avoid dangling references. This also deprecates the constructors that didnt take a lifetime annotation since its ambiguous the lifetime.
```
  4a5a23a4
29 Apr, 2022 1 commit
- Add GatherND operator (#1089) · 4ec35e5f
  turneram authored Apr 28, 2022
```
Add ref and gpu implementations for ONNX op GatherND

Resolves #1032
```
  4ec35e5f
26 Apr, 2022 1 commit
- Expose get_queue method for context in API (#1161) · 36656030
  Umang Yadav authored Apr 26, 2022
```
* expose get_queue method
```
  36656030
23 Apr, 2022 1 commit

ReverseSequence op (#1177) · 31906785

Charlie Lin authored Apr 22, 2022

Implements the ReverseSequence ONNX operator as a parser.

This parser can only handle a constant sequence_lens input. This is the same as what is handled for TensorRT as far as I can tell.
We could handle a variable sequence_lens input; that would require ref and GPU implementations of the operator.
The ONNX backend tests are disabled because this does not handle variable sequence_lens.

31906785

19 Apr, 2022 1 commit

Refactor Pooling and implement ONNX LpPool and GlobalLpPool (#1152) · 764273e4

Charlie Lin authored Apr 18, 2022

Refactored the reference implementation of pooling to something like what was done for roialign. Moved the reference implementation of pooling from targets/ref/lowering.cpp to pooling.hpp.
Removed cpu_pooling, instead using reference pooling in pooling.hpp
Added reference implementation of Lp Norm pooling and the global version
Added tests for the Lp Norm Pooling

764273e4

17 Apr, 2022 1 commit

Reduce with runtime compilation (#1150) · f9a5b81e

Paul Fultz II authored Apr 17, 2022

There is significant improvement on larger tensors with half almost 50% faster:

lens: [1024, 384, 768]
gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.16685ms
gpu::reduce_sum[axes={2}]: 1.73126ms
Also for non-trivial layouts this can sometimes be over 2x faster:

lens: [64, 1024, 768, 4]
gpu::code_object[code_object=13832,symbol_name=kernel,global=39321600,local=256,]: 1.1706ms
gpu::reduce_sum[axes={1}]: 2.63375ms
Of course if the stride becomes larger this speed improvement diminishes due to poor memory access patterns. A lane_reduce instead of a block_reduce is needed for such type of kernels. I plan to address that in a future PR.

Finally, this also includes a MIGRAPHX_GPU_DUMP_ASM env variable which will print out the assembly when the kernel compiles.

f9a5b81e

14 Apr, 2022 1 commit

Half2 overloads (#1157) · 12007dba

bpickrel authored Apr 14, 2022

Issue 1127 Updates the math.hpp header file to perform overloads of various standard functions (ops) for the hip half2 type. The half2 type is two 16-bit floats packed into a 32-bit number and therefore the overloads act on vectors of sizes that are multiples of 2. They are invoked in runtime compilation any time one of the ops is called on a tensor declared with the data type shape::half_type.

Defined new template, made instances of the template for those math operations that the hip library contains, added verify tests for the sqrt operator for three cases:

tensor size not divisible by 2
tensor size divisible by 2 but not by 4
tensor size divisible by 4

12007dba

11 Apr, 2022 1 commit

scatter operator refactoring to include reduction (#1124) · 701c2014

bpickrel authored Apr 11, 2022

Change the "scatter" struct and op to a base/child set of three: scatter_none, scatter_add, scatter_mul to mirror Onnx' ScatterElements op. and its three reduction options. (Onnx Scatter op is deprecated and is equivalent to scatter_none.)

Provides both a reference op. and update to Onnx parsing. Tests updated and new test case added.

701c2014

08 Apr, 2022 1 commit
- Fix comparisons in migraphx::value class (#1146) · 1e0bbd78
  Paul Fultz II authored Apr 08, 2022
```
* Fix comparisons in migraphx::value class
```
  1e0bbd78
06 Apr, 2022 1 commit

Python Binding for the Manual Graph Buidling (#1143) · c4b6469a

Umang Yadav authored Apr 06, 2022

Adds following API binding and tests to python :

add_return
add_instruction
add_parameter
create_module.

c4b6469a

01 Apr, 2022 1 commit

Update developer overview, fix doc CMakeLists (#1140) · 0295965d

Charlie Lin authored Apr 01, 2022

* Fix and change doc CMakeLists
1. Fix include directory location with hange from #1088
2. Create a DoxygenWarningLog.txt file in <build_dir>/doc/doxygen
3. Move compiled html or pdf files to <build_dir>/doc/[pdf, html]

0295965d

31 Mar, 2022 1 commit
- Change the doc to mention only gpu or ref as targets (#1153) · c59f4079
  Umang Yadav authored Mar 31, 2022
```
Documentation update for valid targets
```
  c59f4079
29 Mar, 2022 2 commits

Python binding for shape : Fix constructor for the shape and enable tests (#1135) · b5c96d34
Umang Yadav authored Mar 29, 2022
```
Follow up to #1128
```
b5c96d34

Refactor runtime compiled kernels to use the same compile_ops pipeline (#1125) · 661046c6

Paul Fultz II authored Mar 29, 2022

This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape.

This also makes it easier to add new runtime compiled kernels in the future.

661046c6

25 Mar, 2022 1 commit
- Improve handling of string literals in value class (#1141) · c73c0dae
  Paul Fultz II authored Mar 25, 2022
```
* Handle string literal in construction
* Improve get_default with vector
```
  c73c0dae
24 Mar, 2022 1 commit
- Add initial experimental custom op (#1109) · 251cdd74
  Paul Fultz II authored Mar 24, 2022
```
This creates a custom op which has name() and compute_shape() methods. 
```
  251cdd74
21 Mar, 2022 1 commit
- Lp normalization op (#1129) · 03225b57
  Charlie Lin authored Mar 21, 2022
```
* LpNormalization ONNX parser
```
  03225b57
18 Mar, 2022 2 commits

Complete GPU implementation of CumSum op (#1094) · 548783c8

turneram authored Mar 18, 2022

Add exclusive and reverse modes to gpu implementation of prefix_scan_sum, which completes support for ONNX op CumSum

548783c8

Make get_context experimental (#1137) · e521fa3f

Paul Fultz II authored Mar 18, 2022

The get_context may change in the future(when we support multi-targets) so make this experimental for now.

e521fa3f