Commits · 701c20147e6f4c5cf28aa3c8f2a6e0eae4d3ba77 · gaoqiong / MIGraphX

11 Apr, 2022 2 commits

scatter operator refactoring to include reduction (#1124) · 701c2014

bpickrel authored Apr 11, 2022

Change the "scatter" struct and op to a base/child set of three: scatter_none, scatter_add, scatter_mul to mirror Onnx' ScatterElements op. and its three reduction options. (Onnx Scatter op is deprecated and is equivalent to scatter_none.)

Provides both a reference op. and update to Onnx parsing. Tests updated and new test case added.

701c2014

fix a bug in create tensor_view with vec data type (#1155) · 3c301efa

Shucai Xiao authored Apr 11, 2022

When create a tensor_view with vector date type, the last dimension of the shape should be divided by the vec_size.

3c301efa

29 Mar, 2022 1 commit

Refactor runtime compiled kernels to use the same compile_ops pipeline (#1125) · 661046c6

Paul Fultz II authored Mar 29, 2022

This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape.

This also makes it easier to add new runtime compiled kernels in the future.

661046c6

28 Mar, 2022 1 commit
- Use ccache for runtime compilation (#1131) · ad056b1f
  Paul Fultz II authored Mar 28, 2022
```
* Use ccache for runtime compilation
```
  ad056b1f
18 Mar, 2022 1 commit

Complete GPU implementation of CumSum op (#1094) · 548783c8

turneram authored Mar 18, 2022

Add exclusive and reverse modes to gpu implementation of prefix_scan_sum, which completes support for ONNX op CumSum

548783c8

15 Mar, 2022 1 commit

Add iterators to kernels tensor_view and fix roialign to work with non-standard shape (#1126) · 31e63991

Paul Fultz II authored Mar 15, 2022

This adds iterators to tensor_view, which can allow kernels to work with non-standard shapes like for roialign.

To improve the performance of indexing when using the iterators, the shape class was updated to use integral_constants since the compiler doesn't always fold the const values. An integral_constant will at least enforce that in the AST.

Finally, since index calculations with single integers are improved, I also updated pointwise to use single index rather than multi index. There is about 4% improvement in some cases.

31e63991

14 Mar, 2022 1 commit
- Increase max groups in kernel (#1120) · d353641d
  Shucai Xiao authored Mar 14, 2022
```
change max number of groups in a kernel to 1B for greater performance
```
  d353641d
04 Mar, 2022 1 commit

Mode as enum for pooling and roi_align (#1091) · a2e90b5d

bpickrel authored Mar 04, 2022

Changed the pooling values for two structures from strings to specialized enum classes. Many test and operator parsing changes to support this. Introduces one new source file, op_enums.cpp.

a2e90b5d

03 Mar, 2022 3 commits
- Boost the max number of workgroups for pointwise ops (#1113) · d9d17a11
  Paul Fultz II authored Mar 03, 2022
```
Boost the max number of workgroups for pointwise ops by matching what we are doing in launch.hpp
```
  d9d17a11
- Use fp32 compute_type when calling rocBLAS API (#1085) · 36b01ba5
  kahmed10 authored Mar 03, 2022
```
better performance doing it this way
```
  36b01ba5
- Add ScatterND operator (#1074) · 832f28c6
  turneram authored Mar 02, 2022
```
Add onnx parser and ref and gpu implementations of ONNX op ScatterND
```
  832f28c6
02 Mar, 2022 2 commits
- isnan operator (#1100) · bfedcd45
  Charlie Lin authored Mar 02, 2022
```
Implements the IsNaN operator, ref, gpu, and onnx parser.
```
  bfedcd45
- Clang format ver10 (#1106) · 9852aaef
  bpickrel authored Mar 02, 2022
```
Update the base version of clang-format from 5.0 to 10.0
```
  9852aaef
25 Feb, 2022 1 commit
- Add get_queue to context to get the current stream (#1097) · e5242676
  Paul Fultz II authored Feb 24, 2022
```
wrapped in a any_ptr class so the type can be checked at runtime for a mismatch.
```
  e5242676
24 Feb, 2022 1 commit

Some cmake fixes and updates (#1088) · cd0a4aa5

Paul Fultz II authored Feb 23, 2022

Make doc/CMakeLists.txt standalone
Switch to use rocm-cmake modules for document generation
Add CONFIGURE_DEPENDS to file(GLOB) so it will update without an explicit cmake run
Add STRINGS property for build type to make it easier to switch build types with ccmake
Various fixes and improvements

cd0a4aa5

09 Feb, 2022 1 commit
- Enable pointwise fusion by default (#1082) · c7419a9c
  Paul Fultz II authored Feb 09, 2022
```
There is now a MIGRAPHX_DISABLE_POINTWISE_FUSION to disable it
```
  c7419a9c
08 Feb, 2022 2 commits

Add missing output_alias to miopen_fusion op (#1076) · b304d97d

Paul Fultz II authored Feb 08, 2022

This causes incorrect memory coloring, which was causing the accuracy failures in the vision model when enabling the pointwise fusions. Resnet50, inceptionv3 and inceptionv4 do verify now in the driver.

b304d97d

Enforce types to avoid compilation error in pointwise fusions (#1077) · 73b8a773
Paul Fultz II authored Feb 08, 2022
```
Enforce types to avoid compilation error in pointwise fusions
This fixes compile failure: gpt-2, fp16 on Navi
```
73b8a773

28 Jan, 2022 1 commit

Add auto-vectorization of pointwise operators (#1047) · 78a3c9b7

Paul Fultz II authored Jan 28, 2022

* Enable auto vectorization
* Handle vector types with convert function
* Dont vectorize when it will cause problems with preload

78a3c9b7

27 Jan, 2022 1 commit
- Remove Standard Shape requirement for ArgOps (#1042) · 332cb710
  Umang Yadav authored Jan 27, 2022
```
allow nonstd shape for the arg ops, non-standard shapes include broadcast, slice and transpose
```
  332cb710
21 Jan, 2022 1 commit
- Improve handling of generator expressions when getting the flags for hip (#1055) · 3f392a3b
  Paul Fultz II authored Jan 20, 2022
```
* Improve handling of generator expressions when getting the flags for hip
```
  3f392a3b
10 Jan, 2022 1 commit
- Handle miopen fusions when using pointwise fusions (#1019) · 534a05c1
  Paul Fultz II authored Jan 10, 2022
```
* Add matcher for conv_bias pointwise
* Add fusion op
```
  534a05c1
09 Dec, 2021 1 commit

Softmax perf optimization (#1014) · 2e337c7f

Shucai Xiao authored Dec 09, 2021

Changed the number of threads in a block from 256 to 128
Increased the max number of blocks in the kernel from 256 to 1M.
For the case that the axis is the last dimension, we removed the computation of index since it is not required.

With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.

2e337c7f

08 Dec, 2021 1 commit
- Fuse convert ops (#1020) · 00bfed4d
  Paul Fultz II authored Dec 08, 2021
  
  00bfed4d
07 Dec, 2021 1 commit
- Rename reduce_inputs to virtual_inputs (#1021) · 1793cc54
  Paul Fultz II authored Dec 07, 2021
```
simple variable rename
```
  1793cc54
02 Dec, 2021 1 commit
- Fix pointwise compile error with half sqrt (#1010) · 7b3e58a0
  Paul Fultz II authored Dec 02, 2021
```
Fix pointwise compile error with half sqrt 
```
  7b3e58a0
30 Nov, 2021 2 commits
- Fix fusable_conv whitespace bug (#1008) · 9270ebaf
  turneram authored Nov 30, 2021
```
Fix whitespace bug in fusable_conv matcher and add unit test
```
  9270ebaf
- Fix vectorization of broadcasted inputs in pointwise fusions (#1011) · 5dfafd00
  Paul Fultz II authored Nov 30, 2021
  
  5dfafd00
24 Nov, 2021 1 commit
- Check jit kernels files with clang-tidy (#1012) · a33d6fa2
  Paul Fultz II authored Nov 24, 2021
```
* Check jit kernels files with clang-tidy
```
  a33d6fa2
18 Nov, 2021 1 commit
- Parallel compilation (#1007) · b0bc71cd
  Paul Fultz II authored Nov 18, 2021
```
Do compilation in parallel
```
  b0bc71cd
11 Nov, 2021 1 commit

Conditionally enable pointwise fusion (#992) · 157935ff

Paul Fultz II authored Nov 10, 2021

This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored.

This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.

157935ff

09 Nov, 2021 1 commit
- Failing fusion plan workaround (#995) · fb39e5e4
  turneram authored Nov 09, 2021
```
* Add workaround for devices that do not support miopen conv fusions
```
  fb39e5e4
28 Oct, 2021 2 commits

NonMaxSuppression op ref implementation (#968) · c98b22d8

Shucai Xiao authored Oct 28, 2021

This PR is the ref implementation of the nonmaxsuppression operator. It always returns the max possible output shape, which is the problem tracked in issue #948.

c98b22d8

Roialign gpu impl (#972) · 912c8d22

Shucai Xiao authored Oct 28, 2021

GPU implementation of the roialign operator, using the jit approach to reduce the lib size.

912c8d22

20 Oct, 2021 1 commit

Roialign (#952) · d7653732

Shucai Xiao authored Oct 20, 2021

Implementation of the roialign operator. For now, we have only the ref implementation. When we run a model on the GPU, we fall back the execution to use the ref implementation.

d7653732

08 Oct, 2021 2 commits

Nonzero op extension (#870) · 0879b5f1

Shucai Xiao authored Oct 08, 2021

This PR is for the nonzero operator with static output shape.
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
Co-authored-by: mvermeulen <5479696+mvermeulen@users.noreply.github.com>

0879b5f1

Remove alpha and beta from `dot` and `quant_dot` (#961) · 21193e87

Umang Yadav authored Oct 08, 2021

Previously dot operator was defined as C = alpha * A . B + beta * C where * is scalar multiplication and . is dot product or matrix multiplication depending on dimension of the inputs.

Aim is to have the definition of dot operator as C = A . B without having alpha or beta.

In order to achieve the same effect as alpha and beta (1) it multiplies the one of the inputs to the dot operator with alpha value. (2) if beta is present then, multiplies the C with beta and then adds into the output from step 1.

21193e87

01 Oct, 2021 1 commit

Add multinomial op (#954) · 0b7672d7

turneram authored Oct 01, 2021

Add multinomial op to onnx parser with ref and GPU implementations.

The onnx parser inserts a literal of shape {batch_size, sample_size} with random values in the range [0, 1) and inserts existing ops to compute the cumulative density function. The multinomial operator multiplies the random values by the sum of the CDF and returns the index of the first element of the CDF that is greater than the result, representing samples randomly drawn from [0, class_size) that follow the log-probability distribution.

Resolves #821
Co-authored-by: Shucai Xiao <shucai@gmail.com>

0b7672d7

27 Sep, 2021 1 commit

Dpp opts for wavefront 32 (#951) · 6e2df9de

kahmed10 authored Sep 27, 2021

Checks wavefront size, then changes implementation and number of threads for DPP reduce

6e2df9de

17 Sep, 2021 1 commit
- Revert "Remove alpha and beta attributes from dot operator (#945)" (#957) · 985f58b0
  Paul Fultz II authored Sep 17, 2021
```
This reverts commit 9e43cb8b.
```
  985f58b0