Commits · 4f07b8f1d933bc78d7664fb30cd6bbc750dd772b · gaoqiong / MIGraphX

31 Mar, 2022 1 commit
- clang format · af110526
  Shucai Xiao authored Mar 31, 2022
  
  af110526
29 Mar, 2022 5 commits
- clang format · 48b39e06
  Shucai Xiao authored Mar 29, 2022
  
  48b39e06
- simplify the layernorm kernel arguments · de99db23
  Shucai Xiao authored Mar 29, 2022
  
  de99db23
- Refactor runtime compiled kernels to use the same compile_ops pipeline (#1125) · 661046c6
  Paul Fultz II authored Mar 29, 2022
```
This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape.

This also makes it easier to add new runtime compiled kernels in the future.
```
  661046c6
- clang format · 780fffc8
  Shucai Xiao authored Mar 28, 2022
  
  780fffc8
- also rewrite layernorm kernel using half2 datatype · fc48a1d3
  Shucai Xiao authored Mar 28, 2022
  
  fc48a1d3
28 Mar, 2022 6 commits
- clang format · c6700632
  Shucai Xiao authored Mar 28, 2022
  
  c6700632
- half and half2 have the same results · 69c94135
  Shucai Xiao authored Mar 28, 2022
  
  69c94135
- clang format · 580673a0
  Shucai Xiao authored Mar 28, 2022
  
  580673a0
- backup code changes · 80a6ca93
  Shucai Xiao authored Mar 28, 2022
  
  80a6ca93
- layernorm kernel optimization · a5181cd0
  Shucai Xiao authored Mar 28, 2022
  
  a5181cd0
- Use ccache for runtime compilation (#1131) · ad056b1f
  Paul Fultz II authored Mar 28, 2022
```
* Use ccache for runtime compilation
```
  ad056b1f
18 Mar, 2022 1 commit

Complete GPU implementation of CumSum op (#1094) · 548783c8

turneram authored Mar 18, 2022

Add exclusive and reverse modes to gpu implementation of prefix_scan_sum, which completes support for ONNX op CumSum

548783c8

15 Mar, 2022 1 commit

Add iterators to kernels tensor_view and fix roialign to work with non-standard shape (#1126) · 31e63991

Paul Fultz II authored Mar 15, 2022

This adds iterators to tensor_view, which can allow kernels to work with non-standard shapes like for roialign.

To improve the performance of indexing when using the iterators, the shape class was updated to use integral_constants since the compiler doesn't always fold the const values. An integral_constant will at least enforce that in the AST.

Finally, since index calculations with single integers are improved, I also updated pointwise to use single index rather than multi index. There is about 4% improvement in some cases.

31e63991

14 Mar, 2022 1 commit
- Increase max groups in kernel (#1120) · d353641d
  Shucai Xiao authored Mar 14, 2022
```
change max number of groups in a kernel to 1B for greater performance
```
  d353641d
10 Mar, 2022 4 commits
- clang format · abe2a889
  Shucai Xiao authored Mar 09, 2022
  
  abe2a889
- backup latest changes for the layernorm_half2 branch · 65386ce5
  Shucai Xiao authored Mar 09, 2022
  
  65386ce5
- clang format · ec205c54
  Shucai Xiao authored Mar 09, 2022
  
  ec205c54
- backup softmax changes · 1da02b0f
  Shucai Xiao authored Mar 09, 2022
  
  1da02b0f
09 Mar, 2022 1 commit
- remove unnecessary data copy related to rocblas api call · ae59a3b1
  Shucai Xiao authored Mar 09, 2022
  
  ae59a3b1
08 Mar, 2022 5 commits
- final version of softmax that works · 9f06859b
  Shucai Xiao authored Mar 08, 2022
  
  9f06859b
- version that softmax half2 works · bc9eac75
  Shucai Xiao authored Mar 08, 2022
  
  bc9eac75
- fix bugs in softmax half2 implementation · 23a18b2b
  Shucai Xiao authored Mar 08, 2022
  
  23a18b2b
- clang format · 08818705
  Shucai Xiao authored Mar 08, 2022
  
  08818705
- comment out changes in contiguous implementation · 1d9d1e49
  Shucai Xiao authored Mar 08, 2022
  
  1d9d1e49
07 Mar, 2022 2 commits
- clang format · 37f63907
  Shucai Xiao authored Mar 07, 2022
  
  37f63907
- backup code changes related to softmax · 45da3115
  Shucai Xiao authored Mar 07, 2022
  
  45da3115
04 Mar, 2022 6 commits
- clang format · ea656c84
  Shucai Xiao authored Mar 04, 2022
  
  ea656c84
- remove unnecessary code · efac0323
  Shucai Xiao authored Mar 04, 2022
  
  efac0323
- Mode as enum for pooling and roi_align (#1091) · a2e90b5d
  bpickrel authored Mar 04, 2022
```
Changed the pooling values for two structures from strings to specialized enum classes. Many test and operator parsing changes to support this. Introduces one new source file, op_enums.cpp.
```
  a2e90b5d
- more refinement to use fma for mul_add · 8a7e325c
  Shucai Xiao authored Mar 03, 2022
  
  8a7e325c
- clang format · fdc0ae82
  Shucai Xiao authored Mar 03, 2022
  
  fdc0ae82
- use fma for the mul_add and refine add_gelu implementation · 6c834296
  Shucai Xiao authored Mar 03, 2022
  
  6c834296
03 Mar, 2022 5 commits
- clang format · 9e5c56da
  Shucai Xiao authored Mar 03, 2022
  
  9e5c56da
- fix kernels related to add, mul, and mul_add · 1ce84cf5
  Shucai Xiao authored Mar 03, 2022
  
  1ce84cf5
- Boost the max number of workgroups for pointwise ops (#1113) · d9d17a11
  Paul Fultz II authored Mar 03, 2022
```
Boost the max number of workgroups for pointwise ops by matching what we are doing in launch.hpp
```
  d9d17a11
- Use fp32 compute_type when calling rocBLAS API (#1085) · 36b01ba5
  kahmed10 authored Mar 03, 2022
```
better performance doing it this way
```
  36b01ba5
- Add ScatterND operator (#1074) · 832f28c6
  turneram authored Mar 02, 2022
```
Add onnx parser and ref and gpu implementations of ONNX op ScatterND
```
  832f28c6
02 Mar, 2022 2 commits
- isnan operator (#1100) · bfedcd45
  Charlie Lin authored Mar 02, 2022
```
Implements the IsNaN operator, ref, gpu, and onnx parser.
```
  bfedcd45
- Clang format ver10 (#1106) · 9852aaef
  bpickrel authored Mar 02, 2022
```
Update the base version of clang-format from 5.0 to 10.0
```
  9852aaef