Commits · 063ba0c45de1d20bc3c7595a8a79dea4bd2e3402 · gaoqiong / MIGraphX

19 Apr, 2022 2 commits
- Hacked fixes for pointwise · 063ba0c4
  Paul authored Apr 19, 2022
  
  063ba0c4
- Fix headers · f449cd1d
  Paul authored Apr 19, 2022
  
  f449cd1d
12 Apr, 2022 1 commit
- Fix out-of-bounds access when generate uses nonpacked tensors (#1160) · 262ba721
  Paul Fultz II authored Apr 12, 2022
```
out-of-bounds access when generate uses nonpacked tensors and add some additional asserts for gpu memory.
```
  262ba721
11 Apr, 2022 4 commits

scatter operator refactoring to include reduction (#1124) · 701c2014

bpickrel authored Apr 11, 2022

Change the "scatter" struct and op to a base/child set of three: scatter_none, scatter_add, scatter_mul to mirror Onnx' ScatterElements op. and its three reduction options. (Onnx Scatter op is deprecated and is equivalent to scatter_none.)

Provides both a reference op. and update to Onnx parsing. Tests updated and new test case added.

701c2014

fix a bug in create tensor_view with vec data type (#1155) · 3c301efa

Shucai Xiao authored Apr 11, 2022

When create a tensor_view with vector date type, the last dimension of the shape should be divided by the vec_size.

3c301efa

clang format · 401d0f68
Shucai Xiao authored Apr 11, 2022

401d0f68
backup changes · 992f57ba
Shucai Xiao authored Apr 11, 2022

992f57ba

04 Apr, 2022 4 commits
- clang format · 789f86fb
  Shucai Xiao authored Apr 04, 2022
  
  789f86fb
- some additional code cleanup · 8e485cc8
  Shucai Xiao authored Apr 04, 2022
  
  8e485cc8
- clang format · a6477298
  Shucai Xiao authored Apr 04, 2022
  
  a6477298
- refactor of the layernorm code · fe849702
  Shucai Xiao authored Apr 04, 2022
  
  fe849702
31 Mar, 2022 1 commit
- clang format · af110526
  Shucai Xiao authored Mar 31, 2022
  
  af110526
29 Mar, 2022 5 commits
- clang format · 48b39e06
  Shucai Xiao authored Mar 29, 2022
  
  48b39e06
- simplify the layernorm kernel arguments · de99db23
  Shucai Xiao authored Mar 29, 2022
  
  de99db23
- Refactor runtime compiled kernels to use the same compile_ops pipeline (#1125) · 661046c6
  Paul Fultz II authored Mar 29, 2022
```
This adds the infrastructure so we can compile everything in parallel, whereas before only pointwise kernels were compiled in parallel. This will also directly integrate with lowering and the gpu-driver. The kernels for pointwise and roialign are using this infrastructure. Scatternd is not since it does require standard shape.

This also makes it easier to add new runtime compiled kernels in the future.
```
  661046c6
- clang format · 780fffc8
  Shucai Xiao authored Mar 28, 2022
  
  780fffc8
- also rewrite layernorm kernel using half2 datatype · fc48a1d3
  Shucai Xiao authored Mar 28, 2022
  
  fc48a1d3
28 Mar, 2022 6 commits
- clang format · c6700632
  Shucai Xiao authored Mar 28, 2022
  
  c6700632
- half and half2 have the same results · 69c94135
  Shucai Xiao authored Mar 28, 2022
  
  69c94135
- clang format · 580673a0
  Shucai Xiao authored Mar 28, 2022
  
  580673a0
- backup code changes · 80a6ca93
  Shucai Xiao authored Mar 28, 2022
  
  80a6ca93
- layernorm kernel optimization · a5181cd0
  Shucai Xiao authored Mar 28, 2022
  
  a5181cd0
- Use ccache for runtime compilation (#1131) · ad056b1f
  Paul Fultz II authored Mar 28, 2022
```
* Use ccache for runtime compilation
```
  ad056b1f
18 Mar, 2022 1 commit

Complete GPU implementation of CumSum op (#1094) · 548783c8

turneram authored Mar 18, 2022

Add exclusive and reverse modes to gpu implementation of prefix_scan_sum, which completes support for ONNX op CumSum

548783c8

15 Mar, 2022 1 commit

Add iterators to kernels tensor_view and fix roialign to work with non-standard shape (#1126) · 31e63991

Paul Fultz II authored Mar 15, 2022

This adds iterators to tensor_view, which can allow kernels to work with non-standard shapes like for roialign.

To improve the performance of indexing when using the iterators, the shape class was updated to use integral_constants since the compiler doesn't always fold the const values. An integral_constant will at least enforce that in the AST.

Finally, since index calculations with single integers are improved, I also updated pointwise to use single index rather than multi index. There is about 4% improvement in some cases.

31e63991

14 Mar, 2022 1 commit
- Increase max groups in kernel (#1120) · d353641d
  Shucai Xiao authored Mar 14, 2022
```
change max number of groups in a kernel to 1B for greater performance
```
  d353641d
10 Mar, 2022 4 commits
- clang format · abe2a889
  Shucai Xiao authored Mar 09, 2022
  
  abe2a889
- backup latest changes for the layernorm_half2 branch · 65386ce5
  Shucai Xiao authored Mar 09, 2022
  
  65386ce5
- clang format · ec205c54
  Shucai Xiao authored Mar 09, 2022
  
  ec205c54
- backup softmax changes · 1da02b0f
  Shucai Xiao authored Mar 09, 2022
  
  1da02b0f
09 Mar, 2022 1 commit
- remove unnecessary data copy related to rocblas api call · ae59a3b1
  Shucai Xiao authored Mar 09, 2022
  
  ae59a3b1
08 Mar, 2022 5 commits
- final version of softmax that works · 9f06859b
  Shucai Xiao authored Mar 08, 2022
  
  9f06859b
- version that softmax half2 works · bc9eac75
  Shucai Xiao authored Mar 08, 2022
  
  bc9eac75
- fix bugs in softmax half2 implementation · 23a18b2b
  Shucai Xiao authored Mar 08, 2022
  
  23a18b2b
- clang format · 08818705
  Shucai Xiao authored Mar 08, 2022
  
  08818705
- comment out changes in contiguous implementation · 1d9d1e49
  Shucai Xiao authored Mar 08, 2022
  
  1d9d1e49
07 Mar, 2022 2 commits
- clang format · 37f63907
  Shucai Xiao authored Mar 07, 2022
  
  37f63907
- backup code changes related to softmax · 45da3115
  Shucai Xiao authored Mar 07, 2022
  
  45da3115
04 Mar, 2022 2 commits
- clang format · ea656c84
  Shucai Xiao authored Mar 04, 2022
  
  ea656c84
- remove unnecessary code · efac0323
  Shucai Xiao authored Mar 04, 2022
  
  efac0323