Commits · 287f7e9f95f481f26eb643cf80ae06bbeff2036f · gaoqiong / MIGraphX

28 Feb, 2022 2 commits
- clang format · 287f7e9f
  Shucai Xiao authored Feb 27, 2022
  
  287f7e9f
- backup temp changes · b7d1ff95
  Shucai Xiao authored Feb 27, 2022
  
  b7d1ff95
26 Feb, 2022 2 commits
- change mul_add gpu implementation to use half2 for fp16 data type · 5f4e8561
  Shucai Xiao authored Feb 26, 2022
  
  5f4e8561
- reimplement mul_add kernel in a simple way · 9e610129
  Shucai Xiao authored Feb 26, 2022
  
  9e610129
25 Feb, 2022 3 commits
- formatting · 6af36ea4
  Khalique Ahmed authored Feb 25, 2022
  
  6af36ea4
- use void pointer to select alpha beta · ad3c4c1d
  Khalique Ahmed authored Feb 25, 2022
  
  ad3c4c1d
- Add get_queue to context to get the current stream (#1097) · e5242676
  Paul Fultz II authored Feb 24, 2022
```
wrapped in a any_ptr class so the type can be checked at runtime for a mismatch.
```
  e5242676
24 Feb, 2022 1 commit

Some cmake fixes and updates (#1088) · cd0a4aa5

Paul Fultz II authored Feb 23, 2022

Make doc/CMakeLists.txt standalone
Switch to use rocm-cmake modules for document generation
Add CONFIGURE_DEPENDS to file(GLOB) so it will update without an explicit cmake run
Add STRINGS property for build type to make it easier to switch build types with ccmake
Various fixes and improvements

cd0a4aa5

09 Feb, 2022 5 commits
- Enable pointwise fusion by default (#1082) · c7419a9c
  Paul Fultz II authored Feb 09, 2022
```
There is now a MIGRAPHX_DISABLE_POINTWISE_FUSION to disable it
```
  c7419a9c
- formatting · d45bd3ba
  Khalique Ahmed authored Feb 08, 2022
  
  d45bd3ba
- change type for alpha beta · 783e9474
  Khalique Ahmed authored Feb 08, 2022
  
  783e9474
- formatting · 297bfdd0
  Khalique Ahmed authored Feb 08, 2022
  
  297bfdd0
- change type for alpha beta · adcf522c
  Khalique Ahmed authored Feb 08, 2022
  
  adcf522c
08 Feb, 2022 5 commits
- Add missing output_alias to miopen_fusion op (#1076) · b304d97d
  Paul Fultz II authored Feb 08, 2022
```
This causes incorrect memory coloring, which was causing the accuracy failures in the vision model when enabling the pointwise fusions. Resnet50, inceptionv3 and inceptionv4 do verify now in the driver.
```
  b304d97d
- Enforce types to avoid compilation error in pointwise fusions (#1077) · 73b8a773
  Paul Fultz II authored Feb 08, 2022
```
Enforce types to avoid compilation error in pointwise fusions
This fixes compile failure: gpt-2, fp16 on Navi
```
  73b8a773
- revert nary · 9f755219
  Khalique Ahmed authored Feb 07, 2022
  
  9f755219
- formatting · 96c82f21
  Khalique Ahmed authored Feb 07, 2022
  
  96c82f21
- use other device name function · cb965031
  Khalique Ahmed authored Feb 07, 2022
  
  cb965031
04 Feb, 2022 2 commits
- formatting · 2ec8ba6a
  Khalique Ahmed authored Feb 04, 2022
  
  2ec8ba6a
- update device checking · 59c09196
  Khalique Ahmed authored Feb 04, 2022
  
  59c09196
31 Jan, 2022 1 commit
- formatting · 8d21ccdf
  Khalique Ahmed authored Jan 31, 2022
  
  8d21ccdf
28 Jan, 2022 1 commit

Add auto-vectorization of pointwise operators (#1047) · 78a3c9b7

Paul Fultz II authored Jan 28, 2022

* Enable auto vectorization
* Handle vector types with convert function
* Dont vectorize when it will cause problems with preload

78a3c9b7

27 Jan, 2022 1 commit
- Remove Standard Shape requirement for ArgOps (#1042) · 332cb710
  Umang Yadav authored Jan 27, 2022
```
allow nonstd shape for the arg ops, non-standard shapes include broadcast, slice and transpose
```
  332cb710
21 Jan, 2022 1 commit
- Improve handling of generator expressions when getting the flags for hip (#1055) · 3f392a3b
  Paul Fultz II authored Jan 20, 2022
```
* Improve handling of generator expressions when getting the flags for hip
```
  3f392a3b
10 Jan, 2022 1 commit
- Handle miopen fusions when using pointwise fusions (#1019) · 534a05c1
  Paul Fultz II authored Jan 10, 2022
```
* Add matcher for conv_bias pointwise
* Add fusion op
```
  534a05c1
09 Dec, 2021 1 commit

Softmax perf optimization (#1014) · 2e337c7f

Shucai Xiao authored Dec 09, 2021

Changed the number of threads in a block from 256 to 128
Increased the max number of blocks in the kernel from 256 to 1M.
For the case that the axis is the last dimension, we removed the computation of index since it is not required.

With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.

2e337c7f

08 Dec, 2021 1 commit
- Fuse convert ops (#1020) · 00bfed4d
  Paul Fultz II authored Dec 08, 2021
  
  00bfed4d
07 Dec, 2021 1 commit
- Rename reduce_inputs to virtual_inputs (#1021) · 1793cc54
  Paul Fultz II authored Dec 07, 2021
```
simple variable rename
```
  1793cc54
02 Dec, 2021 1 commit
- Fix pointwise compile error with half sqrt (#1010) · 7b3e58a0
  Paul Fultz II authored Dec 02, 2021
```
Fix pointwise compile error with half sqrt 
```
  7b3e58a0
30 Nov, 2021 2 commits
- Fix fusable_conv whitespace bug (#1008) · 9270ebaf
  turneram authored Nov 30, 2021
```
Fix whitespace bug in fusable_conv matcher and add unit test
```
  9270ebaf
- Fix vectorization of broadcasted inputs in pointwise fusions (#1011) · 5dfafd00
  Paul Fultz II authored Nov 30, 2021
  
  5dfafd00
24 Nov, 2021 1 commit
- Check jit kernels files with clang-tidy (#1012) · a33d6fa2
  Paul Fultz II authored Nov 24, 2021
```
* Check jit kernels files with clang-tidy
```
  a33d6fa2
18 Nov, 2021 1 commit
- Parallel compilation (#1007) · b0bc71cd
  Paul Fultz II authored Nov 18, 2021
```
Do compilation in parallel
```
  b0bc71cd
11 Nov, 2021 1 commit

Conditionally enable pointwise fusion (#992) · 157935ff

Paul Fultz II authored Nov 10, 2021

This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored.

This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.

157935ff

09 Nov, 2021 1 commit
- Failing fusion plan workaround (#995) · fb39e5e4
  turneram authored Nov 09, 2021
```
* Add workaround for devices that do not support miopen conv fusions
```
  fb39e5e4
05 Nov, 2021 1 commit
- Update Docker to ROCm 4.5 and support Navi on Jenkins (#994) · 04e17804
  kahmed10 authored Nov 05, 2021
```
Moving our Docker file from ROCm 4.3 to 4.5 
Add Navi base GPUs in to the CI infrastructure 
```
  04e17804
28 Oct, 2021 2 commits

NonMaxSuppression op ref implementation (#968) · c98b22d8

Shucai Xiao authored Oct 28, 2021

This PR is the ref implementation of the nonmaxsuppression operator. It always returns the max possible output shape, which is the problem tracked in issue #948.

c98b22d8

Roialign gpu impl (#972) · 912c8d22

Shucai Xiao authored Oct 28, 2021

GPU implementation of the roialign operator, using the jit approach to reduce the lib size.

912c8d22

20 Oct, 2021 1 commit

Roialign (#952) · d7653732

Shucai Xiao authored Oct 20, 2021

Implementation of the roialign operator. For now, we have only the ref implementation. When we run a model on the GPU, we fall back the execution to use the ref implementation.

d7653732

19 Oct, 2021 1 commit
- Link with pthreads in core migraphx library since we use threads there (#975) · 4d82d761
  Paul Fultz II authored Oct 19, 2021
```
pthread linking errors on SLES. 
```
  4d82d761