Commits · 1cc6c88cdce4a9284bccb8aed973f61b20b3e436 · gaoqiong / MIGraphX

"tests/pytorch/test_decode.py" did not exist on "4903d3cc9deeece7b28024971d1279f4f085d83e"

26 Jan, 2022 1 commit
- Updates · 1cc6c88c
  Paul authored Jan 25, 2022
  
  1cc6c88c
10 Jan, 2022 2 commits
- Format · 467a7cb8
  Paul authored Jan 09, 2022
  
  467a7cb8
- Fix output arg · 88f549e2
  Paul authored Jan 09, 2022
  
  88f549e2
07 Jan, 2022 2 commits
- Formatting · b7aa8f2a
  Paul authored Jan 06, 2022
  
  b7aa8f2a
- Fix device name · a652e90c
  Paul authored Jan 06, 2022
  
  a652e90c
06 Jan, 2022 3 commits
- Format · 13418e23
  Paul authored Jan 05, 2022
  
  13418e23
- Set kernal name · 4ba8706f
  Paul authored Jan 05, 2022
  
  4ba8706f
- Disable eliminate_data_type · eda8df70
  Paul authored Jan 05, 2022
  
  eda8df70
11 Dec, 2021 7 commits
- Enable pointwise_fusion · 8a251fec
  Paul authored Dec 10, 2021
  
  8a251fec
- Formatting · d0feb6b4
  Paul authored Dec 10, 2021
  
  d0feb6b4
- Add mlir verification · c83ee9f8
  Paul authored Dec 10, 2021
  
  c83ee9f8
- Format · e2967e04
  Paul authored Dec 10, 2021
  
  e2967e04
- Add code to insert memrefs · df3749cd
  Paul authored Dec 10, 2021
  
  df3749cd
- Format · 60ab44c7
  Paul authored Dec 10, 2021
  
  60ab44c7
- Dont provide output for return instruction · 2c952efd
  Paul authored Dec 10, 2021
  
  2c952efd
09 Dec, 2021 2 commits

Softmax perf optimization (#1014) · 2e337c7f

Shucai Xiao authored Dec 09, 2021

Changed the number of threads in a block from 256 to 128
Increased the max number of blocks in the kernel from 256 to 1M.
For the case that the axis is the last dimension, we removed the computation of index since it is not required.

With these change, we can get about 2x speedup compared to the develop branch for the softmax op used in the BertSquad model.

2e337c7f

Fuse last instruction in fuse_pointwise (#1015) · e758d457
Paul Fultz II authored Dec 09, 2021
```
Fuse last instruction in fuse_pointwise
This is also fixes a bug with using an invalid iterator.
```
e758d457

08 Dec, 2021 1 commit
- Fuse convert ops (#1020) · 00bfed4d
  Paul Fultz II authored Dec 08, 2021
  
  00bfed4d
07 Dec, 2021 1 commit
- Rename reduce_inputs to virtual_inputs (#1021) · 1793cc54
  Paul Fultz II authored Dec 07, 2021
```
simple variable rename
```
  1793cc54
02 Dec, 2021 1 commit
- Fix pointwise compile error with half sqrt (#1010) · 7b3e58a0
  Paul Fultz II authored Dec 02, 2021
```
Fix pointwise compile error with half sqrt 
```
  7b3e58a0
01 Dec, 2021 4 commits
- Handle unsinged integers · b406a418
  Paul authored Dec 01, 2021
  
  b406a418
- Register dialect · 1851e975
  Paul authored Dec 01, 2021
  
  1851e975
- Format · e6f8a2cf
  Paul authored Dec 01, 2021
  
  e6f8a2cf
- Add mlir_compile · 812cd5c8
  Paul authored Dec 01, 2021
  
  812cd5c8
30 Nov, 2021 2 commits
- Fix fusable_conv whitespace bug (#1008) · 9270ebaf
  turneram authored Nov 30, 2021
```
Fix whitespace bug in fusable_conv matcher and add unit test
```
  9270ebaf
- Fix vectorization of broadcasted inputs in pointwise fusions (#1011) · 5dfafd00
  Paul Fultz II authored Nov 30, 2021
  
  5dfafd00
25 Nov, 2021 1 commit

Non std shape auto contiguous (#1001) · 2d4dcc47

Shucai Xiao authored Nov 25, 2021

Resolves a problem in parsing the ssd-10 model.

The problem is, after inserting contiguous in the auto_contiguous pass, standard output shape of some operators becomes non-standard. Then, if the next operator requires standard input shape, an exception is throw.

For example, if we pass the following model:
Input (standard shape) -> transpose (transposed) -> softmax (transposed) -> transpose (standard) -> gather.
It works fine, and no contiguous is required.

In the auto_contiguous pass, a contiguous is inserted after the first transpose. Then we need to replace the first transpose with the contiguous and recompute all shapes. When it comes to the gather operator, its input is a transposed shape, and an exception is thrown.

The solution is in the recompute_shape() function. If it is called by the auto_contiguous pass and shape of an instruction is changed, and the shape is non_standard, we do not recompute shape of its output. The reason is: since its output shape is non_standard, a contiguous op will be added after the instruction, which will recompute shape for later operators.

2d4dcc47

24 Nov, 2021 3 commits
- Format · ee382ad9
  Paul authored Nov 24, 2021
  
  ee382ad9
- Add return · 2a0ff223
  Paul authored Nov 24, 2021
  
  2a0ff223
- Check jit kernels files with clang-tidy (#1012) · a33d6fa2
  Paul Fultz II authored Nov 24, 2021
```
* Check jit kernels files with clang-tidy
```
  a33d6fa2
22 Nov, 2021 1 commit

Add fp16 verify to driver (#988) · 3c1e91dc

kahmed10 authored Nov 22, 2021

Allows --fp16 to be used in the driver to compare the target fp16 result and the ref fp32 result.

3c1e91dc

18 Nov, 2021 1 commit
- Parallel compilation (#1007) · b0bc71cd
  Paul Fultz II authored Nov 18, 2021
```
Do compilation in parallel
```
  b0bc71cd
17 Nov, 2021 1 commit

Handle removing contiguous on operators that use modules (#1005) · 785307c3

Paul Fultz II authored Nov 17, 2021

Currently, eliminate_contiguous will never remove contiguous for operators that use module inputs due to the fact that it doesn't pass the module inputs to compute_shape.

- Update to pass the module inputs correctly to compute_shape
- Fix the overloads of compute_shape so that when passed an empty vector of module inputs it will call the overload without module inputs
- Add tests with contiguous and pointwise module function.
- Move add_pointwise function to a seperate header to reuse across different tests

785307c3

16 Nov, 2021 4 commits
- Update message · 1ac17a13
  Paul authored Nov 16, 2021
  
  1ac17a13
- Remove old cmake flag · c59d175c
  Paul authored Nov 16, 2021
  
  c59d175c
- Format · 15177ac0
  Paul authored Nov 16, 2021
  
  15177ac0
- Fix bug when appending module · f7f61d7a
  Paul authored Nov 16, 2021
  
  f7f61d7a
15 Nov, 2021 1 commit

Update driver's perf report to account for batch size (#1000) · 19f65e7e

kahmed10 authored Nov 15, 2021

Currently we have the option of passing in --batch to the driver to change the batch size when the model has a dynamic dim value. We can use this flag to adjust the perf report's rate.

19f65e7e

11 Nov, 2021 1 commit

Conditionally enable pointwise fusion (#992) · 157935ff

Paul Fultz II authored Nov 10, 2021

This enables the pointwise fusions using the MIGRAPHX_ENABLE_POINTWISE_FUSION env variable. Its disabled by default since MIOpen fusions need to be refactored.

This also adds a compile_ops pass to compile the pointwise modules. All tests except test_gpu_fast_math passes with MIGRAPHX_ENABLE_POINTWISE_FUSION=1 set.

157935ff

09 Nov, 2021 1 commit
- Move mlir to the gpu and update the test · 0ad547aa
  Paul authored Nov 09, 2021
  
  0ad547aa