Commits · e4ecf265b2c81ca06342aea597b12cab72c660b6 · gaoqiong / MIGraphX

19 Sep, 2022 1 commit

Improve layernorm and reductions performance (#1348) · 97a1ed2d

Paul Fultz II authored Sep 19, 2022

Compute mean and variance in same reduction
Set block size to numbers divisible by 32 instead powers of 2
Global is also set exactly instead of being divisible by block size
More exact matching of global/local can help get rid of branching/loops
Reduce vectors first before doing dpp_reduce
Explicitly vectorize array operators since the compiler doesnt always vectorize them
Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported

97a1ed2d

18 Sep, 2022 2 commits
- Format · 647d5dc5
  Paul authored Sep 18, 2022
  
  647d5dc5
- Add env variable to enable fast softmax · dd93e13c
  Paul authored Sep 18, 2022
  
  dd93e13c
17 Sep, 2022 4 commits
- Remove print · f3eb708b
  Paul authored Sep 17, 2022
  
  f3eb708b
- Format · e60a7d5e
  Paul authored Sep 17, 2022
  
  e60a7d5e
- Fix tidy warnings · 966c4c5b
  Paul authored Sep 17, 2022
  
  966c4c5b
- Remove enum_params · 8fb6eedb
  Paul authored Sep 17, 2022
  
  8fb6eedb
16 Sep, 2022 2 commits
- Fix typo for add_sigmoid (#1385) · 10f37f49
  Umang Yadav authored Sep 16, 2022
```
* fix typo for add_sigmoid
```
  10f37f49
- Update deprecated Pybind constructor (#1382) · 255fb11a
  Umang Yadav authored Sep 16, 2022
```
* remove deprecated constructor
```
  255fb11a
15 Sep, 2022 6 commits
- [mlir] Replaced `find_library` with `find_package` to locate MLIR static library (#1373) · e1e36cdc
  Lixun Zhang authored Sep 15, 2022
```
* Replaced `find_library` with `find_package` to locate MLIR static library
* Unified the include dir for headers and remove backward compatibility
* Embedded the external/include dir into the exported library
```
  e1e36cdc
- Use improved vectorizer · 96ff131b
  Paul authored Sep 15, 2022
  
  96ff131b
- Format · 2c4b9f64
  Paul authored Sep 15, 2022
  
  2c4b9f64
- Adjust base on broadcast · 8320b11e
  Paul authored Sep 15, 2022
  
  8320b11e
- Format · c5d87f8f
  Paul authored Sep 15, 2022
  
  c5d87f8f
- Use larger vec sizes when possible · 9466f4c0
  Paul authored Sep 15, 2022
  
  9466f4c0
14 Sep, 2022 3 commits
- Fix split_reshape for slice len of 1 (#1379) · 4b76dd0d
  Umang Yadav authored Sep 14, 2022
```
* fix slice_dim1 for case
```
  4b76dd0d
- Implement concat using jit compilation (#1356) · 7662d9c0
  Paul Fultz II authored Sep 14, 2022
```
* Implement concat using jit compilation
```
  7662d9c0
- expose underlying migraphx::argument data pointer in pybind (#1376) · 827baeec
  shivadbhavsar authored Sep 13, 2022
```
expose underlying p data inter for migraphx.argument
Update python api documentation
```
  827baeec
13 Sep, 2022 3 commits

Use rocblas_gemm_ex for batched gemms with broadcasted B (#1354) · a10a8ef1

turneram authored Sep 13, 2022

Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension.
The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes.

Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.

a10a8ef1

Format · 63db86bd
Paul authored Sep 13, 2022

63db86bd
Skip group convolution in mlir · 056318a0
Paul authored Sep 13, 2022

056318a0

12 Sep, 2022 5 commits
- Format · 84cb0598
  Paul authored Sep 12, 2022
  
  84cb0598
- Fix tidy · fd877fab
  Paul authored Sep 12, 2022
  
  fd877fab
- Simplify calculation · 30cde6df
  Paul authored Sep 11, 2022
  
  30cde6df
- Format · fc9b2a7d
  Paul authored Sep 11, 2022
  
  fc9b2a7d
- Dont use the output · d90d2137
  Paul authored Sep 11, 2022
  
  d90d2137
10 Sep, 2022 2 commits
- Format · a01835fb
  Paul authored Sep 09, 2022
  
  a01835fb
- Add more asserts · 6d4311cd
  Paul authored Sep 09, 2022
  
  6d4311cd
09 Sep, 2022 6 commits
- Add some comments · 7ffd56a8
  Paul authored Sep 09, 2022
  
  7ffd56a8
- Format · 5794fe3c
  Paul authored Sep 09, 2022
  
  5794fe3c
- Add more asserts · 53870e3b
  Paul authored Sep 09, 2022
  
  53870e3b
- Fix zero global · b7902fa7
  Paul authored Sep 09, 2022
  
  b7902fa7
- Format · 28649b6a
  Paul authored Sep 08, 2022
  
  28649b6a
- Handle non-const local · 351fde4d
  Paul authored Sep 08, 2022
  
  351fde4d
08 Sep, 2022 2 commits
- Remove unused headers (#1363) · ed2c73ac
  Paul Fultz II authored Sep 08, 2022
```
* Remove unused headers
```
  ed2c73ac
- Fix TF literal parsing for relu6 (#1370) · f2667056
  Charlie Lin authored Sep 08, 2022
```
Fixes TF literal parsing for relu6.  previously always made a float type literal, breaks for float16 as an example
```
  f2667056
07 Sep, 2022 4 commits
- Fix bugs in passing value · a839ade9
  Paul authored Sep 07, 2022
  
  a839ade9
- Fix size · 50141f2a
  Paul authored Sep 07, 2022
  
  50141f2a
- Format · 2f25e1d9
  Paul authored Sep 06, 2022
  
  2f25e1d9
- Fix invalid program from find_splits · 68238cc9
  Paul authored Sep 06, 2022
  
  68238cc9