Commits · dea0555fda6324c76b8a76eadb2c89c575bd4d19 · gaoqiong / MIGraphX

23 Sep, 2022 3 commits
- Make shared_block char · dea0555f
  turneram authored Sep 23, 2022
  
  dea0555f
- Use find_path · b10116e5
  turneram authored Sep 23, 2022
  
  b10116e5
- Remove const from p_t param · b1d25f8f
  turneram authored Sep 23, 2022
  
  b1d25f8f
22 Sep, 2022 2 commits
- Formatting · 1dd11890
  turneram authored Sep 22, 2022
  
  1dd11890
- Add xdl fp16 gemm · 07167910
  turneram authored Sep 22, 2022
  
  07167910
21 Sep, 2022 2 commits

Parameterize epsilon for layernorm kernel (#1367) · d9578ba6

kahmed10 authored Sep 21, 2022

This PR allows for other values of epsilon to be matched when finding layernorm. Similarly, the calculation now uses the variable for epsilon.

d9578ba6

Multibroadcast find_mul_conv (#1384) · 9a70050b

Charlie Lin authored Sep 21, 2022

Change find_mul_conv to work with multibroadcast also. Checks the strides instead of the broadcast axis.

9a70050b

19 Sep, 2022 1 commit

Improve layernorm and reductions performance (#1348) · 97a1ed2d

Paul Fultz II authored Sep 19, 2022

Compute mean and variance in same reduction
Set block size to numbers divisible by 32 instead powers of 2
Global is also set exactly instead of being divisible by block size
More exact matching of global/local can help get rid of branching/loops
Reduce vectors first before doing dpp_reduce
Explicitly vectorize array operators since the compiler doesnt always vectorize them
Still uses old for loop when its computing at compile-time since the reinterpret_cast nor the all the vector types is supported

97a1ed2d

16 Sep, 2022 5 commits
- Fix typo for add_sigmoid (#1385) · 10f37f49
  Umang Yadav authored Sep 16, 2022
```
* fix typo for add_sigmoid
```
  10f37f49
- Formatting · d143ed6c
  turneram authored Sep 16, 2022
  
  d143ed6c
- Formatting · c42aded1
  turneram authored Sep 16, 2022
  
  c42aded1
- Remove ck from cmakelists · 961cf059
  turneram authored Sep 16, 2022
  
  961cf059
- Update deprecated Pybind constructor (#1382) · 255fb11a
  Umang Yadav authored Sep 16, 2022
```
* remove deprecated constructor
```
  255fb11a
15 Sep, 2022 1 commit

[mlir] Replaced `find_library` with `find_package` to locate MLIR static library (#1373) · e1e36cdc

Lixun Zhang authored Sep 15, 2022

* Replaced `find_library` with `find_package` to locate MLIR static library
* Unified the include dir for headers and remove backward compatibility
* Embedded the external/include dir into the exported library

e1e36cdc

14 Sep, 2022 4 commits
- Fix split_reshape for slice len of 1 (#1379) · 4b76dd0d
  Umang Yadav authored Sep 14, 2022
```
* fix slice_dim1 for case
```
  4b76dd0d
- Update using ck · a51f40f8
  Paul authored Sep 14, 2022
  
  a51f40f8
- Implement concat using jit compilation (#1356) · 7662d9c0
  Paul Fultz II authored Sep 14, 2022
```
* Implement concat using jit compilation
```
  7662d9c0
- expose underlying migraphx::argument data pointer in pybind (#1376) · 827baeec
  shivadbhavsar authored Sep 13, 2022
```
expose underlying p data inter for migraphx.argument
Update python api documentation
```
  827baeec
13 Sep, 2022 6 commits
- Formatting · 2593dd60
  turneram authored Sep 13, 2022
  
  2593dd60
- Move ck includes to own header file · d1e27426
  turneram authored Sep 13, 2022
  
  d1e27426
- Use rocblas_gemm_ex for batched gemms with broadcasted B (#1354) · a10a8ef1
  turneram authored Sep 13, 2022
```
Improves performance for 4/6 GEMMs used by huggingface BERT models with batch_size>1 by using a non-batched rocBLAS call for GEMMs where the B input has a broadcasted batch dimension.
The four verify tests added reflect the actual configurations used by bert-base-cased, with varied batch sizes.

Also adds a matcher to simplify_reshapes to move multibroadcasts after concats.
```
  a10a8ef1
- Add gemm test · 6fb1706a
  turneram authored Sep 13, 2022
  
  6fb1706a
- Formatting · 0e237605
  turneram authored Sep 13, 2022
  
  0e237605
- Add n-dimensional inputs · 985fb0dd
  turneram authored Sep 13, 2022
  
  985fb0dd
12 Sep, 2022 4 commits
- Formatting · 953da942
  turneram authored Sep 12, 2022
  
  953da942
- Create half_t test · 9a7bb6d2
  turneram authored Sep 12, 2022
  
  9a7bb6d2
- Formatting · fea58a7b
  turneram authored Sep 12, 2022
  
  fea58a7b
- Call from global function · 8c1ad9e6
  turneram authored Sep 12, 2022
  
  8c1ad9e6
09 Sep, 2022 2 commits
- Formatting · ddb0c230
  turneram authored Sep 09, 2022
  
  ddb0c230
- Call gemm from kernel · 127393f4
  turneram authored Sep 09, 2022
  
  127393f4
08 Sep, 2022 4 commits
- Remove unused headers (#1363) · ed2c73ac
  Paul Fultz II authored Sep 08, 2022
```
* Remove unused headers
```
  ed2c73ac
- Formatting · 9d12476e
  turneram authored Sep 08, 2022
  
  9d12476e
- Merge elementwise · cc2535e0
  turneram authored Sep 08, 2022
  
  cc2535e0
- Fix TF literal parsing for relu6 (#1370) · f2667056
  Charlie Lin authored Sep 08, 2022
```
Fixes TF literal parsing for relu6.  previously always made a float type literal, breaks for float16 as an example
```
  f2667056
07 Sep, 2022 6 commits
- Formatting · bf523dbe
  turneram authored Sep 07, 2022
  
  bf523dbe
- Rough draft working · 44a12304
  turneram authored Sep 07, 2022
  
  44a12304
- Formatting · cfbe4da6
  turneram authored Sep 07, 2022
  
  cfbe4da6
- Formatting · 1196b676
  turneram authored Sep 07, 2022
  
  1196b676
- Almost working · e4737e2f
  turneram authored Sep 07, 2022
  
  e4737e2f
- Fix accuracy bug when vectorizing slices (#1364) · 60aa0e48
  Paul Fultz II authored Sep 06, 2022
```
* Fix accuracy bug when vectorizing slices
```
  60aa0e48