Commits · 78f6584aa9f85baa3f77c0e99bf3652e0685ba6c · gaoqiong / composable_kernel

04 Jun, 2021 1 commit
- debugging · 78f6584a
  Jing Zhang authored Jun 04, 2021
  
  78f6584a
03 Jun, 2021 1 commit
- debugging · 17daf766
  Jing Zhang authored Jun 03, 2021
  
  17daf766
02 Jun, 2021 1 commit
- add kpack with incorrect results · 95710403
  Jing Zhang authored Jun 02, 2021
  
  95710403
01 Jun, 2021 7 commits
- clean code · 44078dba
  Jing Zhang authored Jun 01, 2021
  
  44078dba
- clean code · cc77ab57
  Jing Zhang authored Jun 01, 2021
  
  cc77ab57
- add fp16 mfma · e610402f
  Jing Zhang authored Jun 01, 2021
  
  e610402f
- add 32x32x8fp16 · 4ea89209
  Jing Zhang authored Jun 01, 2021
  
  4ea89209
- rename kperwave to kpack · 822856e1
  Jing Zhang authored Jun 01, 2021
  
  822856e1
- clean code · 5ac70ce0
  Jing Zhang authored Jun 01, 2021
  
  5ac70ce0
- pack half4_t · e1a0fb94
  Jing Zhang authored Jun 01, 2021
  
  e1a0fb94
31 May, 2021 1 commit
- adding fp16 mfma · 3bbd5988
  Jing Zhang authored May 31, 2021
  
  3bbd5988
26 May, 2021 2 commits
- add fp32 mfma instructions · 5c27dcd5
  Jing Zhang authored May 26, 2021
  
  5c27dcd5
- abroadcast only · 21755b5d
  Jing Zhang authored May 26, 2021
  
  21755b5d
25 May, 2021 1 commit
- add kpack into xldops_gemm and blockwise_gemm · de9f5bed
  Jing Zhang authored May 25, 2021
  
  de9f5bed
21 May, 2021 3 commits
- tweak · 776721ab
  Jing Zhang authored May 21, 2021
  
  776721ab
- clean · 0e5848a4
  Jing Zhang authored May 21, 2021
  
  0e5848a4
- tweak · 4fdee96b
  Jing Zhang authored May 21, 2021
  
  4fdee96b
20 May, 2021 2 commits
- break vector type to blk_size · 3399ddaf
  Jing Zhang authored May 20, 2021
  
  3399ddaf
- use StaticBuffer of vector_type · 59462dca
  Jing Zhang authored May 20, 2021
  
  59462dca
19 May, 2021 2 commits
- tuning · 2cf1757e
  Jing Zhang authored May 19, 2021
  
  2cf1757e
- added 128x128 wavegemm · 90ec6a19
  Jing Zhang authored May 19, 2021
  
  90ec6a19
18 May, 2021 5 commits
- clean code · 1d48b521
  Jing Zhang authored May 18, 2021
  
  1d48b521
- add 2x2 pipeline · c0ffe379
  Jing Zhang authored May 18, 2021
  
  c0ffe379
- add m/n repeats · 40016f20
  Jing Zhang authored May 18, 2021
  
  40016f20
- add KReduction · 8c84c0b1
  Jing Zhang authored May 18, 2021
  
  8c84c0b1
- clean code · 02bf2be0
  Jing Zhang authored May 18, 2021
  
  02bf2be0
17 May, 2021 1 commit
- added tuning params · dfbe7e20
  Jing Zhang authored May 17, 2021
  
  dfbe7e20
16 May, 2021 2 commits
- fixed output · b3a4d179
  Jing Zhang authored May 16, 2021
  
  b3a4d179
- debugging · 9bdad55b
  Jing Zhang authored May 16, 2021
  
  9bdad55b
13 May, 2021 1 commit
- working on blockwise_gemm_xdlops · 7084b152
  Jing Zhang authored May 13, 2021
  
  7084b152
12 May, 2021 3 commits
- merge master · be49a8c5
  Jing Zhang authored May 12, 2021
  
  be49a8c5
- reorganize some files (#33) · 71d6b19d
  Chao Liu authored May 12, 2021
  
  71d6b19d
- Use DynamicBuffer instead of raw pointer (#32) · 78b987fb
  Chao Liu authored May 12, 2021
```
* Use DynamicBuffer to hold raw pointer (to global and LDS memory)

* add workaround for compiler issue (inefficient ISA) of ds_write for int8x4, int8x8, int8x16
```
  78b987fb
11 May, 2021 2 commits

create files for xdlops · bcdc330d
Jing Zhang authored May 11, 2021

bcdc330d

No raw index calculation (#31) · 01055d95

Chao Liu authored May 11, 2021



* Replace most raw index calculation to coordinate transformation
* Overhaul blockwise and threadwise GEMM
* Overhaul driver for gridwies GEMM kernel
Co-authored-by: Jing Zhang <jizhan@amd.com>

01055d95

28 Apr, 2021 1 commit
- Use Tuple and vector_type instead of Array for holding tensor data (#30) · d075adf1
  Chao Liu authored Apr 28, 2021
```
* replacing array with tuple and vector for tensor data
```
  d075adf1
13 Apr, 2021 2 commits
- Overhaul vector_type and use real vector for int8x4_t instead of aliasing from int32_t (#29) · e4790c25
  Chao Liu authored Apr 12, 2021
```
* overhaul vector_type, make int8x4_t real vector instead of aliasing from int32_t
```
  e4790c25
- Initial implementation of magic number division and "Merge" transformation that use it (#28) · 3bf52e60
  Chao Liu authored Apr 12, 2021
```
* initial implementation for magic number division and DynamicMerge_v2_magic_division that uses it

* turn off DynamicMerge_v2_magic_division that use magic number division by default
```
  3bf52e60
07 Apr, 2021 1 commit
- Hybrid direct + implicit GEMM forward convolution NCHWc v5r1 (#25) · 792a20fa
  zjing14 authored Apr 07, 2021
```
* Hybrid direct + implicit GEMM forward convolution NCHWc v5r1. Input tensor bypass LDS. Support fp32/fp16/int8
```
  792a20fa
06 Apr, 2021 1 commit

Fix performance issue when passing tensor descriptor from host to kernel by void pointers (#27) · d2217f30

Chao Liu authored Apr 06, 2021

* use address_space(4) in kernel signature to fix performance issue when passing tensor descriptor from host to kernel by (void) pointers

* remove passing by pointer* option (only use pass by value or void*)

d2217f30