Commits · 0374f8deaae9e80c5adf8c80fd33b7a1576335f9 · gaoqiong / composable_kernel

29 Apr, 2021 1 commit
- blockwise gemm does 3d*3d=4d · 0374f8de
  Chao Liu authored Apr 29, 2021
  
  0374f8de
28 Apr, 2021 2 commits
- updated block-cluster in gridwise gemm and thread-cluster in blockwise copy to... · 4a661578
  Chao Liu authored Apr 28, 2021
```
updated block-cluster in gridwise gemm and thread-cluster in blockwise copy to use cluster descriptor
```
  4a661578
- added TensorAdaptor class and use it to implement cluster descriptor · 8b306478
  Chao Liu authored Apr 28, 2021
  
  8b306478
26 Apr, 2021 1 commit
- Merge remote-tracking branch 'origin/no_array' into no_raw_index_calculation · 2178d1d8
  Chao Liu authored Apr 26, 2021
  
  2178d1d8
24 Apr, 2021 1 commit
- update v5r1 · 7484a103
  Chao Liu authored Apr 24, 2021
  
  7484a103
23 Apr, 2021 6 commits
- clean up · ba31eb3e
  Chao Liu authored Apr 23, 2021
  
  ba31eb3e
- clean up · 08bb4372
  Chao Liu authored Apr 23, 2021
  
  08bb4372
- updating v5r1 · 905f5a3f
  Chao Liu authored Apr 23, 2021
  
  905f5a3f
- updating v5r1 · 474733b5
  Chao Liu authored Apr 23, 2021
  
  474733b5
- updating v5r1 · 415a4a5b
  Chao Liu authored Apr 23, 2021
  
  415a4a5b
- refactor DynamicBuffer · 32d485dd
  Chao Liu authored Apr 23, 2021
  
  32d485dd
22 Apr, 2021 5 commits
- bug fix · b6e43b25
  Chao Liu authored Apr 22, 2021
  
  b6e43b25
- added back amd_assembly_outer_product_1x2 and amd_assembly_outer_product_1x4 · f5654649
  Chao Liu authored Apr 22, 2021
  
  f5654649
- updating v5r1 · 9d5d6afa
  Chao Liu authored Apr 22, 2021
  
  9d5d6afa
- using tuple (instead of vector) for holding C thread matrix data to solve... · dcee43fe
  Chao Liu authored Apr 22, 2021
```
using tuple (instead of vector) for holding C thread matrix data to solve register over-allocation issue
```
  dcee43fe
- use vector type for holding C thread matrix data, but it cause register over-allocation · aeb05cc4
  Chao Liu authored Apr 22, 2021
  
  aeb05cc4
21 Apr, 2021 6 commits
- clean · d990eff6
  Chao Liu authored Apr 21, 2021
  
  d990eff6
- use StaticBuffer for thread matrix A/B in blockwise GEMM · 437c996a
  Chao Liu authored Apr 21, 2021
  
  437c996a
- fix bug · 36de63ff
  Chao Liu authored Apr 21, 2021
  
  36de63ff
- replace raw pointer with DynamicBuffer in blockwise and threadwise gemm · 888f1d68
  Chao Liu authored Apr 21, 2021
  
  888f1d68
- replacing array with vector for tensor data · 35d68cf8
  Chao Liu authored Apr 21, 2021
  
  35d68cf8
- replacing array with vector for tensor data · 712babe4
  Chao Liu authored Apr 21, 2021
  
  712babe4
20 Apr, 2021 2 commits
- replacing array with vector for tensor data · 03f7892a
  Chao Liu authored Apr 20, 2021
  
  03f7892a
- replacing array with vector for tensor data · e8421cca
  Chao Liu authored Apr 20, 2021
  
  e8421cca
19 Apr, 2021 1 commit
- replacing array with vector for tensor data · 4978c9e7
  Chao Liu authored Apr 19, 2021
  
  4978c9e7
17 Apr, 2021 3 commits
- replacing array with vector for tensor data · e38c1b73
  Chao Liu authored Apr 17, 2021
  
  e38c1b73
- replacing array with vector for tensor data · 841b1480
  Chao Liu authored Apr 17, 2021
  
  841b1480
- replacing raw index calculation with coordinate transformation · fa163f3b
  Chao Liu authored Apr 16, 2021
  
  fa163f3b
14 Apr, 2021 1 commit
- replacing raw index calculation with coordinate transformation · 82696a73
  Chao Liu authored Apr 14, 2021
  
  82696a73
13 Apr, 2021 2 commits
- Overhaul vector_type and use real vector for int8x4_t instead of aliasing from int32_t (#29) · e4790c25
  Chao Liu authored Apr 12, 2021
```
* overhaul vector_type, make int8x4_t real vector instead of aliasing from int32_t
```
  e4790c25
- Initial implementation of magic number division and "Merge" transformation that use it (#28) · 3bf52e60
  Chao Liu authored Apr 12, 2021
```
* initial implementation for magic number division and DynamicMerge_v2_magic_division that uses it

* turn off DynamicMerge_v2_magic_division that use magic number division by default
```
  3bf52e60
07 Apr, 2021 1 commit
- Hybrid direct + implicit GEMM forward convolution NCHWc v5r1 (#25) · 792a20fa
  zjing14 authored Apr 07, 2021
```
* Hybrid direct + implicit GEMM forward convolution NCHWc v5r1. Input tensor bypass LDS. Support fp32/fp16/int8
```
  792a20fa
06 Apr, 2021 2 commits
- Fix performance issue when passing tensor descriptor from host to kernel by void pointers (#27) · d2217f30
  Chao Liu authored Apr 06, 2021
```
* use address_space(4) in kernel signature to fix performance issue when passing tensor descriptor from host to kernel by (void) pointers

* remove passing by pointer* option (only use pass by value or void*)
```
  d2217f30
- bug fix for buffer resource setting (#26) · 6a5ea493
  zjing14 authored Apr 06, 2021
  
  6a5ea493
25 Mar, 2021 1 commit

Dynamic tensor descriptor (#24) · fcbb9788

Chao Liu authored Mar 25, 2021



* support dynamic tensor descriptor

* use buffer load OOB feature for padding case

* add navi support

* add int8x4 inference kernel
Co-authored-by: Chao Liu <chao@ixt-rack-81.local.lan>
Co-authored-by: Jing Zhang <jizhan@amd.com>

fcbb9788

06 Aug, 2020 1 commit

Bwd Data NHWC (#22) · bbcb67d0

Chao Liu authored Aug 06, 2020

* fix buffer_store bug
* remove obsolete kernels
* add bwd-data-v5r1-nhwc

bbcb67d0

29 Jul, 2020 1 commit

Improve buffer address for out of bound check (#21) · ac62d13e

Chao Liu authored Jul 29, 2020

* Use buffer load built-in OOB check. buffer size is limited to 2GB.
* buffer APIs use combined wave and thread offset
* use uint32_t for addr shift in buffer addressing

ac62d13e

24 Jun, 2020 1 commit

Code clean up (#20) · 5c7cec11

Chao Liu authored Jun 23, 2020



* tuning para,

* testing on v100

* add fp16

* remove deprecated tensor descriptor

* sync with miopen

* update build script
Co-authored-by: Jing Zhang <jizhan@amd.com>

5c7cec11

18 Feb, 2020 1 commit
- MIOpen integration (#15) · 7d09790a
  Chao Liu authored Feb 18, 2020
```
* renaming
```
  7d09790a
17 Feb, 2020 1 commit
- MIopen integration (#13) · 1a66e35b
  Chao Liu authored Feb 17, 2020
```
* update for miopen integration: cosmetic refactor
```
  1a66e35b