Commits · 80120f0a0c524d1efc0249926a73d5020f0efd67 · yangql / composable_kernel-1

09 Aug, 2021 1 commit
- tidy · 80120f0a
  Chao Liu authored Aug 09, 2021
  
  80120f0a
27 Jul, 2021 1 commit

[MIOpen Downstream] Initial MIOpen integration (#52) · f63a23ac

Chao Liu authored Jul 27, 2021

* update online kernel wrapper bundle all descriptors in a tuple

* change __CONSTANT__ to CONSTANT

* rename

* adding tuning

* added IsValidCompileParameter

* reorginze

* adding tunable for fp16 and int8

* fix kernel compile warning and bug fixes

* suppress warning about cast CONSTANT (address space 4) pointer

* fix building issue

f63a23ac

17 Jul, 2021 1 commit

Add xdlops v4r4r4 into online compilation (#48) · fbdf4332

zjing14 authored Jul 16, 2021



* init for v4r4 xdlops olc

* refactor wrap

* init impl of v4r4 nchw xdlops olc

* tuning

* test perf

* fixed v4r4 nhwc

* tuned v4r4 nhwc

* use gridwise_gemm_xdlops_v2r3

* swap a/b

* add pointer support into offline v2r3

* debugging v4r4r4 transform for olc

* change timer of olc

* refactor v4r4 xdlops nchw olc

* remove transform fun in v4r4 xdlops nhwc olc
Co-authored-by: Chao Liu <chao.liu2@amd.com>

fbdf4332

08 Jul, 2021 1 commit
- Deprecate static kernel (#42) · 81c942cd
  Chao Liu authored Jul 08, 2021
```
* deprecate static kernels
```
  81c942cd
05 Jul, 2021 1 commit

DL GEMM fp32/fp16/int8 (#41) · b8b2d0a6

Chao Liu authored Jul 04, 2021

* add threadwise copy the copy a tensor in one copy, added kpack to DL GEMM

* add kpack into fwd v4r5 nchw fp32

b8b2d0a6

10 Jun, 2021 1 commit

Restructure gridwise and blockwise GEMM, add tensor contraction and FWD-v4r5 (#36) · 30072aec

Chao Liu authored Jun 09, 2021

* experimenting magic number division

* overhauling fwd-v4r4 to clearly reflect transformation graph

* added fwd-v4r5

* bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2

* bug fix and added sanity-check in transform_dynamic_tensor_descriptor

* added conv_driver_v2

30072aec

12 May, 2021 1 commit

Use DynamicBuffer instead of raw pointer (#32) · 78b987fb

Chao Liu authored May 12, 2021

* Use DynamicBuffer to hold raw pointer (to global and LDS memory)

* add workaround for compiler issue (inefficient ISA) of ds_write for int8x4, int8x8, int8x16

78b987fb

11 May, 2021 1 commit

No raw index calculation (#31) · 01055d95

Chao Liu authored May 11, 2021



* Replace most raw index calculation to coordinate transformation
* Overhaul blockwise and threadwise GEMM
* Overhaul driver for gridwies GEMM kernel
Co-authored-by: Jing Zhang <jizhan@amd.com>

01055d95

28 Apr, 2021 1 commit
- Use Tuple and vector_type instead of Array for holding tensor data (#30) · d075adf1
  Chao Liu authored Apr 28, 2021
```
* replacing array with tuple and vector for tensor data
```
  d075adf1
13 Apr, 2021 1 commit

Initial implementation of magic number division and "Merge" transformation that use it (#28) · 3bf52e60

Chao Liu authored Apr 12, 2021

* initial implementation for magic number division and DynamicMerge_v2_magic_division that uses it

* turn off DynamicMerge_v2_magic_division that use magic number division by default

3bf52e60

25 Mar, 2021 1 commit

Dynamic tensor descriptor (#24) · fcbb9788

Chao Liu authored Mar 25, 2021



* support dynamic tensor descriptor

* use buffer load OOB feature for padding case

* add navi support

* add int8x4 inference kernel
Co-authored-by: Chao Liu <chao@ixt-rack-81.local.lan>
Co-authored-by: Jing Zhang <jizhan@amd.com>

fcbb9788

06 Aug, 2020 1 commit

Bwd Data NHWC (#22) · bbcb67d0

Chao Liu authored Aug 06, 2020

* fix buffer_store bug
* remove obsolete kernels
* add bwd-data-v5r1-nhwc

bbcb67d0

24 Jun, 2020 1 commit

Code clean up (#20) · 5c7cec11

Chao Liu authored Jun 23, 2020



* tuning para,

* testing on v100

* add fp16

* remove deprecated tensor descriptor

* sync with miopen

* update build script
Co-authored-by: Jing Zhang <jizhan@amd.com>

5c7cec11

03 Dec, 2019 1 commit

backward data (#7) · 8f5f6496

Chao Liu authored Dec 03, 2019

* enabled atomic add in tensor copy
* added gridwise GEMM
* added backward data conv using GEMM + atomic
* added backward data conv using GEMM, no atomic

8f5f6496

04 Nov, 2019 1 commit
- MIOpen integration: recent bug fixes from MIOpen (#5) · 562e1e27
  Chao Liu authored Nov 04, 2019
  
  562e1e27
11 Oct, 2019 1 commit
- Refactor for MIOpen integration (#4) · 52c3fe05
  Chao Liu authored Oct 11, 2019
```
Refactor, so can bring multi-index transformation and padding support into MIOpen
```
  52c3fe05
22 Sep, 2019 1 commit
- done: explicitly separate offset component into compile-time, block-invariant... · 6c2c50b0
  Chao Liu authored Sep 22, 2019
```
done: explicitly separate offset component into compile-time, block-invariant and per-thread components. Experimenting
```
  6c2c50b0
21 Sep, 2019 1 commit
- nvidia build · 184c6e7d
  Chao Liu authored Sep 20, 2019
  
  184c6e7d
10 Sep, 2019 1 commit
- adding merge transform · ca42e910
  Chao Liu authored Sep 10, 2019
  
  ca42e910
09 Sep, 2019 1 commit
- more utility code · 7a7fe160
  Chao Liu authored Sep 09, 2019
  
  7a7fe160
02 Sep, 2019 1 commit
- adding dimension transformation · bd44e639
  Chao Liu authored Sep 02, 2019
  
  bd44e639
19 Jun, 2019 2 commits
- fixed amd build · 1f2cfceb
  Chao Liu authored Jun 19, 2019
  
  1f2cfceb
- refactor · 21f7e9f1
  Chao Liu authored Jun 19, 2019
  
  21f7e9f1
13 Jun, 2019 1 commit
- reorginzed files · 1566b317
  Chao Liu authored Jun 13, 2019
  
  1566b317