Commits · 3835318cc32cac3155060c9614013f2e988de40c · yangql / composable_kernel-1

01 Jul, 2021 1 commit

xdlops_v4r4_fwd fp32/fp16 (#34) · 3835318c

zjing14 authored Jul 01, 2021



* create files for xdlops

* working on blockwise_gemm_xdlops

* add KReduction

* add m/n repeats

* add 2x2 pipeline

* added 128x128 wavegemm

* use StaticBuffer of vector_type

* break vector type to blk_size

* add kpack into xldops_gemm and blockwise_gemm

* abroadcast only

* add fp32 mfma instructions

* adding fp16 mfma

* pack half4_t

* rename kperwave to kpack

* add 32x32x8fp16

* add fp16 mfma

* clean code

* clean code

* V4r4 xdlops kpack (#35)

* add kpack with incorrect results

* bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2

* add 1x1 kernel

* add gridwise_gemm_v2 - single_buffer

* enabled dwordx4 for fp16
Co-authored-by: Chao Liu <chao.liu2@amd.com>

* refactor fwd-v4r4-xdlops

* add v4r4-nhwc-xdlop

* improve some perf of nhwc and nchw by tuning parameters, and change scheuduling in gridwise-gemm loop

* tweak scheduling in gridwise gemm

* add v4r3 with a single output copy

* init commit: output with slice win

* adding sliceWin

* add multiple repeats pattern

* starting adding bwd-v4r1-xdlops

* use tuple as SrcBuffer

* adding bwd-data v4r1 nhwc xdlops

* fix bug in make_dynamic_naive_tensor_descriptor_aligned_v2()

* fix bug in host bwd-data conv

* initial implementation of bwd-data v4r1 nhwc xdlops

* add launch bound flags

* enable launch bound

* add m/nrepeat=4

* tweak bwd-data v4r1 nhwc xdlops

* added bwd-data v4r1 nhwc xlops with output A and weight B

* add fwd-v4r4 nhwc xdlops, A input, B weight, C output
Co-authored-by: Chao Liu <chao.liu2@amd.com>

3835318c

10 Jun, 2021 1 commit

Restructure gridwise and blockwise GEMM, add tensor contraction and FWD-v4r5 (#36) · 30072aec

Chao Liu authored Jun 09, 2021

* experimenting magic number division

* overhauling fwd-v4r4 to clearly reflect transformation graph

* added fwd-v4r5

* bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2

* bug fix and added sanity-check in transform_dynamic_tensor_descriptor

* added conv_driver_v2

30072aec

11 May, 2021 1 commit

No raw index calculation (#31) · 01055d95

Chao Liu authored May 11, 2021



* Replace most raw index calculation to coordinate transformation
* Overhaul blockwise and threadwise GEMM
* Overhaul driver for gridwies GEMM kernel
Co-authored-by: Jing Zhang <jizhan@amd.com>

01055d95

25 Mar, 2021 1 commit

Dynamic tensor descriptor (#24) · fcbb9788

Chao Liu authored Mar 25, 2021



* support dynamic tensor descriptor

* use buffer load OOB feature for padding case

* add navi support

* add int8x4 inference kernel
Co-authored-by: Chao Liu <chao@ixt-rack-81.local.lan>
Co-authored-by: Jing Zhang <jizhan@amd.com>

fcbb9788

24 Jun, 2020 1 commit

Code clean up (#20) · 5c7cec11

Chao Liu authored Jun 23, 2020



* tuning para,

* testing on v100

* add fp16

* remove deprecated tensor descriptor

* sync with miopen

* update build script
Co-authored-by: Jing Zhang <jizhan@amd.com>

5c7cec11

17 Feb, 2020 1 commit
- MIopen integration (#13) · 1a66e35b
  Chao Liu authored Feb 17, 2020
```
* update for miopen integration: cosmetic refactor
```
  1a66e35b
27 Jan, 2020 1 commit
- Update for recent MIOpen integration (#11) · 3406a114
  Chao Liu authored Jan 27, 2020
```
* update for MIOpen integration
```
  3406a114
20 Jan, 2020 1 commit

Added bwd data v3r1 v4r1, tweaking v1 (#10) · c5da0377

Chao Liu authored Jan 20, 2020

* Added bwd data v3r1: breaking down compute into a series of load balanced GEMM, and launch in a single kernel
* Added bwd data v4r1: like v3r1, but launch GEMMs in multiple kernels
* Tweaked v1r1  and v1r2 (atomic) on AMD GPU

c5da0377

03 Dec, 2019 1 commit

backward data (#7) · 8f5f6496

Chao Liu authored Dec 03, 2019

* enabled atomic add in tensor copy
* added gridwise GEMM
* added backward data conv using GEMM + atomic
* added backward data conv using GEMM, no atomic

8f5f6496

10 Sep, 2019 1 commit
- adding merge transform · ca42e910
  Chao Liu authored Sep 10, 2019
  
  ca42e910
09 Sep, 2019 1 commit
- more utility code · 7a7fe160
  Chao Liu authored Sep 09, 2019
  
  7a7fe160
05 Sep, 2019 1 commit
- adding dimension tranformation · 0c05f427
  Chao Liu authored Sep 05, 2019
  
  0c05f427
20 Jun, 2019 1 commit
- refactor · 37b82b7e
  Chao Liu authored Jun 19, 2019
  
  37b82b7e
19 Jun, 2019 2 commits
- fixed amd build · 1f2cfceb
  Chao Liu authored Jun 19, 2019
  
  1f2cfceb
- refactor · 21f7e9f1
  Chao Liu authored Jun 19, 2019
  
  21f7e9f1
18 Jun, 2019 2 commits
- refactor · 9de63930
  Chao Liu authored Jun 18, 2019
  
  9de63930
- clean up for miopen · 23f633cd
  Chao Liu authored Jun 17, 2019
  
  23f633cd
17 Jun, 2019 2 commits
- refactoring · 9d59a39a
  Chao Liu authored Jun 17, 2019
  
  9d59a39a
- refactoring for miopen · 33d1e0e2
  Chao Liu authored Jun 17, 2019
  
  33d1e0e2
13 Jun, 2019 1 commit
- reorginzed files · 1566b317
  Chao Liu authored Jun 13, 2019
  
  1566b317
12 Jun, 2019 1 commit
- reorginze files · 81497a93
  Chao Liu authored Jun 11, 2019
  
  81497a93
11 Jun, 2019 2 commits
- rename files, added header guard, added namespace · 88b77181
  Chao Liu authored Jun 11, 2019
  
  88b77181
- remove .hip extension · 05e04665
  Chao Liu authored Jun 11, 2019
  
  05e04665
07 Jun, 2019 1 commit
- use more constexpr for Array · 0a386c46
  Chao Liu authored Jun 06, 2019
  
  0a386c46
06 Jun, 2019 1 commit
- refactor · 7a89684f
  Chao Liu authored Jun 06, 2019
  
  7a89684f
05 Jun, 2019 1 commit
- use more constexpr · 709f13a6
  Chao Liu authored Jun 04, 2019
  
  709f13a6
03 Jun, 2019 1 commit
- use vectorized read and write for threadwise generic tensor copy · 917d7a2b
  Chao Liu authored Jun 03, 2019
  
  917d7a2b
30 May, 2019 1 commit
- adding implicit gemm v4 (nchw, kcyx) · b2439ec9
  Chao Liu authored May 30, 2019
  
  b2439ec9
23 May, 2019 1 commit
- adding implicit gemm v3 · 8a4b5978
  Chao Liu authored May 22, 2019
  
  8a4b5978
16 May, 2019 1 commit
- adding implicit gemm v3 · 5e5c27a6
  Chao Liu authored May 16, 2019
  
  5e5c27a6
02 May, 2019 1 commit
- refactored · 4957d5a3
  Chao Liu authored May 02, 2019
  
  4957d5a3
23 Apr, 2019 1 commit
- added implicit gemm v1r3 lds_double_buffer NCHW * CYXK = KNHW, reworked static functionals · 569ad66e
  Chao Liu authored Apr 23, 2019
  
  569ad66e
18 Apr, 2019 1 commit
- implicit gemm v1r2: adding support for nchw · 19f17df4
  Chao Liu authored Apr 18, 2019
  
  19f17df4
06 Apr, 2019 2 commits
- clean up · 5245a016
  Chao Liu authored Apr 06, 2019
  
  5245a016
- debugging · f6cb5b84
  Chao Liu authored Apr 06, 2019
  
  f6cb5b84
03 Apr, 2019 2 commits
- tidy up · e2313c9e
  Chao Liu authored Apr 02, 2019
  
  e2313c9e
- puting gridwise convolution into its own class · 6290e0b0
  Chao Liu authored Apr 02, 2019
  
  6290e0b0
01 Apr, 2019 1 commit
- refactor · e43d7bc6
  Chao Liu authored Apr 01, 2019
  
  e43d7bc6
29 Mar, 2019 1 commit
- Jing's ds_read inline asm · d6d9a8e4
  Chao Liu authored Mar 28, 2019
  
  d6d9a8e4
24 Mar, 2019 1 commit
- experimenting · 766b0a9e
  Chao Liu authored Mar 24, 2019
  
  766b0a9e