Commits · 8c4e33f19efe33c2c2107ff2702d9d8fc7dcd094 · gaoqiong / composable_kernel

15 Nov, 2021 6 commits

Merge remote-tracking branch 'origin/develop' into v5r1_add · 8c4e33f1
Chao Liu authored Nov 15, 2021

8c4e33f1

Add bfp16/int8 support into XDL GEMM operator (#50) · 3737bb03

zjing14 authored Nov 15, 2021



* init StaticBufferV2

* clean

* adopt old output stage for staticBufferV2

* clean

* remove hack

* clean

* clean

* add parameters

* clean code

* move c_buffer alloc into blockwise gemm

* add adaptors for m/n_thread_data_on_grid

* tweak gemm

* adjust blockwise_gemm_xdlops

* tweak

* update conv

* update script

* adding bwd 1x1

* update script

* adding 1x1 bwd

* debugging bwd 1x1 failure

* update script

* update script

* test

* test v100

* add bf16_1k

* clang-format

* clean

* add bfp16 for gfx908

* add verification

* clean up

* clean code

* restore bfl16

* clean

* add bfp16 support into gemm_driver

* apply new generator to other drivers

* add int8 support

* cleanb

* clean

* clean

* clean
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>
Co-authored-by: root <root@hayabusa6111.amd.com>

3737bb03

clean · 5aed38d4
Jing Zhang authored Nov 15, 2021

5aed38d4
merge develop · e5f7ded6
Jing Zhang authored Nov 15, 2021

e5f7ded6

FP16 data in-register transpose (#41) · b491ebf3

Chao Liu authored Nov 15, 2021

* start fixing 16bit data packing

* adding StaticTensor

* adding StaticTensor

* adding StaticTensor

* add missing constexpr

* adding static tensor

* adding static tensor

* adding transpose

* add inline asm for transpose 2x2 of half_t

* add general transpose_vectors(), but have unnecessary register initialization using v_mov

* fix unnecessary register initialization in transpose_vector by using more pass-by-reference

* add hardcoded logic for NHWC wrw

* improve asm for v_pack

* make ThreadwiseTensorSliceTransfer_v3r2 support any tensor

* tweak

* reorganize file

b491ebf3

merged develop · ed068043
Jing Zhang authored Nov 15, 2021

ed068043

14 Nov, 2021 1 commit

ckProfiler and device-level XDL GEMM operator (#48) · e823d518

Chao Liu authored Nov 14, 2021

* add DeviceGemmXdl

* update script

* fix naming issue

* fix comment

* output HostTensorDescriptor

* rename

* padded GEMM for fwd v4r4r4 nhwc

* refactor

* refactor

* refactor

* adding ckProfiler

* adding ckProfiler

* refactor

* fix tuning parameter bug

* add more gemm instances

* add more fp16 GEMM instances

* fix profiler driver

* fix bug in tuning parameter

* add fp32 gemm instances

* small fix

* refactor

* rename

* refactor gemm profiler; adding DeviceConv and conv profiler

* refactor

* fix

* add conv profiler

* refactor

* adding more GEMM and Conv instance

* Create README.md

Add build instruction for ckProfiler

* Create README.md

Add Readme for gemm_xdl example

* Update README.md

Remove build instruction from top most folder

* Update README.md

* clean up

e823d518

01 Nov, 2021 1 commit
- add activ_type as arguments · 41852668
  Jing Zhang authored Nov 01, 2021
  
  41852668
29 Oct, 2021 6 commits
- add pass by point · e5c9f039
  Jing Zhang authored Oct 29, 2021
  
  e5c9f039
- add dynamic mode of maxpool · 27bad50b
  Jing Zhang authored Oct 29, 2021
  
  27bad50b
- enable dynamic mode of conv and conv+resize_add · 982c3b60
  Jing Zhang authored Oct 29, 2021
  
  982c3b60
- create seperate fusion fun · 1b79fce9
  Jing Zhang authored Oct 29, 2021
  
  1b79fce9
- add gridwise_gemm_v3 · 8e897da7
  Jing Zhang authored Oct 29, 2021
  
  8e897da7
- modularize ops of fusion · baac64e4
  Jing Zhang authored Oct 29, 2021
  
  baac64e4
28 Oct, 2021 2 commits
- clean code · fa5e7aef
  Jing Zhang authored Oct 28, 2021
  
  fa5e7aef
- workaround with offset trick · c19beaa9
  Jing Zhang authored Oct 28, 2021
  
  c19beaa9
27 Oct, 2021 6 commits
- debugging maxpool · f9560180
  Jing Zhang authored Oct 27, 2021
  
  f9560180
- clean · b5bc31bd
  Jing Zhang authored Oct 27, 2021
  
  b5bc31bd
- [Bug Fix] GridwiseGemm_bk0mk1_bk0nk1_mn_xdlops_v2r4 loop issue (#44) · 6014185a
  ltqin authored Oct 27, 2021
```
* change method computering kpad

* remove unusing variable: batchlen

* change KPerBlock to K0PerBlock

* fix bug for k0 == k0perblock

* fix bug for get k0 index

* use math::integer_divide_ceil
Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
```
  6014185a
- Merge pull request #46 from ROCmSoftwarePlatform/miopen_downstream_all · 3e911370
  Chao Liu authored Oct 27, 2021
```
update ck from miopen ck_upstream
```
  3e911370
- Merge branch 'develop' into miopen_downstream_all · 211dae82
  ltqin authored Oct 27, 2021
  
  211dae82
- fixed incorrect results due to typo · 1fb77ae6
  Jing Zhang authored Oct 27, 2021
  
  1fb77ae6
26 Oct, 2021 1 commit
- [Composable Kernel] update develop branch code to ck_upstream · 5890e300
  Jun Liu authored Oct 25, 2021
```
Merge pull request #1236 from ROCmSoftwarePlatform/develop
```
  5890e300
22 Oct, 2021 1 commit
- for binary dumps · 64705e7d
  Jing Zhang authored Oct 22, 2021
  
  64705e7d
21 Oct, 2021 1 commit
- fix bug in gridwise gemm xdlops v2r3 (#45) · d5297aba
  Chao Liu authored Oct 21, 2021
  
  d5297aba
19 Oct, 2021 2 commits

bug fix (#39) · c3018794
Chao Liu authored Oct 19, 2021

c3018794

add nchw atomic , nhwc and nhwc atomic method for backward weight (#30) · fd49ff80

ltqin authored Oct 20, 2021



* add add new algorithm from v4r4r2

* program once issue

* add split k functiion

* redefine code

* add a matrix unmerge

* add b matrix unmerge k0

* trans a and b to gridegemm

* nhwc init

* no hacks and vector load

* add hacks

* modify some parameter

* fix tuning prometer for fp32

* fix tuning prometer for fp16

* start change gridwise k split

* init ok

* revome a b matrix k0mk1 desc in grid

* carewrite lculate gridsize

* add kbatch to CalculateBottomIndex

* remove some unused funtion

* add clear data function before call kernel

* out hacks

* in hacks

* rename device convolution file and function name

* modify kBatch value

* fix some tuning code

* start from v4r4 nhwc

* nhwc atomic is able to run

* just for fp32

* enable nchw atomic

* tweak

* tweak

* re-arrange gridwise gemm hot loop for wrw

* add wrw v4r5

* v4r4r5 fp16

* v4r4r4 fp16

* v4r4r2 fp16

* V4R4R4XDLNHWC fp16

* V4R4R2XDLATOMICNCHW fp16

* adjust for fp16

* input gridsize

* change kbatch to gridsize

* testing wrw

* clean up

* k_batch to gridsize

* fix bug

* wrw v4r4r4 kbatch change to gride size

* wrw v4r4r2 kbatch change to gride size

* after merge , change gridwise gemm v2r4

* change MakeCBlockClusterAdaptor

* other method use new gridwise gemm

* clean up

* chapad method nge to make_right_pad_transform

* kbatch out from transform function

* clean up and fix bug

* fix bug

* using function type reduce template parameters

* using auto replace define fuction type

* clean up
Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Jing Zhang <jizhan@amd.com>

fd49ff80

15 Oct, 2021 3 commits
- test · e9575251
  Jing Zhang authored Oct 15, 2021
  
  e9575251
- test · da207144
  Jing Zhang authored Oct 15, 2021
  
  da207144
- conv-only use v5r1_add · 26c42b94
  Jing Zhang authored Oct 15, 2021
  
  26c42b94
14 Oct, 2021 5 commits
- enable static desc · 4eb9a7a4
  Jing Zhang authored Oct 14, 2021
  
  4eb9a7a4
- add maxpool host for validation · a69937d3
  Jing Zhang authored Oct 14, 2021
  
  a69937d3
- add maxpool fusion · ec381569
  Jing Zhang authored Oct 14, 2021
  
  ec381569
- add configurable makeddesc · 0f276ac2
  Jing Zhang authored Oct 14, 2021
  
  0f276ac2
- add conv_out · 35a57947
  Jing Zhang authored Oct 14, 2021
  
  35a57947
13 Oct, 2021 1 commit
- add bias · 3e298e42
  Jing Zhang authored Oct 13, 2021
  
  3e298e42
12 Oct, 2021 2 commits
- refactor conv_add for InMem::add · 1e6d6782
  Jing Zhang authored Oct 12, 2021
  
  1e6d6782
- make static · f66a71c7
  Jing Zhang authored Oct 12, 2021
  
  f66a71c7
11 Oct, 2021 1 commit
- add bias · 4e5e68a1
  Jing Zhang authored Oct 11, 2021
  
  4e5e68a1
10 Oct, 2021 1 commit
- use activ_enum · af84fba3
  Jing Zhang authored Oct 10, 2021
  
  af84fba3