Commits · fd49ff8080b90687108c46f92321ce10ecc743dc · gaoqiong / composable_kernel

19 Oct, 2021 1 commit

add nchw atomic , nhwc and nhwc atomic method for backward weight (#30) · fd49ff80

ltqin authored Oct 20, 2021



* add add new algorithm from v4r4r2

* program once issue

* add split k functiion

* redefine code

* add a matrix unmerge

* add b matrix unmerge k0

* trans a and b to gridegemm

* nhwc init

* no hacks and vector load

* add hacks

* modify some parameter

* fix tuning prometer for fp32

* fix tuning prometer for fp16

* start change gridwise k split

* init ok

* revome a b matrix k0mk1 desc in grid

* carewrite lculate gridsize

* add kbatch to CalculateBottomIndex

* remove some unused funtion

* add clear data function before call kernel

* out hacks

* in hacks

* rename device convolution file and function name

* modify kBatch value

* fix some tuning code

* start from v4r4 nhwc

* nhwc atomic is able to run

* just for fp32

* enable nchw atomic

* tweak

* tweak

* re-arrange gridwise gemm hot loop for wrw

* add wrw v4r5

* v4r4r5 fp16

* v4r4r4 fp16

* v4r4r2 fp16

* V4R4R4XDLNHWC fp16

* V4R4R2XDLATOMICNCHW fp16

* adjust for fp16

* input gridsize

* change kbatch to gridsize

* testing wrw

* clean up

* k_batch to gridsize

* fix bug

* wrw v4r4r4 kbatch change to gride size

* wrw v4r4r2 kbatch change to gride size

* after merge , change gridwise gemm v2r4

* change MakeCBlockClusterAdaptor

* other method use new gridwise gemm

* clean up

* chapad method nge to make_right_pad_transform

* kbatch out from transform function

* clean up and fix bug

* fix bug

* using function type reduce template parameters

* using auto replace define fuction type

* clean up
Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Jing Zhang <jizhan@amd.com>

fd49ff80

06 Oct, 2021 1 commit

Tweak GEMM kernel (#38) · b3e8d57d

Chao Liu authored Oct 06, 2021

* add parameters

* tweak gemm

* tweak

* update conv

* update script

* adding bwd 1x1

* update script

* adding 1x1 bwd

* debugging bwd 1x1 failure

* update script

* update script

* test

* test v100

* clean up

b3e8d57d

19 Aug, 2021 1 commit

Composable kernel init integration v3 (#1097) · 6fe3627a

Chao Liu authored Aug 19, 2021

* Squashed 'src/composable_kernel/' content from commit f6edda61

git-subtree-dir: src/composable_kernel
git-subtree-split: f6edda61

* add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files

* Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5

5781adf5 Update develop (#5) (#6)
97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
7b1ec41e refactor
49c33aae refactor
54b3e73d rename

git-subtree-dir: src/composable_kernel
git-subtree-split: 5781adf5



* fix

* refactor

* remove online compilation from CK

* refactor

* fix

* add ctest

* add c-style pointer cast

* vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast

* fix clang warning suppression

* tidy

* suppress cppcheck

* fix enum issue

* revert chagnes to hip build

* fix kernel filename

* update CK build script

* rename

* rename

* make innner product compatiable on gfx900

* Update src/include/miopen/solver/ck_utility_common.hpp
Co-authored-by: JD <Jehandad.Khan@amd.com>

* compiler parameter use stream

* use int instead of index_t in kernel wrapper

* DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element

* refactor

* refactor

* change cmakelist

* change ck common utility

* fix
Co-authored-by: JD <Jehandad.Khan@amd.com>

6fe3627a

09 Aug, 2021 1 commit
- tidy · f885c131
  Chao Liu authored Aug 09, 2021
  
  f885c131
18 Jul, 2021 1 commit

reorganize files to prepare for MIOpen integration (#51) · 12649254

Chao Liu authored Jul 18, 2021

* change olc cmake

* adding online compile to fwd-v4r5r2

* update scripts

* remane fwd-v4r5r2 to fwd-v6r1

* clean up

12649254

05 Jul, 2021 1 commit

DL GEMM fp32/fp16/int8 (#41) · b8b2d0a6

Chao Liu authored Jul 04, 2021

* add threadwise copy the copy a tensor in one copy, added kpack to DL GEMM

* add kpack into fwd v4r5 nchw fp32

b8b2d0a6

11 May, 2021 1 commit

No raw index calculation (#31) · 01055d95

Chao Liu authored May 11, 2021



* Replace most raw index calculation to coordinate transformation
* Overhaul blockwise and threadwise GEMM
* Overhaul driver for gridwies GEMM kernel
Co-authored-by: Jing Zhang <jizhan@amd.com>

01055d95

24 Jun, 2020 1 commit

Code clean up (#20) · 5c7cec11

Chao Liu authored Jun 23, 2020



* tuning para,

* testing on v100

* add fp16

* remove deprecated tensor descriptor

* sync with miopen

* update build script
Co-authored-by: Jing Zhang <jizhan@amd.com>

5c7cec11

20 Jan, 2020 1 commit

Added bwd data v3r1 v4r1, tweaking v1 (#10) · c5da0377

Chao Liu authored Jan 20, 2020

* Added bwd data v3r1: breaking down compute into a series of load balanced GEMM, and launch in a single kernel
* Added bwd data v4r1: like v3r1, but launch GEMMs in multiple kernels
* Tweaked v1r1  and v1r2 (atomic) on AMD GPU

c5da0377

05 Jul, 2019 1 commit
- update build · 3276a5e9
  Chao Liu authored Jul 05, 2019
  
  3276a5e9
13 Jun, 2019 1 commit
- reorginzed files · 1566b317
  Chao Liu authored Jun 13, 2019
  
  1566b317
12 Jun, 2019 2 commits
- change build · c82b833d
  Chao Liu authored Jun 12, 2019
  
  c82b833d
- reorginze files · 81497a93
  Chao Liu authored Jun 11, 2019
  
  81497a93
11 Jun, 2019 1 commit
- rename files, added header guard, added namespace · 88b77181
  Chao Liu authored Jun 11, 2019
  
  88b77181
01 Apr, 2019 1 commit
- changed to dynamics lds allocation · 23c626a9
  Chao Liu authored Apr 01, 2019
  
  23c626a9
15 Feb, 2019 3 commits
- change file extension to hip.hpp and hip.cpp · b2888adf
  Chao Liu authored Feb 15, 2019
  
  b2888adf
- update build · a414e3fd
  Chao Liu authored Feb 15, 2019
  
  a414e3fd
- hip build · 67c6f73f
  Chao Liu authored Feb 15, 2019
  
  67c6f73f
14 Feb, 2019 1 commit
- refactor build, clean up · e80fbbdd
  Chao Liu authored Feb 14, 2019
  
  e80fbbdd