Commits · 646fcc268ede841a16cdaafb68aa64803d8390e1 · gaoqiong / composable_kernel

27 Oct, 2021 4 commits
- Merge pull request #47 from ROCmSoftwarePlatform/develop · 646fcc26
  Chao Liu authored Oct 27, 2021
```
Merge develop into master
```
  646fcc26
- [Bug Fix] GridwiseGemm_bk0mk1_bk0nk1_mn_xdlops_v2r4 loop issue (#44) · 6014185a
  ltqin authored Oct 27, 2021
```
* change method computering kpad

* remove unusing variable: batchlen

* change KPerBlock to K0PerBlock

* fix bug for k0 == k0perblock

* fix bug for get k0 index

* use math::integer_divide_ceil
Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
```
  6014185a
- Merge pull request #46 from ROCmSoftwarePlatform/miopen_downstream_all · 3e911370
  Chao Liu authored Oct 27, 2021
```
update ck from miopen ck_upstream
```
  3e911370
- Merge branch 'develop' into miopen_downstream_all · 211dae82
  ltqin authored Oct 27, 2021
  
  211dae82
26 Oct, 2021 1 commit
- [Composable Kernel] update develop branch code to ck_upstream · 5890e300
  Jun Liu authored Oct 25, 2021
```
Merge pull request #1236 from ROCmSoftwarePlatform/develop
```
  5890e300
21 Oct, 2021 2 commits
- fix bug in gridwise gemm xdlops v2r3 (#45) · d5297aba
  Chao Liu authored Oct 21, 2021
  
  d5297aba
- Merge pull request #43 from ROCmSoftwarePlatform/develop · 38a90b6e
  Chao Liu authored Oct 20, 2021
```
Merge develop into master
```
  38a90b6e
19 Oct, 2021 2 commits

bug fix (#39) · c3018794
Chao Liu authored Oct 19, 2021

c3018794

add nchw atomic , nhwc and nhwc atomic method for backward weight (#30) · fd49ff80

ltqin authored Oct 20, 2021



* add add new algorithm from v4r4r2

* program once issue

* add split k functiion

* redefine code

* add a matrix unmerge

* add b matrix unmerge k0

* trans a and b to gridegemm

* nhwc init

* no hacks and vector load

* add hacks

* modify some parameter

* fix tuning prometer for fp32

* fix tuning prometer for fp16

* start change gridwise k split

* init ok

* revome a b matrix k0mk1 desc in grid

* carewrite lculate gridsize

* add kbatch to CalculateBottomIndex

* remove some unused funtion

* add clear data function before call kernel

* out hacks

* in hacks

* rename device convolution file and function name

* modify kBatch value

* fix some tuning code

* start from v4r4 nhwc

* nhwc atomic is able to run

* just for fp32

* enable nchw atomic

* tweak

* tweak

* re-arrange gridwise gemm hot loop for wrw

* add wrw v4r5

* v4r4r5 fp16

* v4r4r4 fp16

* v4r4r2 fp16

* V4R4R4XDLNHWC fp16

* V4R4R2XDLATOMICNCHW fp16

* adjust for fp16

* input gridsize

* change kbatch to gridsize

* testing wrw

* clean up

* k_batch to gridsize

* fix bug

* wrw v4r4r4 kbatch change to gride size

* wrw v4r4r2 kbatch change to gride size

* after merge , change gridwise gemm v2r4

* change MakeCBlockClusterAdaptor

* other method use new gridwise gemm

* clean up

* chapad method nge to make_right_pad_transform

* kbatch out from transform function

* clean up and fix bug

* fix bug

* using function type reduce template parameters

* using auto replace define fuction type

* clean up
Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Jing Zhang <jizhan@amd.com>

fd49ff80

06 Oct, 2021 3 commits

[MIOpen Downstream] Fix Reduction Kernel (#34) · b2dc55f8

Qianfeng authored Oct 07, 2021



* Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel

* Fix with regard to implementing GetZeroVal() in both kernel and host

* Avoid convert to compType from dstDataType before writting the output value

* Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator

* Add CONSTANT decorator for descriptor read buffer

* Use get_thread_local_1d_id() for thread local Id

* Rename GetZeroVal() to GetReductionZeroVal() in the kernels

* Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp

* Occasional tiny simplification and update in the kernel files

* Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers

* Update to remove OpenCL tidy checking failures

* Update for better readability

* Remove unused codes and not-needed template parameters in the kernel wrappers
Co-authored-by: Chao Liu <chao.liu2@amd.com>

b2dc55f8

Tweak GEMM kernel (#38) · b3e8d57d

Chao Liu authored Oct 06, 2021

* add parameters

* tweak gemm

* tweak

* update conv

* update script

* adding bwd 1x1

* update script

* adding 1x1 bwd

* debugging bwd 1x1 failure

* update script

* update script

* test

* test v100

* clean up

b3e8d57d

Add VectorType support into StaticBuffer (#27) · 846f462b

zjing14 authored Oct 06, 2021



* init StaticBufferV2

* clean

* adopt old output stage for staticBufferV2

* clean

* remove hack

* clean

* clean

* clean code

* move c_buffer alloc into blockwise gemm

* add adaptors for m/n_thread_data_on_grid

* adjust blockwise_gemm_xdlops

* reorder ops in GEMM hot loop
Co-authored-by: Chao Liu <chao.liu2@amd.com>

846f462b

29 Sep, 2021 1 commit

[Enhancements] Several bugfixes and refactoring of dynamic generic reduction (#1156) · dfb80c4e

Qianfeng authored Sep 29, 2021

* Squashed 'src/composable_kernel/' content from commit f6edda61

git-subtree-dir: src/composable_kernel
git-subtree-split: f6edda61

* add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files

* Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5

5781adf5 Update develop (#5) (#6)
97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
7b1ec41e refactor
49c33aae refactor
54b3e73d rename

git-subtree-dir: src/composable_kernel
git-subtree-split: 5781adf5



* fix

* refactor

* remove online compilation from CK

* refactor

* fix

* add ctest

* tidy

* add tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* tidy

* add c-style pointer cast

* vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast

* fix clang warning suppression

* tidy

* suppress cppcheck

* fix enum issue

* revert chagnes to hip build

* fix kernel filename

* update CK build script

* rename

* rename

* make innner product compatiable on gfx900

* Update src/include/miopen/solver/ck_utility_common.hpp
Co-authored-by: JD <Jehandad.Khan@amd.com>

* compiler parameter use stream

* use int instead of index_t in kernel wrapper

* DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element

* refactor

* refactor

* change cmakelist

* change ck common utility

* fix

* Squashed 'src/composable_kernel/' changes from 5781adf5..31b40352

31b40352 Merge pull request #16 from ROCmSoftwarePlatform/develop
b62bf8c3 Merge pull request #14 from ROCmSoftwarePlatform/miopen_downstream_init_integration
ccc4a1d3 Merge pull request #8 from ROCmSoftwarePlatform/miopen_downstream_init_integration
67ad47e7 refactor
16effa76 refactor
a91b68df DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element
2cbabbba use int instead of index_t in kernel wrapper
0834bc76 compiler parameter use stream
f2ac7832 make innner product compatiable on gfx900
4e57b30a rename
c03045ce rename
b2589957 update CK build script
2c48039d fix kernel filename
d626dccc fix enum issue
643ebd4f tidy
ddd49ec9 fix clang warning suppression
4f566c62 vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast
172036d7 add c-style pointer cast
76f31319 tidy
d1842890 tidy
f885c131 tidy
80120f0a tidy
c3efeb5e tidy
56fc0842 tidy
54fba515 tidy
e62bae7a tidy
24c87289 add tidy
61487e0a fix
ae98b52a remove online compilation from CK
cb954213 refactor
73ca9701 Merge commit '437cc595c6e206dfebb118985b5171bbc1e29eab' into composable_kernel_init_integration_v3
3b866461 Merge pull request #7 from ROCmSoftwarePlatform/master
d09ea4f4 Update develop (#5)
3d32ae94 add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files

git-subtree-dir: src/composable_kernel
git-subtree-split: 31b40352



* Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel

* Fix with regard to implementing GetZeroVal() in both kernel and host

* Avoid convert to compType from dstDataType before writting the output value

* Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator

* Add CONSTANT decorator for descriptor read buffer

* Use get_thread_local_1d_id() for thread local Id

* Rename GetZeroVal() to GetReductionZeroVal() in the kernels

* Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp

* Occasional tiny simplification and update in the kernel files

* Update in src/reducetensor.cpp for consistent IDs passing to the kernel

* Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers

* Update to remove OpenCL tidy checking failures

* Small updates in src/reducetensor.cpp

* Update for better readability

* Remove unused codes and not-needed template parameters in the kernel wrappers
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: JD <Jehandad.Khan@amd.com>

dfb80c4e

21 Sep, 2021 5 commits
- Merge pull request #1165 from ROCmSoftwarePlatform/develop · 8557901d
  Jun Liu authored Sep 21, 2021
```
Merge develop into CK_upstream (Please don't squash when merging)
```
  8557901d
- Merge pull request #31 from ROCmSoftwarePlatform/miopen_downstream-dynamic_reduction_pr · f305bebd
  Chao Liu authored Sep 21, 2021
```
[MIOpen Downstream] Dynamic Reduction PR
```
  f305bebd
- Merge remote-tracking branch 'origin/develop' into miopen_downstream-dynamic_reduction_pr · b725e3fc
  Chao Liu authored Sep 21, 2021
  
  b725e3fc
- Merge pull request #32 from ROCmSoftwarePlatform/develop · 88833bd9
  Chao Liu authored Sep 21, 2021
```
Merge develop into master
```
  88833bd9
- :Merge remote-tracking branch 'origin/develop' into CK_upstream · df0d6810
  Chao Liu authored Sep 20, 2021
  
  df0d6810
05 Sep, 2021 2 commits
- Add a version of Merge transform that use integerdivision and mod (#25) · f3acd251
  Chao Liu authored Sep 05, 2021
```
* add Merg_v3_division_mod

* refactor
```
  f3acd251
- GEMM driver and kernel (#29) · 19613902
  Chao Liu authored Sep 05, 2021
```
* add gemm driver

* tweak

* add gemm kernel: mk_kn_mn and km_kn_mn

* tweak

* add GEMM km_nk_mn

* fix comment
```
  19613902
31 Aug, 2021 1 commit

Backward weight v4r4r2 with xdlops (#18) · 627d8ef3

ltqin authored Aug 31, 2021



* start

* modify transformat

* modify device convolutiion

* modify host

* added host conv bwd and wrw

* remove bwd, seperate wrw

* clean

* hacall k to zero

* out log

* fixed

* fixed

* change to (out in wei)

* input hack

* hack to out

* format

* fix by comments

* change wei hacks(wei transform has not merge)

* fix program once issue

* fix review comment

* fix vector load issue

* tweak
Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Jing Zhang <jizhan@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

627d8ef3

27 Aug, 2021 2 commits

Misc fixes (#24) · 10bb8110

Chao Liu authored Aug 26, 2021

* use cast_pointer_to_generic_address_space() in v6r1 kernel wrapper, DynamcBuffer and buffer_load take customized invalid-element-value, add buffer_load/store for fp64

* use remove_cvref_t

10bb8110

[SWDEV-281541][MSRCHA-100] Implementation of Dynamic Generic Reduction (#1108) · 9e80cdce

Qianfeng authored Aug 27, 2021



* add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files

* make inner product compatible on gfx900

* Update src/include/miopen/solver/ck_utility_common.hpp

* compiler parameter use stream

* use int instead of index_t in kernel wrapper

* DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element

* Add dynamic generic reduction kernel layer (kernel wrappers, kernel implementations and utilities)

* Some updates to dynamic composable kernel facility for the need of dynamic generic reduction

* Update to generic reduction C++ host interface layer to support dynamic generic reduction

* Update to remove tidy complaints in host interface layer

* Change the unary operator form from void op(T &x) to T op(T x)

* Update to pass single workspace pointer for all kernels (fix for OpenCL backend)

* Use cppcheck-suppress to prevent some strange warnings

* Re-use operator [] and () for DynamicBuffer and update to depending codes

* Remove useless codes in first call threadwise/warpwise/blockwise kernel wrappers

* [performance] Remove un-needed local buffer initialization
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: JD <Jehandad.Khan@amd.com>

9e80cdce

25 Aug, 2021 1 commit

GlobalAtomicAdd for fp32/int32 (#23) · a7a758d8

zjing14 authored Aug 25, 2021



* add f32/i32 atomicAdd support into dynamicBuffer, and enable it in v1r3

* fixed

* fixed

* update comment
Co-authored-by: Chao Liu <chao.liu2@amd.com>

a7a758d8

23 Aug, 2021 2 commits
- Xdlops refactor fix (#22) · 9d3f634a
  zjing14 authored Aug 23, 2021
```
* added constexpr ahead of adptor; clean unused driver; rename M/NPerWave to M/NPerXDL

* fixed bwd

* fixed comment
```
  9d3f634a
- magic division use __umulhi() (#19) · c6f26bb4
  Chao Liu authored Aug 23, 2021
  
  c6f26bb4
19 Aug, 2021 3 commits

Composable kernel init integration v3 (#1097) · 6fe3627a

Chao Liu authored Aug 19, 2021

* Squashed 'src/composable_kernel/' content from commit f6edda61

git-subtree-dir: src/composable_kernel
git-subtree-split: f6edda61

* add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files

* Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5

5781adf5 Update develop (#5) (#6)
97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
7b1ec41e refactor
49c33aae refactor
54b3e73d rename

git-subtree-dir: src/composable_kernel
git-subtree-split: 5781adf5



* fix

* refactor

* remove online compilation from CK

* refactor

* fix

* add ctest

* add c-style pointer cast

* vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast

* fix clang warning suppression

* tidy

* suppress cppcheck

* fix enum issue

* revert chagnes to hip build

* fix kernel filename

* update CK build script

* rename

* rename

* make innner product compatiable on gfx900

* Update src/include/miopen/solver/ck_utility_common.hpp
Co-authored-by: JD <Jehandad.Khan@amd.com>

* compiler parameter use stream

* use int instead of index_t in kernel wrapper

* DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element

* refactor

* refactor

* change cmakelist

* change ck common utility

* fix
Co-authored-by: JD <Jehandad.Khan@amd.com>

6fe3627a

refactor dynamic xdlops iGemm (#13) · a2ad6d35

zjing14 authored Aug 19, 2021

* xdlops refactor

* fixed commnt

* clean xdlops_gemm

* add make c into xldops-gemm

* change mfma_info

* refactor xdlops, hide c desc

* clean

* clean

* clean

* apply hacks changes to v4r4r4_nhwc

* rename hacks and use single stage adapter

* enable fp16 mfma

a2ad6d35

Added host_conv_wrw for verification (#15) · ba6f79a7
zjing14 authored Aug 19, 2021
```
* added host conv wrw
```
ba6f79a7

18 Aug, 2021 1 commit
- Merge pull request #16 from ROCmSoftwarePlatform/develop · 31b40352
  Chao Liu authored Aug 18, 2021
```
Merge develop into master
```
  31b40352
16 Aug, 2021 4 commits
- Merge pull request #14 from ROCmSoftwarePlatform/miopen_downstream_init_integration · b62bf8c3
  Chao Liu authored Aug 16, 2021
```
MIOpen Downstream: Initial integration 2nd PR
```
  b62bf8c3
- Merge pull request #8 from ROCmSoftwarePlatform/miopen_downstream_init_integration · ccc4a1d3
  Chao Liu authored Aug 16, 2021
  
  ccc4a1d3
- refactor · 67ad47e7
  Chao Liu authored Aug 16, 2021
  
  67ad47e7
- refactor · 16effa76
  Chao Liu authored Aug 16, 2021
  
  16effa76
13 Aug, 2021 3 commits
- DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element · a91b68df
  Chao Liu authored Aug 13, 2021
  
  a91b68df
- use int instead of index_t in kernel wrapper · 2cbabbba
  Chao Liu authored Aug 13, 2021
  
  2cbabbba
- compiler parameter use stream · 0834bc76
  Chao Liu authored Aug 13, 2021
  
  0834bc76
11 Aug, 2021 2 commits
- make innner product compatiable on gfx900 · f2ac7832
  Chao Liu authored Aug 11, 2021
  
  f2ac7832
- rename · 4e57b30a
  Chao Liu authored Aug 11, 2021
  
  4e57b30a
10 Aug, 2021 1 commit
- rename · c03045ce
  Chao Liu authored Aug 10, 2021
  
  c03045ce