Commits · 016ebaa7f33d2c3e86cd617210bd636fe7c99b42 · yangql / composable_kernel-1

08 Jun, 2023 1 commit
- support dynamic buffer using memory coherence glc_slc bit from template (#725) · 016ebaa7
  carlushuang authored Jun 08, 2023
  
  016ebaa7
31 May, 2023 1 commit
- update copyright headers (#726) · b94fd0b2
  Illia Silin authored May 31, 2023
  
  b94fd0b2
02 Aug, 2022 1 commit

CGEMM examples bf16, fp32, int8 (#332) · fb0dc358

Adam Osewski authored Aug 02, 2022



* Add int8 specialization for elementwise Add and Subtract.

* CGEMM examples bf16, fp32, int8

* Add convert reference output to CDataType.

* Skip BF16 data type during testing.

* Lower K value to get rid of accumulation error.

* Fix merge artifact.

* Fix changed function name: GetElementSpaceSize()

* Fix merge artifact.
Co-authored-by: Adam Osewski <aosewski@amd.com>

fb0dc358

25 Jun, 2022 2 commits

add license in file (#303) · d3051d75
Chao Liu authored Jun 24, 2022

d3051d75

Absolute include path (#281) · d1db6a0c

Chao Liu authored Jun 24, 2022

* ad gelu and fast_gelu

* added GeLU and fast GeLU

* clean up

* add gemm+fastgelu example

* add gemm+gelu instances

* update profiler

* clean up

* clean up

* adding gemm+bias+activation

* clean

* adding bias

* clean

* adding gemm multiple d

* debugging

* add gemm bias add fastgelu

* rename, clean

* refactoring; add readme

* refactor

* refactor

* refactor

* refactor

* refactor

* refactor

* fix

* fix

* update example

* update example

* rename

* update example

* add ckProfiler

* clean

* clean

* clean

* clean

* add client app example

* update readme

* delete obselete files

* remove old client app

* delete old file

* cleaning

* clean

* remove half

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path for all examples

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* revert client app example

* clean build

* fix build

* temporary disable client test on Jenkins

* clean

* clean

* clean

d1db6a0c

25 May, 2022 1 commit
- minor fix for recent PR (#255) · 61851ae2
  Chao Liu authored May 24, 2022
```
* minor fix

* clean
```
  61851ae2
24 May, 2022 1 commit

Overhaul to Reducton and its dependants (#237) · 63eee2d9

Qianfeng authored May 25, 2022

* Tiny fix in dynamic_buffer.hpp to support vectorized AtomicAdd for double type

* Update to host layer and host reduction

* Merge and remove reduction kernels

* Merge and remove reduction device interfaces and update pooling device interface

* Merge and remove useless reduction device instances

* Update to reduction profiler and reduction ctests

* Update to reduction and pooling examples and add one reduction example

* Change to reduction examples to let them testable by ctest

* Add explicit pass checking for reduction and pooling examples

* Explicit assignment of tensor shapes in example reduce_blockwise_two_call

* Use atomic_add to repace atomicAdd and add atomic_add for double type

* Add reduce ctest support for double data type

* Replace to_int_vector() by using c++ std::vector::assign()

* Keep DeviceReduceThreadWise separated from DeviceReduceBlockWise

* Merge DeviceReduceBlockWise and DeviceReduceMultiBlockAtomicAdd into DeviceReduceMultiBlock

* Add GetAtomicOperationZeroValue() support for AtomicMax

* Tiny change to reduce example README.md

* Fix some tiny issues due to branch merging

* Revoke previous change in dynamic_buffer.hpp and add atomic_add for double2_t

* Add reduce multiblock_atomic_add instances for fp64 to verify vectorized atomic_add on fp64

* Renaming

* Clean the header includings in device_reduce instances header files

63eee2d9

20 May, 2022 1 commit

Gemm reduce max (#209) · 0ffe956a

rocking5566 authored May 20, 2022



* [What] Rename the example
[Why] Prepare to add unary reduction

* Add global oparation to the parameter

* Add atomicmax

* Fix compile error

* Support atomicMax (hip library)

* Rename the reduction example

* Fix target name

* use p_d1_grid as the indicator directly

* Prevent performance issue. Let passthrough handle it.

* Implement the function template the specialize the float2

* No need to separate into two lines

* Remove empty line

* add comment

* Fix compile error due to merge from develop

* make the implementation of atomic_max / atomic_add explicit for each datatype

* Refine typo

* For future CI test

* Fix compiler error in ckProfiler

* Merge commit 'de2769e3a6695b38a20529261273ddc5cdaab2fe'

* simply use remove_pointer

* Rename type and var

* Refine example

* Modify reducemax example

* Fix bug in reduction

* Change initialize range

* Implement F64 version of atomicMax

* Move reduction  code together

* Add buffer atomic_max

* Fix coding style by clang-format

* Integrate new api of DeviceGemmReduce_Xdl_CShuffle

* Integrate Batch gemm reduction

* Fix example

* fix example

* clean up

* Fix batch gemm tensor operation

* Fix coding style

* Fix template augument

* Fix clang format

* Keep flexible of different stride for each D tensor

* Fix compile error for ckProfiler

* Fix typo

* [What] Fix naming
[Why] Prepare to add out elementop

* Add DoutElementOp
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: rocking <chunylai@amd.com>

0ffe956a

15 Apr, 2022 1 commit

Compile CK for all targets (#188) · 4221505d

Illia Silin authored Apr 15, 2022



* compile ck for all targets

* update the target criteria

* change the target condition

* fixed some typos

* fixed missed file

* revert changes in README

* revert device_conv3d_fwd_xdl_...

* update device_conv3d_fwd_xdl_...

* update device_batched_gemm_reduce...

* test the unused arguments fix

* test the warning suppression

* try suppress warnings in device_batched_gemm_reduce_xdl...

* fix the last warnings

* replace UNUSED with std::ignore

* fix a typo

* replaced std::ignore with ignore

* add igonre header to common_header

* refactor atomicAdd
Co-authored-by: Chao Liu <chao.liu2@amd.com>

4221505d

31 Mar, 2022 1 commit

Compile for gfx908 and gfx90a (#130) · cd167e49

Chao Liu authored Mar 31, 2022

* adding compilation for multiple targets

* fix build

* clean

* update Jekinsfile

* update readme

* update Jenkins

* use ck::half_t instead of ushort for bf16

* rename enum classes

* clean

* rename

* clean

cd167e49

09 Mar, 2022 1 commit

Reorganize files, Part 1 (#119) · 5d37d7bf

Chao Liu authored Mar 08, 2022

* delete obselete files

* move files

* build

* update cmake

* update cmake

* fix build

* reorg examples

* update cmake for example and test

5d37d7bf

04 Mar, 2022 1 commit

Refactor threadwise copy using sfcurve (#101) · 0619ebf7

Jianfeng Yan authored Mar 04, 2022



* add space_filling_curve

* cleanup and move space_filling_curve into test

* WIP: start refactoring threadwise_transfer_v1r3

* threadwise_copy works but needs further refactoring

* add some comments

* add SpaceFillingCurve::GetIndices()

* minor changes

* removed GetIndices; refactored GetDstCoordinateResetStep

* add DynamicBuffer::Transfer, but Add is not tested

* rebased agaist develop

* threadwise_copy_v6r1/v6r2/v6r3 using space-filling curve start to work

* minor changes

* refactored threadcopy v3r1, v2; removed old implementations

* clang-format

* cleanup

* fix a typo in v6r3

* format
Co-authored-by: Chao Liu <chao.liu2@amd.com>

0619ebf7

23 Feb, 2022 1 commit

Conv3d new (#94) · 6dfb92bb

Jianfeng Yan authored Feb 22, 2022



* conv3d compiles but has memory error

* conv3d works

* fix performance issue by using __builtin_amdgc_readfirstlane

* change MakeBlock2CTileMap to MakeDefaultBlock2CTileMap; change c_blockid_to* to cblockid_to*

* clang-format

* remove CK_EXPERIMENTAL_PASS_TENSOR_DECRIPTOR_BY_*; moved wrapper into DeviceConv3d

* format

* remove useless marc

* add comment
Co-authored-by: Chao Liu <chao.liu2@amd.com>

6dfb92bb

12 Feb, 2022 1 commit

NHWC conv 2d: fwd bfp16/int8, Device level tuning and host API (#73) · 880fbee9

ltqin authored Feb 12, 2022



* add fwd bf16 conv

* change tunning parametor

* add int8 for conv fwd

* remove comments

* change tunning parametor for int8

* change init int8 example

* add test for conv2d fwd

* change device operation file pos because merge develop

* fwd int8 use reference

* test_conv_fwd use reference

* add braket for if statement

* rename fwd example name

* remove StaticBufferOfVectorTypeV2

* tweak example
Co-authored-by: ltqin <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

880fbee9

03 Feb, 2022 1 commit

Replace llvm Intrinsics with clang buildins (#65) · 6d92959a

zjing14 authored Feb 02, 2022

* test mfma builtins

* add fp16 buildins

* add int8 buildins

* add bfl16 buildins

* simplify host conv forward

* clean

* clean

6d92959a

18 Nov, 2021 1 commit
- Use __builtin_memcpy to implement bit_cast and for accessing vector from pointer of scalars (#53) · 64350aff
  Chao Liu authored Nov 18, 2021
```
* reworking vector_type

* use __builtin_memcpy for bit_cast and vector access of scalar pointer

* clean up
```
  64350aff
27 Aug, 2021 2 commits

Misc fixes (#24) · 10bb8110

Chao Liu authored Aug 26, 2021

* use cast_pointer_to_generic_address_space() in v6r1 kernel wrapper, DynamcBuffer and buffer_load take customized invalid-element-value, add buffer_load/store for fp64

* use remove_cvref_t

10bb8110

[SWDEV-281541][MSRCHA-100] Implementation of Dynamic Generic Reduction (#1108) · 9e80cdce

Qianfeng authored Aug 27, 2021



* add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files

* make inner product compatible on gfx900

* Update src/include/miopen/solver/ck_utility_common.hpp

* compiler parameter use stream

* use int instead of index_t in kernel wrapper

* DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element

* Add dynamic generic reduction kernel layer (kernel wrappers, kernel implementations and utilities)

* Some updates to dynamic composable kernel facility for the need of dynamic generic reduction

* Update to generic reduction C++ host interface layer to support dynamic generic reduction

* Update to remove tidy complaints in host interface layer

* Change the unary operator form from void op(T &x) to T op(T x)

* Update to pass single workspace pointer for all kernels (fix for OpenCL backend)

* Use cppcheck-suppress to prevent some strange warnings

* Re-use operator [] and () for DynamicBuffer and update to depending codes

* Remove useless codes in first call threadwise/warpwise/blockwise kernel wrappers

* [performance] Remove un-needed local buffer initialization
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: JD <Jehandad.Khan@amd.com>

9e80cdce

25 Aug, 2021 1 commit

GlobalAtomicAdd for fp32/int32 (#23) · a7a758d8

zjing14 authored Aug 25, 2021



* add f32/i32 atomicAdd support into dynamicBuffer, and enable it in v1r3

* fixed

* fixed

* update comment
Co-authored-by: Chao Liu <chao.liu2@amd.com>

a7a758d8

19 Aug, 2021 1 commit

Composable kernel init integration v3 (#1097) · 6fe3627a

Chao Liu authored Aug 19, 2021

* Squashed 'src/composable_kernel/' content from commit f6edda61

git-subtree-dir: src/composable_kernel
git-subtree-split: f6edda61

* add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files

* Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5

5781adf5 Update develop (#5) (#6)
97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile
7b1ec41e refactor
49c33aae refactor
54b3e73d rename

git-subtree-dir: src/composable_kernel
git-subtree-split: 5781adf5



* fix

* refactor

* remove online compilation from CK

* refactor

* fix

* add ctest

* add c-style pointer cast

* vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast

* fix clang warning suppression

* tidy

* suppress cppcheck

* fix enum issue

* revert chagnes to hip build

* fix kernel filename

* update CK build script

* rename

* rename

* make innner product compatiable on gfx900

* Update src/include/miopen/solver/ck_utility_common.hpp
Co-authored-by: JD <Jehandad.Khan@amd.com>

* compiler parameter use stream

* use int instead of index_t in kernel wrapper

* DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element

* refactor

* refactor

* change cmakelist

* change ck common utility

* fix
Co-authored-by: JD <Jehandad.Khan@amd.com>

6fe3627a

16 Aug, 2021 1 commit
- refactor · 16effa76
  Chao Liu authored Aug 16, 2021
  
  16effa76
13 Aug, 2021 1 commit
- DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element · a91b68df
  Chao Liu authored Aug 13, 2021
  
  a91b68df
10 Aug, 2021 2 commits
- rename · c03045ce
  Chao Liu authored Aug 10, 2021
  
  c03045ce
- add c-style pointer cast · 172036d7
  Chao Liu authored Aug 10, 2021
  
  172036d7
09 Aug, 2021 1 commit
- tidy · 80120f0a
  Chao Liu authored Aug 09, 2021
  
  80120f0a
27 Jul, 2021 1 commit

[MIOpen Downstream] Initial MIOpen integration (#52) · f63a23ac

Chao Liu authored Jul 27, 2021

* update online kernel wrapper bundle all descriptors in a tuple

* change __CONSTANT__ to CONSTANT

* rename

* adding tuning

* added IsValidCompileParameter

* reorginze

* adding tunable for fp16 and int8

* fix kernel compile warning and bug fixes

* suppress warning about cast CONSTANT (address space 4) pointer

* fix building issue

f63a23ac

05 Jul, 2021 1 commit

DL GEMM fp32/fp16/int8 (#41) · b8b2d0a6

Chao Liu authored Jul 04, 2021

* add threadwise copy the copy a tensor in one copy, added kpack to DL GEMM

* add kpack into fwd v4r5 nchw fp32

b8b2d0a6

12 May, 2021 1 commit

Use DynamicBuffer instead of raw pointer (#32) · 78b987fb

Chao Liu authored May 12, 2021

* Use DynamicBuffer to hold raw pointer (to global and LDS memory)

* add workaround for compiler issue (inefficient ISA) of ds_write for int8x4, int8x8, int8x16

78b987fb