Commits · 0cfda84d0576b7dc271755b3da214a4ab55fa33d · yangql / composable_kernel-1

15 Feb, 2023 1 commit

[Navi3x] Add Device Operations (#567) · 0cfda84d

Haocong WANG authored Feb 16, 2023

* wmma_op + unit test

* add arch limitation to wmma test

* change arch limitation

* Refactor + Add all type unit test(int4 compile failed)

* Add f32_16x16x16_bf16 unit test

* tempsave

* tempsave

* tempsave

* runtime bug, cannot find symbol

* workaround for incorrect HIP warpSize return value

* debugging

* tempsave

* Correctness OK, waiting for optimization

* Tidy up + format

* temp save

* temp save, reproduce the v_bfi_b32 issue

* add inline asm for wmmaop test

* tidy up

* clean some debug purpose code

* discard some codes

* clang format

* clang format

* compiler issue fixed + increase tile size

* navi3x_multipleD+example

* temp save

* workable

* batchedgemm[OK], groupconv[debug]

* groupconv: Sanity check[OK], Performance[Bad]

* navi3x_groupconv_need_optimization

* format

* Add arch limitation to all wmma examples

* fix bug: example30 input conv args

0cfda84d

18 Jan, 2023 1 commit

Wavelet (inter-wave consumer-producer) GEMM (#310) · 1cfa8760

Raman R jana authored Jan 18, 2023



* wavelet gemm programming model support for CK

* GEMM pipeline update for wavelet progrmmaing model

* Updated wavelet programming pipeline

* fixes for global-write for math-wave

* fixed bug in global writes

* Updated comments for better readability

* fixed clang format errors

* added block_lds without barrier sync

* clean

* clean

* clean

* clean

* refactor

* prototype

4 layouts

fix default stride

all problem sizes

tidy

move file

update build script

restore old file

fix build

* refactor standalone test to use gemm test harness

* simplify gemm test

* update build script

* remove redundant

* early return when cmd arg doesn't match

* tidy

* report failure when result not validated

* tidy

* Add comment depicting B2C mapping pattern.

* Formatting & comments.

* Comparison with custom B2C mapping pattern.

* Example for wavelet gemm.

* Add wavelet to Gemm standalone test.

* Remove debug code.

* Remove dangling #endif directive.

Co-authored-by: root <Raman Jana>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Anthony Chang <ac.chang@outlook.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

1cfa8760

17 Jan, 2023 1 commit

[Navi3x-LWPCK-545] Block-wise GEMM + Real GEMM_WMMA_FP16 (#541) · 919aeb1f

Haocong WANG authored Jan 17, 2023

* wmma_op + unit test

* add arch limitation to wmma test

* change arch limitation

* Refactor + Add all type unit test(int4 compile failed)

* Add f32_16x16x16_bf16 unit test

* tempsave

* tempsave

* tempsave

* runtime bug, cannot find symbol

* workaround for incorrect HIP warpSize return value

* debugging

* tempsave

* Correctness OK, waiting for optimization

* Tidy up + format

* temp save

* temp save, reproduce the v_bfi_b32 issue

* add inline asm for wmmaop test

* tidy up

* clean some debug purpose code

* discard some codes

* clang format

* clang format

* compiler issue fixed + increase tile size

919aeb1f

23 Aug, 2022 1 commit

Add examples of Gemm (data type: int4) (#367) · fa2d894b

Po Yen Chen authored Aug 24, 2022

* Add GEMM examples for int4

Currently the source files are just copied from int8 examples

* Re-use pre-defined alias in int4 exmples

* Distinguish user-side type from kernel-side type

* Add int4_t support for check_err()

* Allow conversion between Tensor<> specializations

* Re-format source files

* Use different type for host tensors

* Re-use CopyAsType<>() to implement copy ctor

* Re-use element-wise operation type alias

* Fix typo in alias names

* Complete the int4 examples

* Add constraint to Tensor<> templated methods

* Add type traits 'is_signed_integral<>'

* Add type constraints for integer version check_err<>()

* Allow comparing different-sized integral types in check_err()

* Check converted Tensor<int4_t> with golden Tensor<int8_t>

* Remove constraint of Tensor<>::CopyAsType()

* Avoid compilation error while disabling ck::int4_t support

* Remove debug messages

* Add #error directive to prevent compile sources with wrong setting

* Simplify tensor usages in examples

* Add constraint to check_err() input reference type

* Align design with other PR

* Use ""_uz to simplify example code

* Avoid too much generalizing check_err()

* Re-format GEMM instance template arguments

* Extract int4 example common codes

* Sort include directives

* Move #include directives into new header

* Move common codes together

* Re-format template argument in example code

* Reuse same implementation code for most of GEMM examples

* Re-format common.hpp

* Unify structured comment in examples

* Use reinterpret_cast<>() for cross-type pointer conversion

* Revert "Add type traits 'is_signed_integral<>'"

This reverts commit f2c148efaedf42c8ee66032dac6d13a1003b0f3a.

* Allow unsigned integer arguments for check_err()

* Fix compilation error in check_err()

* Remove unnecessary copy ctor for Tensor<>

* Mark Tensor<> special member functions as 'default'

* Use more strict condition to add code in examples

* Fix wrong program return value of GEMM examples

* Handle the case while user specify all the strides

* Fix never-ran examples

* Exit successfully if GEMM instance does not support given problem

* Add missing 'else' keyword

* Re-format CMakeLists.txt

* Add wrapper function to hide value conversion while copying memory

* Add new DeviceMem API to copy memory

* Use new DeviceMem API to implement examples

* Revert "Add new DeviceMem API to copy memory"

This reverts commit 3f190b0779ceedf7aaf0b380712fda0518de72c1.

* Add conversion ctor for Tensor<>

* Write Tensor<> conversion logics explicitly in example code

* Convert Tensor<> values after transfer data to host

fa2d894b

13 Aug, 2022 1 commit

Skip lds of b matrix (#326) · 10b3278b

ltqin authored Aug 13, 2022

* start

* read for gridwise gemm

* add MakeBGridDescriptor_K0_N0_N1_N2_N3_K1

* add thread  copy desc and register buffer

* add K0PerBlock dim

* add read global data

* finish gridwise gemm

* finish blockwise gemm

* add print data

* add smallest config

* add compare code for gridwis gemm

* fix NXdlPerWave

* fix k0perthread and gridewis gemm main loop

* remove b matrix lds alloc

* fix name

* add test code

* create b_grid_desc_k0_k1_k2_n0_n1_n2_n3_k3 from parameter

* add double register

* modify b_thread_desc_

* add float

* fp16 tag

* add tail for pipeline

* finish main loop

* optimize main loop

* start clear gridwise gemm

* clear code

* clear redundant code

* change file name

* change file name

* fix bug after merge develop

* fix input parameters

* using MultiK0 control b load data loop

* fix some config

* 4 buffer

* fix bug

* one can use

* change read order

* change buffer array to tuple

* change to 8 buffer

* interleave buffer load

* change to 16

* read 8 buffer

* add data buffer to template

* fix after merge develop(head file)

* format

* change to 4 buffer

* remove unnecessary lambda fun

10b3278b

31 May, 2022 1 commit

Minor fix for recent PR (#260) · 85fc91c3

Chao Liu authored May 30, 2022

* fix example

* update IsSupportedArgument

* fix

* disable fp64 conv example as test

85fc91c3

26 May, 2022 1 commit

Add FP64 XDL GEMM built-in function (#199) · 3e6c2610

ltqin authored May 27, 2022



* add intrin_mfma_f64_16x16x4f64

* add example

* gemm reference add double data type

* chang init data

* fix M N PerXdlops

* fix ifdef

* add comparsion config

* add conv fwd example

* format log out

* change rc matrix egister layout

* reorganize example

* reorganize example 2

* format,because merge develop

* fix call impl adding acc data type

* lost ;

* add compiler warning

* change example tunning parameters

* add test for fp64

* add instance

* add test/gemm/gemm_fp64.cpp

* fix get name issue

* remove some tunning parameter

* fix conflict

* format

* use integer value for GEMM test

* add acc data type

* remove typeid because fp16

* fix streamconfig etc bug from merging develop

* format

* remove test_gemm_xdl_fp64

* add AccDataType

* AccDataType problem
Co-authored-by: qinletao <letaoqin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

3e6c2610

24 May, 2022 1 commit

Navi21 gemm (#197) · 40b59a63

Jianfeng Yan authored May 24, 2022



* start adding navi21 GEMM

* navi_gemm_km_kn_mn_fp32 compiles and passes one test.

* rename variables and functions in gridwise_gemm_dlops_v1r3

* add other 3 layouts; format instance

* adding more tuning parameters

add tuning parameters for other 3 layouts

* add gemm_dlops_f16

* tmp

* add dependence of DeviceGemm::IsSupportedArg() on arch

* minor changes

* minor changes

* minor changes

* minor changes

* minor changes

* minor changes

* minor changes

* push gemm_dlops into profiler

* minor changes

* if using xdl or dlops is moved into profiler_gemm_impl

* minor changes

* minor changes

* remove is_xdl from profile_gemm_impl

* make IsSupportedArg dependent on arch for other device_gemm

* minor changes

* minor changes

* fix a bug in f_generate_tensor_value

* add 64x64x64 for gemm_dlops_int8

* add 64x64x64 for gemm_dlops_int8

* comment out 3 layouts in gemm_dlops_int8; add 32x32x32 for gemm_dlops_int8; init A values to 1

* fix

* start fixing tuning parameters

* monir

* minor changes

* minor changes

* minor changes

* fixing

* adding example

* adding example

* adding example

* add gemm fp32 example

* clean up

* use 128x128x16 as MNK tile in navi21 gemm example

* bug fix

* fix test

* use new block c tile

* clean

* fix build
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: shaojiewang <wsjmessi@163.com>

40b59a63

09 Mar, 2022 1 commit

Reorganize files, Part 1 (#119) · 5d37d7bf

Chao Liu authored Mar 08, 2022

* delete obselete files

* move files

* build

* update cmake

* update cmake

* fix build

* reorg examples

* update cmake for example and test

5d37d7bf