Commits · 334cfe1c205db73b0f88dee0b4bdbf2cc8c91bf7 · gaoqiong / composable_kernel

06 Nov, 2023 1 commit
- 4D Kernel · 334cfe1c
  muozturk authored Nov 06, 2023
  
  334cfe1c
27 Oct, 2023 2 commits
- Tensor Contraction Complex Data Type is working · d2cd5658
  Muhammed Ozturk authored Oct 27, 2023
  
  d2cd5658
- update · 160cf6ed
  Muhammed Ozturk authored Oct 27, 2023
  
  160cf6ed
10 Oct, 2023 2 commits
- bug fix · 7582c18e
  muozturk authored Oct 10, 2023
  
  7582c18e
- complex type contraction · f0a8ee84
  Muhammed Ozturk authored Oct 10, 2023
  
  f0a8ee84
05 Oct, 2023 3 commits
- Replace CMake `return` from later CMake (#970) · 59136091
  Lauren Wrubleski authored Oct 05, 2023
  
  59136091
- Revert "Add support for mixed precision in contraction scale and bilinear" (#967) · 4daedf8c
  Illia Silin authored Oct 05, 2023
```
* Revert "Add support for mixed precision in contraction scale and bilinear (#936)"

This reverts commit f0748506.

* revert commits #957 and #960
```
  4daedf8c
- remove example 60 (#963) · 570ff3dd
  zjing14 authored Oct 05, 2023
```
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  570ff3dd
04 Oct, 2023 1 commit

Add conv bwd weight fp16 comp bf8 fp8 op, instances and example (#945) · 42facfc6

Rostyslav Geyyer authored Oct 04, 2023



* Add f8 bf8 gemm example

* Add element-wise ops

* Add intrinsics

* Update reference calculation

* Add an additional type option for xdlops gemm

* Fix build process

* Add bf8 to buffer addressing

* Update blockwise op, split typeA and typeB

* Update for compatibility

* Uppdate naming to f8->fp8

* Update naming

* Format

* Update naming (#937)

* Add a client example

* Add computetypes to device and gridwise ops

* Add instances, update instance factory

* Format

* Fix a flag

* Add ckProfiler mode

* Fix typos

* Add an example

* Add bf8 generator

* add bf8 mfma; fixed type_convert for bf8

* move verfication ahead of timing

* Update reference calculation

* Fix reference

* Narrow down float init range

* Fix bf8 bf8 mfma

* Add bf8 @ fp8 mfma

* Update example

* Update instances

* Update profiler api

* Update for compatibility

* Format

* Remove extra example

* Clean up

* workaround convert

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

42facfc6

03 Oct, 2023 2 commits
- changed test for grouped_gemm to be random (#959) · 5311d1b3
  zjing14 authored Oct 03, 2023
```
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  5311d1b3
- Fixed contraction issues (#960) · aa46039f
  zjing14 authored Oct 03, 2023
```
* add missing ComputeType

* fixed

* Update cmake-ck-dev.sh

---------
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  aa46039f
02 Oct, 2023 3 commits

Add fp8 @ bf8 gemm support and example (#933) · bd09b5c5

Rostyslav Geyyer authored Oct 02, 2023

* Add f8 bf8 gemm example

* Add element-wise ops

* Add intrinsics

* Update reference calculation

* Add an additional type option for xdlops gemm

* Fix build process

* Add bf8 to buffer addressing

* Update blockwise op, split typeA and typeB

* Update for compatibility

* Uppdate naming to f8->fp8

* Update naming

* Format

bd09b5c5

get rid of gfx900/906, set rocm5.7 as default (#958) · 59dbb01f
Illia Silin authored Oct 02, 2023

59dbb01f

Contraction multi abd (#957) · 9d58c421

zjing14 authored Oct 02, 2023



* add gridwise_multi_abd

* move element_op into RunRead

* merge element_wise op with data read

* add multiABD example

* allow packed elementwise_op

* changed example

* clean

* clean

* add is_detected

* fix

* minor fix

* add scaleAdd_vec4 example

* init commit for contraction_multi_ABD

* add examples

* add examples of multiA and broadcast

* update example

* fixed comments

* Update cmake-ck-dev.sh

* Update cmake-ck-dev.sh

* Add comments into the example

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

9d58c421

29 Sep, 2023 1 commit

Add support for mixed precision in contraction scale and bilinear (#936) · f0748506

Bartlomiej Wroblewski authored Sep 29, 2023

* Extract common functionality to separate files

* Reference contraction: Remove incorrect consts from type_converts

* Reference contraction: Add missing type_convert for dst value

* Reference contraction: Fix incorrect order of B matrix dimensions

* Add support for mixed precision in contraction scale and bilinear

* Move using statements from instances to a common file

* Move using statements from examples to a common file

* Fix the order of B matrix dimensions across examples and profiler

* Fix the computation of error threshold

* Make ComputeDataType an optional argument

* Include possible DataType -> ComputeDataType casting error in the threshold

* Remove commented code

f0748506

28 Sep, 2023 1 commit

Add grouped conv bwd data wmma (#950) · cb538740

Bartłomiej Kocot authored Sep 28, 2023

* Add grouped conv bwd data wmma

* Fix copyrights

* Add instances with smaller NPerBlock

* Update interface test

* Minor stylistic fixes

* Minor stylistic fixes

cb538740

27 Sep, 2023 2 commits

Add column to image kernel (#930) · e2243a4d

Bartłomiej Kocot authored Sep 27, 2023

* Add column to image kernel

* Minor fixes for dtypes and client examples

* Disable tests for disabled dtypes

* Disable add instances functions for disabled data types

* Minor stylistic fixes

* Revert "Disable add instances functions for disabled data types"

This reverts commit 728b8695.

* Instances reduction

* Add comments in device_column_to_image_impl

* Update changelog and Copyrights

* Improve changelog

e2243a4d

Add multiple A/B support (#906) · 11676c7e

zjing14 authored Sep 26, 2023



* add gridwise_multi_abd

* move element_op into RunRead

* merge element_wise op with data read

* add multiABD example

* allow packed elementwise_op

* changed example

* clean

* clean

* add is_detected

* fix

* minor fix

* add scaleAdd_vec4 example

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

11676c7e

21 Sep, 2023 1 commit

Refactoring cmake files to build data types separately. (#932) · bba085d2

Illia Silin authored Sep 20, 2023

* refactor cmake files for the tests

* refactor cmake files for examples

* fix cmake for gemm example

* fix the cmake file for all examples

* add splitting by data types in gemm_splitk instance header

* rename test to reflect only dl instances are used

* clean up CI workspace, update cmake for instances

* change the jenkinsfile syntax

* build all instances except DL on gfx11

* move workspace cleanup after stages

* clean up workspace after every stage

* isolate data types in grouped_conv_fwd header

* isolate dl instances for grouped_conv2d_fwd

* fix syntax

* fix cmake and batchnorm instances

* fix typo

* fix reduction instances

* fix grouped_conv headers

* fix syntax

* replace parsing logic for instances, replace bfp16 with bf16

* fix the client examples build

* clean up DTYPES from instances cmake files

* update the parsing logic in cmake files

* make an exception for reduction kernels

* update few remaining cmake files to handle DTYPES

* fix syntax

* fix cmake conflicts

* replace f8 with fp8 test name

* resolve conflicts for dpp instances

bba085d2

15 Sep, 2023 1 commit

Add fp16/fp8 support into Grouped gemm FixedNK (#874) · f9d0eddb

zjing14 authored Sep 14, 2023



* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* instance and client

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

* clean

* formatting

* clean

* clean

* fixed computeType

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

f9d0eddb

13 Sep, 2023 2 commits

Add grouped conv bwd weight dl instances and new layout (#897) · 475188ca

Bartłomiej Kocot authored Sep 13, 2023

* Add grouped conv bwd weight dl instances and new layout

* Add M and N padding

* Remove todo comment

* Enable grouped conv fwd dl k,c=1 generic instance

* Comment fixes

475188ca

fixed fp8 issues (#894) · a66d14ed

zjing14 authored Sep 12, 2023



* fixed fp8 init; and reference gemm

* Update host_tensor_generator.hpp

* fixed convert

* fixed reference gemm

* fixed comments

* fixed comments

* fixed ci

* fixed computeType

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

a66d14ed

12 Sep, 2023 1 commit

Refactor f8_t, add bf8_t (#792) · 62d4af74

Rostyslav Geyyer authored Sep 12, 2023

* Refactor f8_t to add bf8_t

* Add check_err impl for f8_t

* Update fp8 test

* Format

* Revert the fix

* Update vector_type implementation

* Add bf8 test

* Add bf8, use BitInt types

* Add bf8 conversion methods

* Update type_convert for fp8/bf8

* Add check_err fp8/bf8 support

* Add subnorm fp8 tests

* Add subnorm bf8 tests

* Fix conversion

* Add bf8 cmake bindings

* Add macros to enable build with disabled fp8/bf8

* Remove is_native method

* Update flag combination for mixed precision instances

* Add more flag checks

* Add another flag to a client example

* Add type traits, decouple f8/bf8 casting

* Clean up

* Decouple fp8 and bf8 flags

* Remove more redundant flags

* Remove leftover comments

62d4af74

08 Sep, 2023 1 commit

[Navi3x] Add fp16/int8 wmma conv forward instances (#746) · 562b4cec

Haocong WANG authored Sep 08, 2023



* fix wmma gemm int8; add grouped conv int8 example

* Add int8 gemm-bilinear instances

* compile sanity check unknown

* Sanity pass + clang-format

* add int8 conv profiler instances

* solve merge conflict

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

562b4cec

06 Sep, 2023 1 commit

Redesign the DPP8 GEMM kernel to use warp-wise component (#863) · 37a8c1f7

Bartlomiej Wroblewski authored Sep 06, 2023

* Redesign the DPP8 GEMM kernel to use warp-wise component

* Review: Improve error messages

* Review: Remove unnecessary empty lines

* Review: Fix M, N per thread names

* Review: Rename mfma_input_type to dpp_input_type

* Review: Fix tensor adaptor; remove unnecessary element

* Review: Remove calls to dpp_gemm's MakeCDescriptor

* Review: Add blockwise doc, change function names to include dimension names

* Review: Remove duplicated code; Move Block2CtileMap alias to the top of the file

* Review: Add __restrict__ keywords

* Review: Use MatrixPadder for padding A, B, C matrices

* Review: Remove hardcoded datatypes

* Review: Change names from FloatX to XDataType

* Review: Introduce AK0 and BK0 instead of a single K0

* Review: Remove construction of dpp_datatypes object

* Review: Rename DppInstrRunner to DppLanegroupGemm

37a8c1f7

05 Sep, 2023 1 commit

Add image to column kernel (#867) · 0077eeb3

Bartłomiej Kocot authored Sep 05, 2023

* Add image to column kernel

* Add instances, tests, profiler, example

* Add client example

* Several fixes of image to column

* Fix variable name in device_image_to_column_impl

* Several fixes of image to column profiler

* Fix num_btype calculation

* Make new mesaurements for correct bytes calculation

0077eeb3

31 Aug, 2023 2 commits

Grouped Gemm with Fixed K and N with SplitK (#818) · f5ec04f0

zjing14 authored Aug 31, 2023



* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

f5ec04f0

MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861) · 866377de

rocking authored Aug 31, 2023

* Add maxpool instances

* Rename index pool to max pool.

* Add maxpool bwd bf16 instances

* Add avg pool bwd instances

* Rename avgpool and maxpool to avg_pool3d and max_pool

* Add bf16 pool fwd instances

* Add max pool bwd to ckProfiler

* Add avg pool3d bwd to ckProfiler

* Add avg pool bwd test

* Fix bug of reference pool fwd (dilation)

* Fix bug of max pool bwd  (dilation and initZero)

* Support bf16 compute data type

* Force compute type be f32. Because atomicAdd only support f32

* Add max pool bwd test

* Rename folder

* Rename pool

* Add max pool bwd client example

* Add avg pool bwd client example

* Add missing workspace

* clang format

* Rename macro

* remove useless header

* remove useless layout

866377de

29 Aug, 2023 1 commit

add an example of customized type convert - bfp16_rtn (#869) · 38ada109

zjing14 authored Aug 29, 2023



* add an example of customized bfp16_rtn

* fixed threadwise_copy

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

38ada109

23 Aug, 2023 1 commit
- use correct data types in cmake conditions for splitk gemm example (#862) · 7c71dc7e
  Illia Silin authored Aug 23, 2023
  
  7c71dc7e
22 Aug, 2023 1 commit

Add instances/ckProfiler/client example for fp8/fp16 mixed precision Gemm (#853) · eac50708

Rostyslav Geyyer authored Aug 22, 2023



* Add ComputeType arg to splitk device and gridwise ops

* Update for gridwise op compatibility

* Update bf16 and int8 splitk gemm examples with ComputeType

* Add instances

* Update ckProfiler for mixed precision cases

* Add a mixed precision splitK gemm client example

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

eac50708

14 Aug, 2023 2 commits

Implement DPP8 based GEMM for Navi21 (#826) · d4c84256
Bartlomiej Wroblewski authored Aug 14, 2023

d4c84256

Refactor pool fwd (#815) · f60f0a5e

rocking authored Aug 15, 2023

* Do not hardcode stride

* devicePool2DFwd Inherit devicePool3DFwd

* Move instance declaration out of common

* Add dilation

* use the pool3d rank, because pool2d inherit pooo3d

* calculate Do Ho Wo for the dilation

* Fix header name

* Modify ckProfiler

* Remove pool2d instance

* Remove pool2d in profiler

* Remove pool2d and add dilation

* In to client example, this commit revise following:
1. Add dilation.
2. Use pool3d to implement pool2d

* Refine naming and IsSupportedArgument()

* Add dilation to maxpool bwd example

* clang format

* 1. Remove useless header
2. Fix copyright
3. Refine naming

* Add layout parameter to pool fwd

* clang format

* Fix merge error

* Fix compile error

* Remove layout parameter in derived class

* Refine changlog

* Fix compile error

* Fix compiler error

* Add layout to external api and profiler

f60f0a5e

10 Aug, 2023 1 commit

Average pool backward deviceOP and example (#797) · 578142db

rocking authored Aug 10, 2023

* Add avgpool bwd reference code

* Refine naming

* Fix invalid in_element op in ref_conv

* Add example (only reference now)

* Add the full example of avgpool bwd

* Fix copyright

* Imitate MakeDescriptor from  transform_conv_bwd_data_to_gemm_v1.hpp

* rename channel to c from k

* Arrange the code

* Imitate the argument from conv bwd

* Implement invoker

* Fix order of parameter in example

* Refactor reference code for different dimension

* Support different stride

* Check if argument is valid

* Fix kernel parameter for NDHWC, fastest dimension C is not reduced

* Add more data type in example

* Fix bug in example

* calculate Do Ho Wo according to the dilation

* Remove useless header

* Add comment in reference code

* Add layout parameter

* Remove layout in derived class

* Refine reference comment

578142db

09 Aug, 2023 1 commit

Enable f16/f8 mixed precision mode (#820) · 9c54eaab

Rostyslav Geyyer authored Aug 09, 2023

* Enable f16/f8 mixed precision

* Add an argument to enable mixed precision

* Update for compatibility

* Add mixed precision example

* Introduce ComputeType argument

9c54eaab

07 Aug, 2023 2 commits

Allow building CK for specific data types and split off last remaining DL instances. (#830) · 08eb1769

Illia Silin authored Aug 07, 2023

* properly split conv_nd_bwd_data instances

* split conv2d_fwd instance data types

* split the gemm, conv2d_fwd and batched_gemm_softamx_gemm

* split the tests by data types where possible

* filter examples by DTYPES

* split few remaining examples by DTYPES

* filter most instances by DTYPES

* add new lines at end of headers, fix grouped_gemm profiler

* fix syntax

* split the ckprofiler instances by DTYPES

* split the conv2d and quantization DL and XDL instances

* fix the splitting of conv2d DL instances

* split softmax and pool_fwd tests for fp16 and fp32 types

* fix syntax

* fix the dl_int8 quantization instances isolation

08eb1769

Add wei_strides to grouped conv3d wei to keep consistency (#817) · 22443f7a

Bartłomiej Kocot authored Aug 07, 2023



* Add wei_strides to grouped conv3d wei to keep consistency

* Fix strides in client examples

* Unify backward weight api with forward

* Fix for example

* Fixes for examples

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

22443f7a

26 Jul, 2023 3 commits

initial stream-k implementation with example (#699) · e7dca79d

carlushuang authored Jul 27, 2023



* initial stream-k implementation with example

* fix unexpected change in err

* improve a little bit performance by reorganize pipeline.

* improve perf a little bit by swizzle block idx

* add profiler

* update example

* fix spelling

* shrink karg for streamk

* support dynamic buffer using memory coherence glc_slc bit from template

* control memory coherence while construct dynamic buffer

* update reduction for streamk(not ready yet)

* Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting

* fix build issue

* fix several bug

* now result is correct, everything works (but has scratch)

* remove scratch by manually reset coordinate

* update device code

* fix a bug in final reduce

* fix something in example

* update async memset

* fix enum as camel case

* modify coherence enum name

* clean code and use atomic streamk by default

* remove unused var

* throw exception if have empty pointer

* fix format

* fix CI warning

* fix type in init

* modify CI error

* filter out on gfx10+

* restore changed example code

---------
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>

e7dca79d

Disable XDL kernels on unsupported HW Add ck::is_xdl_supported (#768) · ac6d68b3

Bartłomiej Kocot authored Jul 26, 2023



* Disable XDL kernels on unsupported HW; Add ck::is_xdl_supported function (#765)

* Do not throw an error when GEMM problem is not supported.

---------
Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

ac6d68b3

Refine the dimension of host tesnor. This example only require 1D (#812) · 016bd428
rocking authored Jul 26, 2023

016bd428