Commits · 6b1490c9517d6c99d4a0c16bc870909f88bfcbd2 · gaoqiong / composable_kernel

11 Oct, 2023 2 commits

Revert "Grouped Gemm with looping over the tiles. (#788)" (#982) · c99323be
zjing14 authored Oct 11, 2023
```
This reverts commit a4f72a31.
```
c99323be

Grouped Gemm with looping over the tiles. (#788) · a4f72a31

Adam Osewski authored Oct 11, 2023



* Introduce LocalBlockToCTileMap.

* Change the signature of CalculateBottomIndex() function which now does
not accept any argument. The B2C map which is already passed as an
argument to the kernel Run function is calculating block's local id
already outside at kernel entry point __global__ function.
The LocalB2C map stores as members local block ID.

* Use LocalBlockToCTile map in device ops.

* First draft of tile loop work distribution.

* Fix typo.

* Simplify kernel arguments.

Calculate descriptors & B2C maps on the device.

* Use looping kernel.

* Fix B2C constructor.

* Fix Navi21 errors.

* Calculate tile start/end in device kernel.

* Change Run API to accept user provided workspace buffer.

* Add new line at EOF.

* Move Gemm KernelArguments to device op interface.

* Remove unused code.

* Update API.

* Launch grid size which is min of occupancy vs tile count

* Get back to use constant memory for gemm descriptors.

* Remove unused code.

* Add default virtual method implementation.

* Update comments to conform with doxygen style.

* Fix doc style and unused parameters.

* Add thread cluster lengths to kernel name.

* Remove old splitk impl and replace it with tile looping one.

* Modify instances.

* set KPerBlock to 64
* maximize wherever possible vector load size.

* Fix instances cluster lengths.

* Change comment style.

* Use 128b store where possible in instances.

* Update test cases, since KPerBlock has doubled.

* Update output stream operator for Sequence.

* Add pipeline version to GroupedGEMM device op type string.

* Fix pipeline version type logging.

* Fix input tensors type after merge.

* Fix compiler error.

* Fix output stream operator for Pipeline version.

* Store using 128b.

* Set of instances with kpb 32/64

* Limit number of instances

* Remove commented out instances.

* Fix function name.

* Limit the number of instances.

Add pipline version to the regular instances

* Change thr cluster layout for reading B tensor.

* disabled failed instances

---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Jing Zhang <jizha@amd.com>

a4f72a31

10 Oct, 2023 1 commit
- Fix MNKPadding in gridwise_gemm_xdlops_v2r3 (#981) · 98c80714
  Bartłomiej Kocot authored Oct 10, 2023
  
  98c80714
05 Oct, 2023 2 commits
- Replace CMake `return` from later CMake (#970) · 59136091
  Lauren Wrubleski authored Oct 05, 2023
  
  59136091
- Revert "Add support for mixed precision in contraction scale and bilinear" (#967) · 4daedf8c
  Illia Silin authored Oct 05, 2023
```
* Revert "Add support for mixed precision in contraction scale and bilinear (#936)"

This reverts commit f0748506.

* revert commits #957 and #960
```
  4daedf8c
29 Sep, 2023 1 commit

Add support for mixed precision in contraction scale and bilinear (#936) · f0748506

Bartlomiej Wroblewski authored Sep 29, 2023

* Extract common functionality to separate files

* Reference contraction: Remove incorrect consts from type_converts

* Reference contraction: Add missing type_convert for dst value

* Reference contraction: Fix incorrect order of B matrix dimensions

* Add support for mixed precision in contraction scale and bilinear

* Move using statements from instances to a common file

* Move using statements from examples to a common file

* Fix the order of B matrix dimensions across examples and profiler

* Fix the computation of error threshold

* Make ComputeDataType an optional argument

* Include possible DataType -> ComputeDataType casting error in the threshold

* Remove commented code

f0748506

28 Sep, 2023 1 commit

Add grouped conv bwd data wmma (#950) · cb538740

Bartłomiej Kocot authored Sep 28, 2023

* Add grouped conv bwd data wmma

* Fix copyrights

* Add instances with smaller NPerBlock

* Update interface test

* Minor stylistic fixes

* Minor stylistic fixes

cb538740

27 Sep, 2023 4 commits

Fix gemm_splitk test, add hip_check_error after kernel calls in kernel_launch. (#951) · bc1108bb

Illia Silin authored Sep 27, 2023



* Added error check after kernel launch (#919)
Co-authored-by: Xiaodong Wang <xdwang@meta.com>
Co-authored-by: Xiaodong Wang <xw285@cornell.edu>

* remove M=0 test cases for test_gemm_splitk

---------
Co-authored-by: Xiaodong Wang <xdwang@meta.com>
Co-authored-by: Xiaodong Wang <xw285@cornell.edu>

bc1108bb

Handle type conversions to a const datatype (#944) · f4af5aed

Bartlomiej Wroblewski authored Sep 27, 2023

* Handle type conversions to a const datatype

* Review: Handle X being const data type as well

* Review: Remove typo

f4af5aed

Add column to image kernel (#930) · e2243a4d

Bartłomiej Kocot authored Sep 27, 2023

* Add column to image kernel

* Minor fixes for dtypes and client examples

* Disable tests for disabled dtypes

* Disable add instances functions for disabled data types

* Minor stylistic fixes

* Revert "Disable add instances functions for disabled data types"

This reverts commit 728b8695.

* Instances reduction

* Add comments in device_column_to_image_impl

* Update changelog and Copyrights

* Improve changelog

e2243a4d

disabled failed instances · 4e5190f5
Jing Zhang authored Sep 27, 2023

4e5190f5

26 Sep, 2023 1 commit

Resolve some data type issues and cmake policy. (#940) · 2ea75bd6

Illia Silin authored Sep 26, 2023

* split the types in gemm_bilinear instances, add condition to cmake policy

* fix syntax

* split the data types in batchnorm examples

* fix the batchnorm_bwd test

* fix types in the batchnorm_bwd test

2ea75bd6

23 Sep, 2023 1 commit

Add 3d grouped conv fwd wmma instances (#935) · c9553832

Bartłomiej Kocot authored Sep 23, 2023

* Add 3d grouped conv fwd wmma instances

* Refactor fwd conv tests

* Split wmma instances for each specialization

* Minor stylistic fixes

c9553832

21 Sep, 2023 1 commit

Refactoring cmake files to build data types separately. (#932) · bba085d2

Illia Silin authored Sep 20, 2023

* refactor cmake files for the tests

* refactor cmake files for examples

* fix cmake for gemm example

* fix the cmake file for all examples

* add splitting by data types in gemm_splitk instance header

* rename test to reflect only dl instances are used

* clean up CI workspace, update cmake for instances

* change the jenkinsfile syntax

* build all instances except DL on gfx11

* move workspace cleanup after stages

* clean up workspace after every stage

* isolate data types in grouped_conv_fwd header

* isolate dl instances for grouped_conv2d_fwd

* fix syntax

* fix cmake and batchnorm instances

* fix typo

* fix reduction instances

* fix grouped_conv headers

* fix syntax

* replace parsing logic for instances, replace bfp16 with bf16

* fix the client examples build

* clean up DTYPES from instances cmake files

* update the parsing logic in cmake files

* make an exception for reduction kernels

* update few remaining cmake files to handle DTYPES

* fix syntax

* fix cmake conflicts

* replace f8 with fp8 test name

* resolve conflicts for dpp instances

bba085d2

15 Sep, 2023 1 commit
- Stylistic improvements for grouped convolution code · bc2d0583
  Bartlomiej Kocot authored Sep 13, 2023
```
Remove unnecessary ignoring

Update test/grouped_convnd_bwd_weight/test_grouped_convnd_bwd_weight.cpp
```
  bc2d0583
13 Sep, 2023 1 commit

Add grouped conv bwd weight dl instances and new layout (#897) · 475188ca

Bartłomiej Kocot authored Sep 13, 2023

* Add grouped conv bwd weight dl instances and new layout

* Add M and N padding

* Remove todo comment

* Enable grouped conv fwd dl k,c=1 generic instance

* Comment fixes

475188ca

12 Sep, 2023 1 commit

Refactor f8_t, add bf8_t (#792) · 62d4af74

Rostyslav Geyyer authored Sep 12, 2023

* Refactor f8_t to add bf8_t

* Add check_err impl for f8_t

* Update fp8 test

* Format

* Revert the fix

* Update vector_type implementation

* Add bf8 test

* Add bf8, use BitInt types

* Add bf8 conversion methods

* Update type_convert for fp8/bf8

* Add check_err fp8/bf8 support

* Add subnorm fp8 tests

* Add subnorm bf8 tests

* Fix conversion

* Add bf8 cmake bindings

* Add macros to enable build with disabled fp8/bf8

* Remove is_native method

* Update flag combination for mixed precision instances

* Add more flag checks

* Add another flag to a client example

* Add type traits, decouple f8/bf8 casting

* Clean up

* Decouple fp8 and bf8 flags

* Remove more redundant flags

* Remove leftover comments

62d4af74

05 Sep, 2023 1 commit

Add image to column kernel (#867) · 0077eeb3

Bartłomiej Kocot authored Sep 05, 2023

* Add image to column kernel

* Add instances, tests, profiler, example

* Add client example

* Several fixes of image to column

* Fix variable name in device_image_to_column_impl

* Several fixes of image to column profiler

* Fix num_btype calculation

* Make new mesaurements for correct bytes calculation

0077eeb3

31 Aug, 2023 1 commit

MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861) · 866377de

rocking authored Aug 31, 2023

* Add maxpool instances

* Rename index pool to max pool.

* Add maxpool bwd bf16 instances

* Add avg pool bwd instances

* Rename avgpool and maxpool to avg_pool3d and max_pool

* Add bf16 pool fwd instances

* Add max pool bwd to ckProfiler

* Add avg pool3d bwd to ckProfiler

* Add avg pool bwd test

* Fix bug of reference pool fwd (dilation)

* Fix bug of max pool bwd  (dilation and initZero)

* Support bf16 compute data type

* Force compute type be f32. Because atomicAdd only support f32

* Add max pool bwd test

* Rename folder

* Rename pool

* Add max pool bwd client example

* Add avg pool bwd client example

* Add missing workspace

* clang format

* Rename macro

* remove useless header

* remove useless layout

866377de

23 Aug, 2023 1 commit

[HotFix] add config and version files to pass on build info (#856) · c8a8385f

Jun Liu authored Aug 23, 2023

* experiment with config file

* experiment with version.h config

* add more info to version.h

* minor updates

* minor updates

* fix case where DTYPE is not used

* large amount of files but minor changes

* remove white space

* minor changes to add more MACROs

* fix cmakedefine01

* fix issue with CK internal conflict

* fix define and define value

* fix clang-format

* fix formatting issue

* experiment with cmake

* clang format v12 to be consistent with miopen

* avoid clang-format for config file

c8a8385f

22 Aug, 2023 3 commits
- Ck profiler splitk (#857) · ca3115e7
  zjing14 authored Aug 22, 2023
```
* updated regular gemm

* update ckProfiler

* fixed gtests

---------
Co-authored-by: Jing Zhang <jizha@amd.com>
```
  ca3115e7
- Fix transform and instances for grouped conv bwd data (#848) · 595d23be
  Bartłomiej Kocot authored Aug 22, 2023
```
* Fix transform and instances for grouped conv bwd data

* Add instances for small K and small C

* Remove workaround after fix

* Fix interface tests
```
  595d23be
- Update test cases, since KPerBlock has doubled. · 6b1738f3
  Adam Osewski authored Aug 22, 2023
  
  6b1738f3
14 Aug, 2023 1 commit

Refactor pool fwd (#815) · f60f0a5e

rocking authored Aug 15, 2023

* Do not hardcode stride

* devicePool2DFwd Inherit devicePool3DFwd

* Move instance declaration out of common

* Add dilation

* use the pool3d rank, because pool2d inherit pooo3d

* calculate Do Ho Wo for the dilation

* Fix header name

* Modify ckProfiler

* Remove pool2d instance

* Remove pool2d in profiler

* Remove pool2d and add dilation

* In to client example, this commit revise following:
1. Add dilation.
2. Use pool3d to implement pool2d

* Refine naming and IsSupportedArgument()

* Add dilation to maxpool bwd example

* clang format

* 1. Remove useless header
2. Fix copyright
3. Refine naming

* Add layout parameter to pool fwd

* clang format

* Fix merge error

* Fix compile error

* Remove layout parameter in derived class

* Refine changlog

* Fix compile error

* Fix compiler error

* Add layout to external api and profiler

f60f0a5e

09 Aug, 2023 1 commit

Enable grouped conv with small K or C (#822) · 472fa029

Bartłomiej Kocot authored Aug 09, 2023

* Enable grouped conv with small K or C

* Add missing instances

* Refactor grouped conv fwd instances

* Fix fp16 instances since it supports src_per_vec %2 = 0

* Add generic instances

472fa029

07 Aug, 2023 2 commits

Allow building CK for specific data types and split off last remaining DL instances. (#830) · 08eb1769

Illia Silin authored Aug 07, 2023

* properly split conv_nd_bwd_data instances

* split conv2d_fwd instance data types

* split the gemm, conv2d_fwd and batched_gemm_softamx_gemm

* split the tests by data types where possible

* filter examples by DTYPES

* split few remaining examples by DTYPES

* filter most instances by DTYPES

* add new lines at end of headers, fix grouped_gemm profiler

* fix syntax

* split the ckprofiler instances by DTYPES

* split the conv2d and quantization DL and XDL instances

* fix the splitting of conv2d DL instances

* split softmax and pool_fwd tests for fp16 and fp32 types

* fix syntax

* fix the dl_int8 quantization instances isolation

08eb1769

Add wei_strides to grouped conv3d wei to keep consistency (#817) · 22443f7a

Bartłomiej Kocot authored Aug 07, 2023



* Add wei_strides to grouped conv3d wei to keep consistency

* Fix strides in client examples

* Unify backward weight api with forward

* Fix for example

* Fixes for examples

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

22443f7a

27 Jul, 2023 1 commit

Add s_nops after v_dot to avoid hazard (#808) · 7761e523

Bartłomiej Kocot authored Jul 27, 2023

* Add s_nops after v_dot to avoid hazard

* Fix builtin for inner_produxt fp16

* Skip inline version to builtin

* Add comments regarding isa

* Fix comment regarding s_nop

7761e523

26 Jul, 2023 2 commits

initial stream-k implementation with example (#699) · e7dca79d

carlushuang authored Jul 27, 2023



* initial stream-k implementation with example

* fix unexpected change in err

* improve a little bit performance by reorganize pipeline.

* improve perf a little bit by swizzle block idx

* add profiler

* update example

* fix spelling

* shrink karg for streamk

* support dynamic buffer using memory coherence glc_slc bit from template

* control memory coherence while construct dynamic buffer

* update reduction for streamk(not ready yet)

* Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting

* fix build issue

* fix several bug

* now result is correct, everything works (but has scratch)

* remove scratch by manually reset coordinate

* update device code

* fix a bug in final reduce

* fix something in example

* update async memset

* fix enum as camel case

* modify coherence enum name

* clean code and use atomic streamk by default

* remove unused var

* throw exception if have empty pointer

* fix format

* fix CI warning

* fix type in init

* modify CI error

* filter out on gfx10+

* restore changed example code

---------
Co-authored-by: Qianfeng Zhang <Qianfeng.Zhang@amd.com>

e7dca79d

Disable DL kernels by default. (#816) · 9195435c
Illia Silin authored Jul 26, 2023

9195435c

21 Jul, 2023 1 commit
- Grouped conv bwd wei NDHWGC/NDHWGK (#804) · 10732847
  Bartłomiej Kocot authored Jul 21, 2023
  
  10732847
18 Jul, 2023 2 commits

Grouped 3d conv backward data support (#799) · 49180fd6
Bartłomiej Kocot authored Jul 18, 2023
```
* Grouped 3d conv backward data support

* Fix comments
```
49180fd6

Add mechanism to build CK for select data types, add Navi3x CI. (#790) · 189ea3b9

Illia Silin authored Jul 17, 2023

* allow building CK for specific data types

* add CI build and test stage on Naiv3x without some int8 instances

* add missing gemm fp16 instances

* add the changes to the missed cmake file

* add empty lines at end of source files

* Do not build quantization client example on navi3 in CI

* disable batched_gemm_multi_d_int8 instances with DTYPES

* disable device_conv2d_bwd_data_instance with DTYPES

* fix ckprofiler for conv_bwd_data for int8

* properly isolate the conv_bwd_data int8 instances

* remove empty line

189ea3b9

12 Jul, 2023 1 commit

Support NHWGC conv2d_bwd_weight (#769) · 1ee99dca

Bartłomiej Kocot authored Jul 12, 2023



* Support NHWGC conv2d_bwd_weight

* Fix client example

* Fix client example

* Fix comments

* Redesign grouped_conv_bwd_weight instances

* Clang format fix

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

1ee99dca

06 Jul, 2023 2 commits
- Move Device Ops implementations into impl directory. (#777) · f4dfc060
  Adam Osewski authored Jul 06, 2023
```
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
```
  f4dfc060
- Fix copyrights for DeviceBatchedGemmMultipleD_Dl · 2b0b6d9f
  Bartlomiej Kocot authored Jul 05, 2023
  
  2b0b6d9f
21 Jun, 2023 1 commit

Support bf16/f32/f16 and NHWGC conv2d_bwd_data (#757) · 63388e84

Bartłomiej Kocot authored Jun 21, 2023

* Support bf16/f32/f16 and NHWGC conv2d_bwd_data

* Add interface test

* clang format

* Comment fixes

* Add more friendly error message

63388e84

19 Jun, 2023 1 commit

FP8 enablement - add a pseudorandom number generator, add conversion methods (#708) · f0c620c4

Rostyslav Geyyer authored Jun 19, 2023

* Add basic fp8 definitions and prn-generator

* Format

* Add fp8<->fp32 type_convert

* Format

* Split type_convert and cast_to/from_f8

* Format

* Minor fix

* Minor fix

* Move fp8 utils to a separate header

* Add elementwise ops

* Add fp8_convert_sr

* Format

* Add element op

* Eliminate magic numbers

* Split f8_convert_sr in host and device

* Format

* Add some constexpr

* Add a datatype test

* Format

* Another format

* Add fp8<->fp16 tests

* Update type_converts

* Format

* Add fp16 casting functions

* Format

* Use seed as a runtime arg

* Use element location for PRNG

* Format

* Add fp8<->fp16 to PassThrough element op

* Clean up

* Merge host and device implementations

* Add comments on rounding modes

* Remove leftover code

* Put type_converts into a separate header

* Put random number gen to a separate header

* Rearrange f8_utils' namespaces

* Refactor type_convert.hpp

* Move f8_t definition

f0c620c4

17 Jun, 2023 1 commit

Padded Generic Kernel Instance (#730) · 0d911822

Qianfeng authored Jun 17, 2023



* Add NumReduceDim template parameter to DeviceSoftmax and Softmax client API to simplify instances collecting

* Move the generic kernel instance to be the first of the instance list for elementwise op of normalization

* Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax

* Add testing of GetGenericInstance() in client_example of Softmax

* Revert "Add testing of GetGenericInstance() in client_example of Softmax"

This reverts commit f629cd9a93ce38dfed4886d849f3c38d2e5379c8.

* Revert "Add GetGenericInstance() interface for DeviceOperationInstanceFactory class of DeviceSoftmax"

This reverts commit a9f0d000eb9fd240404112a526ef125429a351df.

* Support generic kernel instance to be the first instance returned by GetInstances() for GroupNorm

* Move generic kernel instance to separate tuple for elementwise op of normalization

* Remove un-used files for softmax instance

* Store generic kernel instance to separate tuple for softmax

* Add IsSupported checking for generic instance to client example of softmax

* Replace the get_device_normalize_from_mean_meansquare_instances() by the DeviceOperationInstanceFactory class for elementwise-normalization

* clang-format fix

* Remove int8 from softmax instances

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

0d911822

15 Jun, 2023 1 commit

Enable gfx941 and gfx942 architectures. (#752) · 027e46ee

Illia Silin authored Jun 15, 2023

* enable gfx941/942 targets

* fix clang format

* fix the cmake logic for multiple targets

* fix cmake syntax for looping over targets

* add gfx941/942 support for gemm_xdl instances

027e46ee