Commits · 72b7ae251aa9060b59e08897e1a5ac724bad43ed · gaoqiong / composable_kernel

15 May, 2023 1 commit

Update staging branch. (#706) · 72b7ae25

Illia Silin authored May 15, 2023



* update daily build from rocm 5.4.3 to 5.5 (#693)

* Fix grouped_gemm_splitk kernels on MI300. (#694)

* replace amd_buffer_atomic_add with hip_atomic_add

* fix grouped_gemm_splitk kernels on mi300

* fix syntax

* revert experimental atomic_add changes

---------
Co-authored-by: Jing Zhang <jizhan@amd.com>

* Fix the group of quantization_int8 kernels on MI300. (#695)

* replace amd_buffer_atomic_add with hip_atomic_add

* fix grouped_gemm_splitk kernels on mi300

* fix syntax

* revert experimental atomic_add changes

* fix the group of kernels from ticket 723 on MI300

---------
Co-authored-by: Jing Zhang <jizhan@amd.com>

* Optimize bf16 conversion (#664)

* Add TypeConvert class and start refactoring

* Refactor TypeConvert as a struct

* Get back to template functions type_convert

* Add a type_convert_bf16_rtn, set rtz as default

* Clean up

* Add UnaryConvertPrecision struct for high-precision workloads

* Format

* Update type_convert to UnaryConvert on threadwise level

* Update UnaryConvertPrecision

* Format

* Fix chmod

* Add a flag to pick converion method

* Format

* Remove the added flag

* Merge elementwise op with type conversion

* Move type_convert to elemwise op, update the op

* Update type_convert_precision -> bf16_convert_rtn

* Clean up

* Update comments

* Update the CK_WORKAROUND_DENORM_FIX flag handling

* Update the unneeded op to work but warn user

* Remove the message

* Use a PassThrough instead of ConvertBF16RTN to calcaulate reference

* Format

* Add missing include

* Normalization/split k (#615)

* Add contraction profiler and tests (#701)

* Add contraction profiler and tests

* Build and style fixes

* Allow to use any elementwise operator for ref_contraction

* Introduce profile_contraction_scale and profile_contraction_bilinear

* Make ref_contraction generic and extend interface tests

* Stylistic minor fixes

* Extend test_contraction_interface

---------
Co-authored-by: Jing Zhang <jizhan@amd.com>
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
Co-authored-by: rocking <ChunYu.Lai@amd.com>
Co-authored-by: Bartłomiej Kocot <bartlomiejkocot98@gmail.com>

72b7ae25

15 Feb, 2023 1 commit

Add contraction_fp64 example (#570) · 24c9ee1d

zjing14 authored Feb 15, 2023



* add contraction_bilinear

* add contraction_scale_xdl_fp64

* reduce tile size to avoid register spill

---------
Co-authored-by: root <root@ctr-ubbsmc16.amd.com>

24c9ee1d

15 Nov, 2022 1 commit

Introduce ck::accumulate_n() (#439) · 730204ee

Po Yen Chen authored Nov 15, 2022

We can use this template to eliminate duplicated iterator computing
logics. By providing return type to ck::accumulate_n(), we can avoid
type conversion operations.

730204ee

11 Nov, 2022 1 commit

Rangify constructor of HostTensorDescriptor & Tensor<> (#445) · 4a2a56c2

Po Yen Chen authored Nov 12, 2022

* Rangify STL algorithms

This commit adapts rangified std::copy(), std::fill() & std::transform()

* Rangify check_err()

By rangifying check_err(), we can not only compare values between
std::vector<>s, but also compare any ranges which have same value
type.

* Allow constructing Tensor<> like a HostTensorDescriptor

* Simplify Tensor<> object construction logics

* Remove more unnecessary 'HostTensorDescriptor' objects

* Re-format example code

* Re-write more HostTensorDescriptor ctor call

4a2a56c2

13 Oct, 2022 1 commit

Refactor device op implementations into `impl` subdirectory. (#420) · 30480288

Adam Osewski authored Oct 13, 2022



* Move kernel implementation files under impl directory.

* Update examples paths.

* Update device kernel impl include paths.

* Update tensor operation instances include paths.

* Update profiler and tests include paths.

* Clang-format

* Update include paths for batched gemm reduce

* Refactor UnitTest ConvNDBwdWeight.

* Refactor fwd and bwd data convND UT.

* Fix used test macro.

* Fix include path.

* Fix include paths.

* Fix include paths in profiler and tests.

* Fix include paths.
Co-authored-by: Adam Osewski <aosewski@amd.com>

30480288

29 Jul, 2022 1 commit

Clean up conv example, Instances, profiler and test (#324) · 500fa995

Chao Liu authored Jul 29, 2022

* convnd_fwd fp16 example

* update example

* update example

* update instance

* updating refernce conv

* update reference conv

* update conv fwd profiler

* update conv 1d and 3d instance

* update include path

* clean

* update profiler for conv bwd data and weight

* update conv bwd weight

* clean

* update conv example

* update profiler for conv bwd weight

* update ckprofiler for conv bwd data

* fix reference conv bwd data bug; update conv bwd data test

* update examples

* fix initialization issue

* update test for conv fwd

* clean

* clean

* remove test case too sensitive to error threshhold

* fix test

* clean

* fix build

* adding conv multiple d

* adding conv multiple D

* add matrix padder

* add gemm padding to convnd

* adding group conv

* update gemm multi-d

* refactor

* refactor

* refactor

* clean

* clean

* refactor

* refactor

* reorg

* add ds

* add bias

* clean

* add G

* adding group

* adding group

* adding group

* update Tensor

* clean

* update example

* update DeviceGemmMultipleD_Xdl_CShuffle

* update conv bwd-data and bwd-weight

* upate contraction example

* update gemm and batch gemm with e permute

* fix example build

* instance for grouped conv1d

* update example

* adding group conv instance

* update gemm bilinear instance

* update gemm+add+add+fastgelu instance

* update profiler

* update profiler

* update test

* update test and client example

* clean

* add grouped conv into profiler

* update profiler

* clean

* add test grouped conv, update all conv test to gtest

* update test

500fa995

08 Jul, 2022 1 commit

GEMM pipeline v2 (#317) · 63914743

Po Yen Chen authored Jul 09, 2022



* format

* improving pipeline

* fix typo

* format

* adding thread group

* adding thread group

* adding thread group

* adding gemm pipeline

* tweak

* refactor

* refactor

* add missing type convert

* refactor

* refactor

* refactor

* clean

* fix build

* refactor

* format

* clean up

* use remove_cvref_t

* clean

* use pipeline_v2 for gemm kernel

* Remove inconsistent indent

* Fix compilation errors due to incomplete merge process

* Add missing include directives

* Fix compilation errors in currently unused files

* Add license in newly added files

* Re-format touched files by clang-format-10

* Fix wrong template argument count of DeviceGemm<>

* Use language construct to choose between types

* Use language construct to choose GEMM example instance

* Fix compilation error due to interface change

* Re-use type alias to avoid duplication

* Unify type alias usage in source file

* Only use v2 pipeline in one gridwise GEMM type

* Remove no-longer used include directives

* Add static_assert() to check pipeline type requirements

* Revert "Add static_assert() to check pipeline type requirements"

This reverts commit f0985f0a132671a1caaea92810c9f30dcf062bde.

* clean

* clean

* clean

* clean
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: shaojiewang <wsjmessi@163.com>

63914743

07 Jul, 2022 1 commit

N-D Tensor Contraction example, instance, and client example (#270) · 4fe9c393

Chao Liu authored Jul 07, 2022

* adding contraction

* add contraction example

* update examle

* update example

* format

* update readme

* clean header

* clean header

* contraction with multiple D

* rename

* fix naming issue; add instances for contraction+bilinear

* change assumed virtual layout of contraction; add client example

* update example

* update

* contraction+scale

* use type_convert

* rename

4fe9c393