Commits · db376dd8a4eb36c6f8e9b100b89d8a1371d76f4c · gaoqiong / composable_kernel_ROCM

16 Apr, 2024 1 commit

carlushuang authored Apr 16, 2024

* enable gfx940

* switch between intrinsic mfma routines on mi100/200 and mi300

* fix mfma_int8 on MI300

* disable 2 int8 examples on MI300

* Update cmake-ck-dev.sh

* restore gitignore file

* modify Jenkinsfile to the internal repo

* Bump rocm-docs-core from 0.24.0 to 0.29.0 in /docs/sphinx

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.24.0 to 0.29.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.24.0...v0.29.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>

* initial enablement of gfx950

* fix clang format

* disable examples 31 and 41 int8 on gfx950

* add code

* fix build wip

* fix xx

* now can build

* naming

* minor fix

* wip fix

* fix macro for exp2; fix warpgemm a/b in transposedC

* unify as tuple_array

* Update the required Python version to 3.9

* Update executable name in test scripts

* re-structure tuple/array to avoid spill

* Merge function templates

* Fix format

* Add constraint to array<> ctor

* Re-use function

* Some minor changes

* remove wrong code in store_raw()

* fix compile issue in transpose

* Rename enum
Rename 'cood_transform_enum' to 'coord_transform_enum'

* let more integral_constant->constant, and formating

* make sure thread_buffer can be tuple/array

* temp fix buffer_store spill

* not using custom data type by default, now we can have ISA-level same code as opt_padding

* fix compile error, fp8 not ready now

* fix fp8 duplicated move/shift/and/or problem

* Default use CK_TILE_FLOAT_TO_FP8_STOCHASTIC rounding mode

* fix scratch in fp8 kernel

* update some readme

* fix merge from upstream

* sync with upstream

* sync upstream again

* sync 22

* remove unused

* fix clang-format

* update README of ck_tile example

* fix several issue

* let python version to be 3.8 as minimal

* remove ck_tile example from default cmake target like all/install/check

* remove mistake

* 1).support receipe in generate.py 2).use simplified mask type 3).change left/right to pass into karg

* fix some bug in group-mode masking and codegen. update README

* F8 quantization for FMHA forward (#1224)

* Add SAccElementFunction, PComputeElementFunction, OAccElementFunction in pipeline

* Add element function to fmha api

* Adjust P elementwise function

* Fix bug of elementwise op, our elementwise op is not inout

* Add some elementwise op, prepare to quantization

* Let generate.py can generate different elementwise function

* To prevent compiler issue, remove the elementwise function we have not used.

* Remove f8 pipeline, we should share the same pipeline even in f8

* Remove remove_cvref_t

* Avoid warning

* Fix wrong fp8 QK/KV block gemm setting

* Check fp8 rounding error in check_err()

* Set fp8 rounding error for check_err()

* Use CK_TILE_FLOAT_TO_FP8_STANDARD as default fp8 rounding mode

* 1. codgen the f8 api and kernel
2. f8 host code

* prevent warning in filter mode

* Remove not-in-use elementwise function kargs

* Remove more not-in-use elementwise function kargs

* Small refinements in C++ source files

* Use conditional_t<> to simplify code

* Support heterogeneous argument for binary function types

* Re-use already-existing scales<> functor template

* Fix wrong value produced by saturating

* Generalize the composes<> template

* Unify saturates<> implementation

* Fix type errors in composes<>

* Extend less_equal<>

* Reuse the existing template less_equal<> in check_err()

* Add equal<float> & equal<double>

* Rename check_err() parameter

* Rename check_err() parameter

* Add FIXME comment for adding new macro in future

* Remove unnecessary cast to void

* Eliminate duplicated code

* Avoid dividing api pool into more than 2 groups

* Use more clear variable names

* Use affirmative condition in if stmt

* Remove blank lines

* Donot perfect forwarding in composes<>

* To fix compile error, revert generate.py back to 4439cc107dd90302d68a6494bdd33113318709f8

* Fix bug of p element function

* Add compute element op to host softmax

* Remove element function in api interface

* Extract user parameter

* Rename pscale and oscale variable

* rename f8 to fp8

* rename more f8 to fp8

* Add pipeline::operator() without element_functor

* 1. Remove deprecated pipeline enum
2. Refine host code parameter

* Use quantization range as input

* 1. Rename max_dtype to dtype_max.
2. Rename scale to scale_s
3.Add init description

* Refine description

* prevent early return

* unify _squant kernel name in cpp, update README

* Adjust the default range.

* Refine error message and bias range

* Add fp8 benchmark and smoke test

* fix fp8 swizzle_factor=4 case

---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Po-Yen, Chen <PoYen.Chen@amd.com>
Co-authored-by: rocking <ChunYu.Lai@amd.com>

db376dd8

12 Apr, 2024 1 commit
- Update the config.h after the CK_USE_XDL/WMMA are set. (#1236) · 7cdf5a96
  Illia Silin authored Apr 12, 2024
```
* pass XDL and WMMA macros to libs that use CK

* update config.h after XDL and WMMA macros get set
```
  7cdf5a96
02 Apr, 2024 1 commit

Split the instances by architecture. (#1223) · ae57e593

Illia Silin authored Apr 02, 2024

* parse examples inside the add_example_executable function

* fix the example 64 cmake file

* add xdl flag to the gemm_bias_softmax_gemm_permute example

* add filtering of tests based on architecture type

* enable test_grouped_gemm for gfx9 only

* enable test_transpose only for gfx9

* only linnk test_transpose if it gets built

* split the gemm instances by architectures

* split gemm_bilinear,grouped_conv_bwd_weight instances by targets

* split instances by architecture

* split grouped_conv instances by architecture

* fix clang format

* fix the if-else logic in group_conv headers

* small fix for grouped convolution instances

* fix the grouped conv bwd weight dl instances

* fix client examples

* only enable client examples 3 and 4 on gfx9

* set the gfx9 macro

* make sure the architecture macros are set by cmake

* use separate set of xdl/wmma flags for host code

* sinmplify the main cmake file

* add conv_fwd_bf8 instance declaration

ae57e593

03 Jan, 2024 1 commit
- fix the cmake option syntax (#1117) · fbf31a2e
  Illia Silin authored Jan 03, 2024
  
  fbf31a2e
02 Jan, 2024 1 commit
- adding -Wno-switch-default compiler flag (#1115) · b268f273
  Illia Silin authored Jan 02, 2024
  
  b268f273
20 Dec, 2023 1 commit

enable compilation of INSTANCES_ONLY for Windows (#1082) · fb5bd51b

Artur Wojcik authored Dec 20, 2023



* enable compilation of INSTANCES_ONLY for Windows

* suppress ROCMChecks warnings on GoogleTests

* suppress -Wfloat-equal warning on GoogleTests

---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

fb5bd51b

19 Dec, 2023 2 commits
- ROCm 6.0 replaces all __HIP_PLATFORM_HCC__ with __HIP_PLATFORM_AMD__ (#1106) · 3ab1838f
  Jun Liu authored Dec 19, 2023
```
* ROCm 6.0 replaces all __HIP_PLATFORM_HCC__ with __HIP_PLATFORM_AMD__

* make it backward compatible

* Update .clang-tidy

* Update ClangTidy.cmake
```
  3ab1838f
- add -Wno-pass-failed compiler flag (#1105) · 3726a173
  Illia Silin authored Dec 19, 2023
  
  3726a173
15 Dec, 2023 1 commit

cmake: Add CK_PARALLEL_LINK_JOBS and CK_PARALLEL_COMPILE_JOBS options (#1063) · efaf3106

trixirt authored Dec 14, 2023



Copied from the llvm-project LLVM_PARALLEL_*_JOBS

Concurrent linking can break the build as well as having too many
compile jobs for the avaiable memory.  These options allow the user
to fine tune the build to fit within their machines memory
constraints.

An example use on linux is
COMPILE_JOBS=`cat /proc/cpuinfo | grep -m 1 'cpu cores' | awk '{ print $4 }'`
if [ ${COMPILE_JOBS}x = x ]; then
  COMPILE_JOBS=1
fi
BUILD_MEM=4
MEM_KB=0
MEM_KB=`cat /proc/meminfo | grep MemTotal | awk '{ print $2 }'`
MEM_MB=`eval "expr ${MEM_KB} / 1024"`
MEM_GB=`eval "expr ${MEM_MB} / 1024"`
COMPILE_JOBS_MEM=`eval "expr 1 + ${MEM_GB} / ${BUILD_MEM}"`
if [ "$COMPILE_JOBS_MEM" -lt "$COMPILE_JOBS" ]; then
  COMPILE_JOBS=$COMPILE_JOBS_MEM
fi
LINK_MEM=32
LINK_JOBS=`eval "expr 1 + ${MEM_GB} / ${LINK_MEM}"`

cmake -G Ninja -DCK_PARALLEL_LINK_JOBS=$LINK_JOBS
               -DCK_PARALLEL_COMPILE_JOBS=$COMPILE_JOBS
Signed-off-by: Tom Rix <trix@redhat.com>

efaf3106

05 Dec, 2023 1 commit

Add daily run with mainline compiler. (#1075) · afe46220

Illia Silin authored Dec 04, 2023

* add daily build with mainline compiler

* fix the compiler paths for ci

* remove the -flto flag

* build with clang by default

afe46220

30 Oct, 2023 1 commit

Enable sccache in the default docker and CI. (#1009) · 4e44a9e8

Illia Silin authored Oct 30, 2023



* replace ccache with sccache, pin package versions

* put ccache back temporarily to avoid breaking other CI jobs

* add sccashe_wrapper.sh script

* fix the package version syntax

* fix the pymysql package issue

* run sccache_wrapper before build if ccache server found

* set the paths before calling the sccache_wrapper

* use /tmp instead of /usr/local for cache

* try using sccache --start-server instead of wrapper

* try using redis server with sccache

* define SCCACHE_REDIS

* add redis and ping packages, and redis port

* use the new sccache redis server

* do not use sccache with staging compiler

* fix the condition syntax

* add stunnel to redis

* add tunnel verification

* separate caches for different architectures

* fix syntax for the cache tag

* quse double brackets for conditions

* add bash line to the script

* add a switch for sccache and only use it in build stage

* run check_host function when enabling sccache

* fix the invocation tags for sccache

* fix groovy syntax

* set the invocation tag in groovy

* disable sccache in clang-format stage

* try another syntax for invocation tags

* use local sccache server if can't connect to redis

* fix script syntax

* update README

* refresh readme

* readme updates

* remove the timing and verification caveat from readme

---------
Co-authored-by: Lisa Delaney <lisa.delaney@amd.com>

4e44a9e8

18 Oct, 2023 1 commit

Clean DTYPES conditions in CMake (#974) · bf435140

zjing14 authored Oct 18, 2023



* Add a condition to build fp8 instances

* simplified buffer_load/store

* add bfp8/fp8

* fixed

* remove all f8/bf8 condition include folder

* fixed cmake conditions

* fixed DTYPES=fp16/bfp16

* fix

* fixed buffer_load

* fixed buffer_store

* fix

* clean example cmake files

* fixed ci

* fixed cit

---------
Co-authored-by: Rostyslav Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Jing Zhang <jizha@amd.com>

bf435140

02 Oct, 2023 1 commit
- get rid of gfx900/906, set rocm5.7 as default (#958) · 59dbb01f
  Illia Silin authored Oct 02, 2023
  
  59dbb01f
27 Sep, 2023 1 commit
- Use lower case for ckprofiler package. (#948) · 420b5a03
  Illia Silin authored Sep 26, 2023
```
* split ckProfiler gfx9 package into gfx90 and gfx94

* use lower case for package names
```
  420b5a03
26 Sep, 2023 2 commits
- split ckProfiler gfx9 package into gfx90 and gfx94 (#946) · 0b296a27
  Illia Silin authored Sep 26, 2023
  
  0b296a27
- Resolve some data type issues and cmake policy. (#940) · 2ea75bd6
  Illia Silin authored Sep 26, 2023
```
* split the types in gemm_bilinear instances, add condition to cmake policy

* fix syntax

* split the data types in batchnorm examples

* fix the batchnorm_bwd test

* fix types in the batchnorm_bwd test
```
  2ea75bd6
21 Sep, 2023 1 commit

Refactoring cmake files to build data types separately. (#932) · bba085d2

Illia Silin authored Sep 20, 2023

* refactor cmake files for the tests

* refactor cmake files for examples

* fix cmake for gemm example

* fix the cmake file for all examples

* add splitting by data types in gemm_splitk instance header

* rename test to reflect only dl instances are used

* clean up CI workspace, update cmake for instances

* change the jenkinsfile syntax

* build all instances except DL on gfx11

* move workspace cleanup after stages

* clean up workspace after every stage

* isolate data types in grouped_conv_fwd header

* isolate dl instances for grouped_conv2d_fwd

* fix syntax

* fix cmake and batchnorm instances

* fix typo

* fix reduction instances

* fix grouped_conv headers

* fix syntax

* replace parsing logic for instances, replace bfp16 with bf16

* fix the client examples build

* clean up DTYPES from instances cmake files

* update the parsing logic in cmake files

* make an exception for reduction kernels

* update few remaining cmake files to handle DTYPES

* fix syntax

* fix cmake conflicts

* replace f8 with fp8 test name

* resolve conflicts for dpp instances

bba085d2

19 Sep, 2023 1 commit
- fix the ckprofiler package build in a loop (#926) · 5a4416c8
  Illia Silin authored Sep 19, 2023
  
  5a4416c8
13 Sep, 2023 1 commit
- [Cmake] Set cmake default build type Release and path to /opt/rocm (#914) · 5fe687fa
  Jun Liu authored Sep 13, 2023
  
  5fe687fa
12 Sep, 2023 1 commit

Refactor f8_t, add bf8_t (#792) · 62d4af74

Rostyslav Geyyer authored Sep 12, 2023

* Refactor f8_t to add bf8_t

* Add check_err impl for f8_t

* Update fp8 test

* Format

* Revert the fix

* Update vector_type implementation

* Add bf8 test

* Add bf8, use BitInt types

* Add bf8 conversion methods

* Update type_convert for fp8/bf8

* Add check_err fp8/bf8 support

* Add subnorm fp8 tests

* Add subnorm bf8 tests

* Fix conversion

* Add bf8 cmake bindings

* Add macros to enable build with disabled fp8/bf8

* Remove is_native method

* Update flag combination for mixed precision instances

* Add more flag checks

* Add another flag to a client example

* Add type traits, decouple f8/bf8 casting

* Clean up

* Decouple fp8 and bf8 flags

* Remove more redundant flags

* Remove leftover comments

62d4af74

04 Sep, 2023 1 commit
- Fix config header installation (#880) · bd8024b8
  Lauren Wrubleski authored Sep 04, 2023
  
  bd8024b8
23 Aug, 2023 1 commit

[HotFix] add config and version files to pass on build info (#856) · c8a8385f

Jun Liu authored Aug 23, 2023

* experiment with config file

* experiment with version.h config

* add more info to version.h

* minor updates

* minor updates

* fix case where DTYPE is not used

* large amount of files but minor changes

* remove white space

* minor changes to add more MACROs

* fix cmakedefine01

* fix issue with CK internal conflict

* fix define and define value

* fix clang-format

* fix formatting issue

* experiment with cmake

* clang format v12 to be consistent with miopen

* avoid clang-format for config file

c8a8385f

09 Aug, 2023 2 commits
- Update the rocm version threshold to apply the -fno-offload-uniform-block flag. (#839) · cbbd172f
  Illia Silin authored Aug 09, 2023
```
* add fno-offload-uniform-block flag for rocm5.7 and up

* add a comment and compiler ticket number

* update the threshold rocm version
```
  cbbd172f
- add no-offload-uniform-block flag for rocm5.7 and up (#838) · 68026113
  Illia Silin authored Aug 08, 2023
```
* add -fno-offload-uniform-block flag for rocm5.7 and up

* add a comment and compiler ticket number
```
  68026113
03 Aug, 2023 1 commit
- add an option to build ckProfiler package for specific architectures (#828) · 2474dddb
  Illia Silin authored Aug 03, 2023
  
  2474dddb
26 Jul, 2023 1 commit
- Disable DL kernels by default. (#816) · 9195435c
  Illia Silin authored Jul 26, 2023
  
  9195435c
21 Jul, 2023 1 commit
- add INSTANCES_ONLY cmake macro to build only instances (#807) · 7a29f711
  Illia Silin authored Jul 21, 2023
  
  7a29f711
18 Jul, 2023 1 commit

Add mechanism to build CK for select data types, add Navi3x CI. (#790) · 189ea3b9

Illia Silin authored Jul 17, 2023

* allow building CK for specific data types

* add CI build and test stage on Naiv3x without some int8 instances

* add missing gemm fp16 instances

* add the changes to the missed cmake file

* add empty lines at end of source files

* Do not build quantization client example on navi3 in CI

* disable batched_gemm_multi_d_int8 instances with DTYPES

* disable device_conv2d_bwd_data_instance with DTYPES

* fix ckprofiler for conv_bwd_data for int8

* properly isolate the conv_bwd_data int8 instances

* remove empty line

189ea3b9

17 Jul, 2023 1 commit

Add check for compiler GPU target support. (#800) · 4867db42

Illia Silin authored Jul 17, 2023

* check if gpu_targets are supported by compiler

* set default list of targets and filter for them

4867db42

29 Mar, 2023 1 commit
- Add CMake Option "USE_OPT_NAVI3X" (#647) · 4e097ad2
  Haocong WANG authored Mar 30, 2023
```
* Add CMake Option "USE_OPT_NAVI3X"

* remove navi3x opt compile option from cmake script
```
  4e097ad2
10 Nov, 2022 1 commit
- Add packages for examples and profiler (#502) · 37f2e918
  Lauren Wrubleski authored Nov 10, 2022
```
* Add packages for example and profiler

* correct TEST_NAME -> EXAMPLE_NAME
```
  37f2e918
27 Oct, 2022 1 commit

reduce the number of default targets (#489) · a5059f8f

Illia Silin authored Oct 27, 2022

* reduce the number of default targets

* re-write the setting of target flags

* move all options to one place

* add new custom target instances for installing CK

a5059f8f

26 Aug, 2022 1 commit

Add an option to build CK with clang directly (#387) · 1e5b59df

Illia Silin authored Aug 26, 2022

* replace hipcc compiler with clang++

* build client app with hipcc

* build client app with clang

* add an option to build with hipcc ro clang

* fix the environment for client app

* fix setting up compiler in cmake_build

* change the way the compiler is set

1e5b59df

18 Aug, 2022 1 commit

int4 data type (#364) · e00149ac

Adam Osewski authored Aug 18, 2022



* Introduce int4 data type.

* Add unit-tests for int4

* Compile int4 UT only when int4 enabled.

* clang-format
Co-authored-by: Adam Osewski <aosewski@amd.com>

e00149ac

13 Aug, 2022 1 commit

Fused attention (#345) · cac014f1

Anthony Chang authored Aug 13, 2022



* initial stub for gemm_gemm_xdl_cshuffle

* set up example code

* compiles

* prevent integer overflow

* harmonize interface between ref_gemm and ref_batched_gemm

* batched_gemm_gemm

* fix example

* host tensor gen: diagonal pattern in lowest two-dimensions only

* make c descriptors containing only integral constants

* clean up

* add BlockwiseGemmXdlops_v2 while exploring an unified approach

* implement proper interface

* tidy up example

* fix compilation warnings

* coarsely controlled 2nd gemm padding

* remove rocm-cmake's hard requirement for certain revision

* clang-format

* resolve merge conflict

* fix compilation error on gfx10

* adds acc0 elementwise op to interface

* attention host validation

* add blockwsie softmax v1

* iteratively update softmax+gemm

* transpose both gemm0 and gemm1 xdl output so as to avoid broadcasting softmax max/sum

* add init method for easier debugging

* do away with manual thread cluster calculation

* generalize blockwise softmax interface

* row-wise softmax sum & max

* format

* rename to DeviceBatchedGemmSoftmaxGemm

* add gemm_softmax_gemm instances and tests

* comment
Co-authored-by: ltqin <letao.qin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

cac014f1

30 Jun, 2022 1 commit
- Remove incorrect old packaging statement (#308) · eccf8773
  Liam Wrubleski authored Jun 30, 2022
  
  eccf8773
25 Jun, 2022 2 commits

Switch to standard ROCm packaging (#301) · b653c5eb

Liam Wrubleski authored Jun 25, 2022



* Switch to standard ROCm packaging

* Revert .gitignore changes

* install new rocm-cmake version

* update readme
Co-authored-by: illsilin <Illia.Silin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

b653c5eb

Absolute include path (#281) · d1db6a0c

Chao Liu authored Jun 24, 2022

* ad gelu and fast_gelu

* added GeLU and fast GeLU

* clean up

* add gemm+fastgelu example

* add gemm+gelu instances

* update profiler

* clean up

* clean up

* adding gemm+bias+activation

* clean

* adding bias

* clean

* adding gemm multiple d

* debugging

* add gemm bias add fastgelu

* rename, clean

* refactoring; add readme

* refactor

* refactor

* refactor

* refactor

* refactor

* refactor

* fix

* fix

* update example

* update example

* rename

* update example

* add ckProfiler

* clean

* clean

* clean

* clean

* add client app example

* update readme

* delete obselete files

* remove old client app

* delete old file

* cleaning

* clean

* remove half

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path for all examples

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* fix header path

* revert client app example

* clean build

* fix build

* temporary disable client test on Jenkins

* clean

* clean

* clean

d1db6a0c

20 May, 2022 1 commit
- remove options.hpp.in (#240) · 44943e0e
  Chao Liu authored May 20, 2022
  
  44943e0e
13 May, 2022 1 commit

Validate examples in CI (#233) · 9f71ff48

Anthony Chang authored May 14, 2022



* validate examples in ctest runs

* format

* fix usage of check_err

* amend

* add example codes to custom target 'check'
Co-authored-by: Chao Liu <chao.liu2@amd.com>

9f71ff48