Commits · d670c5a69af6d5e5218eda275dffa7b828d10563 · gaoqiong / composable_kernel

20 Sep, 2023 2 commits
- fixed · d670c5a6
  Jing Zhang authored Sep 20, 2023
  
  d670c5a6
- add f8 comp instance · a18342d6
  Jing Zhang authored Sep 20, 2023
  
  a18342d6
31 Aug, 2023 1 commit

Grouped Gemm with Fixed K and N with SplitK (#818) · f5ec04f0

zjing14 authored Aug 31, 2023



* move all arguments into device

* add b2c_tile_map

* add examples

* add SetDeviceKernelArgs

* dedicated fixed_nk solution

* init client api

* add grouped_gemm_bias example

* add a instance

* add instances

* formatting

* fixed cmake

* Update EnableCompilerWarnings.cmake

* Update cmake-ck-dev.sh

* clean; fixed comments

* fixed comment

* add instances for fp32 output

* add instances for fp32 output

* add fp32 out client example

* fixed CI

* init commit for kbatch

* add splitk gridwise

* format

* fixed

* clean deviceop

* clean code

* finish splitk

* fixed instances

* change m_loops to tile_loops

* add setkbatch

* clean code

* add splitK+bias

* add instances

* opt mk_nk instances

* clean examples

* fixed CI

* remove zero

* finished non-zero

* clean

* clean code

* optimized global_barrier

* fixed ci

* fixed CI

* removed AddBias

* format

* fixed CI

* fixed CI

* move 20_grouped_gemm to 21_grouped_gemm

---------
Co-authored-by: Jing Zhang <jizha@amd.com>

f5ec04f0

23 Aug, 2023 1 commit

[HotFix] add config and version files to pass on build info (#856) · c8a8385f

Jun Liu authored Aug 23, 2023

* experiment with config file

* experiment with version.h config

* add more info to version.h

* minor updates

* minor updates

* fix case where DTYPE is not used

* large amount of files but minor changes

* remove white space

* minor changes to add more MACROs

* fix cmakedefine01

* fix issue with CK internal conflict

* fix define and define value

* fix clang-format

* fix formatting issue

* experiment with cmake

* clang format v12 to be consistent with miopen

* avoid clang-format for config file

c8a8385f

27 Jul, 2023 1 commit

Add s_nops after v_dot to avoid hazard (#808) · 7761e523

Bartłomiej Kocot authored Jul 27, 2023

* Add s_nops after v_dot to avoid hazard

* Fix builtin for inner_produxt fp16

* Skip inline version to builtin

* Add comments regarding isa

* Fix comment regarding s_nop

7761e523

06 Jul, 2023 1 commit

Add basic setup for precommit (#749) (#764) · 237f9cd3

Adam Osewski authored Jul 06, 2023



* Add basic setup for precommit

* Update README.md with instructions on installing precommit hooks

---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: Bartlomiej Wroblewski <bwroblewski10@gmail.com>

237f9cd3

16 Jun, 2023 1 commit
- do not build gfx941/942 targets during daily QA runs (#758) · d140bdc9
  Illia Silin authored Jun 16, 2023
  
  d140bdc9
15 Jun, 2023 1 commit

Enable gfx941 and gfx942 architectures. (#752) · 027e46ee

Illia Silin authored Jun 15, 2023

* enable gfx941/942 targets

* fix clang format

* fix the cmake logic for multiple targets

* fix cmake syntax for looping over targets

* add gfx941/942 support for gemm_xdl instances

027e46ee

28 Apr, 2023 1 commit

Syncing up from internal repo to enable MI300. (#690) · 4feebedd

Illia Silin authored Apr 28, 2023



* enable gfx940

* switch between intrinsic mfma routines on mi100/200 and mi300

* fix mfma_int8 on MI300

* disable 2 int8 examples on MI300

* Update cmake-ck-dev.sh

* restore gitignore file

* modify Jenkinsfile to the internal repo

---------
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

4feebedd

29 Mar, 2023 1 commit
- Add CMake Option "USE_OPT_NAVI3X" (#647) · 4e097ad2
  Haocong WANG authored Mar 30, 2023
```
* Add CMake Option "USE_OPT_NAVI3X"

* remove navi3x opt compile option from cmake script
```
  4e097ad2
15 Mar, 2023 1 commit
- Update cmake-ck-dev.sh script (#641) · fa998675
  Rostyslav Geyyer authored Mar 15, 2023
```
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
```
  fa998675
24 Feb, 2023 1 commit
- disable tensor contraction f64 on MI100 (#602) · 209baee2
  zjing14 authored Feb 23, 2023
  
  209baee2
15 Feb, 2023 1 commit

Add contraction_fp64 example (#570) · 24c9ee1d

zjing14 authored Feb 15, 2023



* add contraction_bilinear

* add contraction_scale_xdl_fp64

* reduce tile size to avoid register spill

---------
Co-authored-by: root <root@ctr-ubbsmc16.amd.com>

24c9ee1d

06 Feb, 2023 1 commit

Fix CI issues. (#572) · f73574ff

Illia Silin authored Feb 06, 2023

* switch to recent staging compiler as default for CI

* fix the baseline query

* roll back sqlalchemy to version 1.4.46

f73574ff

02 Nov, 2022 1 commit

Conv perlayer int8 quantization (#471) · 226bc02b

rocking5566 authored Nov 03, 2022

* Add conv2d requant example

* Fix bash error

* Rename example

* 1. Rename gemm quantization
2. shares the requantization lambda function with conv

* Refine declare type

* Add conv bias relu quantization exmaple

* clang format

* Fix compile error due to merge develop

* Fix CI error

* Extract quantization post operation into another file

* Support quantization for non piecewise linear function

* Add instance for conv quantization

* Add convolution quantization factory

* Add convolution quantization client example

* Add more instances with different template parameters

* clang format

* Sync the naming with the develop

226bc02b

26 Oct, 2022 1 commit
- fix the script parsing the QA results (#495) · 0ee3aea1
  Illia Silin authored Oct 26, 2022
  
  0ee3aea1
03 Oct, 2022 1 commit

update document: Readme, contributors, citation, (#463) · 473ba5bc

Chao Liu authored Oct 03, 2022

* update cmake script

* update readme

* Update README.md

* add citation

* add images

* Update README.md

* update

* Update README.md

* Update CONTRIBUTORS.md

* Update README.md

* Update CITATION.cff

* Update README.md

* Update CITATION.cff

473ba5bc

21 Sep, 2022 1 commit

Build the CK targets only once. (#433) · 85b0920d

Illia Silin authored Sep 21, 2022

* build CK only once, use deb package in all subsequent stages

* update jenkins file

* change prefix for build_CK stage

* update writing deb metadata to control file

* update ubuntu source for docker, script syntax for deb package metadata

* try different way to create deb metadata

* clean up DEBIAN before creating one

* fix the CI folder names, fix splitK qa

* use correct docker in all stages, separate tests for splitK verification and performance

* clean old comments, change dir before packaging

* use different package syntax

* change packaging syntax

* package with cmake

* remove unnecessary build prefix

* get rid of unnecessary paths

* change paths during unpacking

* change script syntax while unpacking

* get rid of unneccesary steps

* get rid of comments in the scripts

* use double quotes for scripts

* add ccache during build, try dpkg -x

* pull and install each package separately

* use full package names

* try to use stashing for packages

* change stash/unstash syntax

* move unstash out of shell, run tests on any gpu node

* unpack each package separately

* try re-using existing workspace

* merge the build and test stages, only stash ckProfiler

* merge the build and test stages, only stash zipped ckProfiler

* fix syntax

* add GPU check before build and test, rename docker to usual name

85b0920d

13 Sep, 2022 1 commit

Upgrade the OS and ROCM versions. (#411) · b22ebd44

Illia Silin authored Sep 13, 2022

* upgrade the OS and ROCM versions in CK docker

* add cxx flags to link code with rocm5.2 and ck-9110 compiler

* rename the docker image

* run ONNX gemms using init=1

b22ebd44

07 Sep, 2022 1 commit

Add stderr to QA logfiles, process splitK and ONNX gemm kernels (#402) · ce74cea4

Illia Silin authored Sep 07, 2022

* add processing for the onng_gemm and splitK_gemm

* add profile_onnx_gemm.sh

* add stderr to logfiles, add splitK and onnx gemm parsing

* enable splitK gemm wresults posting to db

ce74cea4

26 Aug, 2022 1 commit
- Fixed splitk gemm fp32 (#384) · 9881625b
  zjing14 authored Aug 26, 2022
```
* add scripts

* fixed splitK_gemm_fp32

* clean

* clean
```
  9881625b
25 Aug, 2022 2 commits

GEMM batched/splitK/cgemm/grouped int4 examples (#383) · 3ab20fd7

Adam Osewski authored Aug 26, 2022



* Grouped GEmm int4.

* Formatting + fix K dimension for int8.

* Batched Gemm int4 example.

* CGEMM int4 example.

* Include inc filese in clang-format.

* SplitK int4 example

* Refactoring of performance measurement.

* Fix #ifdef statements.
Co-authored-by: Adam Osewski <aosewski@amd.com>

3ab20fd7

add scripts (#382) · f246fd2c
zjing14 authored Aug 25, 2022

f246fd2c

08 Aug, 2022 1 commit

Fix QA, allow switching compiler versions, fix google test compilation error. (#348) · aba7fefc

Illia Silin authored Aug 08, 2022

* allow selecting compiler version

* fix typo

* add Wno-deprecated flag for google tests

* change git repo, fix qa log files names

* change the git clone syntax

* use Omkar's git credentials

* try to use jenkins as git user

* try using illsilin username for gerrit repo with ssh key

* try new gerrit authorization

* change ssh key syntax

* try another way of passing ssh key to docker

* add mount ssh in dockerfile

* create .ssh folder

* move ssh-keyscan to later

* get rid of npm call

* build first docker image on master

* check the contents of the .ssh folder

* try replacing omkars creds with gerrit creds

* use open repo, clean up changes

* get rid of ssh default argument

aba7fefc

02 Aug, 2022 1 commit

Run CI on MI100 nodes only, run daily QA on MI200 nodes. (#339) · 984b3722

Illia Silin authored Aug 02, 2022



* turn on full qa only on gfx90a, use int initialization

* change script syntax

* update script parsing clinfo, throw exception if 0 devices

* fix syntax

* try using toBoolean for the QA conditions

* run regular CI on MI100 only, use MI200 only for daily QA

* evaluate when conditions before agent

* launch QA on develop branch and update profile_reduce script

* update test script

* update script

* remove false dependency from dockerfile

* try removing rbuild completely
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

984b3722

21 Jul, 2022 1 commit

Add full QA with verification option, few other changes. (#331) · d8415a96

Illia Silin authored Jul 21, 2022

* add verify flag and update scripts

* replace old check_error function with the new check_err

* fix syntax

* remove blank spaces

* remove empty line

* add check_err for tensors

* fix syntax

* replace tensors with vectors in check_err calls

* fix syntax

* remove blank spaces

* fix syntax

* add new line at end of file

* disable conv2d_bwd_weight test, add gpu check

* set check_gpu using export

* check GPU using runShell

* add definition of runShell

* fix script syntax

* reduce the number of threads, add full qa option

* run processing scripts in bash

* fix the branch and host names in performance scripts, add chronos

* replace parameterizedCron with cron

* archive the perf log files

* try to fix git call

* pass branch and host names as arguments into scripts

* fix script arguments

* fix script arguments

* process results on master

* fix pipeline

* add definition of gpu_arch

* run processing scripts in docker

* fix the brackets

* add agent master for the processing stage

* get rid of show_node_info call on master

* try using mici label instead of master, disable MI100 tests for now

* fix syntax

* simplify container for results processing

* remove node(master) from the process_results stage

* put all stages in original order

* change the agent label from master to mici for gfx908

d8415a96

13 Jul, 2022 1 commit

Add switch between compilers, make 9110 compiler default, add full QA scripts. (#322) · 39acaea3

Illia Silin authored Jul 13, 2022

* adding scripts for full perf test suite

* uncomment the sql queries

* fix typo and chmod a+x for scripts

* dos2unix for all new scripts

* disable verification in full performance test

* fix reduction scripts, add gfrouped_gemm hotfix

* fix the grouped_gemm hotfix and only run reduction for fp16

* change compiler flag syntax

* fix syntax

* add predefinition of dockerArgs

* avoid redefinitions of dockerArgs

* add blank space at the end of dockerArgs

* try to build with release compiler

* adding spaces inside if condition

* limit the number of threads for building 9110 compiler

* change the way HIP_CLANG_PATH is set

* remove the export command

* change the conditional ENV syntax

* set HIP_CLANG_PATH at docker run time

* update scripts for full qa

* enable the sql write query

* fix typo

* remove a comment from a script

39acaea3

01 Jul, 2022 1 commit

Improve external interface for GEMM and GEMM+add+add+fastgelu (#311) · 0dcb3496

Chao Liu authored Jun 30, 2022

* interface for GEMM and GEMM+add+add+fastgelu

* rename namespace

* instance factory

* fix build

* fix build; add GEMM client example

* clean

0dcb3496

23 Jun, 2022 1 commit

Testing all fwd convolution specializations. (#259) · a2edd7d8

Adam Osewski authored Jun 23, 2022



* UniforFill with integer values.

* Log tested instance type string.

* Add UT for all convolution specializations.

* debugging conv

* Fix dangling reference bug.

* Small refinements.

* Fix call to error checking function.

* Small refinements to tests.

* Configure error tolerance
* Change problem size.
* Remove OddC case from types that do not support it.

* Add helper traits for AccumulatorDataType.

* Print first 5 errs in check_err for integral types.

* Rename FillUniform to FillUniformDistribution

* Refactor

* Do not use typed tests.
* Instead use plain fixture class with templatized member functions.
* Initialize tensors with integer values.

* Refine test instances.

* Properly set accumulator data type.
* Add another "big" instance.

* Refactor convolution tests.

* Revert "debugging conv"

This reverts commit b109516455631ff8fd6dce99cf7c14bf8e323ebb.

* Add pragma once + format + small refinement.

* Fix some unwanted changes.

* Clang-format

* Fix profile_convnd to use renamed tensor initializer.

* Add instances for ConvFWDND kernel case 2D

* Helpers to get ConvNDFwd 2D instances.

* Refactoring.

* Remove "small block" instance as it was generating compiler errors.
* Remove default template parameters values.

* Refine and fix test.

* Fix problem with default template parameter types.
* Adjust error thresholds for floating point values test.
* Use integer values initialization for instances test.
* Add tests for ConvNDFwd 2D case.

* Remove AccumulatorDataType type trait.

* Update unit-tests.

* Remove operator<< overload.

* Unlock conv1d/3d nd fwd instances.

* Enable skipping calculating reference using flag.

* Fix number of channels for first ResNet50 layer.

* Clang-format.
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

a2edd7d8

21 Jun, 2022 1 commit
- update readme and script (#290) · ccbd8d90
  Chao Liu authored Jun 20, 2022
  
  ccbd8d90
16 Jun, 2022 1 commit

Use new github credentials (#278) · fb9b6b1e

Illia Silin authored Jun 15, 2022

* use pre-built docker instead of building a new one

* try docker.image.pull

* change syntax in docker.image()

* add 30 min timeout

* increase timeout to 3 hours

* move performance tests to first stage for testing

* set image variable to the new container name

* update image name

* check available images

* check available images in both places

* try different image name

* use image ID to refer to image

* run performance on gfx90a

* fix the gpu_arch labeling, add parameter

* move env vars out of stages

* add stand-alone performance script, MI200 tests, CU numbers

* dos2unix for run_perf_tests.sh

* try the new git credentials

* use env var for git credentials

fb9b6b1e

10 Jun, 2022 1 commit

Add performance tests on MI200 in CI, reporting number of CUs, add stand-alone perf test. (#277) · 1ced00a5

Illia Silin authored Jun 10, 2022

* use pre-built docker instead of building a new one

* try docker.image.pull

* change syntax in docker.image()

* add 30 min timeout

* increase timeout to 3 hours

* move performance tests to first stage for testing

* set image variable to the new container name

* update image name

* check available images

* check available images in both places

* try different image name

* use image ID to refer to image

* run performance on gfx90a

* fix the gpu_arch labeling, add parameter

* move env vars out of stages

* add stand-alone performance script, MI200 tests, CU numbers

1ced00a5

02 Jun, 2022 1 commit

Adding Resnet50 test to Performance tests (#268) · 1677cf70

Illia Silin authored Jun 02, 2022

* add resnet50 test to performance tests

* add blanks before gpu_arch in log files

* add resnet50 test with N=4 and process its results

* add ROCM and HIP versions to test tables

* uncomment the sql queries

* fix script syntax in jenkinsfile

1677cf70

24 May, 2022 2 commits

Overhaul to Reducton and its dependants (#237) · 63eee2d9

Qianfeng authored May 25, 2022

* Tiny fix in dynamic_buffer.hpp to support vectorized AtomicAdd for double type

* Update to host layer and host reduction

* Merge and remove reduction kernels

* Merge and remove reduction device interfaces and update pooling device interface

* Merge and remove useless reduction device instances

* Update to reduction profiler and reduction ctests

* Update to reduction and pooling examples and add one reduction example

* Change to reduction examples to let them testable by ctest

* Add explicit pass checking for reduction and pooling examples

* Explicit assignment of tensor shapes in example reduce_blockwise_two_call

* Use atomic_add to repace atomicAdd and add atomic_add for double type

* Add reduce ctest support for double data type

* Replace to_int_vector() by using c++ std::vector::assign()

* Keep DeviceReduceThreadWise separated from DeviceReduceBlockWise

* Merge DeviceReduceBlockWise and DeviceReduceMultiBlockAtomicAdd into DeviceReduceMultiBlock

* Add GetAtomicOperationZeroValue() support for AtomicMax

* Tiny change to reduce example README.md

* Fix some tiny issues due to branch merging

* Revoke previous change in dynamic_buffer.hpp and add atomic_add for double2_t

* Add reduce multiblock_atomic_add instances for fp64 to verify vectorized atomic_add on fp64

* Renaming

* Clean the header includings in device_reduce instances header files

63eee2d9

Add performance tests as a stage of CI. (#247) · 1085794d

Illia Silin authored May 24, 2022

* modify ckProfiler_gemm output

* fix syntax

* change ckProfiler output and return 0

* fix syntax

* output datatype

* fix syntax

* output datatype in another way

* fix syntax

* fix syntax

* test return values of ckProfiler

* add layout info and tests, make sure ckprofiler returns 0

* fix syntax

* change layout output

* fix syntax

* fix syntax again

* update script to process perf results

* rearrange jenkins stages

* fix typo

* add python packages to Docker file

* adding setuptools-rust package

* modify parsing for new test parameters

* test db credentials on jenkins

* fix syntax

* update python script to handle incomplete lines

* ungrade python to 3.8 and write the gemm_params table

* add sqlalchemy package to docker

* move perf data processing to master node

* move the master node inside a steps region

* add new stage for result processing

* move results processing to separate stage

* reduce number of tests to speedup debugging

* pass config to processPerfResults stage

* run script on master in a docker container

* replace show_node_info

* try loading docker on master node again

* use ansible node instead of master

* get rid of pymysql package

* try ssh connection using paramiko

* put back pymysql

* put the perf data processing back on the gpu node

* put back artifact definition

* archive the perf_log before parsing

* clean up jenkinsfile, fix parsing

* fix typo

* enable all perf tests

* put all stages in original order, finalize script

* fix gpu_arch version

* update parsing script

* remove obsolete file causing merge conflict

1085794d

08 May, 2022 1 commit

Add Benchmark test into CI (#226) · a3c910ac

Illia Silin authored May 08, 2022



* add performance test to jenkins pipeline

* fix typo

* fix the syntax in conv_fwd_util.cpp

* fix the error message syntax spacing

* fix the error message syntax spacing again

* run profile_gemm and archive results

* fix typo

* try to figure out the paths

* try to figure out the paths one more time

* skip the copying step

* build ckProfiler release only once

* change directory using dir

* fix dir syntax

* change the gemm parameters

* do not pipe script output to file

* try running ckProfiler directly

* fix typo

* use set +e

* run profile_gemm.sh || true

* run multiple gemms and parse results

* fix typo in jenkinsfile

* fix syntax

* add new gemm sizes, update scripts

* put all jenkins steps in original order
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

a3c910ac

22 Apr, 2022 1 commit
- Clang-format only modified files. (#181) · 31d869ad
  Adam Osewski authored Apr 22, 2022
  
  31d869ad
15 Apr, 2022 1 commit

Compile CK for all targets (#188) · 4221505d

Illia Silin authored Apr 15, 2022



* compile ck for all targets

* update the target criteria

* change the target condition

* fixed some typos

* fixed missed file

* revert changes in README

* revert device_conv3d_fwd_xdl_...

* update device_conv3d_fwd_xdl_...

* update device_batched_gemm_reduce...

* test the unused arguments fix

* test the warning suppression

* try suppress warnings in device_batched_gemm_reduce_xdl...

* fix the last warnings

* replace UNUSED with std::ignore

* fix a typo

* replaced std::ignore with ignore

* add igonre header to common_header

* refactor atomicAdd
Co-authored-by: Chao Liu <chao.liu2@amd.com>

4221505d

31 Mar, 2022 1 commit

Compile for gfx908 and gfx90a (#130) · cd167e49

Chao Liu authored Mar 31, 2022

* adding compilation for multiple targets

* fix build

* clean

* update Jekinsfile

* update readme

* update Jenkins

* use ck::half_t instead of ushort for bf16

* rename enum classes

* clean

* rename

* clean

cd167e49

23 Mar, 2022 1 commit

Unified conv3D API + support for all data types. (#133) · f91579aa

Adam Osewski authored Mar 23, 2022



* Convolution ND

* Code unification across dimensions for generating tensor descriptors.
* Example
* Instances

* Move convnd f32 instance file to comply with repo structure.

* Conv 1D tensor layouts.

* Formatting and use ReferenceConv

* Reference ConvFwd supporting 1D and 2D convolution.

* Debug printing TensorLayout name.

* Conv fwd 1D instance f32

* Refactor conv ND example.

Needed to support various conv dimensio.

Needed to support various conv dimensions

* Rename conv nd example director to prevent conflicts.

* Refactor some common utility to single file.

Plus some tests.

* Refactor GetHostTensorDescriptor + UT.

* Add 1D test case.

* Test reference convolution 1d/2d

* Remove some leftovers.

* Fix convolution example error for 1D

* Refactor test check errors utility function.

* Test Conv2D Fwd XDL

* More UT for 1D case.

* Parameterize input & weight initializers.

* Rename example to prevent conflicts.

* Split convnd instance into separate files for 1d/2d

* Address review comments.

* Fix data type for flops/gbytes calculations.

* Assign example number 11.

* 3D cases for convolution utility functions.

* 3D reference convolution.

* Add support for 3D convolution.

* Check for inputs bigger than  2GB.

* Formatting

* Support for bf16/f16/f32/i8 - conv instances + UT.

* Use check_err from test_util.hpp.

* Split convnd test into separate files for each dim.

* Fix data generation and use proper instances.

* Formatting

* Skip tensor initialization if not necessary.

* Fix CMakefiles.

* Remove redundant conv2d_fwd test.

* Lower problem size for conv3D UT.

* 3D case for convnd example.

* Remove leftovers after merge.

* Add Conv Specialization string to GetTypeString

* Skip instance causing numerical errors.

* Small fixes.

* Remove redundant includes.

* Fix namespace name error.

* Script for automatic testing and logging convolution fwd UTs

* Comment out numactl cmd.

* Refine weights initalization and relax rtol for fp16

* Fix weights initialization for int8.

* Add type_convert when store output in ref conv 1D.

* Get back old conv2d_fwd_xdl operation.

* Silence conv debug print.

* format

* clean

* clean

* Fix merge.

* Fix namespace for check_err
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

f91579aa