Commits · 000eefbf6ebf3c35363b05aa1b00262577b12eaa · gaoqiong / composable_kernel

13 Aug, 2022 2 commits

Merge remote-tracking branch 'origin/develop' into fused-gemm · 000eefbf
Chao Liu authored Aug 13, 2022

000eefbf

Anthony Chang authored Aug 13, 2022



* initial stub for gemm_gemm_xdl_cshuffle

* set up example code

* compiles

* prevent integer overflow

* harmonize interface between ref_gemm and ref_batched_gemm

* batched_gemm_gemm

* fix example

* host tensor gen: diagonal pattern in lowest two-dimensions only

* make c descriptors containing only integral constants

* clean up

* add BlockwiseGemmXdlops_v2 while exploring an unified approach

* implement proper interface

* tidy up example

* fix compilation warnings

* coarsely controlled 2nd gemm padding

* remove rocm-cmake's hard requirement for certain revision

* clang-format

* resolve merge conflict

* fix compilation error on gfx10

* adds acc0 elementwise op to interface

* attention host validation

* add blockwsie softmax v1

* iteratively update softmax+gemm

* transpose both gemm0 and gemm1 xdl output so as to avoid broadcasting softmax max/sum

* add init method for easier debugging

* do away with manual thread cluster calculation

* generalize blockwise softmax interface

* row-wise softmax sum & max

* format

* rename to DeviceBatchedGemmSoftmaxGemm

* add gemm_softmax_gemm instances and tests

* comment
Co-authored-by: ltqin <letao.qin@amd.com>
Co-authored-by: Chao Liu <chao.liu2@amd.com>

cac014f1

12 Aug, 2022 4 commits

Move literal ""_uz & ""_zu into namespace 'ck::literals' (#354) · a670a5a0
Po Yen Chen authored Aug 13, 2022
```
* Move literal ""_uz & ""_zu into namespace 'literals'

* Move namespace 'literals' as 'ck::literals'
```
a670a5a0

Add example of conv_fwd_bias_relu_add for int4, int8, bfp16, fp16, and fp32 (#343) · 0c6ef7c1

Rostyslav Geyyer authored Aug 12, 2022



* [LWPCK-359] Initial commit

* Working version for fp16, add results to readme

* Update according to PR #341

* Update results in readme

* Add fp32 example

* Add bf16 example

* Update fp16 and fp32 examples

* Add int8 example

* Add separate lengths and strides tensors for D tensors
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>

0c6ef7c1

add g; fixed strides (#355) · 35e49f2d
zjing14 authored Aug 12, 2022

35e49f2d

Build docker only once in CI, fix conv_bwd logfile names. (#353) · de60d290

Illia Silin authored Aug 12, 2022

* build docker in separate stage

* build docker with only one prefix

* add parallel statement

* add docker repo url

* fix the name of perf_conv_bwd_data log file

de60d290

11 Aug, 2022 24 commits
- Add examples for GEMM + AddAddFastGelu (data type: int8, bf16, fp32) (#340) · 68b61504
  Po Yen Chen authored Aug 12, 2022
```
* Add always_false<> util to delay symbol resolution

* Use always_false<> to prevent trying instantiate unwanted method

* Add new specializations of AddAddFastGelu::operator() method

* Add GEMM + AddAddFastGelu examples for data types: int8, bf16, fp32

* Use floating point literal to simplify code

* Remove unnecessary capture in lambda expressions

* Extract fast GeLU calculation as standalone method

* Mark methods as 'constexpr'

* Add constraint for HostTensorDescriptor templated ctors

* Simplify HostTensorDescriptor ctor calls

* Add C++23 std::size_t literal suffix

* Use _uz suffix to shorten example code

* Remove unnecessary conversion to std::array<>

* Re-order include directives

* Remove C-style casting by literal suffix

* Remove unnecessary statements in main()

* Remove unused type parameter of always_false<>

* Remove unused include directive

* Exit main() by returning meaningful value

* Use 'if constexpr' to switch example flow

* Use std::is_same_v<> to shorten example code

* Add 'inline' specifier to literal functions

* Unify output methods in example

* Move common codes into .inc file

* Add type check in type_convert<>()

* Add type_convert<float>() before computation

* Merge AddAddFastGelu method specializations

* Remove always_false<>

* Add constraint to AddAddFastGelu::operator() parameter types
```
  68b61504
- ckProfiler for layernorm (#330) · fdfd7eb5
  rocking5566 authored Aug 12, 2022
```
* Refine parameter

* Add base class for layernorm

* Add layernorm instance

* Add layernorm to ckProfiler

* Remove redundant

* Add verification

* Fix compile error due to merge
```
  fdfd7eb5
- avoid LDS data hazard · b64a2860
  Anthony Chang authored Aug 11, 2022
  
  b64a2860
- add gemm_gemm instances and tests · 8aa44bcd
  Anthony Chang authored Aug 11, 2022
  
  8aa44bcd
- adds acc0 elementwise op to interface · 51fc99a8
  Anthony Chang authored Aug 11, 2022
  
  51fc99a8
- fix compilation error on gfx10 · 8672733f
  Anthony Chang authored Aug 10, 2022
  
  8672733f
- resolve merge conflict · 3c5a50f2
  Anthony Chang authored Aug 08, 2022
  
  3c5a50f2
- clang-format · edc494df
  Anthony Chang authored Aug 04, 2022
  
  edc494df
- remove rocm-cmake's hard requirement for certain revision · 00331ee4
  Anthony Chang authored Aug 04, 2022
  
  00331ee4
- coarsely controlled 2nd gemm padding · c9bef1c6
  Anthony Chang authored Aug 10, 2022
  
  c9bef1c6
- fix compilation warnings · e55b67a0
  Anthony Chang authored Aug 10, 2022
  
  e55b67a0
- tidy up example · ed424975
  Anthony Chang authored Aug 04, 2022
  
  ed424975
- implement proper interface · 5f94555b
  Anthony Chang authored Aug 04, 2022
  
  5f94555b
- add BlockwiseGemmXdlops_v2 while exploring an unified approach · 98e4c0ce
  Anthony Chang authored Aug 03, 2022
  
  98e4c0ce
- clean up · eceea10a
  Anthony Chang authored Aug 03, 2022
  
  eceea10a
- make c descriptors containing only integral constants · 4ee34028
  Anthony Chang authored Aug 03, 2022
  
  4ee34028
- host tensor gen: diagonal pattern in lowest two-dimensions only · caf2b2ed
  Anthony Chang authored Aug 01, 2022
  
  caf2b2ed
- fix example · b790e44b
  Anthony Chang authored Aug 01, 2022
  
  b790e44b
- batched_gemm_gemm · 408ba59b
  Anthony Chang authored Jul 27, 2022
  
  408ba59b
- harmonize interface between ref_gemm and ref_batched_gemm · b57c3879
  Anthony Chang authored Jul 27, 2022
  
  b57c3879
- prevent integer overflow · 237371ad
  Anthony Chang authored Jul 27, 2022
  
  237371ad
- compiles · 047cee2b
  Anthony Chang authored Jul 20, 2022
  
  047cee2b
- set up example code · 68b71534
  Anthony Chang authored Jul 12, 2022
  
  68b71534
- initial stub for gemm_gemm_xdl_cshuffle · 89a5e847
  Anthony Chang authored Jul 11, 2022
  
  89a5e847
10 Aug, 2022 1 commit

Add batched/grouped_gemm contraction deviceOps (#349) · e08d68d2

zjing14 authored Aug 10, 2022



* convnd_fwd fp16 example

* update example

* update example

* update instance

* updating refernce conv

* update reference conv

* update conv fwd profiler

* update conv 1d and 3d instance

* update include path

* clean

* update profiler for conv bwd data and weight

* update conv bwd weight

* clean

* update conv example

* update profiler for conv bwd weight

* update ckprofiler for conv bwd data

* fix reference conv bwd data bug; update conv bwd data test

* update examples

* fix initialization issue

* update test for conv fwd

* clean

* clean

* remove test case too sensitive to error threshhold

* fix test

* clean

* fix build

* adding conv multiple d

* adding conv multiple D

* add matrix padder

* add gemm padding to convnd

* adding group conv

* update gemm multi-d

* refactor

* refactor

* refactor

* clean

* clean

* refactor

* refactor

* reorg

* add ds

* add bias

* clean

* add G

* adding group

* adding group

* adding group

* update Tensor

* clean

* update example

* update DeviceGemmMultipleD_Xdl_CShuffle

* update conv bwd-data and bwd-weight

* upate contraction example

* update gemm and batch gemm with e permute

* fix example build

* instance for grouped conv1d

* update example

* adding group conv instance

* update gemm bilinear instance

* update gemm+add+add+fastgelu instance

* update profiler

* update profiler

* update test

* update test and client example

* clean

* add grouped conv into profiler

* update profiler

* clean

* add test grouped conv, update all conv test to gtest

* update test

* change gemm_c_permute with contraction

* add grouped_contraction

* add contraction in group_gemm

* add example of grouped_gemm with contraction

* add example of grouped_contraction_bias_e_permute

* clean

* fixed ds

* add m3n2 m2n3 examples into gemm_bias_e_permute
Co-authored-by: Chao Liu <chao.liu2@amd.com>

e08d68d2

08 Aug, 2022 1 commit

Fix QA, allow switching compiler versions, fix google test compilation error. (#348) · aba7fefc

Illia Silin authored Aug 08, 2022

* allow selecting compiler version

* fix typo

* add Wno-deprecated flag for google tests

* change git repo, fix qa log files names

* change the git clone syntax

* use Omkar's git credentials

* try to use jenkins as git user

* try using illsilin username for gerrit repo with ssh key

* try new gerrit authorization

* change ssh key syntax

* try another way of passing ssh key to docker

* add mount ssh in dockerfile

* create .ssh folder

* move ssh-keyscan to later

* get rid of npm call

* build first docker image on master

* check the contents of the .ssh folder

* try replacing omkars creds with gerrit creds

* use open repo, clean up changes

* get rid of ssh default argument

aba7fefc

07 Aug, 2022 1 commit
- fix bug in gemm profiler (#344) · 146972f4
  Chao Liu authored Aug 07, 2022
  
  146972f4
03 Aug, 2022 1 commit

Update Group convolution (#341) · 75ab874e

Chao Liu authored Aug 03, 2022

* add conv oddC

* update example

* update example

* fix bug in example

* fix bug in group conv example

75ab874e

02 Aug, 2022 2 commits

CGEMM examples bf16, fp32, int8 (#332) · fb0dc358

Adam Osewski authored Aug 02, 2022



* Add int8 specialization for elementwise Add and Subtract.

* CGEMM examples bf16, fp32, int8

* Add convert reference output to CDataType.

* Skip BF16 data type during testing.

* Lower K value to get rid of accumulation error.

* Fix merge artifact.

* Fix changed function name: GetElementSpaceSize()

* Fix merge artifact.
Co-authored-by: Adam Osewski <aosewski@amd.com>

fb0dc358

Run CI on MI100 nodes only, run daily QA on MI200 nodes. (#339) · 984b3722

Illia Silin authored Aug 02, 2022



* turn on full qa only on gfx90a, use int initialization

* change script syntax

* update script parsing clinfo, throw exception if 0 devices

* fix syntax

* try using toBoolean for the QA conditions

* run regular CI on MI100 only, use MI200 only for daily QA

* evaluate when conditions before agent

* launch QA on develop branch and update profile_reduce script

* update test script

* update script

* remove false dependency from dockerfile

* try removing rbuild completely
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Chao Liu <lc.roy86@gmail.com>

984b3722

29 Jul, 2022 1 commit

Clean up conv example, Instances, profiler and test (#324) · 500fa995

Chao Liu authored Jul 29, 2022

* convnd_fwd fp16 example

* update example

* update example

* update instance

* updating refernce conv

* update reference conv

* update conv fwd profiler

* update conv 1d and 3d instance

* update include path

* clean

* update profiler for conv bwd data and weight

* update conv bwd weight

* clean

* update conv example

* update profiler for conv bwd weight

* update ckprofiler for conv bwd data

* fix reference conv bwd data bug; update conv bwd data test

* update examples

* fix initialization issue

* update test for conv fwd

* clean

* clean

* remove test case too sensitive to error threshhold

* fix test

* clean

* fix build

* adding conv multiple d

* adding conv multiple D

* add matrix padder

* add gemm padding to convnd

* adding group conv

* update gemm multi-d

* refactor

* refactor

* refactor

* clean

* clean

* refactor

* refactor

* re...

500fa995

22 Jul, 2022 2 commits

comment out cron trigger (#334) · 85978e02
Illia Silin authored Jul 22, 2022

85978e02

Batched Gemm with multiD (#329) · d7d78290

zjing14 authored Jul 22, 2022



* add batched_gemm_multiD

* add ds

* rename file

* add batched_gemm_bias example

* add batch_strides into bmm_c_permute

* clean

* rename example_28 to example_29
Co-authored-by: Chao Liu <chao.liu2@amd.com>

d7d78290

21 Jul, 2022 1 commit

Add full QA with verification option, few other changes. (#331) · d8415a96

Illia Silin authored Jul 21, 2022

* add verify flag and update scripts

* replace old check_error function with the new check_err

* fix syntax

* remove blank spaces

* remove empty line

* add check_err for tensors

* fix syntax

* replace tensors with vectors in check_err calls

* fix syntax

* remove blank spaces

* fix syntax

* add new line at end of file

* disable conv2d_bwd_weight test, add gpu check

* set check_gpu using export

* check GPU using runShell

* add definition of runShell

* fix script syntax

* reduce the number of threads, add full qa option

* run processing scripts in bash

* fix the branch and host names in performance scripts, add chronos

* replace parameterizedCron with cron

* archive the perf log files

* try to fix git call

* pass branch and host names as arguments into scripts

* fix script arguments

* fix script arguments

* process results on master

* fix pipeline

* add definition of gpu_arch

* run processing scripts in docker

* fix the brackets

* add agent master for the processing stage

* get rid of show_node_info call on master

* try using mici label instead of master, disable MI100 tests for now

* fix syntax

* simplify container for results processing

* remove node(master) from the process_results stage

* put all stages in original order

* change the agent label from master to mici for gfx908

d8415a96