Commits · e2dd8f056c3c1db9bcfedba4cee236f3d0e7d44b · gaoqiong / composable_kernel

09 Feb, 2023 2 commits
- Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/composable_kernel into PR567 · e2dd8f05
  aska-0096 authored Feb 09, 2023
  
  e2dd8f05
- Add arch limitation to all wmma examples · b47e8c41
  aska-0096 authored Feb 09, 2023
  
  b47e8c41
08 Feb, 2023 4 commits

adding the first draft of changelog (#571) · b63accee
Illia Silin authored Feb 08, 2023
```
* adding the first draft of changelog

* second draft of changelog
```
b63accee

Add GemmAddSoftmaxGemm support for MSFT ORT (instances and client API) (#576) · 332ccc33

ltqin authored Feb 09, 2023

* add instance for gemm bias softmax gemm

* add client example

* change CGridDesc_G_M_N to CGridDesc_G_M_O

* add gridwise

* change c grid name

* device add d0s data

* fix 08 client_example

* add example 47_fused_attention

* example output correct

* add d0 to example

* add d0 element op

* rechange instance code

* change Acc0ElementwiseOperation to C0DEElementwiseOperation

* change example name

* update instance for cdeelementwiseop

* add bhalf_t ScaleAdd

* add test

* not surport geem1 bias

* remove some ignore

* fix test bug

332ccc33

Fix a couple more CI issues. (#578) · bb3d9546

Illia Silin authored Feb 08, 2023

* test the QA cron parameter for compiler commit

* create separate dockers for latest and fixed amd-stg-open compiler versions

* change groovy syntax

* apply cron timers back to develop branch

bb3d9546

Merge branch 'develop' of... · 68ca5b3d

aska-0096 authored Feb 08, 2023

Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/composable_kernel into navi3x_mD_batchedGEMM_GroupConvFwd

68ca5b3d

06 Feb, 2023 1 commit

Fix CI issues. (#572) · f73574ff

Illia Silin authored Feb 06, 2023

* switch to recent staging compiler as default for CI

* fix the baseline query

* roll back sqlalchemy to version 1.4.46

f73574ff

01 Feb, 2023 1 commit

Add the markdown tutorial hello world (#563) · afdfef74

Rostyslav Geyyer authored Feb 01, 2023



* Add the markdown tutorial

* Clean up

---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>

afdfef74

31 Jan, 2023 2 commits

Merge branch 'develop' of... · 55a01eef

aska-0096 authored Jan 31, 2023

Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/composable_kernel into navi3x_mD_batchedGEMM_GroupConvFwd

55a01eef

remove unused variable (#564) · ba40c2ce
who who who authored Jan 31, 2023
```
* remove unused variable

* format code
```
ba40c2ce

30 Jan, 2023 3 commits
- Use defined seed for deterministic test runs. (#562) · 274108d6
  Adam Osewski authored Jan 30, 2023
```
Co-authored-by: Adam Osewski <aosewski@amd.com>
```
  274108d6
- Merge branch 'develop' of... · f1b53d78
  aska-0096 authored Jan 30, 2023
```
Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/composable_kernel into navi3x_mD_batchedGEMM_GroupConvFwd
```
  f1b53d78
- format · 0c9cdbce
  aska-0096 authored Jan 30, 2023
  
  0c9cdbce
26 Jan, 2023 1 commit
- Add more instances for irregular GEMM sizes. (#560) · 7494c1c6
  Adam Osewski authored Jan 26, 2023
```
Co-authored-by: Adam Osewski <aosewski@amd.com>
```
  7494c1c6
25 Jan, 2023 1 commit

Batchnorm inference instances, external API, client examples and gtests (#531) · a1b2441f

Qianfeng authored Jan 26, 2023

* File renaming and class renaming for device element-wise operation

* Add batchnorm-infer instances, external API and client example

* Add batchnorm-infer profiler module and gtests

* Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp

* Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer

* Rename class and file due to conflict from device_elementwise_2d.hpp

* Fix namespace in batcnnorm_infer_nhwc client example

a1b2441f

19 Jan, 2023 1 commit
- navi3x_groupconv_need_optimization · 0517cf08
  aska-0096 authored Jan 19, 2023
  
  0517cf08
18 Jan, 2023 9 commits

Use double for all scaling values and float-point constant values at the Device Op API (#557) · 52abc2f3

Qianfeng authored Jan 19, 2023

* Use double as alpha/beta values type in reduce device op api

* Use double as alpha/beta values type in softmax device op api

* Use double as alpha/beta values type in multiple-reduce device op api

* Use double as epsilon value type in normalization/elementwise-normalization device op api

52abc2f3

Wavelet (inter-wave consumer-producer) GEMM (#310) · 1cfa8760

Raman R jana authored Jan 18, 2023



* wavelet gemm programming model support for CK

* GEMM pipeline update for wavelet progrmmaing model

* Updated wavelet programming pipeline

* fixes for global-write for math-wave

* fixed bug in global writes

* Updated comments for better readability

* fixed clang format errors

* added block_lds without barrier sync

* clean

* clean

* clean

* clean

* refactor

* prototype

4 layouts

fix default stride

all problem sizes

tidy

move file

update build script

restore old file

fix build

* refactor standalone test to use gemm test harness

* simplify gemm test

* update build script

* remove redundant

* early return when cmd arg doesn't match

* tidy

* report failure when result not validated

* tidy

* Add comment depicting B2C mapping pattern.

* Formatting & comments.

* Comparison with custom B2C mapping pattern.

* Example for wavelet gemm.

* Add wavelet to Gemm standalone test.

* Remove debug code.

* Remove dangling #endif directive.

Co-authored-by: root <Raman Jana>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Anthony Chang <ac.chang@outlook.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

1cfa8760

Add multiD Gemm client APIs (#534) · d66421fe

ltqin authored Jan 19, 2023



* start add example

* fix config

* fix showinfo bug

* add an elementop

* change to padding

* add xdl example

* change elementwiseop

* add instance

* add instance to profiler

* change file name

* fix deive not support issue

* add client example

* fix client gemm_add_multiply name

* change AddMultiply elementwiseop

* fix elementwiseop

* fix client example

* fix addmultiply op

* fix comments and fun name
Co-authored-by: letaoqin <letaoqin@amd.com>

d66421fe

fix a bug for 6-dim kernels (#555) · 00ff30af
Illia Silin authored Jan 18, 2023

00ff30af

add multi embeddings support (#542) · 147b7db5

who who who authored Jan 19, 2023

* add multi embeddings support

* fix format

* optimize sqrt

* add reduce operation

* change to elementwise op

* fix name

* rename

* run ci cd

* format example

* format code

* format code

147b7db5

Add client API/examples for 3xGemm+Bias+Add+Permute{0, 2, 3, 1} (#550) · 55236709

ltqin authored Jan 19, 2023

* add example

* fix example

* add instance for gemm permute

* add to client example

* change configs

* change instance file name

* formate

* change client example file name and remove example

55236709

groupconv: Sanity check[OK], Performance[Bad] · 9c3c435a
aska-0096 authored Jan 18, 2023

9c3c435a
batchedgemm[OK], groupconv[debug] · abfc94b2
aska-0096 authored Jan 18, 2023

abfc94b2
workable · 07180cb7
aska-0096 authored Jan 18, 2023

07180cb7

17 Jan, 2023 3 commits

Reduction external API and client examples (#493) · 80e05267

Qianfeng authored Jan 17, 2023



* Change to the DeviceReduce base class template to include all problem description information

* Add external api for reduction

* Add client example to test the reduction external api

* Spelling correction

* Re-implement the host_reduction to follow the DeviceReduce base API format

* Change the reduce profiler to call the external API for collecting device instances

* Rename reduce client example directory from 08_reduce to 12_reduce

* Remove (void) before the functional call

* Tiny update in reduce client example

* Tiny update in profile_reduce_impl.hpp

* Rename the reduce client example directory
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

80e05267

Gemm layernorm welford (#413) · 7829d729

rocking5566 authored Jan 17, 2023



* Add device op of gemm layernorm

* [What] Rename F to H
[Why] F and G prepare for welford tensor

* Add gridwise gemm + welford

* Extract template parameter

* Rename kernel. Prepare to add second half kernel

* Extract var

* Add second kernel for gemm+layernorm

* Move to the gemm_layernorm folder

* Rename F and G to mean and var

* Do not use snakeCurved, it makes determination of padding  for welford difficult

* Rewrite the device interface and rename some var

* Add welford count

* Update interface

* Sync code, prepare to test on MI200

* Clean the code

* Implement layernorm

* Add comment to mension hipFree

* Wrtie out the e for debug.
This could be remove and use h for instead

* 1. Allocate mean, var and count into by SetWorkSpacePointer.
2. Add GetWorkSpaceSize to calculate the space size

* Add gemm layernorm host code

* use reference layernorm

* Fix bug of blockwise welford for first kernel

* Fix bug of mean var padding for layernorm

* Use sgpr for shuffleM_index

* padding for GemmMeanVarCountGridDescriptor_M_NBlock

* Add layout parameter

* Check argument for gemm

* calculate max count for tail block

* Share E and H memory in device op

* Hard code the vector dim

* Refine the MakeDescriptor

* 1. Remove E parameter, because E is inside of device op
2. Check vector size

* [What] Rename MakeMeanVarDescriptor_M_N
[Why] Prepare to add count version of make descriptor

* Use 1D global memory for count

* Prevent redundant IO

* Update parameter

* Add pipeline v1/v2 selector

* Rename the example name

* Add base class for gemm layernorm

* Refine naming to distinguish naive and welford

* Add comment to explan in detail

* We don't need to pad in N dimension in gemm for mean/var/count. Set NPerTile 1

* Rewrite the 2st kernel, use multiple block along N dimension in layernorm kernel

* Share the vector size

* Refine var name

* [What] Force LayernormThreadSliceSize_N = vector size.
[Why] Memory coalesce

* Add comment

* Extract divisor out of the loop in reference layernorm

* Pad different size for E and H in layernorm kernel according to different block tile

* Refine naming

* Refine naming

* Prevent implicit cast

* [What] use ck::math::sqrt instead of __builtin_amdgcn_sqrtf
[Why] __builtin_amdgcn_sqrtf is only support float, double will cause casting

* Cast only constant

* Change of post shuffle thread descriptor

* Add EMeanVarDataType parameter.

* Merge the mean and var threadwise copy

* Add missing index

* Fix Typo

* Sync the variable with previous if

* 1. Declare e inside the host_gemm_layernorm()
2. Prevent implicit cast in reference code
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

7829d729

[Navi3x-LWPCK-545] Block-wise GEMM + Real GEMM_WMMA_FP16 (#541) · 919aeb1f

Haocong WANG authored Jan 17, 2023

* wmma_op + unit test

* add arch limitation to wmma test

* change arch limitation

* Refactor + Add all type unit test(int4 compile failed)

* Add f32_16x16x16_bf16 unit test

* tempsave

* tempsave

* tempsave

* runtime bug, cannot find symbol

* workaround for incorrect HIP warpSize return value

* debugging

* tempsave

* Correctness OK, waiting for optimization

* Tidy up + format

* temp save

* temp save, reproduce the v_bfi_b32 issue

* add inline asm for wmmaop test

* tidy up

* clean some debug purpose code

* discard some codes

* clang format

* clang format

* compiler issue fixed + increase tile size

919aeb1f

16 Jan, 2023 2 commits
- Merge branch 'develop' of... · c6de88b4
  aska-0096 authored Jan 16, 2023
```
Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/composable_kernel into navi3x_multiD_batchedGEMM
```
  c6de88b4
- temp save · 2963dd96
  aska-0096 authored Jan 16, 2023
  
  2963dd96
13 Jan, 2023 1 commit
- navi3x_multipleD+example · ccb94cea
  aska-0096 authored Jan 13, 2023
  
  ccb94cea
12 Jan, 2023 2 commits

Add a flag to enable/disable debug output in many kernels. (#549) · 715e8dd2

Illia Silin authored Jan 11, 2023

* add DEBUG_LOG macro to enable/disable debug output

* fix syntax

* fix syntax again

* fix syntax one more time

* remove balnk spaces

* use ifdefs

* add the Print argument

* move the definition of DEBUG_LOG to ck.hpp

* add the missign argument to Print()

715e8dd2

Remove including of cmath (#551) · a17b0414

Qianfeng authored Jan 12, 2023

* Let cmath included when compiling host codes in math_v2.hpp

* Remove including of cmath in device_base.hpp and device_permute.hpp

a17b0414

11 Jan, 2023 1 commit
- compiler issue fixed + increase tile size · 8efd363f
  aska-0096 authored Jan 11, 2023
  
  8efd363f
19 Dec, 2022 1 commit
- Merge branch 'develop' of https://github.com/ROCmSoftwarePlatform/composable_kernel into wmma_gemm · 40ec8e5d
  aska-0096 authored Dec 19, 2022
  
  40ec8e5d
15 Dec, 2022 5 commits
- Add MNK padding, M = 0 support into grouped_gemm (#539) · 0345963e
  zjing14 authored Dec 15, 2022
```
* add mnk padding, support m=0

* clean code

* clean code
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
```
  0345963e
- disable the attention test that fails on MI100 (#540) · 11151175
  Illia Silin authored Dec 15, 2022
  
  11151175
- clang format · 5d5891b0
  aska-0096 authored Dec 15, 2022
  
  5d5891b0
- clang format · cfb397b1
  aska-0096 authored Dec 15, 2022
  
  cfb397b1
- discard some codes · 3941bd1f
  aska-0096 authored Dec 15, 2022
  
  3941bd1f