Commits · 14b3504d95082ffd466ae43a05951053f36718a8 · gaoqiong / composable_kernel

15 Mar, 2023 2 commits
- Update GetTypeString function to generate unique kernel IDs. (#638) · 14b3504d
  Illia Silin authored Mar 15, 2023
```
* make conv_fwd_bias_activation kernel id unique

* add more parameters to conv and gemm kernel names

* update GetTypeString for conv and gemm kernels

* fix two more kernel strings
```
  14b3504d
- Fix arch limitation bug (#639) · ea028ac6
  Haocong WANG authored Mar 15, 2023
  
  ea028ac6
10 Mar, 2023 2 commits

Remove debug asserts (#629) · 5b57ab96
Rostyslav Geyyer authored Mar 10, 2023
```
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
```
5b57ab96

[Navi3x] Multiple issue fix (#612) · 087e3105

Haocong WANG authored Mar 11, 2023



* Change gridwise gemm mD blockwise gemm to naive

* RRR Gemm fix

* Fix RCR gemm bug

* Isolate wmma instructions

* Update amd_inline_asm.hpp

* Update amd_wmma.hpp

* Update amd_wmma.hpp

* fix syntax and update Jenkinsfile

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

087e3105

09 Mar, 2023 2 commits
- fix a bug with non-dword-aligned offset when OOB, in case crash (#616) · 76fcdc60
  carlushuang authored Mar 09, 2023
```
Co-authored-by: zjing14 <zhangjing14@gmail.com>
```
  76fcdc60
- [gfx110x] support Navi3x architectures. (#628) · 0ccecc7c
  Illia Silin authored Mar 09, 2023
```
* enable building on Nav31

* fix syntax

* replace GPU_TARGETS with offload-arch

* add gfx1102 rachitecture

* fix typo

* update changelog
```
  0ccecc7c
08 Mar, 2023 1 commit

GroupedGEMM + Gelu client example/instances/profiler (#614) · 9096b1c7

Adam Osewski authored Mar 08, 2023



* Grouped gemm + Gelu instances.

* Device Instance Factory for GroupedGemm+Gelu

* Client example

* Rangify fill helper functions.

* Fix name clash.

* Profiler for grouped_gemm+gelu

* No need to use full namespace name.

* Add check for MRaw divisible by vector load.

* Ugly fix for big errors.

* Add grouped_gemm+gelu to profiler CMakelists.

* Store in argument additional info.

* Information about Mraw, Nraw, Kraw values.

* Use FastGelu instead of Gelu.

* Change client ex to use FastGelu

* Remove relaxed error precision.

* Remove duplicate output elementwise-op

---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

9096b1c7

06 Mar, 2023 2 commits

Add descriptions to avoid build issues (#619) · 1e59eb3b
Rostyslav Geyyer authored Mar 06, 2023
```
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
```
1e59eb3b

Generate output using Doxygen / Breathe (#598) · e4bf6d42

pmaybank authored Mar 06, 2023



* Modify Doxygen config to pick up include directories recursively

* Add DeviceMem struct to API Reference guide

* Add classes that are used in Flash Attention kernel

* Add a reference and config for generating bibliography
Co-authored-by: Philip Maybank <Philip.Maybank@amd.com>

e4bf6d42

02 Mar, 2023 1 commit

Change the CI workflow. (#611) · e6cda9f8

Illia Silin authored Mar 02, 2023

* add new parallel stage on navi node

* dont run performance tests on navi, get rid of 9110 compiler

* only run navi build when not doing QA

* fix syntax

* use navi21 label

* dont stash profiler on navi nodes, scp deb package to ginger

* disable tests on navi nodes

* test posting a binary to ginger

* add sshpass and use it to copy deb package

* fix the scp example

* fix syntax

* debug the scp issues

* add jenkins user to docker

* dont try whoami

* change jenkins uid and add user with uid=1002

* try scp from the last stage on micimaster

* rename and stash the package, scp from micimaster

e6cda9f8

01 Mar, 2023 2 commits

Suppress reserved-identifier warning and catch all warnings. (#608) · 59cbb20c
Illia Silin authored Mar 01, 2023
```
* suppress the reserved-identifier warnings

* keep BUILD_DEV=On and use -Werror by default
```
59cbb20c

[Navi3x Bug Fix] fix typo to accept MNKPadding flag correctly. (#597) · 68dbf40a

Haocong WANG authored Mar 02, 2023

* fix a bug blocking wmma_gemm_multipleD

* Utilize matrix padder in device_wmma_op

* cosmetic change for gemmpadding format

* clang format

* Change gridwise gemm from FIFO to KMN loop fashion

68dbf40a

27 Feb, 2023 1 commit

Fast GeLU using built-in function (#587) · 8f455615

Chao Liu authored Feb 26, 2023



* clean up

* fast gelu using builtin function

* clean

* clean

* clean

* clean:

* clean

* fix compilation

* clean

* clean

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

8f455615

24 Feb, 2023 1 commit
- disable tensor contraction f64 on MI100 (#602) · 209baee2
  zjing14 authored Feb 23, 2023
  
  209baee2
22 Feb, 2023 2 commits

Add Grouped Conv Backward Weight on Navi21 for ResNet50. (#505) · 246ceee4

Rostyslav Geyyer authored Feb 22, 2023



* Add DeviceOp and examples

* Format DeviceOp template arguments

* Remove bf16 example

* Format

* Format

* Update MakeABCGridDescriptor_A_K0_M_K1_B_K0_N_K1_C_M_N

* Refactor argument preparation

* Update conv_bwd_weight_dl to grouped_conv_bwd_weight_dl

* Rename device op file

* Update include directive in the example file

* Update descriptor preparation for grouped op

* Update the argument

* Update batch handling

* Add gridwise gemm supporting batched input

* Update blockwise indexing, working version

* Update copyright year

* Update check if argument is supported

* Refactor and make consistent with xdl examples

* Update check if argument is supported

* Add changelog entry

* Added comments on Dl op split_k>1 support

---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

246ceee4

Grouped conv1d client example (#589) · 830d37a7

ltqin authored Feb 23, 2023



* add conv1d fwd client example

* change 07_grouped_conv2d_fwd to 07_grouped_convnd_fwd

* add conv1d bwd weight

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

830d37a7

16 Feb, 2023 2 commits

fix a bug when building for gfx1030 target. (#591) · bef0cb20
Illia Silin authored Feb 16, 2023
```
* fix a bug while building for gfx1030 and add gfx1030 to targets

* fix syntax
```
bef0cb20

Build and archive deb packages. (#590) · 584d233c

Illia Silin authored Feb 16, 2023

* build and archive deb packages

* fix syntax

* run QA to test building packages

* apply cron to develop branch again

584d233c

15 Feb, 2023 7 commits

Sphinx doc (#581) · cb3fac4d

pmaybank authored Feb 15, 2023



* New docs directory with minimal config

* Based on docs directory of rocBLAS

* Config for running Doxygen then Sphinx to generate HTML

* Add minimal content - intro to doc

* Add some boilerplate sections to doc

* content still needs to be done,
* e.g., need to generate API documentation using Doxygen
* need to write contributor guide

* Start Softmax section of Support Primitives doc

* Written as a test bed for typesetting math content

* Need to decide how much detail to go into

* add doc directories to git ignore file.

* Minor edits - new line at EOF, change year in copyright notices

* Port Markdown files to ReStructuredText

* Copy Markdown files from pre-existing doc directory to docs directory

* Convert to reStructured Text (rst) - section headings, links, tables
  have a different syntax in rst

* New rst files added to index - can generate HTML with same style as
  HTML generated from rst files in previous commits

* Intention is to make all the content in doc redundant and use rst
  throughout rather than mix of md and rst

* Extend Softmax section of Primitives Guide

* rename l to z

* add material on applying softmax row-wise to matrix

* define macro for diag operator (represents diagonal matrix)

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

cb3fac4d

Clean up kernel launch output (#569) · 19490ac4

Illia Silin authored Feb 15, 2023



* clean up output from kernel_launch

* set RUN_WARMUP to 0 by default

* split the warm-up into a separate issue

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

19490ac4

Add contraction_fp64 example (#570) · 24c9ee1d

zjing14 authored Feb 15, 2023



* add contraction_bilinear

* add contraction_scale_xdl_fp64

* reduce tile size to avoid register spill

---------
Co-authored-by: root <root@ctr-ubbsmc16.amd.com>

24c9ee1d

Improve normalization (#580) · 6a6163a3

rocking5566 authored Feb 16, 2023

* Sync the order of type string with template parameter

* Add more instances

* Check the vector size and remove redundant var

* Extract var to static, prepare to separate sweep once kernel

* Separate sweeponce flow and optimize the flow

* 1. Rename AccDatatype in normalization to computeData
2. Rename AccElementwiseOperation to YElementwiseOperation in normalization

* Remove useless code

* Update naive variance kernel

* Refine string

* Fix typo

* Support naive variance for device_normalization

* Check the blocksize

* Share the VGPR of x and y

* Share the VGPR of gamma and beta

* Add more instances

* Support fp16 sqrt for experiment

* Add CHANGELOG

* Fix typo

* clang-format

6a6163a3

[Navi3x] Add Device Operations (#567) · 0cfda84d

Haocong WANG authored Feb 16, 2023

* wmma_op + unit test

* add arch limitation to wmma test

* change arch limitation

* Refactor + Add all type unit test(int4 compile failed)

* Add f32_16x16x16_bf16 unit test

* tempsave

* tempsave

* tempsave

* runtime bug, cannot find symbol

* workaround for incorrect HIP warpSize return value

* debugging

* tempsave

* Correctness OK, waiting for optimization

* Tidy up + format

* temp save

* temp save, reproduce the v_bfi_b32 issue

* add inline asm for wmmaop test

* tidy up

* clean some debug purpose code

* discard some codes

* clang format

* clang format

* compiler issue fixed + increase tile size

* navi3x_multipleD+example

* temp save

* workable

* batchedgemm[OK], groupconv[debug]

* groupconv: Sanity check[OK], Performance[Bad]

* navi3x_groupconv_need_optimization

* format

* Add arch limitation to all wmma examples

* fix bug: example30 input conv args

0cfda84d

Conv3D FWD BWD WRW fp16 fp32 client examples (#559) · e9fd1228

Adam Osewski authored Feb 15, 2023



* Conv3d bwd weight client example.

* Update year in license

* Convolution bwd data 3D fp16/fp32 client example.

* Client example for convnd fwd fp16 fp32

* clang-format

* Review remarks.

* Fix compiler err.

* Update data layout to standard one.

* Add conv 3d fwd NDHWGC instances

* clang-format

* Conv3d fwd NDHWGC instances.

---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

e9fd1228

Remove the workaround for bf16 attention tests. (#586) · 06f1fc86
Illia Silin authored Feb 14, 2023
```
* remove workanround in bf16 attention test

* clean up another workaround
```
06f1fc86

13 Feb, 2023 1 commit

GroupedGEMM more bigger tiles. (#577) · 8f42780f

Adam Osewski authored Feb 13, 2023



* Adding more bigger tiles.

* Remove failing instance.

* Remove instances which that don't improve perf.

---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

8f42780f

10 Feb, 2023 1 commit
- enable batched_gemm_softmax_bf16 tests (#582) · 0ac0f51a
  Illia Silin authored Feb 10, 2023
  
  0ac0f51a
09 Feb, 2023 2 commits

Gemm+layernorm instance, ckProfiler, client example (#568) · f7d28f3e

rocking5566 authored Feb 10, 2023

* Add gemm + layernorm instance

* Add ckProfiler

* Add test

* Add client example

* Detect if user forger to set the workrspace

* Use literal in the example

* [What] use builtin function for sqrt
[Why] compiler will not use v_sqrt_f64_e64 if we use ::sqrt()

* check gemm vaildity in IsSupportedArgument

* Add more testcases

* Merge duplicated folder in client example

* Print more infomation

* Use better kernel parameter for MS problem size

* clang format

* Add constexpr for if condition and remove redundant include

* Remove cstdlib and add constexpr

f7d28f3e

Add instance for elementwise normlization (#573) · 76d144fa

guangzlu authored Feb 10, 2023

* added instances for large N

* add instance for elementwise normlization

* added supported restrict in device_elementwise_normalization_impl.hpp

76d144fa

08 Feb, 2023 3 commits

adding the first draft of changelog (#571) · b63accee
Illia Silin authored Feb 08, 2023
```
* adding the first draft of changelog

* second draft of changelog
```
b63accee

Add GemmAddSoftmaxGemm support for MSFT ORT (instances and client API) (#576) · 332ccc33

ltqin authored Feb 09, 2023

* add instance for gemm bias softmax gemm

* add client example

* change CGridDesc_G_M_N to CGridDesc_G_M_O

* add gridwise

* change c grid name

* device add d0s data

* fix 08 client_example

* add example 47_fused_attention

* example output correct

* add d0 to example

* add d0 element op

* rechange instance code

* change Acc0ElementwiseOperation to C0DEElementwiseOperation

* change example name

* update instance for cdeelementwiseop

* add bhalf_t ScaleAdd

* add test

* not surport geem1 bias

* remove some ignore

* fix test bug

332ccc33

Fix a couple more CI issues. (#578) · bb3d9546

Illia Silin authored Feb 08, 2023

* test the QA cron parameter for compiler commit

* create separate dockers for latest and fixed amd-stg-open compiler versions

* change groovy syntax

* apply cron timers back to develop branch

bb3d9546

06 Feb, 2023 1 commit

Fix CI issues. (#572) · f73574ff

Illia Silin authored Feb 06, 2023

* switch to recent staging compiler as default for CI

* fix the baseline query

* roll back sqlalchemy to version 1.4.46

f73574ff

01 Feb, 2023 1 commit

Add the markdown tutorial hello world (#563) · afdfef74

Rostyslav Geyyer authored Feb 01, 2023



* Add the markdown tutorial

* Clean up

---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>

afdfef74

31 Jan, 2023 1 commit
- remove unused variable (#564) · ba40c2ce
  who who who authored Jan 31, 2023
```
* remove unused variable

* format code
```
  ba40c2ce
30 Jan, 2023 1 commit
- Use defined seed for deterministic test runs. (#562) · 274108d6
  Adam Osewski authored Jan 30, 2023
```
Co-authored-by: Adam Osewski <aosewski@amd.com>
```
  274108d6
26 Jan, 2023 1 commit
- Add more instances for irregular GEMM sizes. (#560) · 7494c1c6
  Adam Osewski authored Jan 26, 2023
```
Co-authored-by: Adam Osewski <aosewski@amd.com>
```
  7494c1c6
25 Jan, 2023 1 commit

Batchnorm inference instances, external API, client examples and gtests (#531) · a1b2441f

Qianfeng authored Jan 26, 2023

* File renaming and class renaming for device element-wise operation

* Add batchnorm-infer instances, external API and client example

* Add batchnorm-infer profiler module and gtests

* Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp

* Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer

* Rename class and file due to conflict from device_elementwise_2d.hpp

* Fix namespace in batcnnorm_infer_nhwc client example

a1b2441f

18 Jan, 2023 2 commits

Use double for all scaling values and float-point constant values at the Device Op API (#557) · 52abc2f3

Qianfeng authored Jan 19, 2023

* Use double as alpha/beta values type in reduce device op api

* Use double as alpha/beta values type in softmax device op api

* Use double as alpha/beta values type in multiple-reduce device op api

* Use double as epsilon value type in normalization/elementwise-normalization device op api

52abc2f3

Wavelet (inter-wave consumer-producer) GEMM (#310) · 1cfa8760

Raman R jana authored Jan 18, 2023



* wavelet gemm programming model support for CK

* GEMM pipeline update for wavelet progrmmaing model

* Updated wavelet programming pipeline

* fixes for global-write for math-wave

* fixed bug in global writes

* Updated comments for better readability

* fixed clang format errors

* added block_lds without barrier sync

* clean

* clean

* clean

* clean

* refactor

* prototype

4 layouts

fix default stride

all problem sizes

tidy

move file

update build script

restore old file

fix build

* refactor standalone test to use gemm test harness

* simplify gemm test

* update build script

* remove redundant

* early return when cmd arg doesn't match

* tidy

* report failure when result not validated

* tidy

* Add comment depicting B2C mapping pattern.

* Formatting & comments.

* Comparison with custom B2C mapping pattern.

* Example for wavelet gemm.

* Add wavelet to Gemm standalone test.

* Remove debug code.

* Remove dangling #endif directive.

Co-authored-by: root <Raman Jana>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Anthony Chang <ac.chang@outlook.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

1cfa8760