Commits · improve_layernorm · gaoqiong / composable_kernel

15 Feb, 2023 2 commits
- Merge branch 'develop' into improve_layernorm · 7fa74c8d
  rocking5566 authored Feb 15, 2023
  
  7fa74c8d
- Remove the workaround for bf16 attention tests. (#586) · 06f1fc86
  Illia Silin authored Feb 14, 2023
```
* remove workanround in bf16 attention test

* clean up another workaround
```
  06f1fc86
14 Feb, 2023 6 commits
- Merge branch 'develop' into improve_layernorm · 4fea5004
  rocking5566 authored Feb 15, 2023
  
  4fea5004
- clang-format · 98562925
  rocking authored Feb 14, 2023
  
  98562925
- Fix typo · 6df53672
  rocking authored Feb 14, 2023
  
  6df53672
- Add CHANGELOG · aaee60c3
  rocking authored Feb 14, 2023
  
  aaee60c3
- Support fp16 sqrt for experiment · 7460bab1
  rocking authored Feb 14, 2023
  
  7460bab1
- Add more instances · ff5b6ea1
  rocking authored Feb 14, 2023
  
  ff5b6ea1
13 Feb, 2023 4 commits
- Share the VGPR of gamma and beta · 8b1f2238
  rocking authored Feb 13, 2023
  
  8b1f2238
- Share the VGPR of x and y · f61c37d0
  rocking authored Feb 13, 2023
  
  f61c37d0
- Check the blocksize · 491c0631
  rocking authored Feb 13, 2023
  
  491c0631
- GroupedGEMM more bigger tiles. (#577) · 8f42780f
  Adam Osewski authored Feb 13, 2023
```
* Adding more bigger tiles.

* Remove failing instance.

* Remove instances which that don't improve perf.

---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
```
  8f42780f
10 Feb, 2023 8 commits
- Support naive variance for device_normalization · 7a7d50ec
  rocking authored Feb 10, 2023
  
  7a7d50ec
- Fix typo · f174fb09
  rocking authored Feb 10, 2023
  
  f174fb09
- Refine string · 510dfb60
  rocking authored Feb 10, 2023
  
  510dfb60
- enable batched_gemm_softmax_bf16 tests (#582) · 0ac0f51a
  Illia Silin authored Feb 10, 2023
  
  0ac0f51a
- Update naive variance kernel · 8745c0ca
  rocking authored Feb 10, 2023
  
  8745c0ca
- Remove useless code · 3b6f9c16
  rocking authored Feb 10, 2023
  
  3b6f9c16
- Merge branch 'develop' into improve_layernorm · 9f453d42
  rocking5566 authored Feb 10, 2023
  
  9f453d42
- 1. Rename AccDatatype in normalization to computeData · 9d2280d6
  rocking authored Feb 10, 2023
```
2. Rename AccElementwiseOperation to YElementwiseOperation in normalization
```
  9d2280d6
09 Feb, 2023 2 commits

Gemm+layernorm instance, ckProfiler, client example (#568) · f7d28f3e

rocking5566 authored Feb 10, 2023

* Add gemm + layernorm instance

* Add ckProfiler

* Add test

* Add client example

* Detect if user forger to set the workrspace

* Use literal in the example

* [What] use builtin function for sqrt
[Why] compiler will not use v_sqrt_f64_e64 if we use ::sqrt()

* check gemm vaildity in IsSupportedArgument

* Add more testcases

* Merge duplicated folder in client example

* Print more infomation

* Use better kernel parameter for MS problem size

* clang format

* Add constexpr for if condition and remove redundant include

* Remove cstdlib and add constexpr

f7d28f3e

Add instance for elementwise normlization (#573) · 76d144fa

guangzlu authored Feb 10, 2023

* added instances for large N

* add instance for elementwise normlization

* added supported restrict in device_elementwise_normalization_impl.hpp

76d144fa

08 Feb, 2023 4 commits

adding the first draft of changelog (#571) · b63accee
Illia Silin authored Feb 08, 2023
```
* adding the first draft of changelog

* second draft of changelog
```
b63accee

Add GemmAddSoftmaxGemm support for MSFT ORT (instances and client API) (#576) · 332ccc33

ltqin authored Feb 09, 2023

* add instance for gemm bias softmax gemm

* add client example

* change CGridDesc_G_M_N to CGridDesc_G_M_O

* add gridwise

* change c grid name

* device add d0s data

* fix 08 client_example

* add example 47_fused_attention

* example output correct

* add d0 to example

* add d0 element op

* rechange instance code

* change Acc0ElementwiseOperation to C0DEElementwiseOperation

* change example name

* update instance for cdeelementwiseop

* add bhalf_t ScaleAdd

* add test

* not surport geem1 bias

* remove some ignore

* fix test bug

332ccc33

Separate sweeponce flow and optimize the flow · 1a38e362
rocking authored Feb 08, 2023

1a38e362

Fix a couple more CI issues. (#578) · bb3d9546

Illia Silin authored Feb 08, 2023

* test the QA cron parameter for compiler commit

* create separate dockers for latest and fixed amd-stg-open compiler versions

* change groovy syntax

* apply cron timers back to develop branch

bb3d9546

07 Feb, 2023 2 commits
- Extract var to static, prepare to separate sweep once kernel · e12a6be2
  rocking authored Feb 07, 2023
  
  e12a6be2
- Check the vector size and remove redundant var · 73d26a88
  rocking authored Feb 07, 2023
  
  73d26a88
06 Feb, 2023 3 commits
- Fix CI issues. (#572) · f73574ff
  Illia Silin authored Feb 06, 2023
```
* switch to recent staging compiler as default for CI

* fix the baseline query

* roll back sqlalchemy to version 1.4.46
```
  f73574ff
- Add more instances · b53db56e
  rocking authored Feb 06, 2023
  
  b53db56e
- Sync the order of type string with template parameter · f2930add
  rocking authored Feb 06, 2023
  
  f2930add
01 Feb, 2023 1 commit

Add the markdown tutorial hello world (#563) · afdfef74

Rostyslav Geyyer authored Feb 01, 2023



* Add the markdown tutorial

* Clean up

---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>

afdfef74

31 Jan, 2023 1 commit
- remove unused variable (#564) · ba40c2ce
  who who who authored Jan 31, 2023
```
* remove unused variable

* format code
```
  ba40c2ce
30 Jan, 2023 1 commit
- Use defined seed for deterministic test runs. (#562) · 274108d6
  Adam Osewski authored Jan 30, 2023
```
Co-authored-by: Adam Osewski <aosewski@amd.com>
```
  274108d6
26 Jan, 2023 1 commit
- Add more instances for irregular GEMM sizes. (#560) · 7494c1c6
  Adam Osewski authored Jan 26, 2023
```
Co-authored-by: Adam Osewski <aosewski@amd.com>
```
  7494c1c6
25 Jan, 2023 1 commit

Batchnorm inference instances, external API, client examples and gtests (#531) · a1b2441f

Qianfeng authored Jan 26, 2023

* File renaming and class renaming for device element-wise operation

* Add batchnorm-infer instances, external API and client example

* Add batchnorm-infer profiler module and gtests

* Remove file device_elementwise_extension.hpp and move NormalizeInInfer operation to element_wise_operation.hpp

* Remove the using of class aliasing for DeviceElementwiseForBatchNormInfer

* Rename class and file due to conflict from device_elementwise_2d.hpp

* Fix namespace in batcnnorm_infer_nhwc client example

a1b2441f

18 Jan, 2023 4 commits

Use double for all scaling values and float-point constant values at the Device Op API (#557) · 52abc2f3

Qianfeng authored Jan 19, 2023

* Use double as alpha/beta values type in reduce device op api

* Use double as alpha/beta values type in softmax device op api

* Use double as alpha/beta values type in multiple-reduce device op api

* Use double as epsilon value type in normalization/elementwise-normalization device op api

52abc2f3

Wavelet (inter-wave consumer-producer) GEMM (#310) · 1cfa8760

Raman R jana authored Jan 18, 2023



* wavelet gemm programming model support for CK

* GEMM pipeline update for wavelet progrmmaing model

* Updated wavelet programming pipeline

* fixes for global-write for math-wave

* fixed bug in global writes

* Updated comments for better readability

* fixed clang format errors

* added block_lds without barrier sync

* clean

* clean

* clean

* clean

* refactor

* prototype

4 layouts

fix default stride

all problem sizes

tidy

move file

update build script

restore old file

fix build

* refactor standalone test to use gemm test harness

* simplify gemm test

* update build script

* remove redundant

* early return when cmd arg doesn't match

* tidy

* report failure when result not validated

* tidy

* Add comment depicting B2C mapping pattern.

* Formatting & comments.

* Comparison with custom B2C mapping pattern.

* Example for wavelet gemm.

* Add wavelet to Gemm standalone test.

* Remove debug code.

* Remove dangling #endif directive.

Co-authored-by: root <Raman Jana>
Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: Anthony Chang <ac.chang@outlook.com>
Co-authored-by: Adam Osewski <19374865+aosewski@users.noreply.github.com>

1cfa8760

Add multiD Gemm client APIs (#534) · d66421fe

ltqin authored Jan 19, 2023



* start add example

* fix config

* fix showinfo bug

* add an elementop

* change to padding

* add xdl example

* change elementwiseop

* add instance

* add instance to profiler

* change file name

* fix deive not support issue

* add client example

* fix client gemm_add_multiply name

* change AddMultiply elementwiseop

* fix elementwiseop

* fix client example

* fix addmultiply op

* fix comments and fun name
Co-authored-by: letaoqin <letaoqin@amd.com>

d66421fe

fix a bug for 6-dim kernels (#555) · 00ff30af
Illia Silin authored Jan 18, 2023

00ff30af