Commits · 903cd19ce31c27edb7de49d5c77c09c397813de7 · yangql / composable_kernel-1

22 Apr, 2023 1 commit

Put back the split-k gemm code. (#684) · 903cd19c

Illia Silin authored Apr 21, 2023



* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* use name from tensor layout

---------
Co-authored-by: carlushuang <carlus.huang@amd.com>

903cd19c

21 Apr, 2023 2 commits
- Switch to the new rocm5.6 compiler. (#681) · 9afa44d4
  Illia Silin authored Apr 21, 2023
```
* switch to the new rocm5.6 compiler and docker

* fix syntax
```
  9afa44d4
- Update dependabot config (#682) · 938a5e0e
  Sam Wu authored Apr 20, 2023
```
Co-authored-by: samjwu <samjwu@users.noreply.github.com>
```
  938a5e0e
18 Apr, 2023 1 commit

Allow using ROCm release candidate compilers. (#679) · bb0b772d

Illia Silin authored Apr 18, 2023

* enable use of rocm5.5 release candidate 4

* upgrade to ROCM5.5 RC5

* try fix the PUB_KEY error, remove the cmake-data package

* upgrade to latest cmake version

* use private dockerhub repo for rocm5.5 rc5

* add missing bracket

bb0b772d

17 Apr, 2023 1 commit
- Add (#677) · fd11a4a1
  rocking5566 authored Apr 17, 2023
  
  fd11a4a1
16 Apr, 2023 2 commits
- Fix a typo (#676) · fc26d42a
  Haocong WANG authored Apr 16, 2023
  
  fc26d42a
- Add more macros to turn on/off denorm fix (#678) · 03eaee6a
  Rostyslav Geyyer authored Apr 15, 2023
```
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
```
  03eaee6a
11 Apr, 2023 5 commits
- Add memory index guard in wmma device ops (#667) · e85178b4
  Haocong WANG authored Apr 12, 2023
  
  e85178b4
- [gtest] suppress unsafe buffer warn (#670) · f5329887
  Jun Liu authored Apr 11, 2023
```
ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912
```
  f5329887
- Add dependabot config and pin rocm-docs-core (#663) · fd497f0e
  Sam Wu authored Apr 11, 2023
  
  fd497f0e
- fixed quant example (#672) · c203bf67
  zjing14 authored Apr 11, 2023
```
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>
```
  c203bf67
- add a marco to turn on/off denorm fix (off by default) (#673) · c54f8bcc
  zjing14 authored Apr 11, 2023
```
* add a marco to turn off denorm fix by default

* expose the marco

---------
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>
```
  c54f8bcc
10 Apr, 2023 1 commit

Groupnorm + swish external api (#668) · ed3a2e52

rocking5566 authored Apr 10, 2023

* Rename to proper naming

* Add example of groupnorm + swish

* Extract duplicate code in example

* Add groupnorm + swish instances

* Ractor instance generation, split into multiple cpp file

* Add external api and client example

* Refine profiler message

* Use ck math version of exp

* Refine problem size in example

* Add host version of exp

ed3a2e52

07 Apr, 2023 1 commit
- Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665) · 3248387b
  Jun Liu authored Apr 06, 2023
```
This reverts commit bb5530af.
```
  3248387b
30 Mar, 2023 3 commits
- add fp64 instances (#658) · fde6d274
  zjing14 authored Mar 30, 2023
```
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>
```
  fde6d274
- fix 3rd dword of buffer source descriptor (#659) · 091570f5
  Haocong WANG authored Mar 30, 2023
  
  091570f5
- simplify karg in device/grid of split-k op (#644) · bb5530af
  carlushuang authored Mar 30, 2023
```
* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* use name from tensor layout
```
  bb5530af
29 Mar, 2023 3 commits

Add a denorm test fix (#603) · dbd8f94b

Rostyslav Geyyer authored Mar 29, 2023



* Add type_convert implementations for bf16

* Add the fix for conv_fwd

* Add the fix for conv_bwd_data

* Add the fix for conv_bwd_weight

* Format

* Format

* Another format

* Add a macro to use workaround on MI200 only

* Format

---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

dbd8f94b

Conv + quantization + tanh (#645) · 389e84a8

rocking5566 authored Mar 30, 2023



* Rename file. Prepare to support another activation

* Add comment for quantization

* Extract out_elementop

* Add tanh example

* Add conv + bias + tanh quantization instance

* Add missing parameter

* Refine cmake

* Add external api and client example

* Extract variable in example

* Fix the comment

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

389e84a8

Add CMake Option "USE_OPT_NAVI3X" (#647) · 4e097ad2
Haocong WANG authored Mar 30, 2023
```
* Add CMake Option "USE_OPT_NAVI3X"

* remove navi3x opt compile option from cmake script
```
4e097ad2

27 Mar, 2023 1 commit

Separate bibtex requirement from rocm-docs-core (#656) · 88d47432

Sam Wu authored Mar 27, 2023

* separate bibtex requirement from rocm-docs-core

* point requirements to source rocm-docs-core repo

88d47432

24 Mar, 2023 1 commit
- standardize docs (#655) · f80776d9
  Sam Wu authored Mar 23, 2023
  
  f80776d9
23 Mar, 2023 1 commit
- [Navi3x] Fix Gridwise_multiple_d operation (#649) · e5376be4
  Haocong WANG authored Mar 24, 2023
```
* Add CMake Option "USE_OPT_NAVI3X"

* fix bug
```
  e5376be4
22 Mar, 2023 2 commits
- Reduce group & batch of the tested convolutions (#648) · fe96e8fb
  Po Yen Chen authored Mar 23, 2023
  
  fe96e8fb
- Get rid of XDL parameters in WMMA kernel string. (#646) · 36750a57
  Illia Silin authored Mar 22, 2023
```
* remove XDL parameters from WMMA kernel string

* get rid f two more parameters
```
  36750a57
20 Mar, 2023 2 commits

rtn in ternary way (#632) · 8a659a2e

Dan Yao authored Mar 21, 2023



* rtn in ternary way

* Check both flags to preserve NaN

* Format

* Rearrange flag1

* Apply suggestions from code review
Co-authored-by: Ronan Keryell <ronan@keryell.fr>

---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
Co-authored-by: Ronan Keryell <ronan@keryell.fr>

8a659a2e

workaround 637 (#640) · 6ae12434

ltqin authored Mar 21, 2023



* add workaround 637

* format

* change id

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

6ae12434

15 Mar, 2023 6 commits

Update cmake-ck-dev.sh script (#641) · fa998675
Rostyslav Geyyer authored Mar 15, 2023
```
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
```
fa998675

gemm/Conv xdlops + dlops quantization (#625) · 16dc18e0

rocking5566 authored Mar 16, 2023



* Add conv perlayer quantization

* Add gemm_dlops quantization

* Support int8 for innerproduct

* Refine gemm dlops int8 kernel parameter

* Support gfx908(MI100) and gfx90a(MI200)

* clang-format

* Rename example number

* Support different layout for d tensor

* Add conv dlops perchannel quantization example

* Move to example 40

* Extract the common code for different platform (dlops and xdlops)

* Move ot subfolder. Prepare to add other op of quantization

* Refine the quantization instance library

* Add conv dl instances and client example

* Remove unnecessary type

* Add gemm quantization instance

* Add external api and client example

* Refine num_bytes

* Separete different layout to different cpp

* Add more xdl instances

* Revert "Remove unnecessary type"

This reverts commit 820869182f6a8f62b2c9004101ba6bf76b96be14.

* Remove CShuffleDataType in dlops
Let acc and CShuffleDataType be the same in xdlops

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

16dc18e0

Device Op GroupedGemmMultipleD + example fp16 (#633) · a2d5ca8e

Adam Osewski authored Mar 15, 2023



* Pass shared mem pointer as pointer to void.

* Device Op GroupedGEMM Multiple D

* Example for grouped gemm multiple d.

* Add MI200 to supported archs.

---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

a2d5ca8e

Add layout check to IsSupportedArgument (#627) · c10a6e82

Rostyslav Geyyer authored Mar 15, 2023



* Add layout check to IsSupportedArgument

* Format

---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

c10a6e82

Update GetTypeString function to generate unique kernel IDs. (#638) · 14b3504d

Illia Silin authored Mar 15, 2023

* make conv_fwd_bias_activation kernel id unique

* add more parameters to conv and gemm kernel names

* update GetTypeString for conv and gemm kernels

* fix two more kernel strings

14b3504d

Fix arch limitation bug (#639) · ea028ac6
Haocong WANG authored Mar 15, 2023

ea028ac6

10 Mar, 2023 2 commits

Remove debug asserts (#629) · 5b57ab96
Rostyslav Geyyer authored Mar 10, 2023
```
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
```
5b57ab96

[Navi3x] Multiple issue fix (#612) · 087e3105

Haocong WANG authored Mar 11, 2023



* Change gridwise gemm mD blockwise gemm to naive

* RRR Gemm fix

* Fix RCR gemm bug

* Isolate wmma instructions

* Update amd_inline_asm.hpp

* Update amd_wmma.hpp

* Update amd_wmma.hpp

* fix syntax and update Jenkinsfile

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

087e3105

09 Mar, 2023 2 commits
- fix a bug with non-dword-aligned offset when OOB, in case crash (#616) · 76fcdc60
  carlushuang authored Mar 09, 2023
```
Co-authored-by: zjing14 <zhangjing14@gmail.com>
```
  76fcdc60
- [gfx110x] support Navi3x architectures. (#628) · 0ccecc7c
  Illia Silin authored Mar 09, 2023
```
* enable building on Nav31

* fix syntax

* replace GPU_TARGETS with offload-arch

* add gfx1102 rachitecture

* fix typo

* update changelog
```
  0ccecc7c
08 Mar, 2023 1 commit

GroupedGEMM + Gelu client example/instances/profiler (#614) · 9096b1c7

Adam Osewski authored Mar 08, 2023



* Grouped gemm + Gelu instances.

* Device Instance Factory for GroupedGemm+Gelu

* Client example

* Rangify fill helper functions.

* Fix name clash.

* Profiler for grouped_gemm+gelu

* No need to use full namespace name.

* Add check for MRaw divisible by vector load.

* Ugly fix for big errors.

* Add grouped_gemm+gelu to profiler CMakelists.

* Store in argument additional info.

* Information about Mraw, Nraw, Kraw values.

* Use FastGelu instead of Gelu.

* Change client ex to use FastGelu

* Remove relaxed error precision.

* Remove duplicate output elementwise-op

---------
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

9096b1c7

06 Mar, 2023 2 commits

Add descriptions to avoid build issues (#619) · 1e59eb3b
Rostyslav Geyyer authored Mar 06, 2023
```
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
```
1e59eb3b

Generate output using Doxygen / Breathe (#598) · e4bf6d42

pmaybank authored Mar 06, 2023



* Modify Doxygen config to pick up include directories recursively

* Add DeviceMem struct to API Reference guide

* Add classes that are used in Flash Attention kernel

* Add a reference and config for generating bibliography
Co-authored-by: Philip Maybank <Philip.Maybank@amd.com>

e4bf6d42