Commits · 532bbe5372b6c2ba99266020deff2afe4ff132ce · gaoqiong / composable_kernel

23 May, 2023 3 commits
- Add fp16 casting functions · 532bbe53
  Rostyslav Geyyer authored May 23, 2023
  
  532bbe53
- Format · c1ba7c63
  Rostyslav Geyyer authored May 23, 2023
  
  c1ba7c63
- Update type_converts · 052ab48a
  Rostyslav Geyyer authored May 23, 2023
  
  052ab48a
18 May, 2023 1 commit
- Add some constexpr · b9bf7fb8
  Rostyslav Geyyer authored May 18, 2023
  
  b9bf7fb8
16 May, 2023 1 commit
- Format · a30a0128
  Rostyslav Geyyer authored May 15, 2023
  
  a30a0128
15 May, 2023 2 commits
- Split f8_convert_sr in host and device · 114c341f
  Rostyslav Geyyer authored May 15, 2023
  
  114c341f
- Eliminate magic numbers · fd2e6309
  Rostyslav Geyyer authored May 15, 2023
  
  fd2e6309
12 May, 2023 6 commits
- Add element op · 28187354
  Rostyslav Geyyer authored May 12, 2023
  
  28187354
- Format · 653f9515
  Rostyslav Geyyer authored May 12, 2023
  
  653f9515
- Add fp8_convert_sr · 4ddb62bd
  Rostyslav Geyyer authored May 12, 2023
  
  4ddb62bd
- Add elementwise ops · 4089bc68
  Rostyslav Geyyer authored May 12, 2023
  
  4089bc68
- Move fp8 utils to a separate header · 185fb545
  Rostyslav Geyyer authored May 12, 2023
  
  185fb545
- Minor fix · 9e24e2bc
  Rostyslav Geyyer authored May 12, 2023
  
  9e24e2bc
11 May, 2023 4 commits
- Minor fix · be7e055e
  Rostyslav Geyyer authored May 11, 2023
  
  be7e055e
- Format · 872093b7
  Rostyslav Geyyer authored May 11, 2023
  
  872093b7
- Split type_convert and cast_to/from_f8 · 5038b95b
  Rostyslav Geyyer authored May 11, 2023
  
  5038b95b
- Normalization/split k (#615) · a1e344b1
  rocking authored May 11, 2023
  
  a1e344b1
08 May, 2023 4 commits
- Format · f07a74d1
  Rostyslav Geyyer authored May 08, 2023
  
  f07a74d1
- Add fp8<->fp32 type_convert · 21481b44
  Rostyslav Geyyer authored May 08, 2023
  
  21481b44
- Format · d3929cb0
  Rostyslav Geyyer authored May 08, 2023
  
  d3929cb0
- Add basic fp8 definitions and prn-generator · 6f0735f5
  Rostyslav Geyyer authored May 08, 2023
  
  6f0735f5
04 May, 2023 1 commit

Optimize bf16 conversion (#664) · b076a02a

Rostyslav Geyyer authored May 04, 2023

* Add TypeConvert class and start refactoring

* Refactor TypeConvert as a struct

* Get back to template functions type_convert

* Add a type_convert_bf16_rtn, set rtz as default

* Clean up

* Add UnaryConvertPrecision struct for high-precision workloads

* Format

* Update type_convert to UnaryConvert on threadwise level

* Update UnaryConvertPrecision

* Format

* Fix chmod

* Add a flag to pick converion method

* Format

* Remove the added flag

* Merge elementwise op with type conversion

* Move type_convert to elemwise op, update the op

* Update type_convert_precision -> bf16_convert_rtn

* Clean up

* Update comments

* Update the CK_WORKAROUND_DENORM_FIX flag handling

* Update the unneeded op to work but warn user

* Remove the message

* Use a PassThrough instead of ConvertBF16RTN to calcaulate reference

* Format

* Add missing include

b076a02a

03 May, 2023 2 commits

Fix the group of quantization_int8 kernels on MI300. (#695) · b8635a25

Illia Silin authored May 03, 2023



* replace amd_buffer_atomic_add with hip_atomic_add

* fix grouped_gemm_splitk kernels on mi300

* fix syntax

* revert experimental atomic_add changes

* fix the group of kernels from ticket 723 on MI300

---------
Co-authored-by: Jing Zhang <jizhan@amd.com>

b8635a25

Fix grouped_gemm_splitk kernels on MI300. (#694) · 4a51d2da

Illia Silin authored May 03, 2023



* replace amd_buffer_atomic_add with hip_atomic_add

* fix grouped_gemm_splitk kernels on mi300

* fix syntax

* revert experimental atomic_add changes

---------
Co-authored-by: Jing Zhang <jizhan@amd.com>

4a51d2da

28 Apr, 2023 1 commit

Syncing up from internal repo to enable MI300. (#690) · 4feebedd

Illia Silin authored Apr 28, 2023



* enable gfx940

* switch between intrinsic mfma routines on mi100/200 and mi300

* fix mfma_int8 on MI300

* disable 2 int8 examples on MI300

* Update cmake-ck-dev.sh

* restore gitignore file

* modify Jenkinsfile to the internal repo

---------
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

4feebedd

26 Apr, 2023 1 commit
- add vector load check (#680) · 54c90aae
  Haocong WANG authored Apr 27, 2023
```
Co-authored-by: zjing14 <zhangjing14@gmail.com>
```
  54c90aae
24 Apr, 2023 1 commit

Grouped Gemm + SplitK + simplified Kernel Args (#669) · 8bb2bb4a

Adam Osewski authored Apr 24, 2023



* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* B2C with 3D grid for KSplit

* Remove unused code.

* Use default B2C (3D grid) in grid gemm v2r4r2.

* Device gemm splitk use B2C map.

* Device GroupedGemmXdlSplitKCShuffle

* Example for GroupedGemm Xdl SplitK

* Introduce Device GroupedGemmSplitK

* Fix updating kbatch size.

* Add instance mk-nk-mn

* Enable set kbatch in profiler.

* Add GGemmSplitK mk-kn-mn instances

* Add more instances & split into multiple files.

* minor fix

* tuning

* clean

* disabled failed instances

* use pipe v2

* Ignore arg on not supported arch.

* fix warning

---------
Co-authored-by: carlushuang <carlus.huang@amd.com>
Co-authored-by: Adam Osewski <aosewski@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Jing Zhang <jizhan@amd.com>
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>

8bb2bb4a

22 Apr, 2023 1 commit

Put back the split-k gemm code. (#684) · 903cd19c

Illia Silin authored Apr 21, 2023



* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* use name from tensor layout

---------
Co-authored-by: carlushuang <carlus.huang@amd.com>

903cd19c

16 Apr, 2023 2 commits
- Fix a typo (#676) · fc26d42a
  Haocong WANG authored Apr 16, 2023
  
  fc26d42a
- Add more macros to turn on/off denorm fix (#678) · 03eaee6a
  Rostyslav Geyyer authored Apr 15, 2023
```
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
```
  03eaee6a
11 Apr, 2023 2 commits
- Add memory index guard in wmma device ops (#667) · e85178b4
  Haocong WANG authored Apr 12, 2023
  
  e85178b4
- add a marco to turn on/off denorm fix (off by default) (#673) · c54f8bcc
  zjing14 authored Apr 11, 2023
```
* add a marco to turn off denorm fix by default

* expose the marco

---------
Co-authored-by: root <root@ctr-ubbsmc15.amd.com>
```
  c54f8bcc
10 Apr, 2023 1 commit

Groupnorm + swish external api (#668) · ed3a2e52

rocking5566 authored Apr 10, 2023

* Rename to proper naming

* Add example of groupnorm + swish

* Extract duplicate code in example

* Add groupnorm + swish instances

* Ractor instance generation, split into multiple cpp file

* Add external api and client example

* Refine profiler message

* Use ck math version of exp

* Refine problem size in example

* Add host version of exp

ed3a2e52

07 Apr, 2023 1 commit
- Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665) · 3248387b
  Jun Liu authored Apr 06, 2023
```
This reverts commit bb5530af.
```
  3248387b
30 Mar, 2023 2 commits
- fix 3rd dword of buffer source descriptor (#659) · 091570f5
  Haocong WANG authored Mar 30, 2023
  
  091570f5
- simplify karg in device/grid of split-k op (#644) · bb5530af
  carlushuang authored Mar 30, 2023
```
* simplify karg in device/grid split-k op

* fix mk_kn_mn instances

* add more instances

* use name from tensor layout
```
  bb5530af
29 Mar, 2023 2 commits

Add a denorm test fix (#603) · dbd8f94b

Rostyslav Geyyer authored Mar 29, 2023



* Add type_convert implementations for bf16

* Add the fix for conv_fwd

* Add the fix for conv_bwd_data

* Add the fix for conv_bwd_weight

* Format

* Format

* Another format

* Add a macro to use workaround on MI200 only

* Format

---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

dbd8f94b

Conv + quantization + tanh (#645) · 389e84a8

rocking5566 authored Mar 30, 2023



* Rename file. Prepare to support another activation

* Add comment for quantization

* Extract out_elementop

* Add tanh example

* Add conv + bias + tanh quantization instance

* Add missing parameter

* Refine cmake

* Add external api and client example

* Extract variable in example

* Fix the comment

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

389e84a8

23 Mar, 2023 1 commit
- [Navi3x] Fix Gridwise_multiple_d operation (#649) · e5376be4
  Haocong WANG authored Mar 24, 2023
```
* Add CMake Option "USE_OPT_NAVI3X"

* fix bug
```
  e5376be4
22 Mar, 2023 1 commit
- Get rid of XDL parameters in WMMA kernel string. (#646) · 36750a57
  Illia Silin authored Mar 22, 2023
```
* remove XDL parameters from WMMA kernel string

* get rid f two more parameters
```
  36750a57