Commits · c208a8acea1a11b9f58a9768ef1bc4749be7fcc4 · gaoqiong / composable_kernel

09 Jun, 2023 4 commits
- Put random number gen to a separate header · c208a8ac
  Rostyslav Geyyer authored Jun 09, 2023
  
  c208a8ac
- Put type_converts into a separate header · d6a666fa
  Rostyslav Geyyer authored Jun 09, 2023
  
  d6a666fa
- Remove leftover code · f61c7704
  Rostyslav Geyyer authored Jun 09, 2023
  
  f61c7704
- Add comments on rounding modes · f730c3fb
  Rostyslav Geyyer authored Jun 09, 2023
  
  f730c3fb
24 May, 2023 3 commits
- Merge host and device implementations · f1c2ec74
  Rostyslav Geyyer authored May 24, 2023
  
  f1c2ec74
- Clean up · 8386868b
  Rostyslav Geyyer authored May 24, 2023
  
  8386868b
- Clean-up the headers (#713) · ac9e01e2
  Illia Silin authored May 24, 2023
```
* fix headers for gpu instances

* remove unused headers

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>
```
  ac9e01e2
23 May, 2023 7 commits
- Format · 8107bbb5
  Rostyslav Geyyer authored May 23, 2023
  
  8107bbb5
- Use element location for PRNG · c5e22952
  Rostyslav Geyyer authored May 23, 2023
  
  c5e22952
- Use seed as a runtime arg · 789862ca
  Rostyslav Geyyer authored May 23, 2023
  
  789862ca
- Format · 502942fe
  Rostyslav Geyyer authored May 23, 2023
  
  502942fe
- Add fp16 casting functions · 532bbe53
  Rostyslav Geyyer authored May 23, 2023
  
  532bbe53
- Format · c1ba7c63
  Rostyslav Geyyer authored May 23, 2023
  
  c1ba7c63
- Update type_converts · 052ab48a
  Rostyslav Geyyer authored May 23, 2023
  
  052ab48a
18 May, 2023 1 commit
- Add some constexpr · b9bf7fb8
  Rostyslav Geyyer authored May 18, 2023
  
  b9bf7fb8
16 May, 2023 1 commit
- Format · a30a0128
  Rostyslav Geyyer authored May 15, 2023
  
  a30a0128
15 May, 2023 2 commits
- Split f8_convert_sr in host and device · 114c341f
  Rostyslav Geyyer authored May 15, 2023
  
  114c341f
- Eliminate magic numbers · fd2e6309
  Rostyslav Geyyer authored May 15, 2023
  
  fd2e6309
12 May, 2023 5 commits
- Add element op · 28187354
  Rostyslav Geyyer authored May 12, 2023
  
  28187354
- Format · 653f9515
  Rostyslav Geyyer authored May 12, 2023
  
  653f9515
- Add fp8_convert_sr · 4ddb62bd
  Rostyslav Geyyer authored May 12, 2023
  
  4ddb62bd
- Move fp8 utils to a separate header · 185fb545
  Rostyslav Geyyer authored May 12, 2023
  
  185fb545
- Minor fix · 9e24e2bc
  Rostyslav Geyyer authored May 12, 2023
  
  9e24e2bc
11 May, 2023 3 commits
- Minor fix · be7e055e
  Rostyslav Geyyer authored May 11, 2023
  
  be7e055e
- Format · 872093b7
  Rostyslav Geyyer authored May 11, 2023
  
  872093b7
- Split type_convert and cast_to/from_f8 · 5038b95b
  Rostyslav Geyyer authored May 11, 2023
  
  5038b95b
08 May, 2023 4 commits
- Format · f07a74d1
  Rostyslav Geyyer authored May 08, 2023
  
  f07a74d1
- Add fp8<->fp32 type_convert · 21481b44
  Rostyslav Geyyer authored May 08, 2023
  
  21481b44
- Format · d3929cb0
  Rostyslav Geyyer authored May 08, 2023
  
  d3929cb0
- Add basic fp8 definitions and prn-generator · 6f0735f5
  Rostyslav Geyyer authored May 08, 2023
  
  6f0735f5
04 May, 2023 1 commit

Optimize bf16 conversion (#664) · b076a02a

Rostyslav Geyyer authored May 04, 2023

* Add TypeConvert class and start refactoring

* Refactor TypeConvert as a struct

* Get back to template functions type_convert

* Add a type_convert_bf16_rtn, set rtz as default

* Clean up

* Add UnaryConvertPrecision struct for high-precision workloads

* Format

* Update type_convert to UnaryConvert on threadwise level

* Update UnaryConvertPrecision

* Format

* Fix chmod

* Add a flag to pick converion method

* Format

* Remove the added flag

* Merge elementwise op with type conversion

* Move type_convert to elemwise op, update the op

* Update type_convert_precision -> bf16_convert_rtn

* Clean up

* Update comments

* Update the CK_WORKAROUND_DENORM_FIX flag handling

* Update the unneeded op to work but warn user

* Remove the message

* Use a PassThrough instead of ConvertBF16RTN to calcaulate reference

* Format

* Add missing include

b076a02a

28 Apr, 2023 1 commit

Syncing up from internal repo to enable MI300. (#690) · 4feebedd

Illia Silin authored Apr 28, 2023



* enable gfx940

* switch between intrinsic mfma routines on mi100/200 and mi300

* fix mfma_int8 on MI300

* disable 2 int8 examples on MI300

* Update cmake-ck-dev.sh

* restore gitignore file

* modify Jenkinsfile to the internal repo

---------
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

4feebedd

10 Apr, 2023 1 commit

Groupnorm + swish external api (#668) · ed3a2e52

rocking5566 authored Apr 10, 2023

* Rename to proper naming

* Add example of groupnorm + swish

* Extract duplicate code in example

* Add groupnorm + swish instances

* Ractor instance generation, split into multiple cpp file

* Add external api and client example

* Refine profiler message

* Use ck math version of exp

* Refine problem size in example

* Add host version of exp

ed3a2e52

29 Mar, 2023 2 commits

Add a denorm test fix (#603) · dbd8f94b

Rostyslav Geyyer authored Mar 29, 2023



* Add type_convert implementations for bf16

* Add the fix for conv_fwd

* Add the fix for conv_bwd_data

* Add the fix for conv_bwd_weight

* Format

* Format

* Another format

* Add a macro to use workaround on MI200 only

* Format

---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

dbd8f94b

Conv + quantization + tanh (#645) · 389e84a8

rocking5566 authored Mar 30, 2023



* Rename file. Prepare to support another activation

* Add comment for quantization

* Extract out_elementop

* Add tanh example

* Add conv + bias + tanh quantization instance

* Add missing parameter

* Refine cmake

* Add external api and client example

* Extract variable in example

* Fix the comment

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

389e84a8

20 Mar, 2023 1 commit

rtn in ternary way (#632) · 8a659a2e

Dan Yao authored Mar 21, 2023



* rtn in ternary way

* Check both flags to preserve NaN

* Format

* Rearrange flag1

* Apply suggestions from code review
Co-authored-by: Ronan Keryell <ronan@keryell.fr>

---------
Co-authored-by: Rosty Geyyer <rosty.geyyer@amd.com>
Co-authored-by: Rostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
Co-authored-by: Ronan Keryell <ronan@keryell.fr>

8a659a2e

15 Mar, 2023 2 commits

gemm/Conv xdlops + dlops quantization (#625) · 16dc18e0

rocking5566 authored Mar 16, 2023

* Add conv perlayer quantization

* Add gemm_dlops quantization

* Support int8 for innerproduct

* Refine gemm dlops int8 kernel parameter

* Support gfx908(MI100) and gfx90a(MI200)

* clang-format

* Rename example number

* Support different layout for d tensor

* Add conv dlops perchannel quantization example

* Move to example 40

* Extract the common code for different platform (dlops and xdlops)

* Move ot subfolder. Prepare to add other op of quantization

* Refine the quantization instance library

* Add conv dl instances and client example

* Remove unnecessary type

* Add gemm quantization instance

* Add external api and client example

* Refine num_bytes

* Separete different layout to different cpp

* Add more xdl instances

* Revert "Remove unnecessary type"

This reverts commit 82086918

.

* Remove CShuffleDataType in dlops
Let acc and CShuffleDataType be the same in xdlops

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>

16dc18e0

Fix arch limitation bug (#639) · ea028ac6
Haocong WANG authored Mar 15, 2023

ea028ac6

10 Mar, 2023 1 commit

[Navi3x] Multiple issue fix (#612) · 087e3105

Haocong WANG authored Mar 11, 2023



* Change gridwise gemm mD blockwise gemm to naive

* RRR Gemm fix

* Fix RCR gemm bug

* Isolate wmma instructions

* Update amd_inline_asm.hpp

* Update amd_wmma.hpp

* Update amd_wmma.hpp

* fix syntax and update Jenkinsfile

---------
Co-authored-by: zjing14 <zhangjing14@gmail.com>
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

087e3105

09 Mar, 2023 1 commit
- fix a bug with non-dword-aligned offset when OOB, in case crash (#616) · 76fcdc60
  carlushuang authored Mar 09, 2023
```
Co-authored-by: zjing14 <zhangjing14@gmail.com>
```
  76fcdc60