Commits · ef5e60f6bded28416d731ca14aff2e428f001cff · gaoqiong / composable_kernel_ROCM

09 Dec, 2024 2 commits
- Add ckProfiler gemm instances for new mfma instructions and fix ckProfiler build on MI350 · 9eb75a08
  root authored Dec 06, 2024
  
  9eb75a08
- Add new mfma instructions and examples · 22902b1f
  root authored Nov 13, 2024
  
  22902b1f
06 Dec, 2024 1 commit
- Add ckProfiler gemm instances for new mfma instructions and fix ckProfiler build on MI350 · b3c4677b
  root authored Dec 06, 2024
  
  b3c4677b
27 Nov, 2024 1 commit
- clean-up · 001a32c5
  illsilin authored Nov 26, 2024
  
  001a32c5
22 Nov, 2024 1 commit
- Fix typo · bdc1dd6f
  Rostyslav Geyyer authored Nov 22, 2024
  
  bdc1dd6f
13 Nov, 2024 1 commit
- Add new mfma instructions and examples · 24673871
  root authored Nov 13, 2024
  
  24673871
07 Nov, 2024 1 commit
- enable compilation for generic navi targets (#1645) · 75c5bfa3
  Illia Silin authored Nov 07, 2024
  
  75c5bfa3
04 Nov, 2024 1 commit
- Add stochastic rounding tests · 4c47048f
  Rostyslav Geyyer authored Nov 04, 2024
  
  4c47048f
21 Aug, 2024 1 commit

Set RNE fp8 conversion as a default (#1458) · e20f20ef

Rostyslav Geyyer authored Aug 21, 2024

* Set RNE fp8 conversion as a default

* Update f8 tests

* Disable failing test on gfx11

* Update bf8 tests

* Add a flag

* Fix the flag

* Raise flag for gfx10 as well

* Temp commit for tolerance testing

* Update tolerances

e20f20ef

27 Jun, 2024 1 commit
- Merging the gfx12 code into public repo. (#1362) · 941d1f7c
  Illia Silin authored Jun 27, 2024
  
  941d1f7c
17 Jun, 2024 1 commit
- disabled lds direct load inline asm (#1331) · e0210316
  zjing14 authored Jun 16, 2024
  
  e0210316
10 May, 2024 1 commit
- Code clean-up (#1285) · 566b6480
  Illia Silin authored May 10, 2024
```
* code clean-up

* remove the profiling output samples
```
  566b6480
07 May, 2024 1 commit

Enable logging in CK with environment variable. (#1278) · bf420976

Illia Silin authored May 07, 2024



* enable logging using environment variable

* update ck.hpp header

* fix typo

* fix clang format

* Update include/ck/utility/env.hpp
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

---------
Co-authored-by: Bartłomiej Kocot <barkocot@amd.com>

bf420976

29 Apr, 2024 1 commit

Mark unneeded instances as "getting deprecated" (#1265) · 6ced3c12

Rostyslav Geyyer authored Apr 29, 2024



* Add a flag

* Add flag check and messages

---------
Co-authored-by: root <root@aus-g7-rogeyyer.amd.com>

6ced3c12

02 Apr, 2024 1 commit

Split the instances by architecture. (#1223) · ae57e593

Illia Silin authored Apr 02, 2024

* parse examples inside the add_example_executable function

* fix the example 64 cmake file

* add xdl flag to the gemm_bias_softmax_gemm_permute example

* add filtering of tests based on architecture type

* enable test_grouped_gemm for gfx9 only

* enable test_transpose only for gfx9

* only linnk test_transpose if it gets built

* split the gemm instances by architectures

* split gemm_bilinear,grouped_conv_bwd_weight instances by targets

* split instances by architecture

* split grouped_conv instances by architecture

* fix clang format

* fix the if-else logic in group_conv headers

* small fix for grouped convolution instances

* fix the grouped conv bwd weight dl instances

* fix client examples

* only enable client examples 3 and 4 on gfx9

* set the gfx9 macro

* make sure the architecture macros are set by cmake

* use separate set of xdl/wmma flags for host code

* sinmplify the main cmake file

* add conv_fwd_bf8 instance declaration

ae57e593

12 Mar, 2024 1 commit
- some small changes · 9a9cb884
  illsilin authored Mar 11, 2024
  
  9a9cb884
27 Feb, 2024 1 commit
- remove unnecessary changes · 924639f9
  aska-0096 authored Feb 27, 2024
  
  924639f9
21 Feb, 2024 1 commit
- add support for more dl kernels on navi4 · 4c683df4
  illsilin authored Feb 21, 2024
  
  4c683df4
17 Feb, 2024 1 commit
- fixed block_sync_lds · 8831b0d8
  Jing Zhang authored Feb 16, 2024
  
  8831b0d8
16 Feb, 2024 1 commit
- enabled dl_gemm · 28d672d5
  Jing Zhang authored Feb 15, 2024
  
  28d672d5
15 Feb, 2024 2 commits
- remove extra endif · 22509c0b
  illsilin authored Feb 15, 2024
  
  22509c0b
- initial navi4x enablement · 03874cbd
  illsilin authored Feb 15, 2024
  
  03874cbd
14 Feb, 2024 1 commit
- initial enablement of gfx950 · d66da6be
  illsilin authored Feb 14, 2024
  
  d66da6be
02 Feb, 2024 1 commit

Add support for more Navi2x and Navi3x models. (#1152) · 180f16f9

Illia Silin authored Feb 02, 2024

* add support for navi2x and navi3x models

* fix syntax

* use common macro for different mi300 architectures

180f16f9

15 Jan, 2024 1 commit

Add cppcheck to CK CI. (#1125) · e6d099c8

Illia Silin authored Jan 15, 2024

* add cppcheck to the CK CI

* fix the path to CK source for cppcheck

* fix the path to CK source for cppcheck one more time

* fix the path to CK source for cppcheck third time

* change the path to ck_cppcheck.log

* install latest cppcheck from source

* fix bug in ck.hpp and use 20 threads for cppcheck

* create a switch to turn cppckeck on and off in CI

e6d099c8

03 Dec, 2023 1 commit

Add support for double buffering in direct load GEMM kernel (#1052) · bc4bf9bd

Bartlomiej Wroblewski authored Dec 03, 2023

This PR introduces support for double buffering in LDS into GEMM kernels that use direct load instructions.

Direct loads now use inline asm instead of intrinsics. Usage of intrinsics results in compiler adding additional waitcnt instructions what breaks possible load/compute overlap in case of double buffering.

Usage of inline asm results in the need to use sched_barrier in order to make sure that compiler cannot incorrectly reschedule instructions since it does not know the data dependencies between global->LDS and LDS->registers.

bc4bf9bd

28 Nov, 2023 1 commit

Switch default f8 conversion to stochastic rounding (#1048) · 6ef034f6

Rostyslav Geyyer authored Nov 27, 2023

* Switch default f8 conversion to stochastic rounding

* Refactor f8-related type_converts

* Add an element-wise op

6ef034f6

19 Oct, 2023 1 commit
- Fix the DL kernel issues on Navi3x. (#998) · f7331c60
  Illia Silin authored Oct 19, 2023
```
* apply the patch for dl kernels on gfx11

* build DL kernels on navi32 CI
```
  f7331c60
23 Aug, 2023 1 commit

[HotFix] add config and version files to pass on build info (#856) · c8a8385f

Jun Liu authored Aug 23, 2023

* experiment with config file

* experiment with version.h config

* add more info to version.h

* minor updates

* minor updates

* fix case where DTYPE is not used

* large amount of files but minor changes

* remove white space

* minor changes to add more MACROs

* fix cmakedefine01

* fix issue with CK internal conflict

* fix define and define value

* fix clang-format

* fix formatting issue

* experiment with cmake

* clang format v12 to be consistent with miopen

* avoid clang-format for config file

c8a8385f

22 Aug, 2023 1 commit

Fix transform and instances for grouped conv bwd data (#848) · 595d23be

Bartłomiej Kocot authored Aug 22, 2023

* Fix transform and instances for grouped conv bwd data

* Add instances for small K and small C

* Remove workaround after fix

* Fix interface tests

595d23be

14 Aug, 2023 1 commit
- Implement DPP8 based GEMM for Navi21 (#826) · d4c84256
  Bartlomiej Wroblewski authored Aug 14, 2023
  
  d4c84256
03 Aug, 2023 2 commits
- Change to github_issue prefix · aac65a03
  Bartlomiej Kocot authored Aug 01, 2023
  
  aac65a03
- Rename the workaround to a proper issue name · e6a826d3
  Bartlomiej Kocot authored Aug 01, 2023
  
  e6a826d3
27 Jul, 2023 1 commit

Add s_nops after v_dot to avoid hazard (#808) · 7761e523

Bartłomiej Kocot authored Jul 27, 2023

* Add s_nops after v_dot to avoid hazard

* Fix builtin for inner_produxt fp16

* Skip inline version to builtin

* Add comments regarding isa

* Fix comment regarding s_nop

7761e523

06 Jul, 2023 1 commit

Split GEMM instance library & enable pipeline v2 optimization (#783) · 850144a0

Po Yen Chen authored Jul 06, 2023

* Move source file into sub-directories

* Add missing include directive

* Split DeviceGemmXdl<> fp16 instances

* Fix format

* Remove unnecessary CMakeLists.txt

* Add macros to toggle new features

* Remove debug message

* Turn off GEMM v2 pipeline optimization by default

* Fix format

* Extract duplicated string as list

* Enlarge indent in CMakeLists.txt

850144a0

21 Jun, 2023 1 commit

Support bf16/f32/f16 and NHWGC conv2d_bwd_data (#757) · 63388e84

Bartłomiej Kocot authored Jun 21, 2023

* Support bf16/f32/f16 and NHWGC conv2d_bwd_data

* Add interface test

* clang format

* Comment fixes

* Add more friendly error message

63388e84

15 Jun, 2023 1 commit

Enable gfx941 and gfx942 architectures. (#752) · 027e46ee

Illia Silin authored Jun 15, 2023

* enable gfx941/942 targets

* fix clang format

* fix the cmake logic for multiple targets

* fix cmake syntax for looping over targets

* add gfx941/942 support for gemm_xdl instances

027e46ee

31 May, 2023 1 commit
- update copyright headers (#726) · b94fd0b2
  Illia Silin authored May 31, 2023
  
  b94fd0b2
04 May, 2023 1 commit

Optimize bf16 conversion (#664) · b076a02a

Rostyslav Geyyer authored May 04, 2023

* Add TypeConvert class and start refactoring

* Refactor TypeConvert as a struct

* Get back to template functions type_convert

* Add a type_convert_bf16_rtn, set rtz as default

* Clean up

* Add UnaryConvertPrecision struct for high-precision workloads

* Format

* Update type_convert to UnaryConvert on threadwise level

* Update UnaryConvertPrecision

* Format

* Fix chmod

* Add a flag to pick converion method

* Format

* Remove the added flag

* Merge elementwise op with type conversion

* Move type_convert to elemwise op, update the op

* Update type_convert_precision -> bf16_convert_rtn

* Clean up

* Update comments

* Update the CK_WORKAROUND_DENORM_FIX flag handling

* Update the unneeded op to work but warn user

* Remove the message

* Use a PassThrough instead of ConvertBF16RTN to calcaulate reference

* Format

* Add missing include

b076a02a

28 Apr, 2023 1 commit

Syncing up from internal repo to enable MI300. (#690) · 4feebedd

Illia Silin authored Apr 28, 2023



* enable gfx940

* switch between intrinsic mfma routines on mi100/200 and mi300

* fix mfma_int8 on MI300

* disable 2 int8 examples on MI300

* Update cmake-ck-dev.sh

* restore gitignore file

* modify Jenkinsfile to the internal repo

---------
Co-authored-by: Jing Zhang <jizha@amd.com>
Co-authored-by: zjing14 <zhangjing14@gmail.com>

4feebedd