Commits · 019d4b7ca5bfa9304605c957a19d29edca49c13c · gaoqiong / composable_kernel_ROCM

17 Jan, 2025 4 commits

merge from public repo · 019d4b7c
illsilin authored Jan 17, 2025

019d4b7c
Merge pull request #285 from ROCm/merge_from_public · 5063a39f
Illia Silin authored Jan 17, 2025
```
Merge from public
```
5063a39f
merge from public · 96a0d5f6
illsilin authored Jan 16, 2025

96a0d5f6

Implementing Test Filters for Smoke and Regression Tests (#1819) · 54de3e55

Aviral Goel authored Jan 16, 2025

* smoke and regression targets working with tests

* test filters work for both examples and test

* removed uneccesary comments

* added a missing comment

* added a missing comment

* fixed typo in the comments

* updated README

* Update PULL_REQUEST_TEMPLATE.md

updating the template for future addition of test cases

* Update PULL_REQUEST_TEMPLATE.md

54de3e55

16 Jan, 2025 2 commits

Fix and optimize dynamic unary elementwise (#1818) · 1519ce91
Bartłomiej Kocot authored Jan 16, 2025
```
* Fix and optimize dynamic unary elementwise

* fix
```
1519ce91

[CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block (#1808) · 1ff50e78

carlushuang authored Jan 16, 2025

* fix mock token id

* prepare host for g1u1

* reformat inline-asm

* restructure uk_0

* restructure gate_up

* done

* change default to init=1

* update readme

* fix a bug in interleave pipeline

* rcp for silu

1ff50e78

15 Jan, 2025 4 commits

disable inductor codegen tests on legacy OS (#1816) · 8c29e06f
Illia Silin authored Jan 15, 2025

8c29e06f

Add rounding for float to bf16 conversion as default (#1812) · 7790e8c3

Bartłomiej Kocot authored Jan 15, 2025

* Add rounding for float to bf16 conversion

* Add bhalf test

* Add inf test bhalf

* Refactor

* update cmake

* Fixes

7790e8c3

[CK_TILE] Add Various Fusion Functions to RMSNorm (#1802) · 04dd3148

ruanjm authored Jan 15, 2025



* Add shortcut to RMSNorm

* Modify test for adding shortcut for RMSNorm

* Add fused parameter into tests

* 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp

* 1. Supports various stride and percisions.

* Add support of Epilogue

* Add fuse and epilogue support to rmsnorm ref

* Modify rmsnorm example

* Refactor tests/examples

* Bug fix for newly added tests/examples

* Bug fix for new tests 2

* Modify smoke test scripts

remove dbg code

* Supports non-smooth dyanmic quant

* Update Rmsnorm2dFwd::GetName()

* rename xscale and prec_sx to smoothscale and prec_sm

Bug fix after rename

Remove files

* change example_rmsnorm2d_fwd.cpp

* update performance calculator

* Fix issue in two-pass when fuse add is enabled

* Remove comment of beta

---------
Co-authored-by: rocking <ChunYu.Lai@amd.com>

04dd3148

MX FP GEMM - Example Template (#277) · 07307ea1

Andriy Roshchenko authored Jan 14, 2025

Temporarily uses `DeviceGemmMultiD_ABScale_Xdl_CShuffle_V3` kernel and 128x128 scaling matrices.
Must be modified to use MX-native GEMM kernell with 16 or 32 component vectors per scale.

Verified on the emulator.

07307ea1

13 Jan, 2025 5 commits

fix parsing instances for pt inductor (#1796) · c0b90f13

Max Podkorytov authored Jan 13, 2025



add unit test for gen instances for gemms

add unit tests for conv and batched gemms

add unit test for preselected gemm instances

apply ruff lint

add license header for the unit test

add inductor pytest to CI

verbose pip install

switch the directory before installing python packages

move the inductor codegen test

try yet another workdir

Update Jenkinsfile

The directory looks right, fixing pip module not found by invoking pip directly

Update Jenkinsfile

invoke pytest directly since the module is not found

Update Dockerfile

Install setuptools

update package structure

bump setuptools

maybe fix data path for library sources

fix library search path for conv instances

fix path in pyproject definition

compare path used in gen_instances with one in pyproject.toml; fix the difference
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

c0b90f13

Dev/merge u8w8 (#1774) · 53ab1b90

feli authored Jan 14, 2025



* port tiles from a8w8

* rm debug used files

* add instances

* remove all non gemm in cmake

* merge; impl fp16

* recover cmake from develop

* add missed files; fix clang format

---------
Co-authored-by: coderfeli <coderfeli@163.com>

53ab1b90

CK Tile GEMM CICD fixed & register block method refactor (#1776) · 5d671a5f

Thomas Ning authored Jan 12, 2025

* refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm

* Finished the 2x2 warp gemm policy and the block selection mechanism

* Clang format

* address poyen's comment

* Address feedbacks

* Fixed the compilation issue

* Change the function name

5d671a5f

[CK_TILE] Adjust kBlockSize of reduce example for better perf (#1779) · 0b8f117f
ClementLinCF authored Jan 13, 2025
```
* Observed a 2x perf improvement with kBlockSize = 256
* Using 512 threads may lead to redundant computations
```
0b8f117f

Update for fmha_fwd qs_ks_vs pipeline (#1810) · 3d50f57f

Qianfeng authored Jan 13, 2025



* Update for fmha_fwd qs_ks_vs pipeline

* Remove _builtin_amdgcn_sched_barrier(0)

* Move p_compute to p converting earlier for trying to increase vgprs re-using

* Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation

* Re-add __builtin_amdgcn_sched_barrier(0)

---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

3d50f57f

10 Jan, 2025 2 commits

Grouped convolution backward weight special vector size loads (#1772) · fd46a01d

Bartłomiej Kocot authored Jan 10, 2025

* Grouped convolution backward weight special vector size loads

* Instnaces and tests

* Fixes

* Add 7 and 13 special cases

* fix comments

* Fix

* Fix2

* fixes

* fix atomic add bf16

fd46a01d

Ck tile/gemm perf measure (#1750) · 73a076ee

Thomas Ning authored Jan 09, 2025



* Finished adding the performance benchmark for ck tile gemm

* Fix the executable rename problem

* fix the executable name error

* delete the unsupported layout combinations

* Update run_full_test.sh

* Update benchmark_mem_pipeline.sh

* Update benchmark_basic.sh

* change the executable of gemm_universal

* change ck_tile_gemm script permissions

* Addressed the comment

* Addressed the comment

* Fixed the comments

* Fixed Comment

* roll back the malfunctioned change

* Fix the Typo

* finalize the tile_gemm_fp16 performance monitoring

* fix the stash names for ck_tile gemm logs

* change the stashing logic

* change stashing syntax

---------
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Co-authored-by: illsilin <Illia.Silin@amd.com>

73a076ee

08 Jan, 2025 12 commits
- Disable building DPP kernels by default (#1804) · 26b3829c
  darren-amd authored Jan 08, 2025
```
* Disable building DPP kernels by default

* Disable building dpp instances, examples, or tests if DPP_KERNELS is not set

* Add new DPP_KERNELS flag to readme
```
  26b3829c
- mark unused args · ad697c78
  Max Podkorytov authored Jan 07, 2025
  
  ad697c78
- run clang-format -style=file · a2e6ad62
  Max Podkorytov authored Jan 07, 2025
  
  a2e6ad62
- run clang-format==12 · aa59ecaa
  Max Podkorytov authored Dec 19, 2024
  
  aa59ecaa
- update comment in the policy · 82fb3f84
  Max Podkorytov authored Dec 19, 2024
  
  82fb3f84
- update qsksvs comment · 4daa82b4
  Max Podkorytov authored Dec 19, 2024
  
  4daa82b4
- remove dead code · 66c5b715
  Max Podkorytov authored Dec 19, 2024
  
  66c5b715
- clang-format and remove dead code · edb78a47
  Max Podkorytov authored Dec 19, 2024
  
  edb78a47
- roll back splitkv · 60113859
  Max Podkorytov authored Dec 18, 2024
  
  60113859
- update qsksvs pipeline · bfc997a7
  Max Podkorytov authored Dec 18, 2024
  
  bfc997a7
- qsksvs pipeline changes to mirror qrksvs · f7942b99
  Max Podkorytov authored Dec 17, 2024
  
  f7942b99
- enable bias feature that add bias before adding residual (for rtpllm project) (#1741) · d5c8a334
  AMD-dteng authored Jan 08, 2025
```
* 1. enable bias feature that add bias before adding residual; 2. change block size from 128->64 when m<64 in fp16

* delete comment

* 1.remove fmha change 2.change buffer name from bias to xbias

* Now bias can be used independently from fadd

* change kbias to kxbias

---------
Co-authored-by: feli <felix.li@amd.com>
```
  d5c8a334
07 Jan, 2025 5 commits

[MX FP8] Add Scaled Type Convert Functions for OCP FP8/BF8 data types (#271) · c4a05057

Andriy Roshchenko authored Jan 07, 2025

* Move scaled_type_convert functions to a separate header

* Introduce MX data tests

* Build MX tests only on relevant architectures

* Refactor E8M0 scale implementation

* Fix `config.h` typo

* Cleanup deprecated symbols

* Refactor `amd_ck_fp8.hpp`

* `scaled_type_convert` for `f8_ocp_t`

* Implement test for MX FP8 scaled type convert

* Implement test for MX BF8 scaled type convert

* Scaled type convert for vectors of 2 FP8 elements

* Scaled type convert for vectors of 16 FP8 elements

* Implementation of scaled conversion from F32 to F8

* Add tests for scaled conversions from FP32 to FP8

* Add documentation to the test functions

* Implementation of scaled conversion from F32x2 to F8x2

* Implementation of scaled conversion from F32x16 to F8x16

* Implementation of scaled conversion from F32x32 to F8x32

* Implementation of scaled conversion from F8x32 to F32x32

* Verified on the emulator

c4a05057

enable smfmac test · 23e2309d
illsilin authored Jan 07, 2025

23e2309d
Update LICENSE to 2025 (#1797) · a6b761c3
spolifroni-amd authored Jan 07, 2025

a6b761c3

Bump rocm-docs-core from 1.12.1 to 1.13.0 in /docs/sphinx (#1798) · 9f6bf9ab

dependabot[bot] authored Jan 07, 2025

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.12.1 to 1.13.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.12.1...v1.13.0

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

9f6bf9ab

[CK_TILE] fmha fwd splitkv optimization for decode (seqlen_q=1) (#1789) · 24b12d04

Po Yen Chen authored Jan 07, 2025



* Update license year

* Add initial code to override decode problem

* Fix splitkv traits/args overriding error

* Reshape and transpose lse for decode

* Remove debug code

* Prettify example code

* Use better function name

* Add kMergeNumHeadGroupsSeqLenQ flag

Kernel user can use this switch to turn on/off optimization for
some problem sizes

* Add missing flag declarations

* Default turn off kMergeNumHeadGroupsSeqLenQ in codegen

* Group similar statements together

* Remove assumption of seqlen_q=1

* Remove kMergeNumHeadGroupsSeqLenQ from splitkv combine kernel

* Support kMergeNumHeadGroupsSeqLenQ=true in fmha splitkv kernel

* Run kMergeNumHeadGroupsSeqLenQ=true kernels when need

* Fix group mode block skip logics

* Undo changes of normal fwd kernel

* Update in GridSize() and using GridSize() for splitkv kernel (#1799)

---------
Co-authored-by: Qianfeng <qianfeng.zhang@amd.com>

24b12d04

06 Jan, 2025 3 commits

fix test_bf6 · ebc4561f
illsilin authored Jan 06, 2025

ebc4561f
replace the fp6 with bf6 convert calls in test_bf6 · d44b24d1
illsilin authored Jan 06, 2025

d44b24d1

Add MXFP6 and MXBF6 conversion methods (#270) · e093146e

Rostyslav Geyyer authored Jan 06, 2025

* Add conversions

* Add tests

* Add docstrings

* Add scaled conversions

* Add fp6/bf6 tests

* Remove misleading fp4 test case

* Add docstrings

* Clean up

* Address comments

* Set stricter tolerances for RNE tests

* Add missing tests

* Add native conversions to float

* Revert "Add native conversions to float"

This reverts commit 09467111f73b753c8cc3d597533b187940353dab.

* Update copyright years

e093146e

04 Jan, 2025 3 commits

Fix universal gemm profiler for pk_i4_t (#1790) · 888317e6
Bartłomiej Kocot authored Jan 04, 2025
```
* Fix universal gemm profiler for pk_i4_t

* fix
```
888317e6

Bump rocm-docs-core from 1.12.0 to 1.12.1 in /docs/sphinx (#1788) · 37b35146

dependabot[bot] authored Jan 03, 2025

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.12.0 to 1.12.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.12.0...v1.12.1

)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

37b35146

terminology clean-up (#1792) · 8ea375bb
Illia Silin authored Jan 03, 2025

8ea375bb