Commits · df1bad99be38c474f00f1ab6da2b478ec2266a35 · gaoqiong / composable_kernel_ROCM

31 Jan, 2025 5 commits
- Update tests and pack functions · df1bad99
  Rostyslav Geyyer authored Jan 31, 2025
  
  df1bad99
- Fix a typo · 544aad11
  Rostyslav Geyyer authored Jan 31, 2025
  
  544aad11
- Use pointers instead of array indices · 91fa13b0
  Rostyslav Geyyer authored Jan 31, 2025
  
  91fa13b0
- Move flag logic to scaled_type_convert header · 9f58449c
  Rostyslav Geyyer authored Jan 31, 2025
  
  9f58449c
- Add a flag to config file · 7336b04b
  Rostyslav Geyyer authored Jan 31, 2025
  
  7336b04b
30 Jan, 2025 4 commits
- Add docstrings · d032ea56
  Rostyslav Geyyer authored Jan 30, 2025
  
  d032ea56
- Remove unneeded AsType accessors · 97c7e725
  Rostyslav Geyyer authored Jan 30, 2025
  
  97c7e725
- Update pack/unpack methods · bcc12098
  Rostyslav Geyyer authored Jan 30, 2025
  
  bcc12098
- Fix build logic · acf8854e
  Rostyslav Geyyer authored Jan 30, 2025
  
  acf8854e
29 Jan, 2025 2 commits
- Add conversions · b8f4de71
  Rostyslav Geyyer authored Jan 29, 2025
  
  b8f4de71
- Add a flag · c98974ee
  Rostyslav Geyyer authored Jan 29, 2025
  
  c98974ee
27 Jan, 2025 1 commit
- Add size checks in pack function · 2a807013
  Rostyslav Geyyer authored Jan 27, 2025
  
  2a807013
24 Jan, 2025 3 commits
- Fix merge · 7c6a541b
  Rostyslav Geyyer authored Jan 24, 2025
  
  7c6a541b
- Merge branch 'gfx950' into lwpck-2619 · efab74a3
  Rostyslav Geyyer authored Jan 24, 2025
  
  efab74a3
- Update unpack signature · 86950b3a
  Rostyslav Geyyer authored Jan 24, 2025
  
  86950b3a
22 Jan, 2025 4 commits
- Merge pull request #288 from ROCm/lwpck-2747 · bcef33c1
  Illia Silin authored Jan 22, 2025
```
Fix build logic when building for multiple targets, including gfx950.
```
  bcef33c1
- fix typo · 6a747f03
  illsilin authored Jan 22, 2025
  
  6a747f03
- fix typo · 108f2733
  illsilin authored Jan 22, 2025
  
  108f2733
- fic build for multiple archs · 50010cf9
  illsilin authored Jan 21, 2025
  
  50010cf9
21 Jan, 2025 1 commit
- disable CK_USE_AMD_MFMA_GFX950 by default · 74a743e2
  illsilin authored Jan 20, 2025
  
  74a743e2
20 Jan, 2025 1 commit
- only build mx example for gfx950 · 81b69ad7
  illsilin authored Jan 20, 2025
  
  81b69ad7
17 Jan, 2025 4 commits

merge from public repo · 019d4b7c
illsilin authored Jan 17, 2025

019d4b7c
Merge pull request #285 from ROCm/merge_from_public · 5063a39f
Illia Silin authored Jan 17, 2025
```
Merge from public
```
5063a39f
merge from public · 96a0d5f6
illsilin authored Jan 16, 2025

96a0d5f6

Implementing Test Filters for Smoke and Regression Tests (#1819) · 54de3e55

Aviral Goel authored Jan 16, 2025

* smoke and regression targets working with tests

* test filters work for both examples and test

* removed uneccesary comments

* added a missing comment

* added a missing comment

* fixed typo in the comments

* updated README

* Update PULL_REQUEST_TEMPLATE.md

updating the template for future addition of test cases

* Update PULL_REQUEST_TEMPLATE.md

54de3e55

16 Jan, 2025 6 commits
- Fix and optimize dynamic unary elementwise (#1818) · 1519ce91
  Bartłomiej Kocot authored Jan 16, 2025
```
* Fix and optimize dynamic unary elementwise

* fix
```
  1519ce91
- Fix test naming · 6af32900
  Rostyslav Geyyer authored Jan 16, 2025
  
  6af32900
- Add missing type aliases · 17d1e68b
  Rostyslav Geyyer authored Jan 16, 2025
  
  17d1e68b
- Add tests · d8214b28
  Rostyslav Geyyer authored Jan 16, 2025
  
  d8214b28
- Add vector support · 3a64757f
  Rostyslav Geyyer authored Jan 16, 2025
  
  3a64757f
- [CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block (#1808) · 1ff50e78
  carlushuang authored Jan 16, 2025
```
* fix mock token id

* prepare host for g1u1

* reformat inline-asm

* restructure uk_0

* restructure gate_up

* done

* change default to init=1

* update readme

* fix a bug in interleave pipeline

* rcp for silu
```
  1ff50e78
15 Jan, 2025 4 commits

disable inductor codegen tests on legacy OS (#1816) · 8c29e06f
Illia Silin authored Jan 15, 2025

8c29e06f

Add rounding for float to bf16 conversion as default (#1812) · 7790e8c3

Bartłomiej Kocot authored Jan 15, 2025

* Add rounding for float to bf16 conversion

* Add bhalf test

* Add inf test bhalf

* Refactor

* update cmake

* Fixes

7790e8c3

[CK_TILE] Add Various Fusion Functions to RMSNorm (#1802) · 04dd3148

ruanjm authored Jan 15, 2025



* Add shortcut to RMSNorm

* Modify test for adding shortcut for RMSNorm

* Add fused parameter into tests

* 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp

* 1. Supports various stride and percisions.

* Add support of Epilogue

* Add fuse and epilogue support to rmsnorm ref

* Modify rmsnorm example

* Refactor tests/examples

* Bug fix for newly added tests/examples

* Bug fix for new tests 2

* Modify smoke test scripts

remove dbg code

* Supports non-smooth dyanmic quant

* Update Rmsnorm2dFwd::GetName()

* rename xscale and prec_sx to smoothscale and prec_sm

Bug fix after rename

Remove files

* change example_rmsnorm2d_fwd.cpp

* update performance calculator

* Fix issue in two-pass when fuse add is enabled

* Remove comment of beta

---------
Co-authored-by: rocking <ChunYu.Lai@amd.com>

04dd3148

MX FP GEMM - Example Template (#277) · 07307ea1

Andriy Roshchenko authored Jan 14, 2025

Temporarily uses `DeviceGemmMultiD_ABScale_Xdl_CShuffle_V3` kernel and 128x128 scaling matrices.
Must be modified to use MX-native GEMM kernell with 16 or 32 component vectors per scale.

Verified on the emulator.

07307ea1

13 Jan, 2025 5 commits

fix parsing instances for pt inductor (#1796) · c0b90f13

Max Podkorytov authored Jan 13, 2025



add unit test for gen instances for gemms

add unit tests for conv and batched gemms

add unit test for preselected gemm instances

apply ruff lint

add license header for the unit test

add inductor pytest to CI

verbose pip install

switch the directory before installing python packages

move the inductor codegen test

try yet another workdir

Update Jenkinsfile

The directory looks right, fixing pip module not found by invoking pip directly

Update Jenkinsfile

invoke pytest directly since the module is not found

Update Dockerfile

Install setuptools

update package structure

bump setuptools

maybe fix data path for library sources

fix library search path for conv instances

fix path in pyproject definition

compare path used in gen_instances with one in pyproject.toml; fix the difference
Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>

c0b90f13

Dev/merge u8w8 (#1774) · 53ab1b90

feli authored Jan 14, 2025



* port tiles from a8w8

* rm debug used files

* add instances

* remove all non gemm in cmake

* merge; impl fp16

* recover cmake from develop

* add missed files; fix clang format

---------
Co-authored-by: coderfeli <coderfeli@163.com>

53ab1b90

CK Tile GEMM CICD fixed & register block method refactor (#1776) · 5d671a5f

Thomas Ning authored Jan 12, 2025

* refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm

* Finished the 2x2 warp gemm policy and the block selection mechanism

* Clang format

* address poyen's comment

* Address feedbacks

* Fixed the compilation issue

* Change the function name

5d671a5f

[CK_TILE] Adjust kBlockSize of reduce example for better perf (#1779) · 0b8f117f
ClementLinCF authored Jan 13, 2025
```
* Observed a 2x perf improvement with kBlockSize = 256
* Using 512 threads may lead to redundant computations
```
0b8f117f

Update for fmha_fwd qs_ks_vs pipeline (#1810) · 3d50f57f

Qianfeng authored Jan 13, 2025



* Update for fmha_fwd qs_ks_vs pipeline

* Remove _builtin_amdgcn_sched_barrier(0)

* Move p_compute to p converting earlier for trying to increase vgprs re-using

* Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation

* Re-add __builtin_amdgcn_sched_barrier(0)

---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

3d50f57f