Commits · b3054feacd5af855625c8c5630949e71ad78b730 · gaoqiong / composable_kernel_ROCM

22 Jan, 2025 5 commits
- Fix VecLoadSize and TranspseC for mem pipeline. · e62670af
  Adam Osewski authored Jan 22, 2025
  
  e62670af
- Add error loging messages. · ce7753f4
  Adam Osewski authored Jan 22, 2025
  
  ce7753f4
- Fix VectorSize & refactor. · f0c85b0b
  Adam Osewski authored Jan 22, 2025
  
  f0c85b0b
- Formatting & fix IsTranspose · 23377f7b
  Adam Osewski authored Jan 22, 2025
  
  23377f7b
- add fp8 as dst (#1830) · 052a7265
  carlushuang authored Jan 22, 2025
  
  052a7265
21 Jan, 2025 12 commits
- Simplify static_cast if-lands (#1828) · 3db77bc4
  Mateusz Ozga authored Jan 21, 2025
  
  3db77bc4
- CK-Tile Grouped GEMM refactor and post PR fixes (#1756) · 3c93d3c4
  Mateusz Ozga authored Jan 21, 2025
```
* Grouped gemm simple code refactor

* Offset invoker

* Invoke generic Run, and replace name of parrtitioner variable

* Tests fix type

* Removed namespaces

* Add template param to avoid implicit cast

* Remove generic function

* Constant value

* underline enum to int16_t

* Generalize partitioner function

* Remove whitespaces

* Rename function

* Using support

* Clang-format

* Clang-format

* Fn-partitioner description fn

* Typo

* Typo 2

* Better description

* Better description

* Refactor after review

* Use ctr instead of set fn

* Inovke ctr and typo

* Comments

* Remove unnecessary comment

* Review, remove modulo
```
  3c93d3c4
- Fix static assert. · c6dcf20d
  Adam Osewski authored Jan 21, 2025
  
  c6dcf20d
- Update B LDS layout and setup tile distribution pattern at class level. · 465f8e6a
  Adam Osewski authored Jan 21, 2025
  
  465f8e6a
- A/B smem pack size taken from WarpGemm attributes · 69d6660c
  Adam Osewski authored Jan 21, 2025
  
  69d6660c
- Take contiguous dim size when calculating dram vector load size. · e0d67738
  Adam Osewski authored Jan 21, 2025
  
  e0d67738
- Transpose A/B register tile if needed for comp v3 pipeline. · d79d1a38
  Adam Osewski authored Jan 21, 2025
  
  d79d1a38
- Enable reading on contiguous dimension in all layouts. · 69b6d2ab
  Adam Osewski authored Jan 21, 2025
  
  69b6d2ab
- Small refactoring + doc · bd5008af
  Adam Osewski authored Jan 21, 2025
  
  bd5008af
- Add transpose_tile2d · bee700b0
  Adam Osewski authored Jan 21, 2025
  
  bee700b0
- Fix err in reverse tuple. · b0bf4912
  Adam Osewski authored Jan 21, 2025
  
  b0bf4912
- Adding shuffled encoding patterns. · 57e6fd46
  Adam Osewski authored Jan 21, 2025
  
  57e6fd46
18 Jan, 2025 1 commit
- [CK_TILE] Add error threshold calculation for gemm examples (#1821) · bdddf1ea
  Bartłomiej Kocot authored Jan 18, 2025
  
  bdddf1ea
16 Jan, 2025 1 commit

[CK_TILE] Fix mock token id, support g1u1/g1u0 through same inline code block (#1808) · 1ff50e78

carlushuang authored Jan 16, 2025

* fix mock token id

* prepare host for g1u1

* reformat inline-asm

* restructure uk_0

* restructure gate_up

* done

* change default to init=1

* update readme

* fix a bug in interleave pipeline

* rcp for silu

1ff50e78

15 Jan, 2025 1 commit

[CK_TILE] Add Various Fusion Functions to RMSNorm (#1802) · 04dd3148

ruanjm authored Jan 15, 2025



* Add shortcut to RMSNorm

* Modify test for adding shortcut for RMSNorm

* Add fused parameter into tests

* 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp

* 1. Supports various stride and percisions.

* Add support of Epilogue

* Add fuse and epilogue support to rmsnorm ref

* Modify rmsnorm example

* Refactor tests/examples

* Bug fix for newly added tests/examples

* Bug fix for new tests 2

* Modify smoke test scripts

remove dbg code

* Supports non-smooth dyanmic quant

* Update Rmsnorm2dFwd::GetName()

* rename xscale and prec_sx to smoothscale and prec_sm

Bug fix after rename

Remove files

* change example_rmsnorm2d_fwd.cpp

* update performance calculator

* Fix issue in two-pass when fuse add is enabled

* Remove comment of beta

---------
Co-authored-by: rocking <ChunYu.Lai@amd.com>

04dd3148

13 Jan, 2025 2 commits

CK Tile GEMM CICD fixed & register block method refactor (#1776) · 5d671a5f

Thomas Ning authored Jan 12, 2025

* refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm

* Finished the 2x2 warp gemm policy and the block selection mechanism

* Clang format

* address poyen's comment

* Address feedbacks

* Fixed the compilation issue

* Change the function name

5d671a5f

Update for fmha_fwd qs_ks_vs pipeline (#1810) · 3d50f57f

Qianfeng authored Jan 13, 2025



* Update for fmha_fwd qs_ks_vs pipeline

* Remove _builtin_amdgcn_sched_barrier(0)

* Move p_compute to p converting earlier for trying to increase vgprs re-using

* Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation

* Re-add __builtin_amdgcn_sched_barrier(0)

---------
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

3d50f57f

09 Jan, 2025 1 commit
- Introduce static encoding pattern · c400e5b3
  Adam Osewski authored Jan 09, 2025
  
  c400e5b3
08 Jan, 2025 11 commits
- mark unused args · ad697c78
  Max Podkorytov authored Jan 07, 2025
  
  ad697c78
- run clang-format -style=file · a2e6ad62
  Max Podkorytov authored Jan 07, 2025
  
  a2e6ad62
- run clang-format==12 · aa59ecaa
  Max Podkorytov authored Dec 19, 2024
  
  aa59ecaa
- update comment in the policy · 82fb3f84
  Max Podkorytov authored Dec 19, 2024
  
  82fb3f84
- update qsksvs comment · 4daa82b4
  Max Podkorytov authored Dec 19, 2024
  
  4daa82b4
- remove dead code · 66c5b715
  Max Podkorytov authored Dec 19, 2024
  
  66c5b715
- clang-format and remove dead code · edb78a47
  Max Podkorytov authored Dec 19, 2024
  
  edb78a47
- roll back splitkv · 60113859
  Max Podkorytov authored Dec 18, 2024
  
  60113859
- update qsksvs pipeline · bfc997a7
  Max Podkorytov authored Dec 18, 2024
  
  bfc997a7
- qsksvs pipeline changes to mirror qrksvs · f7942b99
  Max Podkorytov authored Dec 17, 2024
  
  f7942b99
- enable bias feature that add bias before adding residual (for rtpllm project) (#1741) · d5c8a334
  AMD-dteng authored Jan 08, 2025
```
* 1. enable bias feature that add bias before adding residual; 2. change block size from 128->64 when m<64 in fp16

* delete comment

* 1.remove fmha change 2.change buffer name from bias to xbias

* Now bias can be used independently from fadd

* change kbias to kxbias

---------
Co-authored-by: feli <felix.li@amd.com>
```
  d5c8a334
07 Jan, 2025 1 commit

[CK_TILE] fmha fwd splitkv optimization for decode (seqlen_q=1) (#1789) · 24b12d04

Po Yen Chen authored Jan 07, 2025



* Update license year

* Add initial code to override decode problem

* Fix splitkv traits/args overriding error

* Reshape and transpose lse for decode

* Remove debug code

* Prettify example code

* Use better function name

* Add kMergeNumHeadGroupsSeqLenQ flag

Kernel user can use this switch to turn on/off optimization for
some problem sizes

* Add missing flag declarations

* Default turn off kMergeNumHeadGroupsSeqLenQ in codegen

* Group similar statements together

* Remove assumption of seqlen_q=1

* Remove kMergeNumHeadGroupsSeqLenQ from splitkv combine kernel

* Support kMergeNumHeadGroupsSeqLenQ=true in fmha splitkv kernel

* Run kMergeNumHeadGroupsSeqLenQ=true kernels when need

* Fix group mode block skip logics

* Undo changes of normal fwd kernel

* Update in GridSize() and using GridSize() for splitkv kernel (#1799)

---------
Co-authored-by: Qianfeng <qianfeng.zhang@amd.com>

24b12d04

03 Jan, 2025 2 commits

[CK_TILE]naive attn support FP8 KVCache quant (#1747) · 6df5fe2a

carlushuang authored Jan 03, 2025



* quant

* fix bug

* simple smoothquant after softmax

* update kv-quant

* update stride

* fix fp8-pertoken-kvcache

* update int8/fp8 quant support

---------

Co-authored-by: so <a.com>
Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

6df5fe2a

Ck tile/layernorm: implement naive reduce, opt performance (#1784) · 4bc61041

feli authored Jan 03, 2025



* add no welford

* enable output raw

* raw of int8

* fix build

* fix smoke test err

* [ck_tile]layernorm: fix welford ok, set int8 and bf16 small N as default and others open by generate

* [cktile]layernorm, fix err commit files and remove uselss

* fix quant 8192 err & change norm_reduce class and file name

---------
Co-authored-by: coderfeli <coderfeli@163.com>
Co-authored-by: carlushuang <carlus.huang@amd.com>

4bc61041

29 Dec, 2024 1 commit

Remove using partitioner for all fmha kernels (#1778) · 4e076909

Qianfeng authored Dec 29, 2024

* Remove using tile partitioner for fmha_fwd_kernel

* Remove using tile partitioner for fmha_fwd_splitkv and splitkv-combine kernels

* Remove using tile partitioner for fmha_fwd_appendkv kernel

* Unify the format of GetTileIndex

4e076909

28 Dec, 2024 1 commit

[CK TILE] GEMM and Batched GEMM SplitK support (#1724) · af664948

Bartłomiej Kocot authored Dec 28, 2024

* [CK TILE] Add split K support in GEMM

* Updates

* Fixes

* rebase

* fix

* Fix

* fixes

* support for batched gemm

af664948

23 Dec, 2024 1 commit
- [CK_TILE] optimize moe-sorting kernel (#1771) · 3d15f364
  carlushuang authored Dec 23, 2024
```
* opt moe sorting

* remove commented code
```
  3d15f364