- 29 Dec, 2024 8 commits
-
-
Po Yen Chen authored
-
Po Yen Chen authored
-
Po Yen Chen authored
-
Po Yen Chen authored
-
Po Yen Chen authored
This reverts commit 09486ebf.
-
Po Yen Chen authored
-
Po Yen Chen authored
-
Po Yen Chen authored
-
- 24 Dec, 2024 1 commit
-
-
Po Yen Chen authored
-
- 23 Dec, 2024 3 commits
-
-
Po Yen Chen authored
-
Po Yen Chen authored
-
Po Yen Chen authored
-
- 19 Dec, 2024 7 commits
-
-
Po Yen Chen authored
-
Po Yen Chen authored
-
Po Yen Chen authored
-
Po Yen Chen authored
-
Po Yen Chen authored
-
Po Yen Chen authored
-
Po Yen Chen authored
-
- 18 Dec, 2024 1 commit
-
-
aledudek authored
* Gemm Kernel Refactor part1 * Gemm Kernel Refactor common gemm pipeline part2 * [CK TILE] Refactor batched gemm to reuse GemmKernel * [CK TILE] Refactor GemmKernel - review changes part1 * [CK TILE] Refactor GemmKernel - references fix * [CK TILE] Refactor GemmKernel - naming changes, add problem * [CK_TILE] Refactor GemmKernel - update tests * [CK_TILE] Refactor GemmKernel - review changes * [CK_TILE] Refactor GemmKernel - update test * [CK_TILE] Refactor GemmKernel - constness fixes * [CK_TILE] Refactor GemmKernel - update tests
-
- 17 Dec, 2024 1 commit
-
-
Adam Osewski authored
* Added object print with all template parameters * fix clang format --------- Co-authored-by:
ravil-mobile <ravil.aviva.com@gmail.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 15 Dec, 2024 1 commit
-
-
Xu, Shengnan authored
* added moe interleaving pipeline * remove redundant code * formater --------- Co-authored-by:root <root@hjbog-srdc-14.amd.com>
-
- 13 Dec, 2024 1 commit
-
-
chenjun authored
* add ck_tile/smoothquant out stride parameter * Remove the default stride value --------- Co-authored-by: so <a.com>
-
- 12 Dec, 2024 1 commit
-
-
carlushuang authored
* add reference attention fwd * refactor addresser * update * paged, and i8 reflect-quant * lets call it forward-quant * fix error in decode variation * update naive-attn * fix page table * fix build err
-
- 06 Dec, 2024 1 commit
-
-
Po Yen Chen authored
-
- 05 Dec, 2024 1 commit
-
-
jakpiase authored
* add IsSupportedArgument to gemm_kernel * add ut and do some refactoring * switched to ck_tile's integral_constant
-
- 04 Dec, 2024 2 commits
-
-
Mateusz Ozga authored
* Ck-tile, impl. grouped gemm * Workspace is allocated by user, and is passed to the function * Prepare test to new api design * Unify GemTransKernelArgs, removing N0 param * Add 1 to dim3 in paritioner * Typo: gem - > gemm --------- Co-authored-by:Adam Osewski <19374865+aosewski@users.noreply.github.com>
-
Po Yen Chen authored
* Use 'false' for highest dimension padding flags * Update padding flag of bias
-
- 03 Dec, 2024 4 commits
-
-
Po Yen Chen authored
-
Po Yen Chen authored
-
Po Yen Chen authored
-
Po Yen Chen authored
-
- 02 Dec, 2024 1 commit
-
-
Po Yen Chen authored
-
- 30 Nov, 2024 1 commit
-
-
Bartłomiej Kocot authored
-
- 29 Nov, 2024 1 commit
-
-
aledudek authored
* [CK Tile] Batched GEMM Example * [CK Tile] Batched GEMM Example - minor refactor * [CK Tile] Batched GEMM Example - README update * [CK Tile] Batched Gemm Example - review changes - Added tensor data layours as input parameters - Changed structure of Host and Kernel args - Removed bug with invalid vector read on non-contiguous memory * [CK Tile] Batched Gemm Example - remove comment * [CK Tile] Batched Gemm Example - Add GTests part1 * [CK Tile] Batched Gemm Example - GTests part2 + review changes * [CK TILE] Batched GEMM post merge fixes * [CK Tile] Batched GEMM Example - fix pad views
-
- 28 Nov, 2024 1 commit
-
-
Bartłomiej Kocot authored
* [CK TILE] Add gemm compute pipeline v3 * Enable universal gemm compute pipeline. * Rename example and add compute pipeline. * Introduce ag bg cr pipeline impl base. * Refactor to reuse code. * Cleaning * Formatting. --------- Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by:
Adam Osewski <Adam.Osewski@amd.com>
-
- 27 Nov, 2024 1 commit
-
-
jakpiase authored
* add interwave scheduler for gemm mem pipeline * Fix merge artifacts. * Refactor unit tests. * Switch to interwave scheduler for mem example --------- Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by:
Adam Osewski <Adam.Osewski@amd.com>
-
- 26 Nov, 2024 3 commits
-
-
rocking authored
* Fix cmake example build * Support max3 in smoothquant one pass * support max3 in two pass * support max3 in add_rmsnorm_rdquant
-
Po Yen Chen authored
* Allow getting batch size from splitkv tile partitioner * Fix wrong paged-kvcache impl for group mode * Fix wrong example code for page-kvcache * Undo changes in fmha_fwd.cpp * Always use 2D block table * Add is_gappy kernel argument for paged-kvcache The is_gappy argument is used for differentiating seqstart_k_ptr usage in flash-attention & xformers * Remove out-of-date comments * Remove no-longer used method * Fix wrong # page-block calculation * Fix wrong comment --------- Co-authored-by:Qianfeng <qianfeng.zhang@amd.com>
-
Adam Osewski authored
* Block universal gemm. * Universal block gemm with interwave scheduler - draft. * Refactoring * Move a/b_warp_tiles into BlockGemmImpl * set BlockGemmImpl as a class member * Change tile size for more suitable to memory bound cases. * Introduce kKPerThread to WarpGemm * Add documentation comment. * Fix Interwave scheduler block gemm. * Add compute/memory friendly tile configuration. * Clean * New tile configurations in gemm mem example. * Add more static checks and fix loop order in block gemm. * Add more static checks and use warp gemm mfma dispatcher. * Add default scheduler block gemm. * Remove logging in example.
-