- 07 Feb, 2025 1 commit
-
-
jakpiase authored
-
- 04 Feb, 2025 1 commit
-
-
Bartłomiej Kocot authored
* Fix pk_int4 cast and add pk_int4 dtype in ck tile * fixes * Improvements * fix typo
-
- 30 Jan, 2025 2 commits
-
-
Adam Osewski authored
* Add spatially local tile partitioner * Use 1D Grid size & create partitioner object. * Docs & use 1D partitioner in example. * Clang format. * Change kernel grid size Now: X is the # of output C-tiles, Y is the batch count Z is the splitK * Formatting & more doc. * Clang format. * Fix batched gemm test. Use 1d partitioner. * Move condition. * FIx ctor. * clang-format. -
Bartłomiej Kocot authored
* [CK TILE] Implement cschuflle algorithm * Rebase * Vector store size fixes * fixes * Fixes * fixes * fmha fix * fixes * fixes of fixes
-
- 27 Jan, 2025 1 commit
-
-
Adam Osewski authored
* Refactor universal gemm policy. * Adapt example to refactor changes. * Introduce static encoding pattern * Adding shuffled encoding patterns. * Fix err in reverse tuple. * Add transpose_tile2d * Small refactoring + doc * Enable reading on contiguous dimension in all layouts. * Transpose A/B register tile if needed for comp v3 pipeline. * Take contiguous dim size when calculating dram vector load size. * A/B smem pack size taken from WarpGemm attributes * Update B LDS layout and setup tile distribution pattern at class level. * Fix static assert. * Fix errors in examples. * Formatting & fix IsTranspose * Fix VectorSize & refactor. * Add error loging messages. * Fix VecLoadSize and TranspseC for mem pipeline. * Update unit-tests & disable mem pipeline. * Clang format * Update include/ck_tile/core/tensor/tile_window.hpp Co-authored-by:
jakpiase <jakub.piasecki@amd.com> * Fix compilation and reviewers comments. * Refactor unit-test. Fallback to non-universal gemm. Need to use GemmPipelineAGmemBGmemCRegV1 for now, since GemmKernel is now supporting also non-K major vector reads. --------- Co-authored-by:
jakpiase <jakub.piasecki@amd.com>
-
- 21 Jan, 2025 1 commit
-
-
Mateusz Ozga authored
* Grouped gemm simple code refactor * Offset invoker * Invoke generic Run, and replace name of parrtitioner variable * Tests fix type * Removed namespaces * Add template param to avoid implicit cast * Remove generic function * Constant value * underline enum to int16_t * Generalize partitioner function * Remove whitespaces * Rename function * Using support * Clang-format * Clang-format * Fn-partitioner description fn * Typo * Typo 2 * Better description * Better description * Refactor after review * Use ctr instead of set fn * Inovke ctr and typo * Comments * Remove unnecessary comment * Review, remove modulo
-
- 28 Dec, 2024 1 commit
-
-
Bartłomiej Kocot authored
* [CK TILE] Add split K support in GEMM * Updates * Fixes * rebase * fix * Fix * fixes * support for batched gemm
-
- 18 Dec, 2024 1 commit
-
-
aledudek authored
* Gemm Kernel Refactor part1 * Gemm Kernel Refactor common gemm pipeline part2 * [CK TILE] Refactor batched gemm to reuse GemmKernel * [CK TILE] Refactor GemmKernel - review changes part1 * [CK TILE] Refactor GemmKernel - references fix * [CK TILE] Refactor GemmKernel - naming changes, add problem * [CK_TILE] Refactor GemmKernel - update tests * [CK_TILE] Refactor GemmKernel - review changes * [CK_TILE] Refactor GemmKernel - update test * [CK_TILE] Refactor GemmKernel - constness fixes * [CK_TILE] Refactor GemmKernel - update tests
-
- 17 Dec, 2024 1 commit
-
-
jakpiase authored
-
- 05 Dec, 2024 1 commit
-
-
jakpiase authored
* add IsSupportedArgument to gemm_kernel * add ut and do some refactoring * switched to ck_tile's integral_constant
-
- 04 Dec, 2024 1 commit
-
-
Mateusz Ozga authored
* Ck-tile, impl. grouped gemm * Workspace is allocated by user, and is passed to the function * Prepare test to new api design * Unify GemTransKernelArgs, removing N0 param * Add 1 to dim3 in paritioner * Typo: gem - > gemm --------- Co-authored-by:Adam Osewski <19374865+aosewski@users.noreply.github.com>
-
- 29 Nov, 2024 1 commit
-
-
aledudek authored
* [CK Tile] Batched GEMM Example * [CK Tile] Batched GEMM Example - minor refactor * [CK Tile] Batched GEMM Example - README update * [CK Tile] Batched Gemm Example - review changes - Added tensor data layours as input parameters - Changed structure of Host and Kernel args - Removed bug with invalid vector read on non-contiguous memory * [CK Tile] Batched Gemm Example - remove comment * [CK Tile] Batched Gemm Example - Add GTests part1 * [CK Tile] Batched Gemm Example - GTests part2 + review changes * [CK TILE] Batched GEMM post merge fixes * [CK Tile] Batched GEMM Example - fix pad views
-
- 27 Nov, 2024 1 commit
-
-
jakpiase authored
* add interwave scheduler for gemm mem pipeline * Fix merge artifacts. * Refactor unit tests. * Switch to interwave scheduler for mem example --------- Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by:
Adam Osewski <Adam.Osewski@amd.com>
-
- 12 Nov, 2024 1 commit
-
-
Thomas Ning authored
* Finished the feature * Modified the test file * Test case update * addresss comment * Addressed the review comment * Fixed the CI error
-
- 30 Oct, 2024 1 commit
-
-
Adam Osewski authored
* CK-Tile GEMM with memory bound pipeline. * Memory bound gemm pipeline. * Fix not closed namespace. * Block gemm mem pipeline draft. * Do not use ck_tile:: within ck_tile namespace. * Refactoring & Move Layout info to pipeline problem. * Get hot loop and TailNum information before lunching kernel. * Fixes in pipeline. * Add comment to load_tile_raw and change variable naming style. * Few small changes & formatting. * Do not use macro. * Add gtests. * Use AccDataType for Output of MFMA instruction. * Formatting. * Refactor gemm examples. * Switch over to current block gemm. * Use currently available pipeline policy. * Refactoring and review comment.s * Fixes after merge. * Add missing include. * Add load tile overload which accepts output tensor as parameter. * This give 8% perf boost at the cost of using more registers. * Rename example. * Small changes. * Fix compilation err and lower K. * Support different layouts for A/B * Fix vector size for different layouts. * Rename Alignment into VectorSize * Unblock tests.
-
- 27 Sep, 2024 1 commit
-
-
Bartłomiej Kocot authored
* [CK_TILE] Image to Column kernel * Fixes * Vector loads and stores * Fixes * Fixes * change test dir name
-