- 30 Oct, 2024 1 commit
-
-
Adam Osewski authored
* CK-Tile GEMM with memory bound pipeline. * Memory bound gemm pipeline. * Fix not closed namespace. * Block gemm mem pipeline draft. * Do not use ck_tile:: within ck_tile namespace. * Refactoring & Move Layout info to pipeline problem. * Get hot loop and TailNum information before lunching kernel. * Fixes in pipeline. * Add comment to load_tile_raw and change variable naming style. * Few small changes & formatting. * Do not use macro. * Add gtests. * Use AccDataType for Output of MFMA instruction. * Formatting. * Refactor gemm examples. * Switch over to current block gemm. * Use currently available pipeline policy. * Refactoring and review comment.s * Fixes after merge. * Add missing include. * Add load tile overload which accepts output tensor as parameter. * This give 8% perf boost at the cost of using more registers. * Rename example. * Small changes. * Fix compilation err and lower K. * Support different layouts for A/B * Fix vector size for different layouts. * Rename Alignment into VectorSize * Unblock tests.
-
- 15 Oct, 2024 1 commit
-
-
Bartłomiej Kocot authored
* [CK_TILE] Add block universal gemm pipeline policy * Fixes * fixes2 * Fixes3 * fixeS
-
- 10 Oct, 2024 1 commit
-
-
Thomas Ning authored
* ake the cshuffle compilable * modify Mhe reference on gpu and cpu. Correaccess of cshuffle * fix the cpu reference code * Complete the in tile shuffle logic * restructure the kernel template input * change the naming pattern of ck_tile gemm pipeline * Re-format files using remod.py * Solve the fmha conflict with gemm * Comment Addressed from Carlus --------- Co-authored-by:Po Yen, Chen <PoYen.Chen@amd.com>
-
- 08 Oct, 2024 1 commit
-
-
Po Yen Chen authored
* Fix text alignment of ArgParser::print() * Update example README files * Clarify make-ck-dev.sh <arch> usage * Only keep some of the argument from '-?' output * Undo command line output changes in README * Only keep existing argument on doc and update description * Fix text alignment * Make cmake-ck-*.sh compatible with 'sh' command
-
- 18 Sep, 2024 1 commit
-
-
Thomas Ning authored
* Support the N dimension padding * Finished the padding feature for different dimension of K
-
- 14 Sep, 2024 1 commit
-
-
Thomas Ning authored
* Finished the feature of gpu verification * Add the ck_tile_gemm test in the CI CD * add the include of tensor_layou in reference_gemm * Comment Addressed * split ck_tile fhma and gemm tests into separate stages * restructure the reference gemm * restructure a new reference_gemm api that could read the device mem --------- Co-authored-by:
carlushuang <carlus.huang@amd.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 07 Sep, 2024 1 commit
-
-
Thomas Ning authored
* Checkpoint: Finished with the tile example & kernel verification, working on the different matrix layout * Finished the Matrix Layout feature set up. Note: Need to modify the inner block to solve the shuffle problem in the future. * Fix: Clang Format, API fixed from fmha * fix with better naming convention * revert back the pipeline code of fmha * Fixed: Addressed the comments and merge the GEMM shape of GEMM Operator and FMHA Operator to one. * clang format with the reference_gemm file * convert the clang format with the remod.py * Changed the format and variable name of the kernel gemm_shape and partitioner --------- Co-authored-by:thomasning <thomasning@banff-cyxtera-s70-4.ctr.dcgpu>
-