- 24 Oct, 2024 2 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 23 Oct, 2024 3 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 22 Oct, 2024 2 commits
-
-
Andriy Roshchenko authored
-
illsilin authored
-
- 21 Oct, 2024 3 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
illsilin authored
-
- 19 Oct, 2024 1 commit
-
-
Andriy Roshchenko authored
-
- 18 Oct, 2024 2 commits
-
-
Andriy Roshchenko authored
-
illsilin authored
-
- 16 Oct, 2024 5 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 15 Oct, 2024 4 commits
-
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 14 Oct, 2024 2 commits
-
-
illsilin authored
-
Illia Silin authored
Merge from public
-
- 11 Oct, 2024 5 commits
-
-
illsilin authored
-
Andriy Roshchenko authored
-
Illia Silin authored
-
Andriy Roshchenko authored
-
Andriy Roshchenko authored
-
- 10 Oct, 2024 5 commits
-
-
Illia Silin authored
-
spolifroni-amd authored
-
Andriy Roshchenko authored
-
Rostyslav Geyyer authored
-
Thomas Ning authored
* ake the cshuffle compilable * modify Mhe reference on gpu and cpu. Correaccess of cshuffle * fix the cpu reference code * Complete the in tile shuffle logic * restructure the kernel template input * change the naming pattern of ck_tile gemm pipeline * Re-format files using remod.py * Solve the fmha conflict with gemm * Comment Addressed from Carlus --------- Co-authored-by:Po Yen, Chen <PoYen.Chen@amd.com>
-
- 09 Oct, 2024 3 commits
-
-
Illia Silin authored
-
Illia Silin authored
-
Christopher Millette authored
-
- 08 Oct, 2024 3 commits
-
-
Rostyslav Geyyer authored
* Add a gpu gemm reference kernel * Switch to gpu reference in gemm examples * Remove redundant arguments * Update all related examples * Update more examples * Try less threads per block * Try even less threads per block * Add support for all matrix layouts * Increase block size * Clean up * Remove hardcoded strides * Clean up * Try a column-major case * Revert back to row-major * Run both CPU and GPU veriffication --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
Po Yen Chen authored
* Fix text alignment of ArgParser::print() * Update example README files * Clarify make-ck-dev.sh <arch> usage * Only keep some of the argument from '-?' output * Undo command line output changes in README * Only keep existing argument on doc and update description * Fix text alignment * Make cmake-ck-*.sh compatible with 'sh' command
-
Qianfeng authored
* Simplify the codes in splitkv_combine pipeline * Always set kPadSeqLenK=true for fmha splitkv kernels * Change in Oacc Alignment and TileDistribution to be more adaptable to tile sizes --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-