1. 13 Jan, 2025 1 commit
    • Thomas Ning's avatar
      CK Tile GEMM CICD fixed & register block method refactor (#1776) · 5d671a5f
      Thomas Ning authored
      * refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm
      
      * Finished the 2x2 warp gemm policy and the block selection mechanism
      
      * Clang format
      
      * address poyen's comment
      
      * Address feedbacks
      
      * Fixed the compilation issue
      
      * Change the function name
      5d671a5f
  2. 10 Jan, 2025 1 commit
    • Thomas Ning's avatar
      Ck tile/gemm perf measure (#1750) · 73a076ee
      Thomas Ning authored
      
      
      * Finished adding the performance benchmark for ck tile gemm
      
      * Fix the executable rename problem
      
      * fix the executable name error
      
      * delete the unsupported layout combinations
      
      * Update run_full_test.sh
      
      * Update benchmark_mem_pipeline.sh
      
      * Update benchmark_basic.sh
      
      * change the executable of gemm_universal
      
      * change ck_tile_gemm script permissions
      
      * Addressed the comment
      
      * Addressed the comment
      
      * Fixed the comments
      
      * Fixed Comment
      
      * roll back the malfunctioned change
      
      * Fix the Typo
      
      * finalize the tile_gemm_fp16 performance monitoring
      
      * fix the stash names for ck_tile gemm logs
      
      * change the stashing logic
      
      * change stashing syntax
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      73a076ee
  3. 28 Dec, 2024 1 commit
  4. 18 Dec, 2024 2 commits
    • aledudek's avatar
      [CK TILE] Refactor GemmKernel to be reused by other GEMM related operators (#1730) · 453ca373
      aledudek authored
      * Gemm Kernel Refactor part1
      
      * Gemm Kernel Refactor common gemm pipeline part2
      
      * [CK TILE] Refactor batched gemm to reuse GemmKernel
      
      * [CK TILE] Refactor GemmKernel - review changes part1
      
      * [CK TILE] Refactor GemmKernel - references fix
      
      * [CK TILE] Refactor GemmKernel - naming changes, add problem
      
      * [CK_TILE] Refactor GemmKernel - update tests
      
      * [CK_TILE] Refactor GemmKernel - review changes
      
      * [CK_TILE] Refactor GemmKernel - update test
      
      * [CK_TILE] Refactor GemmKernel - constness fixes
      
      * [CK_TILE] Refactor GemmKernel - update tests
      453ca373
    • aledudek's avatar
      [CK_TILE] Move hipmalloc/memcpy calls out of gpu reference gemm (#1743) · f6c4d614
      aledudek authored
      * [CK_TILE] Move hipmalloc/memcpy calls out of gpu reference gemm
      
      * [CK_TILE] Move hipmalloc/memcpy calls out of gpu reference gemm - review changes
      
      * [CK_TILE] Move hipmalloc/memcpy calls out of gpu reference gemm - review fix
      f6c4d614
  5. 05 Dec, 2024 1 commit
  6. 28 Nov, 2024 1 commit
  7. 27 Nov, 2024 1 commit
  8. 26 Nov, 2024 1 commit
    • Adam Osewski's avatar
      CK-Tile first draft of universal block gemm with interwave & intrawave scheduler (#1676) · b6bcd76d
      Adam Osewski authored
      * Block universal gemm.
      
      * Universal block gemm with interwave scheduler - draft.
      
      * Refactoring
      
      * Move a/b_warp_tiles into BlockGemmImpl
      * set BlockGemmImpl as a class member
      
      * Change tile size for more suitable to memory bound cases.
      
      * Introduce kKPerThread to WarpGemm
      
      * Add documentation comment.
      
      * Fix Interwave scheduler block gemm.
      
      * Add compute/memory friendly tile configuration.
      
      * Clean
      
      * New tile configurations in gemm mem example.
      
      * Add more static checks and fix loop order in block gemm.
      
      * Add more static checks and use warp gemm mfma dispatcher.
      
      * Add default scheduler block gemm.
      
      * Remove logging in example.
      b6bcd76d
  9. 12 Nov, 2024 1 commit
  10. 30 Oct, 2024 1 commit
    • Adam Osewski's avatar
      [CK-Tile] Universal gemm memory bound pipeline (#1558) · 24d996aa
      Adam Osewski authored
      * CK-Tile GEMM with memory bound pipeline.
      
      * Memory bound gemm pipeline.
      
      * Fix not closed namespace.
      
      * Block gemm mem pipeline draft.
      
      * Do not use ck_tile:: within ck_tile namespace.
      
      * Refactoring & Move Layout info to pipeline problem.
      
      * Get hot loop and TailNum information before lunching kernel.
      
      * Fixes in pipeline.
      
      * Add comment to load_tile_raw and change variable naming style.
      
      * Few small changes & formatting.
      
      * Do not use macro.
      
      * Add gtests.
      
      * Use AccDataType for Output of MFMA instruction.
      
      * Formatting.
      
      * Refactor gemm examples.
      
      * Switch over to current block gemm.
      
      * Use currently available pipeline policy.
      
      * Refactoring and review comment.s
      
      * Fixes after merge.
      
      * Add missing include.
      
      * Add load tile overload which accepts output tensor as parameter.
      
      * This give 8% perf boost at the cost of using more registers.
      
      * Rename example.
      
      * Small changes.
      
      * Fix compilation err and lower K.
      
      * Support different layouts for A/B
      
      * Fix vector size for different layouts.
      
      * Rename Alignment into VectorSize
      
      * Unblock tests.
      24d996aa
  11. 15 Oct, 2024 1 commit
  12. 10 Oct, 2024 1 commit
    • Thomas Ning's avatar
      Ck tile gemm cshuffle & CK Tile GEMM restructure (#1535) · 6f27bc98
      Thomas Ning authored
      
      
      * ake the cshuffle compilable
      
      * modify Mhe reference on gpu and cpu. Correaccess of cshuffle
      
      * fix the cpu reference code
      
      * Complete the in tile shuffle logic
      
      * restructure the kernel template input
      
      * change the naming pattern of ck_tile gemm pipeline
      
      * Re-format files using remod.py
      
      * Solve the fmha conflict with gemm
      
      * Comment Addressed from Carlus
      
      ---------
      Co-authored-by: default avatarPo Yen, Chen <PoYen.Chen@amd.com>
      6f27bc98
  13. 08 Oct, 2024 1 commit
    • Po Yen Chen's avatar
      [CK_TILE] Update example README files & fix script compatibility issue (#1548) · 0c094daa
      Po Yen Chen authored
      * Fix text alignment of ArgParser::print()
      
      * Update example README files
      
      * Clarify make-ck-dev.sh <arch> usage
      
      * Only keep some of the argument from '-?' output
      
      * Undo command line output changes in README
      
      * Only keep existing argument on doc and update description
      
      * Fix text alignment
      
      * Make cmake-ck-*.sh compatible with 'sh' command
      0c094daa
  14. 18 Sep, 2024 1 commit
  15. 14 Sep, 2024 1 commit
  16. 07 Sep, 2024 1 commit
    • Thomas Ning's avatar
      Ck tile gemm example (#1488) · caacd388
      Thomas Ning authored
      
      
      * Checkpoint: Finished with the tile example & kernel verification, working on the different matrix layout
      
      * Finished the Matrix Layout feature set up. Note: Need to modify the inner block to solve the shuffle problem in the future.
      
      * Fix: Clang Format, API fixed from fmha
      
      * fix with better naming convention
      
      * revert back the pipeline code of fmha
      
      * Fixed: Addressed the comments and merge the GEMM shape of GEMM Operator and FMHA Operator to one.
      
      * clang format with the reference_gemm file
      
      * convert the clang format with the remod.py
      
      * Changed the format and variable name of the kernel gemm_shape and partitioner
      
      ---------
      Co-authored-by: default avatarthomasning <thomasning@banff-cyxtera-s70-4.ctr.dcgpu>
      caacd388