1. 18 Oct, 2024 2 commits
  2. 16 Oct, 2024 1 commit
    • Qianfeng's avatar
      [CK_TILE] Improve headdim96 performance for fmha-bwd (#1573) · 14c3cfb1
      Qianfeng authored
      
      
      * Add kQKHeaddimForGemmN and kVHeaddimForGemmN in order to support headdim 96
      
      * Remove the using of MakeKRegBlockDescriptor and MakeVRegBlockDescriptor
      
      * Fix in bwd_piple_default_policy
      
      * Remove kQKHeaddim and rename kQKHeaddimForGemmN to kQKHeaddim in the bwd kernel and pipelines
      
      * Replace kVHeaddimForGemmN by kVHeaddim and kDoDvHeaddim
      
      * Update to hd96 tile settings
      
      * Add smoke test scripts for fmha-bwd hd96
      
      * Revert "Add smoke test scripts for fmha-bwd hd96"
      
      This reverts commit 7ca7e1a93dc65eb99ce3ff4e82693589830e42a2.
      
      * Remove hd96 tile settings in fmha_bwd codegen to save compiling
      
      * Fix lost code line in bwd_pipeline_default_policy
      
      * Merge kDoDvHeaddim/kPadHeadDimDoDv to kVHeaddim/kPadHeadDimV and remove TileFmhaBwdTraits
      
      * Rename KRegSliceBlockDescriptor/VRegSliceBlockDescriptor to KRegBlockDescriptor/VRegBlockDescriptor
      
      * tiny adjustments
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      Co-authored-by: default avatardanyao12 <Dan.Yao@amd.com>
      14c3cfb1
  3. 15 Oct, 2024 3 commits
  4. 14 Oct, 2024 3 commits
  5. 12 Oct, 2024 1 commit
  6. 11 Oct, 2024 1 commit
  7. 10 Oct, 2024 4 commits
  8. 09 Oct, 2024 3 commits
  9. 08 Oct, 2024 3 commits
    • Rostyslav Geyyer's avatar
      Add a gpu gemm reference kernel (#1528) · aa932445
      Rostyslav Geyyer authored
      
      
      * Add a gpu gemm reference kernel
      
      * Switch to gpu reference in gemm examples
      
      * Remove redundant arguments
      
      * Update all related examples
      
      * Update more examples
      
      * Try less threads per block
      
      * Try even less threads per block
      
      * Add support for all matrix layouts
      
      * Increase block size
      
      * Clean up
      
      * Remove hardcoded strides
      
      * Clean up
      
      * Try a column-major case
      
      * Revert back to row-major
      
      * Run both CPU and GPU veriffication
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      aa932445
    • Po Yen Chen's avatar
      [CK_TILE] Update example README files & fix script compatibility issue (#1548) · 0c094daa
      Po Yen Chen authored
      * Fix text alignment of ArgParser::print()
      
      * Update example README files
      
      * Clarify make-ck-dev.sh <arch> usage
      
      * Only keep some of the argument from '-?' output
      
      * Undo command line output changes in README
      
      * Only keep existing argument on doc and update description
      
      * Fix text alignment
      
      * Make cmake-ck-*.sh compatible with 'sh' command
      0c094daa
    • Qianfeng's avatar
      [CK_TILE] Simplify the codes in splitkv_combine pipeline (#1549) · 74d68e3b
      Qianfeng authored
      
      
      * Simplify the codes in splitkv_combine pipeline
      
      * Always set kPadSeqLenK=true for fmha splitkv kernels
      
      * Change in Oacc Alignment and TileDistribution to be more adaptable to tile sizes
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      74d68e3b
  10. 07 Oct, 2024 4 commits
  11. 04 Oct, 2024 3 commits
  12. 02 Oct, 2024 2 commits
  13. 01 Oct, 2024 4 commits
  14. 27 Sep, 2024 1 commit
  15. 26 Sep, 2024 1 commit
  16. 25 Sep, 2024 3 commits
  17. 24 Sep, 2024 1 commit