1. 07 Feb, 2025 1 commit
  2. 06 Feb, 2025 1 commit
  3. 05 Feb, 2025 3 commits
  4. 04 Feb, 2025 1 commit
  5. 03 Feb, 2025 1 commit
  6. 31 Jan, 2025 1 commit
  7. 30 Jan, 2025 3 commits
  8. 29 Jan, 2025 1 commit
  9. 27 Jan, 2025 3 commits
    • ThomasNing's avatar
    • Andriy Roshchenko's avatar
      Add OCP FP8 support in CK_TILE (#1829) · 35aebe59
      Andriy Roshchenko authored
      * Add OCP FP8 to CK_TILE
      
      * Validate OCP FP8 in FMHA FWD under VALID=1
      35aebe59
    • Adam Osewski's avatar
      [CK-Tile] Enable vectorized reads on all layouts & improve perf. (#1835) · 39dc25a9
      Adam Osewski authored
      
      
      * Refactor universal gemm policy.
      
      * Adapt example to refactor changes.
      
      * Introduce static encoding pattern
      
      * Adding shuffled encoding patterns.
      
      * Fix err in reverse tuple.
      
      * Add transpose_tile2d
      
      * Small refactoring + doc
      
      * Enable reading on contiguous dimension in all layouts.
      
      * Transpose A/B register tile if needed for comp v3 pipeline.
      
      * Take contiguous dim size when calculating dram vector load size.
      
      * A/B smem pack size taken from WarpGemm attributes
      
      * Update B LDS layout and setup tile distribution pattern at class level.
      
      * Fix static assert.
      
      * Fix errors in examples.
      
      * Formatting & fix IsTranspose
      
      * Fix VectorSize & refactor.
      
      * Add error loging messages.
      
      * Fix VecLoadSize and TranspseC for mem pipeline.
      
      * Update unit-tests & disable mem pipeline.
      
      * Clang format
      
      * Update include/ck_tile/core/tensor/tile_window.hpp
      Co-authored-by: default avatarjakpiase <jakub.piasecki@amd.com>
      
      * Fix compilation and reviewers comments.
      
      * Refactor unit-test. Fallback to non-universal gemm.
      
      Need to use GemmPipelineAGmemBGmemCRegV1 for now,
      since GemmKernel is now supporting also non-K major vector reads.
      
      ---------
      Co-authored-by: default avatarjakpiase <jakub.piasecki@amd.com>
      39dc25a9
  10. 24 Jan, 2025 3 commits
  11. 23 Jan, 2025 1 commit
  12. 22 Jan, 2025 1 commit
  13. 21 Jan, 2025 2 commits
    • Mateusz Ozga's avatar
      Simplify static_cast if-lands (#1828) · 3db77bc4
      Mateusz Ozga authored
      3db77bc4
    • Mateusz Ozga's avatar
      CK-Tile Grouped GEMM refactor and post PR fixes (#1756) · 3c93d3c4
      Mateusz Ozga authored
      * Grouped gemm simple code refactor
      
      * Offset invoker
      
      * Invoke generic Run, and replace name of parrtitioner variable
      
      * Tests fix type
      
      * Removed namespaces
      
      * Add template param to avoid implicit cast
      
      * Remove generic function
      
      * Constant value
      
      * underline enum to int16_t
      
      * Generalize partitioner function
      
      * Remove whitespaces
      
      * Rename function
      
      * Using support
      
      * Clang-format
      
      * Clang-format
      
      * Fn-partitioner description fn
      
      * Typo
      
      * Typo 2
      
      * Better description
      
      * Better description
      
      * Refactor after review
      
      * Use ctr instead of set fn
      
      * Inovke ctr and typo
      
      * Comments
      
      * Remove unnecessary comment
      
      * Review, remove modulo
      3c93d3c4
  14. 18 Jan, 2025 1 commit
  15. 17 Jan, 2025 2 commits
  16. 16 Jan, 2025 1 commit
  17. 15 Jan, 2025 1 commit
    • ruanjm's avatar
      [CK_TILE] Add Various Fusion Functions to RMSNorm (#1802) · 04dd3148
      ruanjm authored
      
      
      * Add shortcut to RMSNorm
      
      * Modify test for adding shortcut for RMSNorm
      
      * Add fused parameter into tests
      
      * 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp
      
      * 1. Supports various stride and percisions.
      
      * Add support of Epilogue
      
      * Add fuse and epilogue support to rmsnorm ref
      
      * Modify rmsnorm example
      
      * Refactor tests/examples
      
      * Bug fix for newly added tests/examples
      
      * Bug fix for new tests 2
      
      * Modify smoke test scripts
      
      remove dbg code
      
      * Supports non-smooth dyanmic quant
      
      * Update Rmsnorm2dFwd::GetName()
      
      * rename xscale and prec_sx to smoothscale and prec_sm
      
      Bug fix after rename
      
      Remove files
      
      * change example_rmsnorm2d_fwd.cpp
      
      * update performance calculator
      
      * Fix issue in two-pass when fuse add is enabled
      
      * Remove comment of beta
      
      ---------
      Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
      04dd3148
  18. 13 Jan, 2025 2 commits
    • Thomas Ning's avatar
      CK Tile GEMM CICD fixed & register block method refactor (#1776) · 5d671a5f
      Thomas Ning authored
      * refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm
      
      * Finished the 2x2 warp gemm policy and the block selection mechanism
      
      * Clang format
      
      * address poyen's comment
      
      * Address feedbacks
      
      * Fixed the compilation issue
      
      * Change the function name
      5d671a5f
    • Qianfeng's avatar
      Update for fmha_fwd qs_ks_vs pipeline (#1810) · 3d50f57f
      Qianfeng authored
      
      
      * Update for fmha_fwd qs_ks_vs pipeline
      
      * Remove _builtin_amdgcn_sched_barrier(0)
      
      * Move p_compute to p converting earlier for trying to increase vgprs re-using
      
      * Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation
      
      * Re-add __builtin_amdgcn_sched_barrier(0)
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      3d50f57f
  19. 08 Jan, 2025 11 commits