1. 22 Nov, 2024 3 commits
  2. 21 Nov, 2024 2 commits
  3. 20 Nov, 2024 1 commit
  4. 15 Nov, 2024 2 commits
  5. 14 Nov, 2024 1 commit
  6. 12 Nov, 2024 1 commit
  7. 08 Nov, 2024 1 commit
  8. 07 Nov, 2024 1 commit
  9. 06 Nov, 2024 2 commits
  10. 05 Nov, 2024 2 commits
  11. 04 Nov, 2024 2 commits
  12. 01 Nov, 2024 1 commit
    • Illia Silin's avatar
      Reduce build time. (#1621) · 03c6448b
      Illia Silin authored
      * disable fp8 gemm_universal on gfx90a and gfx908 by default
      
      * fix cmake syntax
      
      * fix clang format
      
      * add ifdefs in amd_xdlops
      
      * disable fp8 gemm instances on gfx90a by default
      
      * update readme
      03c6448b
  13. 30 Oct, 2024 4 commits
    • Andriy Roshchenko's avatar
    • Andriy Roshchenko's avatar
    • Rostyslav Geyyer's avatar
      Add conversion tests · d3c89355
      Rostyslav Geyyer authored
      d3c89355
    • Adam Osewski's avatar
      [CK-Tile] Universal gemm memory bound pipeline (#1558) · 24d996aa
      Adam Osewski authored
      * CK-Tile GEMM with memory bound pipeline.
      
      * Memory bound gemm pipeline.
      
      * Fix not closed namespace.
      
      * Block gemm mem pipeline draft.
      
      * Do not use ck_tile:: within ck_tile namespace.
      
      * Refactoring & Move Layout info to pipeline problem.
      
      * Get hot loop and TailNum information before lunching kernel.
      
      * Fixes in pipeline.
      
      * Add comment to load_tile_raw and change variable naming style.
      
      * Few small changes & formatting.
      
      * Do not use macro.
      
      * Add gtests.
      
      * Use AccDataType for Output of MFMA instruction.
      
      * Formatting.
      
      * Refactor gemm examples.
      
      * Switch over to current block gemm.
      
      * Use currently available pipeline policy.
      
      * Refactoring and review comment.s
      
      * Fixes after merge.
      
      * Add missing include.
      
      * Add load tile overload which accepts output tensor as parameter.
      
      * This give 8% perf boost at the cost of using more registers.
      
      * Rename example.
      
      * Small changes.
      
      * Fix compilation err and lower K.
      
      * Support different layouts for A/B
      
      * Fix vector size for different layouts.
      
      * Rename Alignment into VectorSize
      
      * Unblock tests.
      24d996aa
  14. 29 Oct, 2024 1 commit
  15. 21 Oct, 2024 1 commit
  16. 18 Oct, 2024 1 commit
  17. 16 Oct, 2024 1 commit
  18. 15 Oct, 2024 3 commits
  19. 14 Oct, 2024 1 commit
  20. 11 Oct, 2024 2 commits
  21. 10 Oct, 2024 1 commit
  22. 07 Oct, 2024 2 commits
  23. 27 Sep, 2024 1 commit
  24. 20 Sep, 2024 1 commit
  25. 17 Sep, 2024 1 commit
  26. 16 Sep, 2024 1 commit