1. 27 Nov, 2024 2 commits
    • Illia Silin's avatar
    • Adam Osewski's avatar
      Polished Grouped GEMM APIs and new BF16 instances (#1600) · 061ac064
      Adam Osewski authored
      * Few small fixes.
      
      * New GroupedGemm instances (BF16)
      
      * Unify and refactor GroupedGEMM device API.
      
      * Adapt changes to new API.
      
      * Adapt grouped gemm profiler.
      
      * Accept multiple kbatches for grouped gemm profiler.
      
      - delete obsolete two stage as it is now covered by grouped gemm
      
      * Update unit test for grouped gemm.
      
      * Fix thresholds for BF16 and F8. Unblock tests.
      
      * Fix few instances.
      
      * Multiple small fixes.
      
      * Adapt to new API, check dynamic casting.
      
      * Uncomment few data types in grouped gemm profiler.
      
      * Fix call to SetDeviceArgs.
      
      * Fix profile grouped gemm multiply tile loop.
      
      * Fix grouped gemm tile loop kernel args in client examples.
      
      * Review comments.
      061ac064
  2. 26 Nov, 2024 7 commits
  3. 25 Nov, 2024 4 commits
  4. 23 Nov, 2024 2 commits
  5. 22 Nov, 2024 2 commits
  6. 21 Nov, 2024 3 commits
  7. 20 Nov, 2024 2 commits
    • Illia Silin's avatar
      Optimize docker file. (#1679) · d31e8249
      Illia Silin authored
      * reduce the docker image size and layers
      
      * clean up docker file
      
      * fix linker error for client example 24
      
      * install CK into the default /opt/rocm/ path
      
      * restore installing CK to alternative path in CI
      
      * add linking for utility lib
      d31e8249
    • Haocong WANG's avatar
      fix bug (#1680) · 81ec5eff
      Haocong WANG authored
      81ec5eff
  8. 19 Nov, 2024 2 commits
  9. 18 Nov, 2024 2 commits
  10. 15 Nov, 2024 3 commits
  11. 14 Nov, 2024 2 commits
  12. 13 Nov, 2024 3 commits
  13. 12 Nov, 2024 2 commits
  14. 11 Nov, 2024 3 commits
  15. 09 Nov, 2024 1 commit
    • dummycoderfe's avatar
      Ck tile/moe sorting (#1624) · bec6fbc6
      dummycoderfe authored
      
      
      * add moe_sorting & check ok
      
      * fix comments & typo
      
      * Run remod.py under include/ck_tile & example/ck_tile directories
      
      * format codes
      
      * fix output ci check bug
      
      * fix moe sorting readme and error commit file
      
      * use magiv div to accelerate compute
      
      * add an loop unroll for moe lds ops
      
      * add extblocksnel to set zeros for moebufs
      
      * [Ck_tile] moe set zero run ok, add size check and fix ref check
      
      * [Ck_tile]fix moe_sorting fuse set_zero remod
      
      * [Ck_tile] change name style, fix zero buffer size err, change folder
      
      * [Ck_tile] moe_sorting: fix name style
      
      * [Ck_tile] moe_sorting, remove useless params in traits
      
      * [Ck_tile] change outputtile cnt * unit_size; change output buf alloc
      
      ---------
      Co-authored-by: default avatardummycoderfe <noplydummmycoder@163.com>
      Co-authored-by: default avatarPo Yen, Chen <PoYen.Chen@amd.com>
      Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
      bec6fbc6