1. 13 Feb, 2025 1 commit
  2. 10 Feb, 2025 1 commit
    • Mingtao Gu's avatar
      Added Int4 mixed batch gemm support (#1839) · d9f1ead3
      Mingtao Gu authored
      
      
      * remove redundant kernels.
      
      * added batched_gemm_xdl_fp16int4_b_scale_v3
      
      * Enabled the split K.
      
      * added the batched_gemm_b_scale ckProfiler, meet function issue
      
      * fix some typo
      
      * fix ckProfiler build issue
      
      * fix some bugs
      
      * updated some debug info
      
      * comment some code
      
      * Fix
      
      * fixed some bugs and refactor the code
      
      * fixed a function bug.
      
      * formatted files.
      
      * formatted
      
      * uncommented the ckProfiler CMakeLists
      
      * fixed.
      
      * fix ckProfiler for batched_gemm_b_scale
      
      ---------
      Co-authored-by: default avatarmtgu0705 <mtgu@amd.com>
      Co-authored-by: default avataraska-0096 <haocwang@amd.com>
      Co-authored-by: default avatarBartlomiej Kocot <barkocot@amd.com>
      d9f1ead3
  3. 05 Feb, 2025 1 commit
  4. 20 Jan, 2025 1 commit
    • deepsek's avatar
      Added bf16 instances grouped gemm fixed nk (#1825) · e7dce4d2
      deepsek authored
      * Feat: Add bf16 input instances
      
      * feat: Add BF16 profiler code
      
      * fix: reorder enum types
      
      * fix: CI fail due to clang-format
      
      * fix: clang script format issue
      
      * fix: clang format broke cmakelist file
      e7dce4d2
  5. 17 Jan, 2025 1 commit
  6. 13 Jan, 2025 1 commit
    • feli's avatar
      Dev/merge u8w8 (#1774) · 53ab1b90
      feli authored
      
      
      * port tiles from a8w8
      
      * rm debug used files
      
      * add instances
      
      * remove all non gemm in cmake
      
      * merge; impl fp16
      
      * recover cmake from develop
      
      * add missed files; fix clang format
      
      ---------
      Co-authored-by: default avatarcoderfeli <coderfeli@163.com>
      53ab1b90
  7. 03 Jan, 2025 1 commit
  8. 02 Jan, 2025 3 commits
  9. 31 Dec, 2024 1 commit
  10. 30 Dec, 2024 2 commits
  11. 27 Dec, 2024 3 commits
  12. 25 Dec, 2024 1 commit
  13. 24 Dec, 2024 2 commits
  14. 23 Dec, 2024 1 commit
  15. 13 Dec, 2024 1 commit
    • Bartłomiej Kocot's avatar
      Add SplitK support into Batched GEMM V3 (#1729) · 4d8fce33
      Bartłomiej Kocot authored
      
      
      * add bmm api
      
      * add bf16 multi_d
      
      * add ckProfiler for bf16
      
      * add ckProfiler files
      
      * add more instance; fixed 64bit index issue
      
      * fixed naming
      
      * enabled batched Ds
      
      * use long_index for ds offsets
      
      * clean
      
      * add bmm fp8 ckProfiler
      
      * Update example/24_batched_gemm/batched_gemm_xdl_bf16_v3.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update example/24_batched_gemm/batched_gemm_xdl_fp8_rowwise_v3.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update example/24_batched_gemm/run_batched_gemm_example_rowwise.inc
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn.hpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v1_default_instance.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v2_default_instance.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update profiler/src/profile_gemm_universal_batched.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update profiler/include/profiler/profile_gemm_universal_batched_impl.hpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * clean
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_comp_default_instance.cpp
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * refactor batch offset func
      
      * add splitk suppport into bmm_v3
      
      * clean
      
      * clean
      
      * format
      
      * fixed
      
      * fix
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@fb.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      4d8fce33
  16. 27 Nov, 2024 1 commit
    • Adam Osewski's avatar
      Polished Grouped GEMM APIs and new BF16 instances (#1600) · 061ac064
      Adam Osewski authored
      * Few small fixes.
      
      * New GroupedGemm instances (BF16)
      
      * Unify and refactor GroupedGEMM device API.
      
      * Adapt changes to new API.
      
      * Adapt grouped gemm profiler.
      
      * Accept multiple kbatches for grouped gemm profiler.
      
      - delete obsolete two stage as it is now covered by grouped gemm
      
      * Update unit test for grouped gemm.
      
      * Fix thresholds for BF16 and F8. Unblock tests.
      
      * Fix few instances.
      
      * Multiple small fixes.
      
      * Adapt to new API, check dynamic casting.
      
      * Uncomment few data types in grouped gemm profiler.
      
      * Fix call to SetDeviceArgs.
      
      * Fix profile grouped gemm multiply tile loop.
      
      * Fix grouped gemm tile loop kernel args in client examples.
      
      * Review comments.
      061ac064
  17. 21 Nov, 2024 1 commit
  18. 18 Nov, 2024 1 commit
  19. 15 Nov, 2024 1 commit
  20. 06 Nov, 2024 1 commit
  21. 05 Nov, 2024 2 commits
  22. 01 Nov, 2024 1 commit
    • Illia Silin's avatar
      Reduce build time. (#1621) · 03c6448b
      Illia Silin authored
      * disable fp8 gemm_universal on gfx90a and gfx908 by default
      
      * fix cmake syntax
      
      * fix clang format
      
      * add ifdefs in amd_xdlops
      
      * disable fp8 gemm instances on gfx90a by default
      
      * update readme
      03c6448b
  23. 30 Oct, 2024 1 commit
  24. 26 Oct, 2024 1 commit
  25. 23 Oct, 2024 1 commit
  26. 22 Oct, 2024 1 commit
  27. 21 Oct, 2024 4 commits
  28. 07 Oct, 2024 1 commit
  29. 20 Sep, 2024 1 commit
  30. 17 Sep, 2024 1 commit