1. 10 Feb, 2025 2 commits
    • M.Emin Ozturk's avatar
      clang · fd650950
      M.Emin Ozturk authored
      fd650950
    • Mingtao Gu's avatar
      Added Int4 mixed batch gemm support (#1839) · d9f1ead3
      Mingtao Gu authored
      
      
      * remove redundant kernels.
      
      * added batched_gemm_xdl_fp16int4_b_scale_v3
      
      * Enabled the split K.
      
      * added the batched_gemm_b_scale ckProfiler, meet function issue
      
      * fix some typo
      
      * fix ckProfiler build issue
      
      * fix some bugs
      
      * updated some debug info
      
      * comment some code
      
      * Fix
      
      * fixed some bugs and refactor the code
      
      * fixed a function bug.
      
      * formatted files.
      
      * formatted
      
      * uncommented the ckProfiler CMakeLists
      
      * fixed.
      
      * fix ckProfiler for batched_gemm_b_scale
      
      ---------
      Co-authored-by: default avatarmtgu0705 <mtgu@amd.com>
      Co-authored-by: default avataraska-0096 <haocwang@amd.com>
      Co-authored-by: default avatarBartlomiej Kocot <barkocot@amd.com>
      d9f1ead3
  2. 06 Feb, 2025 1 commit
  3. 20 Jan, 2025 1 commit
    • deepsek's avatar
      Added bf16 instances grouped gemm fixed nk (#1825) · e7dce4d2
      deepsek authored
      * Feat: Add bf16 input instances
      
      * feat: Add BF16 profiler code
      
      * fix: reorder enum types
      
      * fix: CI fail due to clang-format
      
      * fix: clang script format issue
      
      * fix: clang format broke cmakelist file
      e7dce4d2
  4. 17 Jan, 2025 1 commit
  5. 13 Jan, 2025 1 commit
    • feli's avatar
      Dev/merge u8w8 (#1774) · 53ab1b90
      feli authored
      
      
      * port tiles from a8w8
      
      * rm debug used files
      
      * add instances
      
      * remove all non gemm in cmake
      
      * merge; impl fp16
      
      * recover cmake from develop
      
      * add missed files; fix clang format
      
      ---------
      Co-authored-by: default avatarcoderfeli <coderfeli@163.com>
      53ab1b90
  6. 03 Jan, 2025 1 commit
  7. 02 Jan, 2025 2 commits
  8. 13 Dec, 2024 1 commit
    • Bartłomiej Kocot's avatar
      Add SplitK support into Batched GEMM V3 (#1729) · 4d8fce33
      Bartłomiej Kocot authored
      
      
      * add bmm api
      
      * add bf16 multi_d
      
      * add ckProfiler for bf16
      
      * add ckProfiler files
      
      * add more instance; fixed 64bit index issue
      
      * fixed naming
      
      * enabled batched Ds
      
      * use long_index for ds offsets
      
      * clean
      
      * add bmm fp8 ckProfiler
      
      * Update example/24_batched_gemm/batched_gemm_xdl_bf16_v3.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update example/24_batched_gemm/batched_gemm_xdl_fp8_rowwise_v3.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update example/24_batched_gemm/run_batched_gemm_example_rowwise.inc
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn.hpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v1_default_instance.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_mem_v2_default_instance.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update profiler/src/profile_gemm_universal_batched.cpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * Update profiler/include/profiler/profile_gemm_universal_batched_impl.hpp
      Co-authored-by: default avatarBartłomiej Kocot <bartlomiejkocot98@gmail.com>
      
      * clean
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * Update library/src/tensor_operation_instance/gpu/gemm_universal_batched/device_batched_gemm_xdl_universal_bf16_bf16_bf16/device_batched_gemm_xdl_universal_bf16_bf16_bf16_mk_nk_mn_comp_default_instance.cpp
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * Update include/ck/tensor_operation/gpu/device/impl/device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp
      
      * refactor batch offset func
      
      * add splitk suppport into bmm_v3
      
      * clean
      
      * clean
      
      * format
      
      * fixed
      
      * fix
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@fb.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      4d8fce33
  9. 27 Nov, 2024 1 commit
    • Adam Osewski's avatar
      Polished Grouped GEMM APIs and new BF16 instances (#1600) · 061ac064
      Adam Osewski authored
      * Few small fixes.
      
      * New GroupedGemm instances (BF16)
      
      * Unify and refactor GroupedGEMM device API.
      
      * Adapt changes to new API.
      
      * Adapt grouped gemm profiler.
      
      * Accept multiple kbatches for grouped gemm profiler.
      
      - delete obsolete two stage as it is now covered by grouped gemm
      
      * Update unit test for grouped gemm.
      
      * Fix thresholds for BF16 and F8. Unblock tests.
      
      * Fix few instances.
      
      * Multiple small fixes.
      
      * Adapt to new API, check dynamic casting.
      
      * Uncomment few data types in grouped gemm profiler.
      
      * Fix call to SetDeviceArgs.
      
      * Fix profile grouped gemm multiply tile loop.
      
      * Fix grouped gemm tile loop kernel args in client examples.
      
      * Review comments.
      061ac064
  10. 21 Nov, 2024 1 commit
  11. 18 Nov, 2024 1 commit
  12. 15 Nov, 2024 1 commit
  13. 06 Nov, 2024 1 commit
  14. 01 Nov, 2024 1 commit
    • Illia Silin's avatar
      Reduce build time. (#1621) · 03c6448b
      Illia Silin authored
      * disable fp8 gemm_universal on gfx90a and gfx908 by default
      
      * fix cmake syntax
      
      * fix clang format
      
      * add ifdefs in amd_xdlops
      
      * disable fp8 gemm instances on gfx90a by default
      
      * update readme
      03c6448b
  15. 26 Oct, 2024 1 commit
  16. 23 Oct, 2024 1 commit
  17. 22 Oct, 2024 1 commit
  18. 21 Oct, 2024 1 commit
    • Thomas Ning's avatar
      Ck profiler instance support (#1575) · 560917b1
      Thomas Ning authored
      * The draft on ckProfiler instance add
      
      * support the ck profiler instance with same data types
      
      * add a small feature on the M and N variable switch.
      
      * Partially solve the incorrect result problem
      
      * fix based on ci cd
      560917b1
  19. 07 Oct, 2024 1 commit
  20. 20 Sep, 2024 1 commit
  21. 17 Sep, 2024 1 commit
  22. 13 Sep, 2024 1 commit
  23. 12 Sep, 2024 1 commit
  24. 11 Sep, 2024 1 commit
    • jakpiase's avatar
      Rewrite pool2d fwd (#1462) · e8d2887c
      jakpiase authored
      
      
      * added pool2d fwd
      
      * add tests
      
      * add reviewers changes
      
      * Revert "Merge remote-tracking branch 'origin/develop' into jakpiase/pool2d_fwd_new"
      
      This reverts commit 6b2ba7ff8960b0a6ddbe30d8dac53eeb55a8597e, reversing
      changes made to 22c82bea0caf3e0f29399100c1bb67b8003fc042.
      
      * Revert "add reviewers changes"
      
      This reverts commit 22c82bea0caf3e0f29399100c1bb67b8003fc042.
      
      * added reviewers comments
      
      * revert some old files
      
      * add reviewers requests
      
      ---------
      Co-authored-by: default avatarAdam Osewski <19374865+aosewski@users.noreply.github.com>
      e8d2887c
  25. 05 Sep, 2024 1 commit
  26. 03 Sep, 2024 1 commit
  27. 19 Aug, 2024 1 commit
  28. 14 Aug, 2024 1 commit
    • Haocong WANG's avatar
      [GEMM] gemm_universal related optimization (#1453) · 3049b546
      Haocong WANG authored
      
      
      * replace buffer_atomic with global_atomic
      
      * fixed global_atomic_add
      
      * added bf16 atomic_add
      
      * format
      
      * clang-format-12
      
      * clean
      
      * clean
      
      * add guards
      
      * Update gtest.cmake
      
      * enabled splitk_gemm_multi_d
      
      * format
      
      * add ckProfiler
      
      * format
      
      * fixed naming
      
      * format
      
      * clean
      
      * clean
      
      * add guards
      
      * fix clang format
      
      * format
      
      * add kbatch printout
      
      * clean
      
      * Add rocm6.2 related gemm optimization
      
      * Limit bf16 atomic usage
      
      * remove redundant RCR gemm_universal instance
      
      * Add RRR fp8 gemm universal instance
      
      * Bug fix
      
      * Add GPU_TARGET guard to FP8/BF8 target
      
      * bug fix
      
      * update cmake
      
      * remove all fp8/bf8 example if arch not support
      
      * Enable fp8 RRR support in ckProfiler
      
      * limit greedy-reverse flag to gemm_universal in ckProfiler
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@fb.com>
      Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      3049b546
  29. 06 Aug, 2024 2 commits
  30. 05 Aug, 2024 1 commit
  31. 31 Jul, 2024 1 commit
  32. 19 Jul, 2024 2 commits
    • Haocong WANG's avatar
      [GEMM] F8 GEMM, performance optimized. (#1384) · 8c90f25b
      Haocong WANG authored
      
      
      * add ab_scale init support
      
      * enabled interwave
      
      * add scale type; update isSupport
      
      * adjust example
      
      * clean
      
      * enable f8 pure gemm rcr ckprofiler
      
      * Add gemm_multiply_multiply instances
      
      * clang format
      
      * Optimize for ScaleBlockMNK=128
      
      * enable abscale f8 gemm ck profiler
      
      * Add pure f8 gemm test suite
      
      * Reverting to the state of project at f60fd77
      
      * update copyright
      
      * clang format
      
      * update copyright
      
      ---------
      Co-authored-by: default avatarroot <jizhan@amd.com>
      8c90f25b
    • ltqin's avatar
      Universal gemm splitk using reduce (with multi-d) (#1341) · c544eb4d
      ltqin authored
      
      
      * init for reduce_threadwise multi_d
      
      * add reduce_threadwise_multi_d
      
      * add reduce_multi_d
      
      * clean
      
      * start add an other splitk device op
      
      * add reduce template parameter to SplitKBatchOffset
      
      * add reduce c matrix
      
      * clean up code
      
      * change example data type to bf16
      
      * add bf16Ai8B example
      
      * remove reduce template parameter
      
      * add splitk atomic status to v4
      
      * example add multi d parameters
      
      * device op add multi-d parameters
      
      * add multi-d to reduce
      
      * fix kbach=1 bug
      
      * change B layout to col in  bf16Ai8B example
      
      * remove float adding struct
      
      * change  multi-d interface
      
      * change file and class name
      
      * remove multi-d of bf16Ai8B example
      
      * change IsReduce function to IsReduceAdd
      
      * change example layout to RRR from RCR
      
      * according layout to set ds stride
      
      * reset parameter layout
      
      * add gemm universal reduce instance
      
      * add reduce factory
      
      * add profile_gemm_universal_reduce
      
      * add reduce to profiler
      
      * fix reduce instance
      
      * fix profiler reduce compiling bug
      
      * format
      
      * format library instance code
      
      * add mem instance for reduce library
      
      * fix call instance names
      
      * add workspace for reduce in ckProfiler
      
      * format
      
      * add mnpading to reduce library instance
      
      * add fp16 instance to reduce of profiler
      
      * change copyright time
      
      * restore profiler cmake file
      
      * add reduce text to instances
      
      * add DsLayout and DsDataType to instances template parameter
      
      * fixed gemm_reduce_multi_d
      
      * add an example without multi_d
      
      * Update common.hpp
      
      * Update gtest.cmake
      
      * Update gemm_xdl_splitk_reduce_bf16.cpp
      
      * clean
      
      * Update gtest.cmake
      
      * format
      
      * fixe api
      
      * format
      
      * default parameter change to RRR
      
      * add vector_len for multi_d
      
      * format
      
      * Update gtest.cmake
      
      * fix bf16A iBB elementwiseop
      
      * add ReduceDataType
      
      * move ReduceDataType to end position
      
      * format
      
      * remove googletest git method  address
      
      * fix copyright time
      
      * update init data
      
      ---------
      Co-authored-by: default avatarroot <jizhan@amd.com>
      Co-authored-by: default avatarletaoqin <letaoqin@amd.com>
      Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      c544eb4d
  33. 08 Jul, 2024 1 commit
  34. 06 Jul, 2024 1 commit
    • Harisankar Sadasivan's avatar
      Universal streamk with atomics (#1360) · 75e622f0
      Harisankar Sadasivan authored
      * universal streamk with atomics with ckprofiler support. grid_size and streamk strategy are tunable. grid_size of -1 leads to #WGs = maximum occupancy X num_CUs. implementation supports many different streamk policies: 1-tile, 2-tile, 3-tile and 4-tile. streamk strategy of -1 leads to default streamk policy (4-tile). 
      
      * Update README.md
      
      * fixing clang-format issues
      
      * removed conflicts in struct members between streamk and universal streamk
      
      * corrected arg parsing for streamk and universal streamk
      
      * added stream-k policies for 3 tile and 4 tile
      
      * fixed argument type issue with parsing cmd args
      
      * changes suggested in PR review are made- removing comments and correcting copyright
      
      * file permissions updated
      
      * added default value support for grid_size and streamk-policy selection set to -1
      
      * print messages for arguments
      
      * print messages for arguments
      
      * print messages for arguments1
      75e622f0
  35. 28 Jun, 2024 1 commit
  36. 27 Jun, 2024 1 commit