1. 14 Aug, 2024 1 commit
    • Haocong WANG's avatar
      [GEMM] gemm_universal related optimization (#1453) · 3049b546
      Haocong WANG authored
      
      
      * replace buffer_atomic with global_atomic
      
      * fixed global_atomic_add
      
      * added bf16 atomic_add
      
      * format
      
      * clang-format-12
      
      * clean
      
      * clean
      
      * add guards
      
      * Update gtest.cmake
      
      * enabled splitk_gemm_multi_d
      
      * format
      
      * add ckProfiler
      
      * format
      
      * fixed naming
      
      * format
      
      * clean
      
      * clean
      
      * add guards
      
      * fix clang format
      
      * format
      
      * add kbatch printout
      
      * clean
      
      * Add rocm6.2 related gemm optimization
      
      * Limit bf16 atomic usage
      
      * remove redundant RCR gemm_universal instance
      
      * Add RRR fp8 gemm universal instance
      
      * Bug fix
      
      * Add GPU_TARGET guard to FP8/BF8 target
      
      * bug fix
      
      * update cmake
      
      * remove all fp8/bf8 example if arch not support
      
      * Enable fp8 RRR support in ckProfiler
      
      * limit greedy-reverse flag to gemm_universal in ckProfiler
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@fb.com>
      Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      3049b546
  2. 21 May, 2024 1 commit
  3. 11 Apr, 2024 1 commit
  4. 03 Apr, 2024 1 commit
  5. 21 Mar, 2024 1 commit
  6. 15 Mar, 2024 1 commit
  7. 29 Feb, 2024 1 commit
  8. 28 Nov, 2023 1 commit
    • Illia Silin's avatar
      Split the static library into several files. (#1044) · 7965d66a
      Illia Silin authored
      * spolit the static library into several
      
      * update lib paths and fix client example
      
      * do not use device_mha_operarions for client examples
      
      * use appropriate libs to link to client examples
      
      * remove the gpu/transpose path from the list
      
      * try fixing clinet examples 3,4,9
      
      * add necessary libs for client examples
      
      * fix the layernorm client example
      
      * fix the client examples 23 and 24
      
      * fix typo
      
      * add interface library and refresh clang format
      7965d66a
  9. 14 Nov, 2023 1 commit
  10. 04 Oct, 2023 1 commit
  11. 15 Jun, 2023 1 commit
  12. 15 Feb, 2023 1 commit