1. 21 Nov, 2024 2 commits
  2. 20 Nov, 2024 2 commits
  3. 19 Nov, 2024 1 commit
  4. 14 Nov, 2024 1 commit
  5. 07 Nov, 2024 1 commit
  6. 04 Nov, 2024 1 commit
  7. 31 Oct, 2024 1 commit
  8. 30 Oct, 2024 1 commit
  9. 29 Oct, 2024 1 commit
  10. 26 Oct, 2024 1 commit
  11. 25 Oct, 2024 1 commit
    • aledudek's avatar
      Generic threshold calculation (#1546) · 9385caa3
      aledudek authored
      * Calculate generic relative threshold pool3dfwd
      
      * Calculate absolute error threshold pool3d fwd
      
      * Generic threshold calculation take max input for relative error pool3dfwd
      
      * Remove max possible value for error calculation at runtime
      
      * Remove debug print in pool3dfwd
      
      * Pool3d fwd adjusted types in generic threshold calculation
      
      * Generic threshold calculation take into account number of accumulations and accdatatype
      
      * Generic threshold fix final error formula
      
      * Generic threshold calculation - num of accs fix
      
      * Generic threshold calculation - adjust absolute error
      
      * Generic threshold calculation - OutDataType in absolute error
      9385caa3
  12. 23 Oct, 2024 1 commit
  13. 22 Oct, 2024 1 commit
  14. 21 Oct, 2024 1 commit
  15. 18 Oct, 2024 1 commit
  16. 16 Oct, 2024 2 commits
  17. 15 Oct, 2024 3 commits
  18. 14 Oct, 2024 3 commits
  19. 11 Oct, 2024 3 commits
  20. 10 Oct, 2024 1 commit
  21. 03 Oct, 2024 1 commit
  22. 20 Sep, 2024 1 commit
  23. 12 Sep, 2024 1 commit
  24. 11 Sep, 2024 1 commit
  25. 14 Aug, 2024 1 commit
    • Haocong WANG's avatar
      [GEMM] gemm_universal related optimization (#1453) · 3049b546
      Haocong WANG authored
      
      
      * replace buffer_atomic with global_atomic
      
      * fixed global_atomic_add
      
      * added bf16 atomic_add
      
      * format
      
      * clang-format-12
      
      * clean
      
      * clean
      
      * add guards
      
      * Update gtest.cmake
      
      * enabled splitk_gemm_multi_d
      
      * format
      
      * add ckProfiler
      
      * format
      
      * fixed naming
      
      * format
      
      * clean
      
      * clean
      
      * add guards
      
      * fix clang format
      
      * format
      
      * add kbatch printout
      
      * clean
      
      * Add rocm6.2 related gemm optimization
      
      * Limit bf16 atomic usage
      
      * remove redundant RCR gemm_universal instance
      
      * Add RRR fp8 gemm universal instance
      
      * Bug fix
      
      * Add GPU_TARGET guard to FP8/BF8 target
      
      * bug fix
      
      * update cmake
      
      * remove all fp8/bf8 example if arch not support
      
      * Enable fp8 RRR support in ckProfiler
      
      * limit greedy-reverse flag to gemm_universal in ckProfiler
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@fb.com>
      Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      3049b546
  26. 07 Aug, 2024 1 commit
  27. 06 Aug, 2024 2 commits
  28. 24 Jul, 2024 1 commit
  29. 17 Jul, 2024 1 commit
  30. 04 Jul, 2024 1 commit