1. 16 Aug, 2024 1 commit
  2. 15 Aug, 2024 2 commits
  3. 14 Aug, 2024 1 commit
    • Haocong WANG's avatar
      [GEMM] gemm_universal related optimization (#1453) · 3049b546
      Haocong WANG authored
      
      
      * replace buffer_atomic with global_atomic
      
      * fixed global_atomic_add
      
      * added bf16 atomic_add
      
      * format
      
      * clang-format-12
      
      * clean
      
      * clean
      
      * add guards
      
      * Update gtest.cmake
      
      * enabled splitk_gemm_multi_d
      
      * format
      
      * add ckProfiler
      
      * format
      
      * fixed naming
      
      * format
      
      * clean
      
      * clean
      
      * add guards
      
      * fix clang format
      
      * format
      
      * add kbatch printout
      
      * clean
      
      * Add rocm6.2 related gemm optimization
      
      * Limit bf16 atomic usage
      
      * remove redundant RCR gemm_universal instance
      
      * Add RRR fp8 gemm universal instance
      
      * Bug fix
      
      * Add GPU_TARGET guard to FP8/BF8 target
      
      * bug fix
      
      * update cmake
      
      * remove all fp8/bf8 example if arch not support
      
      * Enable fp8 RRR support in ckProfiler
      
      * limit greedy-reverse flag to gemm_universal in ckProfiler
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizhan@fb.com>
      Co-authored-by: default avatarJing Zhang <jizhan@meta.com>
      Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      3049b546
  4. 13 Aug, 2024 2 commits
    • AngryLoki's avatar
      Fix compilation errors with libc++ (#1461) · 50c42348
      AngryLoki authored
      
      
      This fixes 2 issues when compiled with libc++.
      
      First issue is attempt to call std::numeric_limits<ranges::range_value_t<_Float16>>::min().
      _Float16 is extension of libstdc++, it does not exist in C++ standard[2].
      Luckily, there is NumericLimits class in composable_kernel, which does everything needed.
      
      Second issue with call to 'check_err' is ambiguous: there are 2 candidates.
      It happens because composable_kernel relies on idea that f8_t (defined as _BitInt(8)) does not pass is_integral trait.
      However, libc++ treats _BitInt(N) as integral (per standard "any implementation-defined extended integer types" can be integral).
      
      Closes: #1460
      Signed-off-by: default avatarSv. Lockal <lockalsash@gmail.com>
      50c42348
    • Mateusz Ozga's avatar
  5. 12 Aug, 2024 2 commits
  6. 10 Aug, 2024 1 commit
  7. 09 Aug, 2024 2 commits
  8. 08 Aug, 2024 3 commits
  9. 07 Aug, 2024 4 commits
  10. 06 Aug, 2024 7 commits
  11. 05 Aug, 2024 2 commits
  12. 01 Aug, 2024 1 commit
  13. 31 Jul, 2024 4 commits
  14. 30 Jul, 2024 2 commits
  15. 26 Jul, 2024 2 commits
  16. 25 Jul, 2024 2 commits
  17. 24 Jul, 2024 2 commits
    • Andriy Roshchenko's avatar
      Adding more instances of grouped convolution 3d forward for FP8 with... · 4a8a1bef
      Andriy Roshchenko authored
      Adding more instances of grouped convolution 3d forward for FP8 with ConvScale+Bias element-wise operation. (#1412)
      
      * Add CMakePresets configurations.
      
      * Add binary elementwise ConvScaleAdd and an example.
      
      * Numerical verification of results.
      
      Observed significant irregularities in F8 to F32 type conversions:
      ```log
      ConvScaleAdd: float=145.000000   f8_t=160.000000    e=144.000000
      ConvScaleAdd: float=97.000000   f8_t=96.000000    e=104.000000
      ConvScaleAdd: float=65.000000   f8_t=64.000000    e=72.000000
      ```
      
      * Implemented ConvScaleAdd + Example.
      
      * Add ConvScale+Bias Instances
      
      * Add Client Example for ConvScale+Bias
      
      * Fix number of bytes in an example..
      
      * Cleanup.
      4a8a1bef
    • Bartłomiej Kocot's avatar
      Add support for half_t and bfloat to reduction operations (#1395) · ffabd70a
      Bartłomiej Kocot authored
      * Add support for half_t and bfloat to reduction operations
      
      * Fix bhalf convert
      
      * Next fix bf16
      ffabd70a