1. 04 Feb, 2025 4 commits
  2. 03 Feb, 2025 1 commit
  3. 01 Feb, 2025 1 commit
  4. 31 Jan, 2025 4 commits
  5. 30 Jan, 2025 5 commits
  6. 29 Jan, 2025 6 commits
  7. 24 Jan, 2025 1 commit
  8. 22 Jan, 2025 4 commits
  9. 21 Jan, 2025 2 commits
  10. 20 Jan, 2025 1 commit
  11. 17 Jan, 2025 4 commits
  12. 16 Jan, 2025 2 commits
  13. 15 Jan, 2025 4 commits
    • Illia Silin's avatar
      8c29e06f
    • Bartłomiej Kocot's avatar
      Add rounding for float to bf16 conversion as default (#1812) · 7790e8c3
      Bartłomiej Kocot authored
      * Add rounding for float to bf16 conversion
      
      * Add bhalf test
      
      * Add inf test bhalf
      
      * Refactor
      
      * update cmake
      
      * Fixes
      7790e8c3
    • ruanjm's avatar
      [CK_TILE] Add Various Fusion Functions to RMSNorm (#1802) · 04dd3148
      ruanjm authored
      
      
      * Add shortcut to RMSNorm
      
      * Modify test for adding shortcut for RMSNorm
      
      * Add fused parameter into tests
      
      * 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp
      
      * 1. Supports various stride and percisions.
      
      * Add support of Epilogue
      
      * Add fuse and epilogue support to rmsnorm ref
      
      * Modify rmsnorm example
      
      * Refactor tests/examples
      
      * Bug fix for newly added tests/examples
      
      * Bug fix for new tests 2
      
      * Modify smoke test scripts
      
      remove dbg code
      
      * Supports non-smooth dyanmic quant
      
      * Update Rmsnorm2dFwd::GetName()
      
      * rename xscale and prec_sx to smoothscale and prec_sm
      
      Bug fix after rename
      
      Remove files
      
      * change example_rmsnorm2d_fwd.cpp
      
      * update performance calculator
      
      * Fix issue in two-pass when fuse add is enabled
      
      * Remove comment of beta
      
      ---------
      Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
      04dd3148
    • Andriy Roshchenko's avatar
      MX FP GEMM - Example Template (#277) · 07307ea1
      Andriy Roshchenko authored
      Temporarily uses `DeviceGemmMultiD_ABScale_Xdl_CShuffle_V3` kernel and 128x128 scaling matrices.
      Must be modified to use MX-native GEMM kernell with 16 or 32 component vectors per scale.
      
      Verified on the emulator.
      07307ea1
  14. 14 Jan, 2025 1 commit