1. 27 Jan, 2025 7 commits
  2. 24 Jan, 2025 4 commits
  3. 23 Jan, 2025 1 commit
  4. 22 Jan, 2025 3 commits
  5. 21 Jan, 2025 2 commits
    • Mateusz Ozga's avatar
      Simplify static_cast if-lands (#1828) · 3db77bc4
      Mateusz Ozga authored
      3db77bc4
    • Mateusz Ozga's avatar
      CK-Tile Grouped GEMM refactor and post PR fixes (#1756) · 3c93d3c4
      Mateusz Ozga authored
      * Grouped gemm simple code refactor
      
      * Offset invoker
      
      * Invoke generic Run, and replace name of parrtitioner variable
      
      * Tests fix type
      
      * Removed namespaces
      
      * Add template param to avoid implicit cast
      
      * Remove generic function
      
      * Constant value
      
      * underline enum to int16_t
      
      * Generalize partitioner function
      
      * Remove whitespaces
      
      * Rename function
      
      * Using support
      
      * Clang-format
      
      * Clang-format
      
      * Fn-partitioner description fn
      
      * Typo
      
      * Typo 2
      
      * Better description
      
      * Better description
      
      * Refactor after review
      
      * Use ctr instead of set fn
      
      * Inovke ctr and typo
      
      * Comments
      
      * Remove unnecessary comment
      
      * Review, remove modulo
      3c93d3c4
  6. 20 Jan, 2025 2 commits
  7. 19 Jan, 2025 1 commit
  8. 18 Jan, 2025 1 commit
  9. 17 Jan, 2025 2 commits
  10. 16 Jan, 2025 2 commits
  11. 15 Jan, 2025 3 commits
    • Illia Silin's avatar
      8c29e06f
    • Bartłomiej Kocot's avatar
      Add rounding for float to bf16 conversion as default (#1812) · 7790e8c3
      Bartłomiej Kocot authored
      * Add rounding for float to bf16 conversion
      
      * Add bhalf test
      
      * Add inf test bhalf
      
      * Refactor
      
      * update cmake
      
      * Fixes
      7790e8c3
    • ruanjm's avatar
      [CK_TILE] Add Various Fusion Functions to RMSNorm (#1802) · 04dd3148
      ruanjm authored
      
      
      * Add shortcut to RMSNorm
      
      * Modify test for adding shortcut for RMSNorm
      
      * Add fused parameter into tests
      
      * 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp
      
      * 1. Supports various stride and percisions.
      
      * Add support of Epilogue
      
      * Add fuse and epilogue support to rmsnorm ref
      
      * Modify rmsnorm example
      
      * Refactor tests/examples
      
      * Bug fix for newly added tests/examples
      
      * Bug fix for new tests 2
      
      * Modify smoke test scripts
      
      remove dbg code
      
      * Supports non-smooth dyanmic quant
      
      * Update Rmsnorm2dFwd::GetName()
      
      * rename xscale and prec_sx to smoothscale and prec_sm
      
      Bug fix after rename
      
      Remove files
      
      * change example_rmsnorm2d_fwd.cpp
      
      * update performance calculator
      
      * Fix issue in two-pass when fuse add is enabled
      
      * Remove comment of beta
      
      ---------
      Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
      04dd3148
  12. 14 Jan, 2025 3 commits
  13. 13 Jan, 2025 7 commits
  14. 10 Jan, 2025 2 commits
    • Bartłomiej Kocot's avatar
      Grouped convolution backward weight special vector size loads (#1772) · fd46a01d
      Bartłomiej Kocot authored
      * Grouped convolution backward weight special vector size loads
      
      * Instnaces and tests
      
      * Fixes
      
      * Add 7 and 13 special cases
      
      * fix comments
      
      * Fix
      
      * Fix2
      
      * fixes
      
      * fix atomic add bf16
      fd46a01d
    • Thomas Ning's avatar
      Ck tile/gemm perf measure (#1750) · 73a076ee
      Thomas Ning authored
      
      
      * Finished adding the performance benchmark for ck tile gemm
      
      * Fix the executable rename problem
      
      * fix the executable name error
      
      * delete the unsupported layout combinations
      
      * Update run_full_test.sh
      
      * Update benchmark_mem_pipeline.sh
      
      * Update benchmark_basic.sh
      
      * change the executable of gemm_universal
      
      * change ck_tile_gemm script permissions
      
      * Addressed the comment
      
      * Addressed the comment
      
      * Fixed the comments
      
      * Fixed Comment
      
      * roll back the malfunctioned change
      
      * Fix the Typo
      
      * finalize the tile_gemm_fp16 performance monitoring
      
      * fix the stash names for ck_tile gemm logs
      
      * change the stashing logic
      
      * change stashing syntax
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      73a076ee