"examples/community/ddim_noise_comparative_analysis.py" did not exist on "12b10cbe0986409e2b87e891248d299b071d0383"
  1. 24 Jan, 2025 1 commit
  2. 22 Jan, 2025 3 commits
  3. 16 Jan, 2025 4 commits
  4. 15 Jan, 2025 2 commits
    • Bartłomiej Kocot's avatar
      Add rounding for float to bf16 conversion as default (#1812) · 7790e8c3
      Bartłomiej Kocot authored
      * Add rounding for float to bf16 conversion
      
      * Add bhalf test
      
      * Add inf test bhalf
      
      * Refactor
      
      * update cmake
      
      * Fixes
      7790e8c3
    • ruanjm's avatar
      [CK_TILE] Add Various Fusion Functions to RMSNorm (#1802) · 04dd3148
      ruanjm authored
      
      
      * Add shortcut to RMSNorm
      
      * Modify test for adding shortcut for RMSNorm
      
      * Add fused parameter into tests
      
      * 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp
      
      * 1. Supports various stride and percisions.
      
      * Add support of Epilogue
      
      * Add fuse and epilogue support to rmsnorm ref
      
      * Modify rmsnorm example
      
      * Refactor tests/examples
      
      * Bug fix for newly added tests/examples
      
      * Bug fix for new tests 2
      
      * Modify smoke test scripts
      
      remove dbg code
      
      * Supports non-smooth dyanmic quant
      
      * Update Rmsnorm2dFwd::GetName()
      
      * rename xscale and prec_sx to smoothscale and prec_sm
      
      Bug fix after rename
      
      Remove files
      
      * change example_rmsnorm2d_fwd.cpp
      
      * update performance calculator
      
      * Fix issue in two-pass when fuse add is enabled
      
      * Remove comment of beta
      
      ---------
      Co-authored-by: default avatarrocking <ChunYu.Lai@amd.com>
      04dd3148
  5. 13 Jan, 2025 2 commits
    • Thomas Ning's avatar
      CK Tile GEMM CICD fixed & register block method refactor (#1776) · 5d671a5f
      Thomas Ning authored
      * refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm
      
      * Finished the 2x2 warp gemm policy and the block selection mechanism
      
      * Clang format
      
      * address poyen's comment
      
      * Address feedbacks
      
      * Fixed the compilation issue
      
      * Change the function name
      5d671a5f
    • Qianfeng's avatar
      Update for fmha_fwd qs_ks_vs pipeline (#1810) · 3d50f57f
      Qianfeng authored
      
      
      * Update for fmha_fwd qs_ks_vs pipeline
      
      * Remove _builtin_amdgcn_sched_barrier(0)
      
      * Move p_compute to p converting earlier for trying to increase vgprs re-using
      
      * Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation
      
      * Re-add __builtin_amdgcn_sched_barrier(0)
      
      ---------
      Co-authored-by: default avatarPo Yen Chen <PoYen.Chen@amd.com>
      3d50f57f
  6. 10 Jan, 2025 1 commit
  7. 08 Jan, 2025 12 commits
  8. 07 Jan, 2025 2 commits
    • Andriy Roshchenko's avatar
      [MX FP8] Add Scaled Type Convert Functions for OCP FP8/BF8 data types (#271) · c4a05057
      Andriy Roshchenko authored
      * Move scaled_type_convert functions to a separate header
      
      * Introduce MX data tests
      
      * Build MX tests only on relevant architectures
      
      * Refactor E8M0 scale implementation
      
      * Fix `config.h` typo
      
      * Cleanup deprecated symbols
      
      * Refactor `amd_ck_fp8.hpp`
      
      * `scaled_type_convert` for `f8_ocp_t`
      
      * Implement test for MX FP8 scaled type convert
      
      * Implement test for MX BF8 scaled type convert
      
      * Scaled type convert for vectors of 2 FP8 elements
      
      * Scaled type convert for vectors of 16 FP8 elements
      
      * Implementation of scaled conversion from F32 to F8
      
      * Add tests for scaled conversions from FP32 to FP8
      
      * Add documentation to the test functions
      
      * Implementation of scaled conversion from F32x2 to F8x2
      
      * Implementation of scaled conversion from F32x16 to F8x16
      
      * Implementation of scaled conversion from F32x32 to F8x32
      
      * Implementation of scaled conversion from F8x32 to F32x32
      
      * Verified on the emulator
      c4a05057
    • Po Yen Chen's avatar
      [CK_TILE] fmha fwd splitkv optimization for decode (seqlen_q=1) (#1789) · 24b12d04
      Po Yen Chen authored
      
      
      * Update license year
      
      * Add initial code to override decode problem
      
      * Fix splitkv traits/args overriding error
      
      * Reshape and transpose lse for decode
      
      * Remove debug code
      
      * Prettify example code
      
      * Use better function name
      
      * Add kMergeNumHeadGroupsSeqLenQ flag
      
      Kernel user can use this switch to turn on/off optimization for
      some problem sizes
      
      * Add missing flag declarations
      
      * Default turn off kMergeNumHeadGroupsSeqLenQ in codegen
      
      * Group similar statements together
      
      * Remove assumption of seqlen_q=1
      
      * Remove kMergeNumHeadGroupsSeqLenQ from splitkv combine kernel
      
      * Support kMergeNumHeadGroupsSeqLenQ=true in fmha splitkv kernel
      
      * Run kMergeNumHeadGroupsSeqLenQ=true kernels when need
      
      * Fix group mode block skip logics
      
      * Undo changes of normal fwd kernel
      
      * Update in GridSize() and using GridSize() for splitkv kernel (#1799)
      
      ---------
      Co-authored-by: default avatarQianfeng <qianfeng.zhang@amd.com>
      24b12d04
  9. 06 Jan, 2025 1 commit
    • Rostyslav Geyyer's avatar
      Add MXFP6 and MXBF6 conversion methods (#270) · e093146e
      Rostyslav Geyyer authored
      * Add conversions
      
      * Add tests
      
      * Add docstrings
      
      * Add scaled conversions
      
      * Add fp6/bf6 tests
      
      * Remove misleading fp4 test case
      
      * Add docstrings
      
      * Clean up
      
      * Address comments
      
      * Set stricter tolerances for RNE tests
      
      * Add missing tests
      
      * Add native conversions to float
      
      * Revert "Add native conversions to float"
      
      This reverts commit 09467111f73b753c8cc3d597533b187940353dab.
      
      * Update copyright years
      e093146e
  10. 04 Jan, 2025 2 commits
  11. 03 Jan, 2025 3 commits
  12. 02 Jan, 2025 2 commits
  13. 29 Dec, 2024 1 commit
    • Qianfeng's avatar
      Remove using partitioner for all fmha kernels (#1778) · 4e076909
      Qianfeng authored
      * Remove using tile partitioner for fmha_fwd_kernel
      
      * Remove using tile partitioner for fmha_fwd_splitkv and splitkv-combine kernels
      
      * Remove using tile partitioner for fmha_fwd_appendkv kernel
      
      * Unify the format of GetTileIndex
      4e076909
  14. 28 Dec, 2024 1 commit
  15. 23 Dec, 2024 1 commit
  16. 20 Dec, 2024 2 commits