- 28 Jan, 2025 1 commit
-
-
darren-amd authored
* Change flag from CK_WORKAROUND_DENORM_FIX to CK_GFX90A_DENORM_WORKAROUND for more clarity. Also changed the definition macros to be more clear.
-
- 27 Jan, 2025 3 commits
-
-
Astha Rai authored
-
Andriy Roshchenko authored
* Add OCP FP8 to CK_TILE * Validate OCP FP8 in FMHA FWD under VALID=1
-
Adam Osewski authored
* Refactor universal gemm policy. * Adapt example to refactor changes. * Introduce static encoding pattern * Adding shuffled encoding patterns. * Fix err in reverse tuple. * Add transpose_tile2d * Small refactoring + doc * Enable reading on contiguous dimension in all layouts. * Transpose A/B register tile if needed for comp v3 pipeline. * Take contiguous dim size when calculating dram vector load size. * A/B smem pack size taken from WarpGemm attributes * Update B LDS layout and setup tile distribution pattern at class level. * Fix static assert. * Fix errors in examples. * Formatting & fix IsTranspose * Fix VectorSize & refactor. * Add error loging messages. * Fix VecLoadSize and TranspseC for mem pipeline. * Update unit-tests & disable mem pipeline. * Clang format * Update include/ck_tile/core/tensor/tile_window.hpp Co-authored-by:
jakpiase <jakub.piasecki@amd.com> * Fix compilation and reviewers comments. * Refactor unit-test. Fallback to non-universal gemm. Need to use GemmPipelineAGmemBGmemCRegV1 for now, since GemmKernel is now supporting also non-K major vector reads. --------- Co-authored-by:
jakpiase <jakub.piasecki@amd.com>
-
- 24 Jan, 2025 3 commits
-
-
ruanjm authored
-
carlushuang authored
* not using structures under ck_tile/ops for ck_tile/host * update as constexpr function * Rename fn * Update other examples. --------- Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> Co-authored-by:
Adam Osewski <Adam.Osewski@amd.com>
-
Astha Rai authored
-
- 23 Jan, 2025 1 commit
-
-
Astha Rai authored
-
- 22 Jan, 2025 1 commit
-
-
carlushuang authored
-
- 21 Jan, 2025 2 commits
-
-
Mateusz Ozga authored
-
Mateusz Ozga authored
* Grouped gemm simple code refactor * Offset invoker * Invoke generic Run, and replace name of parrtitioner variable * Tests fix type * Removed namespaces * Add template param to avoid implicit cast * Remove generic function * Constant value * underline enum to int16_t * Generalize partitioner function * Remove whitespaces * Rename function * Using support * Clang-format * Clang-format * Fn-partitioner description fn * Typo * Typo 2 * Better description * Better description * Refactor after review * Use ctr instead of set fn * Inovke ctr and typo * Comments * Remove unnecessary comment * Review, remove modulo
-
- 20 Jan, 2025 1 commit
-
-
lucbruni-amd authored
* Disable CK_TIME_KERNEL by Default, Add as CMake Variable * Enable CK_TIME_KERNEL by Default, Maintaining CMake Variable Functionality. * Fix build error.
-
- 19 Jan, 2025 1 commit
-
-
Mingtao Gu authored
Co-authored-by:mtgu0705 <mtgu@amd.com>
-
- 18 Jan, 2025 1 commit
-
-
Bartłomiej Kocot authored
-
- 16 Jan, 2025 2 commits
-
-
Bartłomiej Kocot authored
* Fix and optimize dynamic unary elementwise * fix
-
carlushuang authored
* fix mock token id * prepare host for g1u1 * reformat inline-asm * restructure uk_0 * restructure gate_up * done * change default to init=1 * update readme * fix a bug in interleave pipeline * rcp for silu
-
- 15 Jan, 2025 2 commits
-
-
Bartłomiej Kocot authored
* Add rounding for float to bf16 conversion * Add bhalf test * Add inf test bhalf * Refactor * update cmake * Fixes
-
ruanjm authored
* Add shortcut to RMSNorm * Modify test for adding shortcut for RMSNorm * Add fused parameter into tests * 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp * 1. Supports various stride and percisions. * Add support of Epilogue * Add fuse and epilogue support to rmsnorm ref * Modify rmsnorm example * Refactor tests/examples * Bug fix for newly added tests/examples * Bug fix for new tests 2 * Modify smoke test scripts remove dbg code * Supports non-smooth dyanmic quant * Update Rmsnorm2dFwd::GetName() * rename xscale and prec_sx to smoothscale and prec_sm Bug fix after rename Remove files * change example_rmsnorm2d_fwd.cpp * update performance calculator * Fix issue in two-pass when fuse add is enabled * Remove comment of beta --------- Co-authored-by:rocking <ChunYu.Lai@amd.com>
-
- 14 Jan, 2025 1 commit
-
-
Astha Rai authored
-
- 13 Jan, 2025 3 commits
-
-
Astha Rai authored
-
Thomas Ning authored
* refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm * Finished the 2x2 warp gemm policy and the block selection mechanism * Clang format * address poyen's comment * Address feedbacks * Fixed the compilation issue * Change the function name
-
Qianfeng authored
* Update for fmha_fwd qs_ks_vs pipeline * Remove _builtin_amdgcn_sched_barrier(0) * Move p_compute to p converting earlier for trying to increase vgprs re-using * Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation * Re-add __builtin_amdgcn_sched_barrier(0) --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
- 10 Jan, 2025 1 commit
-
-
Bartłomiej Kocot authored
* Grouped convolution backward weight special vector size loads * Instnaces and tests * Fixes * Add 7 and 13 special cases * fix comments * Fix * Fix2 * fixes * fix atomic add bf16
-
- 08 Jan, 2025 14 commits
-
-
Astha Rai authored
-
darren-amd authored
* Disable building DPP kernels by default * Disable building dpp instances, examples, or tests if DPP_KERNELS is not set * Add new DPP_KERNELS flag to readme
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Astha Rai authored
-
AMD-dteng authored
* 1. enable bias feature that add bias before adding residual; 2. change block size from 128->64 when m<64 in fp16 * delete comment * 1.remove fmha change 2.change buffer name from bias to xbias * Now bias can be used independently from fadd * change kbias to kxbias --------- Co-authored-by:feli <felix.li@amd.com>
-
- 07 Jan, 2025 1 commit
-
-
Po Yen Chen authored
* Update license year * Add initial code to override decode problem * Fix splitkv traits/args overriding error * Reshape and transpose lse for decode * Remove debug code * Prettify example code * Use better function name * Add kMergeNumHeadGroupsSeqLenQ flag Kernel user can use this switch to turn on/off optimization for some problem sizes * Add missing flag declarations * Default turn off kMergeNumHeadGroupsSeqLenQ in codegen * Group similar statements together * Remove assumption of seqlen_q=1 * Remove kMergeNumHeadGroupsSeqLenQ from splitkv combine kernel * Support kMergeNumHeadGroupsSeqLenQ=true in fmha splitkv kernel * Run kMergeNumHeadGroupsSeqLenQ=true kernels when need * Fix group mode block skip logics * Undo changes of normal fwd kernel * Update in GridSize() and using GridSize() for splitkv kernel (#1799) --------- Co-authored-by:Qianfeng <qianfeng.zhang@amd.com>
-
- 06 Jan, 2025 2 commits