- 22 Jan, 2025 5 commits
-
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
carlushuang authored
-
- 21 Jan, 2025 12 commits
-
-
Mateusz Ozga authored
-
Mateusz Ozga authored
* Grouped gemm simple code refactor * Offset invoker * Invoke generic Run, and replace name of parrtitioner variable * Tests fix type * Removed namespaces * Add template param to avoid implicit cast * Remove generic function * Constant value * underline enum to int16_t * Generalize partitioner function * Remove whitespaces * Rename function * Using support * Clang-format * Clang-format * Fn-partitioner description fn * Typo * Typo 2 * Better description * Better description * Refactor after review * Use ctr instead of set fn * Inovke ctr and typo * Comments * Remove unnecessary comment * Review, remove modulo
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
- 20 Jan, 2025 1 commit
-
-
lucbruni-amd authored
* Disable CK_TIME_KERNEL by Default, Add as CMake Variable * Enable CK_TIME_KERNEL by Default, Maintaining CMake Variable Functionality. * Fix build error.
-
- 19 Jan, 2025 1 commit
-
-
Mingtao Gu authored
Co-authored-by:mtgu0705 <mtgu@amd.com>
-
- 18 Jan, 2025 1 commit
-
-
Bartłomiej Kocot authored
-
- 16 Jan, 2025 2 commits
-
-
Bartłomiej Kocot authored
* Fix and optimize dynamic unary elementwise * fix
-
carlushuang authored
* fix mock token id * prepare host for g1u1 * reformat inline-asm * restructure uk_0 * restructure gate_up * done * change default to init=1 * update readme * fix a bug in interleave pipeline * rcp for silu
-
- 15 Jan, 2025 2 commits
-
-
Bartłomiej Kocot authored
* Add rounding for float to bf16 conversion * Add bhalf test * Add inf test bhalf * Refactor * update cmake * Fixes
-
ruanjm authored
* Add shortcut to RMSNorm * Modify test for adding shortcut for RMSNorm * Add fused parameter into tests * 1. Add YDataType. 2. rmsnorm2d_fwd_traits_ from rmsnorm2d_fwd.hpp to rmsnorm2d_fwd_api.cpp and rmsnorm2d_fwd_instance_common.hpp * 1. Supports various stride and percisions. * Add support of Epilogue * Add fuse and epilogue support to rmsnorm ref * Modify rmsnorm example * Refactor tests/examples * Bug fix for newly added tests/examples * Bug fix for new tests 2 * Modify smoke test scripts remove dbg code * Supports non-smooth dyanmic quant * Update Rmsnorm2dFwd::GetName() * rename xscale and prec_sx to smoothscale and prec_sm Bug fix after rename Remove files * change example_rmsnorm2d_fwd.cpp * update performance calculator * Fix issue in two-pass when fuse add is enabled * Remove comment of beta --------- Co-authored-by:rocking <ChunYu.Lai@amd.com>
-
- 13 Jan, 2025 2 commits
-
-
Thomas Ning authored
* refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm * Finished the 2x2 warp gemm policy and the block selection mechanism * Clang format * address poyen's comment * Address feedbacks * Fixed the compilation issue * Change the function name
-
Qianfeng authored
* Update for fmha_fwd qs_ks_vs pipeline * Remove _builtin_amdgcn_sched_barrier(0) * Move p_compute to p converting earlier for trying to increase vgprs re-using * Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation * Re-add __builtin_amdgcn_sched_barrier(0) --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
- 10 Jan, 2025 1 commit
-
-
Bartłomiej Kocot authored
* Grouped convolution backward weight special vector size loads * Instnaces and tests * Fixes * Add 7 and 13 special cases * fix comments * Fix * Fix2 * fixes * fix atomic add bf16
-
- 09 Jan, 2025 1 commit
-
-
Adam Osewski authored
-
- 08 Jan, 2025 12 commits
-
-
darren-amd authored
* Disable building DPP kernels by default * Disable building dpp instances, examples, or tests if DPP_KERNELS is not set * Add new DPP_KERNELS flag to readme
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
AMD-dteng authored
* 1. enable bias feature that add bias before adding residual; 2. change block size from 128->64 when m<64 in fp16 * delete comment * 1.remove fmha change 2.change buffer name from bias to xbias * Now bias can be used independently from fadd * change kbias to kxbias --------- Co-authored-by:feli <felix.li@amd.com>
-