- 06 Feb, 2025 1 commit
-
-
AMD-dteng authored
-
- 24 Jan, 2025 1 commit
-
-
AMD-dteng authored
-
- 22 Jan, 2025 1 commit
-
-
AMD-dteng authored
-
- 14 Jan, 2025 1 commit
-
-
AMD-dteng authored
-
- 13 Jan, 2025 3 commits
-
-
Thomas Ning authored
* refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm * Finished the 2x2 warp gemm policy and the block selection mechanism * Clang format * address poyen's comment * Address feedbacks * Fixed the compilation issue * Change the function name
-
ClementLinCF authored
* Observed a 2x perf improvement with kBlockSize = 256 * Using 512 threads may lead to redundant computations
-
Qianfeng authored
* Update for fmha_fwd qs_ks_vs pipeline * Remove _builtin_amdgcn_sched_barrier(0) * Move p_compute to p converting earlier for trying to increase vgprs re-using * Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation * Re-add __builtin_amdgcn_sched_barrier(0) --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
- 10 Jan, 2025 2 commits
-
-
Bartłomiej Kocot authored
* Grouped convolution backward weight special vector size loads * Instnaces and tests * Fixes * Add 7 and 13 special cases * fix comments * Fix * Fix2 * fixes * fix atomic add bf16
-
Thomas Ning authored
* Finished adding the performance benchmark for ck tile gemm * Fix the executable rename problem * fix the executable name error * delete the unsupported layout combinations * Update run_full_test.sh * Update benchmark_mem_pipeline.sh * Update benchmark_basic.sh * change the executable of gemm_universal * change ck_tile_gemm script permissions * Addressed the comment * Addressed the comment * Fixed the comments * Fixed Comment * roll back the malfunctioned change * Fix the Typo * finalize the tile_gemm_fp16 performance monitoring * fix the stash names for ck_tile gemm logs * change the stashing logic * change stashing syntax --------- Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 08 Jan, 2025 12 commits
-
-
darren-amd authored
* Disable building DPP kernels by default * Disable building dpp instances, examples, or tests if DPP_KERNELS is not set * Add new DPP_KERNELS flag to readme
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-
AMD-dteng authored
* 1. enable bias feature that add bias before adding residual; 2. change block size from 128->64 when m<64 in fp16 * delete comment * 1.remove fmha change 2.change buffer name from bias to xbias * Now bias can be used independently from fadd * change kbias to kxbias --------- Co-authored-by:feli <felix.li@amd.com>
-
- 07 Jan, 2025 3 commits
-
-
spolifroni-amd authored
-
dependabot[bot] authored
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.12.1 to 1.13.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.12.1...v1.13.0 ) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Po Yen Chen authored
* Update license year * Add initial code to override decode problem * Fix splitkv traits/args overriding error * Reshape and transpose lse for decode * Remove debug code * Prettify example code * Use better function name * Add kMergeNumHeadGroupsSeqLenQ flag Kernel user can use this switch to turn on/off optimization for some problem sizes * Add missing flag declarations * Default turn off kMergeNumHeadGroupsSeqLenQ in codegen * Group similar statements together * Remove assumption of seqlen_q=1 * Remove kMergeNumHeadGroupsSeqLenQ from splitkv combine kernel * Support kMergeNumHeadGroupsSeqLenQ=true in fmha splitkv kernel * Run kMergeNumHeadGroupsSeqLenQ=true kernels when need * Fix group mode block skip logics * Undo changes of normal fwd kernel * Update in GridSize() and using GridSize() for splitkv kernel (#1799) --------- Co-authored-by:Qianfeng <qianfeng.zhang@amd.com>
-
- 04 Jan, 2025 3 commits
-
-
Bartłomiej Kocot authored
* Fix universal gemm profiler for pk_i4_t * fix
-
dependabot[bot] authored
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.12.0 to 1.12.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.12.0...v1.12.1 ) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Illia Silin authored
-
- 03 Jan, 2025 4 commits
-
-
carlushuang authored
* quant * fix bug * simple smoothquant after softmax * update kv-quant * update stride * fix fp8-pertoken-kvcache * update int8/fp8 quant support --------- Co-authored-by: so <a.com> Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
Mingtao Gu authored
* enable int4 scale (weight only) kernel * format some files * Add unit test for int4 weight only * fixed and formatted code * fixed * formated * formated * fixed * fixed a bug in the ckProfiler, and formatted the code --------- Co-authored-by:mtgu0705 <mtgu@amd.com>
-
feli authored
* add no welford * enable output raw * raw of int8 * fix build * fix smoke test err * [ck_tile]layernorm: fix welford ok, set int8 and bf16 small N as default and others open by generate * [cktile]layernorm, fix err commit files and remove uselss * fix quant 8192 err & change norm_reduce class and file name --------- Co-authored-by:
coderfeli <coderfeli@163.com> Co-authored-by:
carlushuang <carlus.huang@amd.com>
-
John Afaganis authored
-
- 02 Jan, 2025 2 commits
-
-
Muhammed Emin Ozturk authored
* initial * Cmake file * successfull compilation but validation failed * Cmake * update * gpu validation * gemm universal * gemm universal sk update * sk bf16 universal instance * gemm_universal_streamk.hpp * only build for gfx94 * Cmakelist * profiler update, bf16 sk only works at gfx42 * clang * clang * clang all * no need flags * cmake script * delete comment * gemm universal sk fix * clang * profiler fix * clang * update * update * delete comment * code formatting * cmake * fix instance * clang * argument supported * argument supported and clang * update * fix * removing unnecessary comments * clang formatting * Update library/src/tensor_operation_instance/gpu/CMakeLists.txt Co-authored-by:
afagaj <john.afaganis@gmail.com> * CopyRight Comment 2025 * clang reformatting * copy right 2025 --------- Co-authored-by:
Emin Ozturk <ozturk.27@osu.edu> Co-authored-by:
root <root@ctr-ubbsmc16.amd.com> Co-authored-by:
Muhammed Emin Ozturk <meozturk@t004-008.hpcfund> Co-authored-by:
root <root@splinter-126-wr-d3.amd.com> Co-authored-by:
Muhammed Emin Ozturk <meozturk@t006-001.hpcfund> Co-authored-by:
Muhammed Emin Ozturk <meozturk@login1.hpcfund> Co-authored-by:
Muhammed Emin Ozturk <meozturk@t004-004.hpcfund> Co-authored-by:
Emin Ozturk <emin.ozturk@utah.edu> Co-authored-by:
Muhammed Emin Ozturk <meozturk@t008-001.hpcfund> Co-authored-by:
afagaj <john.afaganis@gmail.com>
-
Adam Osewski authored
* add a prototype of int4 * clean * debug * clean * clean * move packed into dynamic_buffer * fixed coord reset * add fast pki4 to half conversion * fix * fixed reference and host_tensor * fixed tensor init * format * debug i4_to_f16_convert * format * fixed splitk * weight permute * add b tile permute * clean * weight permute with splitki * format * improve weight layout * add and_or_b32 * fixed splitk crush * add permute switch as a template * recover v3r1 * clean * failure with intrawave v2 * fixed * fixed * add ckProfiler * add bfp16 support * add bf16 example * fixed int4 to bhalf_t conversion * format * fixed int4 to bf16 conversion * clean * add instances for mem * clean * fixed host tensor size * fixed * debug * fixed * add pk_i4_t as a struct * fix * Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * revert * Update example/01_gemm/gemm_xdl_bf16_pk_i4_v3.cpp Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * Update example/01_gemm/gemm_xdl_fp16_pk_i4_v3.cpp Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * fixed comments * revert * clean * revert * revert * fixed * Update CMakeLists.txt * Update script/cmake-ck-dev.sh Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * Update include/ck/tensor_operation/gpu/element/unary_element_wise_operation.hpp Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * Update CMakeLists.txt Co-authored-by:
Adam Osewski <19374865+aosewski@users.noreply.github.com> * fixed * fixed * fixed * revert * revert * add comments * format * fixed assert * fixed * Fix I4 define in ckProfiler * Fixed example_gemm_xdl_bf16_pk_i4_v3 test failed issue --------- Co-authored-by:
Jing Zhang <jizhan@fb.com> Co-authored-by:
zjing14 <zhangjing14@gmail.com> Co-authored-by:
mtgu0705 <mtgu@amd.com>
-
- 01 Jan, 2025 1 commit
-
-
Bartłomiej Kocot authored
* Add NGCHW bf16 grouped conv fwd instances * add missed cmake
-
- 29 Dec, 2024 1 commit
-
-
Qianfeng authored
* Remove using tile partitioner for fmha_fwd_kernel * Remove using tile partitioner for fmha_fwd_splitkv and splitkv-combine kernels * Remove using tile partitioner for fmha_fwd_appendkv kernel * Unify the format of GetTileIndex
-
- 28 Dec, 2024 1 commit
-
-
Bartłomiej Kocot authored
* [CK TILE] Add split K support in GEMM * Updates * Fixes * rebase * fix * Fix * fixes * support for batched gemm
-
- 25 Dec, 2024 1 commit
-
-
Po Yen Chen authored
-
- 23 Dec, 2024 1 commit
-
-
carlushuang authored
* opt moe sorting * remove commented code
-
- 20 Dec, 2024 2 commits
-
-
Illia Silin authored
-
carlushuang authored
-