- 04 Feb, 2025 3 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 03 Feb, 2025 5 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 02 Feb, 2025 3 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 31 Jan, 2025 3 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 30 Jan, 2025 3 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 29 Jan, 2025 1 commit
-
-
Qianfeng Zhang authored
-
- 26 Jan, 2025 6 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 24 Jan, 2025 4 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 22 Jan, 2025 1 commit
-
-
Qianfeng Zhang authored
-
- 13 Jan, 2025 5 commits
-
-
Max Podkorytov authored
add unit test for gen instances for gemms add unit tests for conv and batched gemms add unit test for preselected gemm instances apply ruff lint add license header for the unit test add inductor pytest to CI verbose pip install switch the directory before installing python packages move the inductor codegen test try yet another workdir Update Jenkinsfile The directory looks right, fixing pip module not found by invoking pip directly Update Jenkinsfile invoke pytest directly since the module is not found Update Dockerfile Install setuptools update package structure bump setuptools maybe fix data path for library sources fix library search path for conv instances fix path in pyproject definition compare path used in gen_instances with one in pyproject.toml; fix the difference Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
feli authored
* port tiles from a8w8 * rm debug used files * add instances * remove all non gemm in cmake * merge; impl fp16 * recover cmake from develop * add missed files; fix clang format --------- Co-authored-by:coderfeli <coderfeli@163.com>
-
Thomas Ning authored
* refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm * Finished the 2x2 warp gemm policy and the block selection mechanism * Clang format * address poyen's comment * Address feedbacks * Fixed the compilation issue * Change the function name
-
ClementLinCF authored
* Observed a 2x perf improvement with kBlockSize = 256 * Using 512 threads may lead to redundant computations
-
Qianfeng authored
* Update for fmha_fwd qs_ks_vs pipeline * Remove _builtin_amdgcn_sched_barrier(0) * Move p_compute to p converting earlier for trying to increase vgprs re-using * Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation * Re-add __builtin_amdgcn_sched_barrier(0) --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-
- 10 Jan, 2025 2 commits
-
-
Bartłomiej Kocot authored
* Grouped convolution backward weight special vector size loads * Instnaces and tests * Fixes * Add 7 and 13 special cases * fix comments * Fix * Fix2 * fixes * fix atomic add bf16
-
Thomas Ning authored
* Finished adding the performance benchmark for ck tile gemm * Fix the executable rename problem * fix the executable name error * delete the unsupported layout combinations * Update run_full_test.sh * Update benchmark_mem_pipeline.sh * Update benchmark_basic.sh * change the executable of gemm_universal * change ck_tile_gemm script permissions * Addressed the comment * Addressed the comment * Fixed the comments * Fixed Comment * roll back the malfunctioned change * Fix the Typo * finalize the tile_gemm_fp16 performance monitoring * fix the stash names for ck_tile gemm logs * change the stashing logic * change stashing syntax --------- Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com> Co-authored-by:
illsilin <Illia.Silin@amd.com>
-
- 08 Jan, 2025 4 commits
-
-
darren-amd authored
* Disable building DPP kernels by default * Disable building dpp instances, examples, or tests if DPP_KERNELS is not set * Add new DPP_KERNELS flag to readme
-
Max Podkorytov authored
-
Max Podkorytov authored
-
Max Podkorytov authored
-