- 09 Feb, 2025 2 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 08 Feb, 2025 2 commits
-
-
Qianfeng Zhang authored
Merge branch 'ck_tile/improve_async_pipeline' of https://github.com/ROCm/composable_kernel into ck_tile/improve_async_pipeline
-
Qianfeng Zhang authored
-
- 06 Feb, 2025 1 commit
-
-
Qianfeng Zhang authored
-
- 04 Feb, 2025 4 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 03 Feb, 2025 5 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 02 Feb, 2025 3 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 31 Jan, 2025 3 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 30 Jan, 2025 3 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 29 Jan, 2025 1 commit
-
-
Qianfeng Zhang authored
-
- 26 Jan, 2025 6 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 24 Jan, 2025 4 commits
-
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
Qianfeng Zhang authored
-
- 22 Jan, 2025 1 commit
-
-
Qianfeng Zhang authored
-
- 13 Jan, 2025 5 commits
-
-
Max Podkorytov authored
add unit test for gen instances for gemms add unit tests for conv and batched gemms add unit test for preselected gemm instances apply ruff lint add license header for the unit test add inductor pytest to CI verbose pip install switch the directory before installing python packages move the inductor codegen test try yet another workdir Update Jenkinsfile The directory looks right, fixing pip module not found by invoking pip directly Update Jenkinsfile invoke pytest directly since the module is not found Update Dockerfile Install setuptools update package structure bump setuptools maybe fix data path for library sources fix library search path for conv instances fix path in pyproject definition compare path used in gen_instances with one in pyproject.toml; fix the difference Co-authored-by:Illia Silin <98187287+illsilin@users.noreply.github.com>
-
feli authored
* port tiles from a8w8 * rm debug used files * add instances * remove all non gemm in cmake * merge; impl fp16 * recover cmake from develop * add missed files; fix clang format --------- Co-authored-by:coderfeli <coderfeli@163.com>
-
Thomas Ning authored
* refactor the block_gemm_areg_breg_creg_v1 and add the v2 policy with 2x2 warp gemm * Finished the 2x2 warp gemm policy and the block selection mechanism * Clang format * address poyen's comment * Address feedbacks * Fixed the compilation issue * Change the function name
-
ClementLinCF authored
* Observed a 2x perf improvement with kBlockSize = 256 * Using 512 threads may lead to redundant computations
-
Qianfeng authored
* Update for fmha_fwd qs_ks_vs pipeline * Remove _builtin_amdgcn_sched_barrier(0) * Move p_compute to p converting earlier for trying to increase vgprs re-using * Enable GetQKBlockGemm to use WarpGemm-16x16x16 for QLoadOnce==false situation * Re-add __builtin_amdgcn_sched_barrier(0) --------- Co-authored-by:Po Yen Chen <PoYen.Chen@amd.com>
-