- 01 Aug, 2023 1 commit
-
-
ltqin authored
-
- 28 Jul, 2023 4 commits
- 27 Jul, 2023 9 commits
-
-
Bartłomiej Kocot authored
* Add s_nops after v_dot to avoid hazard * Fix builtin for inner_produxt fp16 * Skip inline version to builtin * Add comments regarding isa * Fix comment regarding s_nop
-
ltqin authored
-
ltqin authored
-
ltqin authored
-
ltqin authored
-
ltqin authored
-
ltqin authored
Merge branch 'mha-train-develop' of https://github.com/ROCmSoftwarePlatform/composable_kernel into mha-train-develop
-
ltqin authored
-
danyao12 authored
-
- 26 Jul, 2023 9 commits
-
-
carlushuang authored
* initial stream-k implementation with example * fix unexpected change in err * improve a little bit performance by reorganize pipeline. * improve perf a little bit by swizzle block idx * add profiler * update example * fix spelling * shrink karg for streamk * support dynamic buffer using memory coherence glc_slc bit from template * control memory coherence while construct dynamic buffer * update reduction for streamk(not ready yet) * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting * fix build issue * fix several bug * now result is correct, everything works (but has scratch) * remove scratch by manually reset coordinate * update device code * fix a bug in final reduce * fix something in example * update async memset * fix enum as camel case * modify coherence enum name * clean code and use atomic streamk by default * remove unused var * throw exception if have empty pointer * fix format * fix CI warning * fix type in init * modify CI error * filter out on gfx10+ * restore changed example code --------- Co-authored-by:Qianfeng Zhang <Qianfeng.Zhang@amd.com>
-
Illia Silin authored
-
Bartłomiej Kocot authored
* Disable XDL kernels on unsupported HW; Add ck::is_xdl_supported function (#765) * Do not throw an error when GEMM problem is not supported. --------- Co-authored-by:
Bartlomiej Wroblewski <bwroblewski10@gmail.com> Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com>
-
ltqin authored
-
danyao12 authored
-
ltqin authored
-
ltqin authored
-
ltqin authored
-
rocking authored
-
- 25 Jul, 2023 14 commits
-
-
Po Yen Chen authored
* Use better ThreadClusterLengths to speed up * Update B tile reading pattern for layout=NN instance
-
ltqin authored
-
ltqin authored
-
ltqin authored
-
ltqin authored
-
ltqin authored
-
ltqin authored
-
ltqin authored
-
ltqin authored
-
ltqin authored
-
danyao12 authored
-
danyao12 authored
-
danyao12 authored
-
ltqin authored
* first change bias load * add bias dim and scalervector parameter * make CDE0BlockTransferSrcVectorDim not work * changse toinstance * add limit for CDE0BlockTransferSrcScalarPerVector
-
- 24 Jul, 2023 1 commit
-
-
ltqin authored
-
- 21 Jul, 2023 2 commits
-
-
Illia Silin authored
-
Illia Silin authored
-