- 10 Aug, 2023 3 commits
-
-
Adam Osewski authored
-
Adam Osewski authored
* set KPerBlock to 64 * maximize wherever possible vector load size.
-
Adam Osewski authored
-
- 07 Aug, 2023 1 commit
-
-
Adam Osewski authored
-
- 03 Aug, 2023 4 commits
-
-
Illia Silin authored
-
Bartlomiej Kocot authored
-
Bartlomiej Kocot authored
-
Bartlomiej Wroblewski authored
* Improve formatting of docs; Add a note about the DL_KERNELS flag * Change the recommended version of ROCm to 5.6
-
- 02 Aug, 2023 1 commit
-
-
Po Yen Chen authored
* Enable pipeline v2 opt for layout=TT instance * Use better thread mapping for reading A tile * Conditionally enable pipeline v2 opt * Allow enabling only fp16 gemm instances in profiler * Fix formatting error * Fix compilation error if we enable fp32 in profiler
-
- 01 Aug, 2023 1 commit
-
-
Adam Osewski authored
-
- 27 Jul, 2023 4 commits
-
-
Bartłomiej Kocot authored
* Add s_nops after v_dot to avoid hazard * Fix builtin for inner_produxt fp16 * Skip inline version to builtin * Add comments regarding isa * Fix comment regarding s_nop
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
- 26 Jul, 2023 5 commits
-
-
carlushuang authored
* initial stream-k implementation with example * fix unexpected change in err * improve a little bit performance by reorganize pipeline. * improve perf a little bit by swizzle block idx * add profiler * update example * fix spelling * shrink karg for streamk * support dynamic buffer using memory coherence glc_slc bit from template * control memory coherence while construct dynamic buffer * update reduction for streamk(not ready yet) * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting * fix build issue * fix several bug * now result is correct, everything works (but has scratch) * remove scratch by manually reset coordinate * update device code * fix a bug in final reduce * fix something in example * update async memset * fix enum as camel case * modify coherence enum name * clean code and use atomic streamk by default * remove unused var * throw exception if have empty pointer * fix format * fix CI warning * fix type in init * modify CI error * filter out on gfx10+ * restore changed example code --------- Co-authored-by:Qianfeng Zhang <Qianfeng.Zhang@amd.com>
-
Illia Silin authored
-
Bartłomiej Kocot authored
* Disable XDL kernels on unsupported HW; Add ck::is_xdl_supported function (#765) * Do not throw an error when GEMM problem is not supported. --------- Co-authored-by:
Bartlomiej Wroblewski <bwroblewski10@gmail.com> Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com>
-
Adam Osewski authored
-
rocking authored
-
- 25 Jul, 2023 3 commits
-
-
Po Yen Chen authored
* Use better ThreadClusterLengths to speed up * Update B tile reading pattern for layout=NN instance
-
Adam Osewski authored
-
ltqin authored
* first change bias load * add bias dim and scalervector parameter * make CDE0BlockTransferSrcVectorDim not work * changse toinstance * add limit for CDE0BlockTransferSrcScalarPerVector
-
- 21 Jul, 2023 3 commits
-
-
Illia Silin authored
-
Illia Silin authored
-
Bartłomiej Kocot authored
-
- 20 Jul, 2023 2 commits
-
-
Adam Osewski authored
-
Adam Osewski authored
-
- 18 Jul, 2023 7 commits
-
-
Bartłomiej Kocot authored
* Grouped 3d conv backward data support * Fix comments
-
Rostyslav Geyyer authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
Illia Silin authored
* allow building CK for specific data types * add CI build and test stage on Naiv3x without some int8 instances * add missing gemm fp16 instances * add the changes to the missed cmake file * add empty lines at end of source files * Do not build quantization client example on navi3 in CI * disable batched_gemm_multi_d_int8 instances with DTYPES * disable device_conv2d_bwd_data_instance with DTYPES * fix ckprofiler for conv_bwd_data for int8 * properly isolate the conv_bwd_data int8 instances * remove empty line
-
- 17 Jul, 2023 2 commits
-
-
Illia Silin authored
* check if gpu_targets are supported by compiler * set default list of targets and filter for them
-
Adam Osewski authored
-
- 15 Jul, 2023 1 commit
-
-
arvindcheru authored
* Disable Werror to ignore xnack+ warnings
-
- 12 Jul, 2023 2 commits
-
-
Bartłomiej Kocot authored
* Support NHWGC conv2d_bwd_weight * Fix client example * Fix client example * Fix comments * Redesign grouped_conv_bwd_weight instances * Clang format fix --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
Adam Osewski authored
-
- 10 Jul, 2023 1 commit
-
-
Adam Osewski authored
-