- 10 Aug, 2023 5 commits
-
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
* set KPerBlock to 64 * maximize wherever possible vector load size.
-
Adam Osewski authored
-
rocking authored
* Add avgpool bwd reference code * Refine naming * Fix invalid in_element op in ref_conv * Add example (only reference now) * Add the full example of avgpool bwd * Fix copyright * Imitate MakeDescriptor from transform_conv_bwd_data_to_gemm_v1.hpp * rename channel to c from k * Arrange the code * Imitate the argument from conv bwd * Implement invoker * Fix order of parameter in example * Refactor reference code for different dimension * Support different stride * Check if argument is valid * Fix kernel parameter for NDHWC, fastest dimension C is not reduced * Add more data type in example * Fix bug in example * calculate Do Ho Wo according to the dilation * Remove useless header * Add comment in reference code * Add layout parameter * Remove layout in derived class * Refine reference comment
-
- 09 Aug, 2023 6 commits
-
-
Illia Silin authored
* add fno-offload-uniform-block flag for rocm5.7 and up * add a comment and compiler ticket number * update the threshold rocm version
-
Illia Silin authored
* add linting and update contributors list * skip the linting and doc changes * add Astha * add YanXing
-
Illia Silin authored
-
Bartłomiej Kocot authored
* Enable grouped conv with small K or C * Add missing instances * Refactor grouped conv fwd instances * Fix fp16 instances since it supports src_per_vec %2 = 0 * Add generic instances
-
Rostyslav Geyyer authored
* Enable f16/f8 mixed precision * Add an argument to enable mixed precision * Update for compatibility * Add mixed precision example * Introduce ComputeType argument
-
Illia Silin authored
* add -fno-offload-uniform-block flag for rocm5.7 and up * add a comment and compiler ticket number
-
- 07 Aug, 2023 3 commits
-
-
Illia Silin authored
* properly split conv_nd_bwd_data instances * split conv2d_fwd instance data types * split the gemm, conv2d_fwd and batched_gemm_softamx_gemm * split the tests by data types where possible * filter examples by DTYPES * split few remaining examples by DTYPES * filter most instances by DTYPES * add new lines at end of headers, fix grouped_gemm profiler * fix syntax * split the ckprofiler instances by DTYPES * split the conv2d and quantization DL and XDL instances * fix the splitting of conv2d DL instances * split softmax and pool_fwd tests for fp16 and fp32 types * fix syntax * fix the dl_int8 quantization instances isolation
-
Bartłomiej Kocot authored
* Add wei_strides to grouped conv3d wei to keep consistency * Fix strides in client examples * Unify backward weight api with forward * Fix for example * Fixes for examples --------- Co-authored-by:zjing14 <zhangjing14@gmail.com>
-
Adam Osewski authored
-
- 03 Aug, 2023 4 commits
-
-
Illia Silin authored
-
Bartlomiej Kocot authored
-
Bartlomiej Kocot authored
-
Bartlomiej Wroblewski authored
* Improve formatting of docs; Add a note about the DL_KERNELS flag * Change the recommended version of ROCm to 5.6
-
- 02 Aug, 2023 1 commit
-
-
Po Yen Chen authored
* Enable pipeline v2 opt for layout=TT instance * Use better thread mapping for reading A tile * Conditionally enable pipeline v2 opt * Allow enabling only fp16 gemm instances in profiler * Fix formatting error * Fix compilation error if we enable fp32 in profiler
-
- 01 Aug, 2023 1 commit
-
-
Adam Osewski authored
-
- 27 Jul, 2023 4 commits
-
-
Bartłomiej Kocot authored
* Add s_nops after v_dot to avoid hazard * Fix builtin for inner_produxt fp16 * Skip inline version to builtin * Add comments regarding isa * Fix comment regarding s_nop
-
Adam Osewski authored
-
Adam Osewski authored
-
Adam Osewski authored
-
- 26 Jul, 2023 5 commits
-
-
carlushuang authored
* initial stream-k implementation with example * fix unexpected change in err * improve a little bit performance by reorganize pipeline. * improve perf a little bit by swizzle block idx * add profiler * update example * fix spelling * shrink karg for streamk * support dynamic buffer using memory coherence glc_slc bit from template * control memory coherence while construct dynamic buffer * update reduction for streamk(not ready yet) * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting * fix build issue * fix several bug * now result is correct, everything works (but has scratch) * remove scratch by manually reset coordinate * update device code * fix a bug in final reduce * fix something in example * update async memset * fix enum as camel case * modify coherence enum name * clean code and use atomic streamk by default * remove unused var * throw exception if have empty pointer * fix format * fix CI warning * fix type in init * modify CI error * filter out on gfx10+ * restore changed example code --------- Co-authored-by:Qianfeng Zhang <Qianfeng.Zhang@amd.com>
-
Illia Silin authored
-
Bartłomiej Kocot authored
* Disable XDL kernels on unsupported HW; Add ck::is_xdl_supported function (#765) * Do not throw an error when GEMM problem is not supported. --------- Co-authored-by:
Bartlomiej Wroblewski <bwroblewski10@gmail.com> Co-authored-by:
Adam Osewski <aosewski@amd.com> Co-authored-by:
Illia Silin <98187287+illsilin@users.noreply.github.com>
-
Adam Osewski authored
-
rocking authored
-
- 25 Jul, 2023 3 commits
-
-
Po Yen Chen authored
* Use better ThreadClusterLengths to speed up * Update B tile reading pattern for layout=NN instance
-
Adam Osewski authored
-
ltqin authored
* first change bias load * add bias dim and scalervector parameter * make CDE0BlockTransferSrcVectorDim not work * changse toinstance * add limit for CDE0BlockTransferSrcScalarPerVector
-
- 21 Jul, 2023 3 commits
-
-
Illia Silin authored
-
Illia Silin authored
-
Bartłomiej Kocot authored
-
- 20 Jul, 2023 2 commits
-
-
Adam Osewski authored
-
Adam Osewski authored
-
- 18 Jul, 2023 3 commits
-
-
Bartłomiej Kocot authored
* Grouped 3d conv backward data support * Fix comments
-
Rostyslav Geyyer authored
-
Adam Osewski authored
-