- 07 Oct, 2021 1 commit
-
-
Jing Zhang authored
-
- 06 Oct, 2021 3 commits
-
-
Qianfeng authored
* Tiny fix in using data type template parameters in blockwise and direct_threadwise kernel * Fix with regard to implementing GetZeroVal() in both kernel and host * Avoid convert to compType from dstDataType before writting the output value * Add half_t support to NumericLimits and make constexpr GetZeroVal() of binary operator * Add CONSTANT decorator for descriptor read buffer * Use get_thread_local_1d_id() for thread local Id * Rename GetZeroVal() to GetReductionZeroVal() in the kernels * Remove constexpr from initialized zeroVal and tiny fix in reduction_operator.hpp * Occasional tiny simplification and update in the kernel files * Update to re-order tensor dimensions on the host, split second_call kernel wrapper files and simplify reduce_all kernel wrappers * Update to remove OpenCL tidy checking failures * Update for better readability * Remove unused codes and not-needed template parameters in the kernel wrappers Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
Chao Liu authored
* add parameters * tweak gemm * tweak * update conv * update script * adding bwd 1x1 * update script * adding 1x1 bwd * debugging bwd 1x1 failure * update script * update script * test * test v100 * clean up
-
zjing14 authored
* init StaticBufferV2 * clean * adopt old output stage for staticBufferV2 * clean * remove hack * clean * clean * clean code * move c_buffer alloc into blockwise gemm * add adaptors for m/n_thread_data_on_grid * adjust blockwise_gemm_xdlops * reorder ops in GEMM hot loop Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 04 Oct, 2021 2 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
- 02 Oct, 2021 4 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 01 Oct, 2021 1 commit
-
-
Jing Zhang authored
-
- 30 Sep, 2021 1 commit
-
-
Jing Zhang authored
-
- 29 Sep, 2021 1 commit
-
-
Jing Zhang authored
-
- 21 Sep, 2021 3 commits
- 17 Sep, 2021 1 commit
-
-
Jing Zhang authored
-
- 15 Sep, 2021 4 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 14 Sep, 2021 2 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
- 13 Sep, 2021 3 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 12 Sep, 2021 2 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
- 11 Sep, 2021 3 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 10 Sep, 2021 2 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
- 09 Sep, 2021 3 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 08 Sep, 2021 1 commit
-
-
Jing Zhang authored
-
- 05 Sep, 2021 2 commits
- 31 Aug, 2021 1 commit
-
-
ltqin authored
* start * modify transformat * modify device convolutiion * modify host * added host conv bwd and wrw * remove bwd, seperate wrw * clean * hacall k to zero * out log * fixed * fixed * change to (out in wei) * input hack * hack to out * format * fix by comments * change wei hacks(wei transform has not merge) * fix program once issue * fix review comment * fix vector load issue * tweak Co-authored-by:
ltqin <letaoqin@amd.com> Co-authored-by:
Jing Zhang <jizhan@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-