- 09 Aug, 2021 1 commit
-
-
Chao Liu authored
-
- 27 Jul, 2021 1 commit
-
-
Chao Liu authored
* update online kernel wrapper bundle all descriptors in a tuple * change __CONSTANT__ to CONSTANT * rename * adding tuning * added IsValidCompileParameter * reorginze * adding tunable for fp16 and int8 * fix kernel compile warning and bug fixes * suppress warning about cast CONSTANT (address space 4) pointer * fix building issue
-
- 17 Jul, 2021 1 commit
-
-
zjing14 authored
* init for v4r4 xdlops olc * refactor wrap * init impl of v4r4 nchw xdlops olc * tuning * test perf * fixed v4r4 nhwc * tuned v4r4 nhwc * use gridwise_gemm_xdlops_v2r3 * swap a/b * add pointer support into offline v2r3 * debugging v4r4r4 transform for olc * change timer of olc * refactor v4r4 xdlops nchw olc * remove transform fun in v4r4 xdlops nhwc olc Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 08 Jul, 2021 1 commit
-
-
Chao Liu authored
* deprecate static kernels
-
- 05 Jul, 2021 1 commit
-
-
Chao Liu authored
* add threadwise copy the copy a tensor in one copy, added kpack to DL GEMM * add kpack into fwd v4r5 nchw fp32
-
- 10 Jun, 2021 1 commit
-
-
Chao Liu authored
* experimenting magic number division * overhauling fwd-v4r4 to clearly reflect transformation graph * added fwd-v4r5 * bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2 * bug fix and added sanity-check in transform_dynamic_tensor_descriptor * added conv_driver_v2
-
- 12 May, 2021 1 commit
-
-
Chao Liu authored
* Use DynamicBuffer to hold raw pointer (to global and LDS memory) * add workaround for compiler issue (inefficient ISA) of ds_write for int8x4, int8x8, int8x16
-
- 11 May, 2021 1 commit
-
-
Chao Liu authored
* Replace most raw index calculation to coordinate transformation * Overhaul blockwise and threadwise GEMM * Overhaul driver for gridwies GEMM kernel Co-authored-by:Jing Zhang <jizhan@amd.com>
-
- 28 Apr, 2021 1 commit
-
-
Chao Liu authored
* replacing array with tuple and vector for tensor data
-
- 13 Apr, 2021 1 commit
-
-
Chao Liu authored
* initial implementation for magic number division and DynamicMerge_v2_magic_division that uses it * turn off DynamicMerge_v2_magic_division that use magic number division by default
-
- 25 Mar, 2021 1 commit
-
-
Chao Liu authored
* support dynamic tensor descriptor * use buffer load OOB feature for padding case * add navi support * add int8x4 inference kernel Co-authored-by:
Chao Liu <chao@ixt-rack-81.local.lan> Co-authored-by:
Jing Zhang <jizhan@amd.com>
-
- 06 Aug, 2020 1 commit
-
-
Chao Liu authored
* fix buffer_store bug * remove obsolete kernels * add bwd-data-v5r1-nhwc
-
- 24 Jun, 2020 1 commit
-
-
Chao Liu authored
* tuning para, * testing on v100 * add fp16 * remove deprecated tensor descriptor * sync with miopen * update build script Co-authored-by:Jing Zhang <jizhan@amd.com>
-
- 03 Dec, 2019 1 commit
-
-
Chao Liu authored
* enabled atomic add in tensor copy * added gridwise GEMM * added backward data conv using GEMM + atomic * added backward data conv using GEMM, no atomic
-
- 04 Nov, 2019 1 commit
-
-
Chao Liu authored
-
- 11 Oct, 2019 1 commit
-
-
Chao Liu authored
Refactor, so can bring multi-index transformation and padding support into MIOpen
-
- 22 Sep, 2019 1 commit
-
-
Chao Liu authored
done: explicitly separate offset component into compile-time, block-invariant and per-thread components. Experimenting
-
- 21 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 10 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 09 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 02 Sep, 2019 1 commit
-
-
Chao Liu authored
-
- 19 Jun, 2019 2 commits
- 13 Jun, 2019 1 commit
-
-
Chao Liu authored
-