"vscode:/vscode.git/clone" did not exist on "111511750379ecd6865eed4d4bfa453b348b0476"
- 27 Aug, 2021 1 commit
-
-
Qianfeng authored
* add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files * make inner product compatible on gfx900 * Update src/include/miopen/solver/ck_utility_common.hpp * compiler parameter use stream * use int instead of index_t in kernel wrapper * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element * Add dynamic generic reduction kernel layer (kernel wrappers, kernel implementations and utilities) * Some updates to dynamic composable kernel facility for the need of dynamic generic reduction * Update to generic reduction C++ host interface layer to support dynamic generic reduction * Update to remove tidy complaints in host interface layer * Change the unary operator form from void op(T &x) to T op(T x) * Update to pass single workspace pointer for all kernels (fix for OpenCL backend) * Use cppcheck-suppress to prevent some strange warnings * Re-use operator [] and () for DynamicBuffer and update to depending codes * Remove useless codes in first call threadwise/warpwise/blockwise kernel wrappers * [performance] Remove un-needed local buffer initialization Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
JD <Jehandad.Khan@amd.com>
-
- 19 Aug, 2021 1 commit
-
-
Chao Liu authored
* Squashed 'src/composable_kernel/' content from commit f6edda61 git-subtree-dir: src/composable_kernel git-subtree-split: f6edda61 * add solver ConvIgemmFwdV6r1DlopsNchwKcyxNkhw; rename static ck source files * Squashed 'src/composable_kernel/' changes from f6edda61..5781adf5 5781adf5 Update develop (#5) (#6) 97e6d514 Merge pull request #4 from ROCmSoftwarePlatform/separate_online_compile 7b1ec41e refactor 49c33aae refactor 54b3e73d rename git-subtree-dir: src/composable_kernel git-subtree-split: 5781adf5 * fix * refactor * remove online compilation from CK * refactor * fix * add ctest * add c-style pointer cast * vector/scalar pointer cast use c-style pointer cast instead of reinterpret_cast * fix clang warning suppression * tidy * suppress cppcheck * fix enum issue * revert chagnes to hip build * fix kernel filename * update CK build script * rename * rename * make innner product compatiable on gfx900 * Update src/include/miopen/solver/ck_utility_common.hpp Co-authored-by:
JD <Jehandad.Khan@amd.com> * compiler parameter use stream * use int instead of index_t in kernel wrapper * DynamicBuffer, StaticBuffer, amd_buffer_load support customized value for invalid element * refactor * refactor * change cmakelist * change ck common utility * fix Co-authored-by:
JD <Jehandad.Khan@amd.com>
-
- 13 Aug, 2021 1 commit
-
-
Chao Liu authored
-
- 09 Aug, 2021 1 commit
-
-
Chao Liu authored
-
- 27 Jul, 2021 1 commit
-
-
Chao Liu authored
* update online kernel wrapper bundle all descriptors in a tuple * change __CONSTANT__ to CONSTANT * rename * adding tuning * added IsValidCompileParameter * reorginze * adding tunable for fp16 and int8 * fix kernel compile warning and bug fixes * suppress warning about cast CONSTANT (address space 4) pointer * fix building issue
-
- 01 Jul, 2021 1 commit
-
-
zjing14 authored
* create files for xdlops * working on blockwise_gemm_xdlops * add KReduction * add m/n repeats * add 2x2 pipeline * added 128x128 wavegemm * use StaticBuffer of vector_type * break vector type to blk_size * add kpack into xldops_gemm and blockwise_gemm * abroadcast only * add fp32 mfma instructions * adding fp16 mfma * pack half4_t * rename kperwave to kpack * add 32x32x8fp16 * add fp16 mfma * clean code * clean code * V4r4 xdlops kpack (#35) * add kpack with incorrect results * bug fix for make_dynamic_naive_tensor_descriptor_aligned_v2 * add 1x1 kernel * add gridwise_gemm_v2 - single_buffer * enabled dwordx4 for fp16 Co-authored-by:
Chao Liu <chao.liu2@amd.com> * refactor fwd-v4r4-xdlops * add v4r4-nhwc-xdlop * improve some perf of nhwc and nchw by tuning parameters, and change scheuduling in gridwise-gemm loop * tweak scheduling in gridwise gemm * add v4r3 with a single output copy * init commit: output with slice win * adding sliceWin * add multiple repeats pattern * starting adding bwd-v4r1-xdlops * use tuple as SrcBuffer * adding bwd-data v4r1 nhwc xdlops * fix bug in make_dynamic_naive_tensor_descriptor_aligned_v2() * fix bug in host bwd-data conv * initial implementation of bwd-data v4r1 nhwc xdlops * add launch bound flags * enable launch bound * add m/nrepeat=4 * tweak bwd-data v4r1 nhwc xdlops * added bwd-data v4r1 nhwc xlops with output A and weight B * add fwd-v4r4 nhwc xdlops, A input, B weight, C output Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
- 12 May, 2021 1 commit
-
-
Chao Liu authored
* Use DynamicBuffer to hold raw pointer (to global and LDS memory) * add workaround for compiler issue (inefficient ISA) of ds_write for int8x4, int8x8, int8x16
-
- 13 Apr, 2021 1 commit
-
-
Chao Liu authored
* overhaul vector_type, make int8x4_t real vector instead of aliasing from int32_t
-
- 07 Apr, 2021 1 commit
-
-
zjing14 authored
* Hybrid direct + implicit GEMM forward convolution NCHWc v5r1. Input tensor bypass LDS. Support fp32/fp16/int8
-
- 25 Mar, 2021 1 commit
-
-
Chao Liu authored
* support dynamic tensor descriptor * use buffer load OOB feature for padding case * add navi support * add int8x4 inference kernel Co-authored-by:
Chao Liu <chao@ixt-rack-81.local.lan> Co-authored-by:
Jing Zhang <jizhan@amd.com>
-