"vscode:/vscode.git/clone" did not exist on "0d8e8e14274ef5255e8d326f746922c07719862a"
- 15 Nov, 2021 6 commits
-
-
Chao Liu authored
-
zjing14 authored
* init StaticBufferV2 * clean * adopt old output stage for staticBufferV2 * clean * remove hack * clean * clean * add parameters * clean code * move c_buffer alloc into blockwise gemm * add adaptors for m/n_thread_data_on_grid * tweak gemm * adjust blockwise_gemm_xdlops * tweak * update conv * update script * adding bwd 1x1 * update script * adding 1x1 bwd * debugging bwd 1x1 failure * update script * update script * test * test v100 * add bf16_1k * clang-format * clean * add bfp16 for gfx908 * add verification * clean up * clean code * restore bfl16 * clean * add bfp16 support into gemm_driver * apply new generator to other drivers * add int8 support * cleanb * clean * clean * clean Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Chao Liu <lc.roy86@gmail.com> Co-authored-by:
root <root@hayabusa6111.amd.com>
-
Jing Zhang authored
-
Jing Zhang authored
-
Chao Liu authored
* start fixing 16bit data packing * adding StaticTensor * adding StaticTensor * adding StaticTensor * add missing constexpr * adding static tensor * adding static tensor * adding transpose * add inline asm for transpose 2x2 of half_t * add general transpose_vectors(), but have unnecessary register initialization using v_mov * fix unnecessary register initialization in transpose_vector by using more pass-by-reference * add hardcoded logic for NHWC wrw * improve asm for v_pack * make ThreadwiseTensorSliceTransfer_v3r2 support any tensor * tweak * reorganize file
-
Jing Zhang authored
-
- 14 Nov, 2021 1 commit
-
-
Chao Liu authored
* add DeviceGemmXdl * update script * fix naming issue * fix comment * output HostTensorDescriptor * rename * padded GEMM for fwd v4r4r4 nhwc * refactor * refactor * refactor * adding ckProfiler * adding ckProfiler * refactor * fix tuning parameter bug * add more gemm instances * add more fp16 GEMM instances * fix profiler driver * fix bug in tuning parameter * add fp32 gemm instances * small fix * refactor * rename * refactor gemm profiler; adding DeviceConv and conv profiler * refactor * fix * add conv profiler * refactor * adding more GEMM and Conv instance * Create README.md Add build instruction for ckProfiler * Create README.md Add Readme for gemm_xdl example * Update README.md Remove build instruction from top most folder * Update README.md * clean up
-
- 01 Nov, 2021 1 commit
-
-
Jing Zhang authored
-
- 29 Oct, 2021 6 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 28 Oct, 2021 2 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
- 27 Oct, 2021 6 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
ltqin authored
* change method computering kpad * remove unusing variable: batchlen * change KPerBlock to K0PerBlock * fix bug for k0 == k0perblock * fix bug for get k0 index * use math::integer_divide_ceil Co-authored-by:
ltqin <letaoqin@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com>
-
Chao Liu authored
update ck from miopen ck_upstream
-
ltqin authored
-
Jing Zhang authored
-
- 26 Oct, 2021 1 commit
-
-
Jun Liu authored
Merge pull request #1236 from ROCmSoftwarePlatform/develop
-
- 22 Oct, 2021 1 commit
-
-
Jing Zhang authored
-
- 21 Oct, 2021 1 commit
-
-
Chao Liu authored
-
- 19 Oct, 2021 2 commits
-
-
Chao Liu authored
-
ltqin authored
* add add new algorithm from v4r4r2 * program once issue * add split k functiion * redefine code * add a matrix unmerge * add b matrix unmerge k0 * trans a and b to gridegemm * nhwc init * no hacks and vector load * add hacks * modify some parameter * fix tuning prometer for fp32 * fix tuning prometer for fp16 * start change gridwise k split * init ok * revome a b matrix k0mk1 desc in grid * carewrite lculate gridsize * add kbatch to CalculateBottomIndex * remove some unused funtion * add clear data function before call kernel * out hacks * in hacks * rename device convolution file and function name * modify kBatch value * fix some tuning code * start from v4r4 nhwc * nhwc atomic is able to run * just for fp32 * enable nchw atomic * tweak * tweak * re-arrange gridwise gemm hot loop for wrw * add wrw v4r5 * v4r4r5 fp16 * v4r4r4 fp16 * v4r4r2 fp16 * V4R4R4XDLNHWC fp16 * V4R4R2XDLATOMICNCHW fp16 * adjust for fp16 * input gridsize * change kbatch to gridsize * testing wrw * clean up * k_batch to gridsize * fix bug * wrw v4r4r4 kbatch change to gride size * wrw v4r4r2 kbatch change to gride size * after merge , change gridwise gemm v2r4 * change MakeCBlockClusterAdaptor * other method use new gridwise gemm * clean up * chapad method nge to make_right_pad_transform * kbatch out from transform function * clean up and fix bug * fix bug * using function type reduce template parameters * using auto replace define fuction type * clean up Co-authored-by:
ltqin <letaoqin@amd.com> Co-authored-by:
Chao Liu <chao.liu2@amd.com> Co-authored-by:
Jing Zhang <jizhan@amd.com>
-
- 15 Oct, 2021 3 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 14 Oct, 2021 5 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
Jing Zhang authored
-
- 13 Oct, 2021 1 commit
-
-
Jing Zhang authored
-
- 12 Oct, 2021 2 commits
-
-
Jing Zhang authored
-
Jing Zhang authored
-
- 11 Oct, 2021 1 commit
-
-
Jing Zhang authored
-
- 10 Oct, 2021 1 commit
-
-
Jing Zhang authored
-