[Perf][Bwd-weights]Lds re-layout to avoid ds read/write bank conflict and...
[Perf][Bwd-weights]Lds re-layout to avoid ds read/write bank conflict and balance ds ops with address calculations (#190)
* add some instance to develop
* avoid bank conflicts for wrw for all instance
* add small K1 test
* delete some unused instance
* reset buffer load oob and ds memcpy to default option
* remove useless instances
* remove redandunt space
* remove printf code
* clang-format-10 change
* fix clang format for the other files
* add bank length computation
* add template to distinguish the instance that need lds padding for wrw
* use rocm5.1 as docker
* use integer value for GEMM test
* 1. move dedicated transform into gridwisegemm's head file. 2. make lds tensor params a struct templete. 3. remove useless code
* use a new gridwise gemm header for bwd-weight
* revert gridwise gemm v2r4r2
* change foramt
* rename kernel invoker
Co-authored-by:
Chao Liu <chao.liu2@amd.com>
Showing
This diff is collapsed.
Please register or sign in to comment