Add xdlops v4r4r4 into online compilation (#48)
* init for v4r4 xdlops olc
* refactor wrap
* init impl of v4r4 nchw xdlops olc
* tuning
* test perf
* fixed v4r4 nhwc
* tuned v4r4 nhwc
* use gridwise_gemm_xdlops_v2r3
* swap a/b
* add pointer support into offline v2r3
* debugging v4r4r4 transform for olc
* change timer of olc
* refactor v4r4 xdlops nchw olc
* remove transform fun in v4r4 xdlops nhwc olc
Co-authored-by:
Chao Liu <chao.liu2@amd.com>
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment