Batched GEMM for fp16 (#79)
* prepare host for batched_gemm * init commit of batched kernels * fixed * refine transform with freeze * m/n padding * fixed a bug; clean * add small tiles * clean * clean code * clean code * add nt, tn, tt layout * add missing file * use StaticBufferTupleOfVector instead * add reference_batched_gemm * fixed a macro
Showing
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment