- 30 Mar, 2022 1 commit
-
-
Jianfeng Yan authored
* adding batched_gemm_and_reduction * batched_gemm_reduce works with bactch_count=1 * fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1 * adding profiler for batched_gemm_fp16 * fixed a bug in declaration of d1 and d0; both example and profiler work * clang-format * cleanup * batched_gemm_reduce: add test * minor change * fixed some typo in function names
-
- 22 Mar, 2022 1 commit
-
-
Jianfeng Yan authored
* add bf16 for batched gemm * batched_gemm_bf16 works * recover accidently changed files
-
- 21 Mar, 2022 1 commit
-
-
Jianfeng Yan authored
changed long_index_t to index_t when computing memory offset uncomment other ops in profiler added test for batched_gemm
-
- 11 Feb, 2022 1 commit
-
-
zjing14 authored
* prepare host for batched_gemm * init commit of batched kernels * fixed * refine transform with freeze * m/n padding * fixed a bug; clean * add small tiles * clean * clean code * clean code * add nt, tn, tt layout * add missing file * use StaticBufferTupleOfVector instead * add reference_batched_gemm * fixed a macro
-