- 12 May, 2022 1 commit
-
-
JD authored
* Add host API * manually rebase on develop * clean * manually rebase on develop * exclude tests from all target * address review comments * update client app name * fix missing lib name * clang-format update * refactor * refactor * refactor * refactor * refactor * fix test issue * refactor * refactor * refactor * upate cmake and readme Co-authored-by:Chao Liu <chao.liu2@amd.com>
-
- 07 Apr, 2022 1 commit
-
-
Jianfeng Yan authored
* forgot passing BatchedCount in some profiler_batched_gemm * delete default BatchCount
-
- 24 Mar, 2022 1 commit
-
-
Chao Liu authored
* add gridwise gemm v4r1 * rename * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * use sfc in shuffling * remove hardcode * remove hardcode * refactor * fix build * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * adding gemm+reduce * format * clean * adding gemm+reduce * adding profiler for gemm+reduce * adding gemm+reduce profiler * fix build * clean up * gemm+reduce * fix build * update DeviceGemm_Xdl_CShuffle; update enum to enum class * clean up * add test for gemm+reduce * clean up * refactor * fix build * fix build
-
- 22 Mar, 2022 1 commit
-
-
Jianfeng Yan authored
* add bf16 for batched gemm * batched_gemm_bf16 works * recover accidently changed files
-
- 21 Mar, 2022 1 commit
-
-
Jianfeng Yan authored
changed long_index_t to index_t when computing memory offset uncomment other ops in profiler added test for batched_gemm
-
- 11 Feb, 2022 1 commit
-
-
zjing14 authored
* prepare host for batched_gemm * init commit of batched kernels * fixed * refine transform with freeze * m/n padding * fixed a bug; clean * add small tiles * clean * clean code * clean code * add nt, tn, tt layout * add missing file * use StaticBufferTupleOfVector instead * add reference_batched_gemm * fixed a macro
-