Commits · 34c661e71cc8cf4753843a58786c8f6211ec5e22 · gaoqiong / composable_kernel

30 Mar, 2022 1 commit

Batched gemm and reduction (#156) · 34c661e7

Jianfeng Yan authored Mar 30, 2022

* adding batched_gemm_and_reduction

* batched_gemm_reduce works with bactch_count=1

* fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1

* adding profiler for batched_gemm_fp16

* fixed a bug in declaration of d1 and d0; both example and profiler work

* clang-format

* cleanup

* batched_gemm_reduce: add test

* minor change

* fixed some typo in function names

34c661e7

22 Mar, 2022 1 commit

Batched gemm bf16 (#142) · d91f9f11

Jianfeng Yan authored Mar 22, 2022

* add bf16 for batched gemm

* batched_gemm_bf16 works

* recover accidently changed files

d91f9f11

21 Mar, 2022 1 commit
- refactored deviceBatchedGemm; removed GridwiseBatchedGemm; added fp32 and int8 to profiler (#120) · cb87b049
  Jianfeng Yan authored Mar 21, 2022
```
changed long_index_t to index_t when computing memory offset

uncomment other ops in profiler

added test for batched_gemm
```
  cb87b049
11 Feb, 2022 1 commit

Batched GEMM for fp16 (#79) · b53e9d08

zjing14 authored Feb 11, 2022

* prepare host for batched_gemm

* init commit of batched kernels

* fixed

* refine transform with freeze

* m/n padding

* fixed a bug; clean

* add small tiles

* clean

* clean code

* clean code

* add nt, tn, tt layout

* add missing file

* use StaticBufferTupleOfVector instead

* add reference_batched_gemm

* fixed a macro

b53e9d08