1. 31 Mar, 2022 2 commits
  2. 30 Mar, 2022 1 commit
    • Jianfeng Yan's avatar
      Batched gemm and reduction (#156) · 34c661e7
      Jianfeng Yan authored
      * adding batched_gemm_and_reduction
      
      * batched_gemm_reduce works with bactch_count=1
      
      * fix a bug in grid_size; batched_gemm_reduce works for batch_count > 1
      
      * adding profiler for batched_gemm_fp16
      
      * fixed a bug in declaration of d1 and d0; both example and profiler work
      
      * clang-format
      
      * cleanup
      
      * batched_gemm_reduce: add test
      
      * minor change
      
      * fixed some typo in function names
      34c661e7
  3. 22 Mar, 2022 1 commit
  4. 21 Mar, 2022 1 commit
  5. 11 Feb, 2022 1 commit
    • zjing14's avatar
      Batched GEMM for fp16 (#79) · b53e9d08
      zjing14 authored
      * prepare host for batched_gemm
      
      * init commit of batched kernels
      
      * fixed
      
      * refine transform with freeze
      
      * m/n padding
      
      * fixed a bug; clean
      
      * add small tiles
      
      * clean
      
      * clean code
      
      * clean code
      
      * add nt, tn, tt layout
      
      * add missing file
      
      * use StaticBufferTupleOfVector instead
      
      * add reference_batched_gemm
      
      * fixed a macro
      b53e9d08