1. 15 Nov, 2021 1 commit
    • Chao Liu's avatar
      FP16 data in-register transpose (#41) · b491ebf3
      Chao Liu authored
      * start fixing 16bit data packing
      
      * adding StaticTensor
      
      * adding StaticTensor
      
      * adding StaticTensor
      
      * add missing constexpr
      
      * adding static tensor
      
      * adding static tensor
      
      * adding transpose
      
      * add inline asm for transpose 2x2 of half_t
      
      * add general transpose_vectors(), but have unnecessary register initialization using v_mov
      
      * fix unnecessary register initialization in transpose_vector by using more pass-by-reference
      
      * add hardcoded logic for NHWC wrw
      
      * improve asm for v_pack
      
      * make ThreadwiseTensorSliceTransfer_v3r2 support any tensor
      
      * tweak
      
      * reorganize file
      b491ebf3
  2. 06 Oct, 2021 1 commit
    • Chao Liu's avatar
      Tweak GEMM kernel (#38) · b3e8d57d
      Chao Liu authored
      * add parameters
      
      * tweak gemm
      
      * tweak
      
      * update conv
      
      * update script
      
      * adding bwd 1x1
      
      * update script
      
      * adding 1x1 bwd
      
      * debugging bwd 1x1 failure
      
      * update script
      
      * update script
      
      * test
      
      * test v100
      
      * clean up
      b3e8d57d
  3. 05 Sep, 2021 1 commit
    • Chao Liu's avatar
      GEMM driver and kernel (#29) · 19613902
      Chao Liu authored
      * add gemm driver
      
      * tweak
      
      * add gemm kernel: mk_kn_mn and km_kn_mn
      
      * tweak
      
      * add GEMM km_nk_mn
      
      * fix comment
      19613902