1. 15 Nov, 2021 2 commits
    • zjing14's avatar
      Add bfp16/int8 support into XDL GEMM operator (#50) · 3737bb03
      zjing14 authored
      
      
      * init StaticBufferV2
      
      * clean
      
      * adopt old output stage for staticBufferV2
      
      * clean
      
      * remove hack
      
      * clean
      
      * clean
      
      * add parameters
      
      * clean code
      
      * move c_buffer alloc into blockwise gemm
      
      * add adaptors for m/n_thread_data_on_grid
      
      * tweak gemm
      
      * adjust blockwise_gemm_xdlops
      
      * tweak
      
      * update conv
      
      * update script
      
      * adding bwd 1x1
      
      * update script
      
      * adding 1x1 bwd
      
      * debugging bwd 1x1 failure
      
      * update script
      
      * update script
      
      * test
      
      * test v100
      
      * add bf16_1k
      
      * clang-format
      
      * clean
      
      * add bfp16 for gfx908
      
      * add verification
      
      * clean up
      
      * clean code
      
      * restore bfl16
      
      * clean
      
      * add bfp16 support into gemm_driver
      
      * apply new generator to other drivers
      
      * add int8 support
      
      * cleanb
      
      * clean
      
      * clean
      
      * clean
      Co-authored-by: default avatarChao Liu <chao.liu2@amd.com>
      Co-authored-by: default avatarChao Liu <lc.roy86@gmail.com>
      Co-authored-by: default avatarroot <root@hayabusa6111.amd.com>
      3737bb03
    • Chao Liu's avatar
      FP16 data in-register transpose (#41) · b491ebf3
      Chao Liu authored
      * start fixing 16bit data packing
      
      * adding StaticTensor
      
      * adding StaticTensor
      
      * adding StaticTensor
      
      * add missing constexpr
      
      * adding static tensor
      
      * adding static tensor
      
      * adding transpose
      
      * add inline asm for transpose 2x2 of half_t
      
      * add general transpose_vectors(), but have unnecessary register initialization using v_mov
      
      * fix unnecessary register initialization in transpose_vector by using more pass-by-reference
      
      * add hardcoded logic for NHWC wrw
      
      * improve asm for v_pack
      
      * make ThreadwiseTensorSliceTransfer_v3r2 support any tensor
      
      * tweak
      
      * reorganize file
      b491ebf3
  2. 06 Oct, 2021 1 commit
    • Chao Liu's avatar
      Tweak GEMM kernel (#38) · b3e8d57d
      Chao Liu authored
      * add parameters
      
      * tweak gemm
      
      * tweak
      
      * update conv
      
      * update script
      
      * adding bwd 1x1
      
      * update script
      
      * adding 1x1 bwd
      
      * debugging bwd 1x1 failure
      
      * update script
      
      * update script
      
      * test
      
      * test v100
      
      * clean up
      b3e8d57d
  3. 05 Sep, 2021 1 commit
    • Chao Liu's avatar
      GEMM driver and kernel (#29) · 19613902
      Chao Liu authored
      * add gemm driver
      
      * tweak
      
      * add gemm kernel: mk_kn_mn and km_kn_mn
      
      * tweak
      
      * add GEMM km_nk_mn
      
      * fix comment
      19613902