"example/01_gemm/gemm_xdl_fp64.cpp" did not exist on "22d438ae9e02b674e1656263df07799ee06fb466"
  1. 24 Nov, 2021 1 commit
  2. 15 Nov, 2021 1 commit
    • Chao Liu's avatar
      FP16 data in-register transpose (#41) · b491ebf3
      Chao Liu authored
      * start fixing 16bit data packing
      
      * adding StaticTensor
      
      * adding StaticTensor
      
      * adding StaticTensor
      
      * add missing constexpr
      
      * adding static tensor
      
      * adding static tensor
      
      * adding transpose
      
      * add inline asm for transpose 2x2 of half_t
      
      * add general transpose_vectors(), but have unnecessary register initialization using v_mov
      
      * fix unnecessary register initialization in transpose_vector by using more pass-by-reference
      
      * add hardcoded logic for NHWC wrw
      
      * improve asm for v_pack
      
      * make ThreadwiseTensorSliceTransfer_v3r2 support any tensor
      
      * tweak
      
      * reorganize file
      b491ebf3