• zjing14's avatar
    Batched GEMM for fp16 (#79) · b53e9d08
    zjing14 authored
    * prepare host for batched_gemm
    
    * init commit of batched kernels
    
    * fixed
    
    * refine transform with freeze
    
    * m/n padding
    
    * fixed a bug; clean
    
    * add small tiles
    
    * clean
    
    * clean code
    
    * clean code
    
    * add nt, tn, tt layout
    
    * add missing file
    
    * use StaticBufferTupleOfVector instead
    
    * add reference_batched_gemm
    
    * fixed a macro
    b53e9d08
profiler.cpp 2.09 KB