• Chao Liu's avatar
    Add gridwise GEMM pipeline (#89) · 22d438ae
    Chao Liu authored
    * clean up
    
    * add mutilple thread scratch to ThreadwiseTensorSliceTransfer_v3r1
    
    * add 2 stage prefetch
    
    * add more sanity check into transform_tensor_descriptor
    
    * tweak
    
    * enabling 2 stage prefetch to exsiting gridwise gemm; tweak
    
    * enabling 2 stage prefetch to exsiting gridwise gemm
    
    * move gridwise gemm pipeline in class; clean up
    
    * add some irregular tile size
    
    * update CalculateHasMainK0BlockLoop for multi-stage-prefetch
    
    * refactor gridwise gemm pipeline class
    22d438ae
device_gemm_xdl.hpp 20.2 KB