• Shucai Xiao's avatar
    Remove gemm copy and simplify rocblas call (#356) · a0f9b785
    Shucai Xiao authored
    * Remove extra copy in gemm
    
    * combine rocblas gemm call
    
    * clang format
    
    * fix a bug in calling rocblas function
    
    * clang format'
    
    * backup of temporary changes
    
    * clang format
    
    * unify the gemm call to avoid multiple gpu implemantation
    
    * clang format
    
    * remove unnecessary code
    
    * backup temp changes
    
    * clang format
    
    * fix cppcheck error
    
    * code backup
    
    * clang format
    
    * remove unnecessary synchronization function
    
    * clang format
    
    * fix bugs
    
    * clang format
    
    * more optimization related to gemm
    
    * clang format
    
    * code cleanup
    
    * implementation that can achieves better performance
    
    * clang format
    
    * temp changes to try performance
    
    * clang format
    
    * revert to previous commits
    
    * fixed review comments
    
    * clang format
    
    * fix review comments
    a0f9b785
gemm_impl.cpp 5.55 KB