• Adam Osewski's avatar
    Polished Grouped GEMM APIs and new BF16 instances (#1600) · 061ac064
    Adam Osewski authored
    * Few small fixes.
    
    * New GroupedGemm instances (BF16)
    
    * Unify and refactor GroupedGEMM device API.
    
    * Adapt changes to new API.
    
    * Adapt grouped gemm profiler.
    
    * Accept multiple kbatches for grouped gemm profiler.
    
    - delete obsolete two stage as it is now covered by grouped gemm
    
    * Update unit test for grouped gemm.
    
    * Fix thresholds for BF16 and F8. Unblock tests.
    
    * Fix few instances.
    
    * Multiple small fixes.
    
    * Adapt to new API, check dynamic casting.
    
    * Uncomment few data types in grouped gemm profiler.
    
    * Fix call to SetDeviceArgs.
    
    * Fix profile grouped gemm multiply tile loop.
    
    * Fix grouped gemm tile loop kernel args in client examples.
    
    * Review comments.
    061ac064
profile_grouped_gemm.cpp 18.2 KB