• Adam Osewski's avatar
    Grouped GEMM Multiple D tile loop. (#1247) · b4032629
    Adam Osewski authored
    * Overload output stream operator for LoopScheduler and PiplineVersion
    
    * Add Run overload accepting grid descriptors MK.
    
    * Add __device__ keyword for CalculateGridSize
    
    * Create device op GroupedGemmMultipleD
    
    * Add GroupedGemm MultipleD Tile Loop implementation.
    
    * Add an example for GroupedGemm MultipleD tile loop.
    
    * Device Op GroupedGEMMTileLoop.
    
    * Bunch of small changes in exmaple.
    
    * CkProfiler
    
    * Remove unused tparam.
    
    * Fix include statement.
    
    * Fix output stream overloads.
    
    * Do not make descriptors and check validity untill we find group.
    
    * Fix gemm desc initialization.
    
    * Revert device op
    
    * Fix compilation for DTYPES=FP16
    
    * Validate tensor transfers paramters.
    
    * Validate on host only NK dims if M is not known.
    
    * Fix bug.
    
    * A convenient debug func for selecting threads.
    
    * Fix has main k block loop bug.
    
    * Make sure that b2c has up to date tile offset.
    
    * Output stream operator for Sequence type.
    
    * Cmake file formatting.
    b4032629
debug.hpp 2.61 KB