• carlushuang's avatar
    initial stream-k implementation with example (#699) · e7dca79d
    carlushuang authored
    
    
    * initial stream-k implementation with example
    
    * fix unexpected change in err
    
    * improve a little bit performance by reorganize pipeline.
    
    * improve perf a little bit by swizzle block idx
    
    * add profiler
    
    * update example
    
    * fix spelling
    
    * shrink karg for streamk
    
    * support dynamic buffer using memory coherence glc_slc bit from template
    
    * control memory coherence while construct dynamic buffer
    
    * update reduction for streamk(not ready yet)
    
    * Add template parameter to make_dynamic_buffer to support amd_buffer coherence setting
    
    * fix build issue
    
    * fix several bug
    
    * now result is correct, everything works (but has scratch)
    
    * remove scratch by manually reset coordinate
    
    * update device code
    
    * fix a bug in final reduce
    
    * fix something in example
    
    * update async memset
    
    * fix enum as camel case
    
    * modify coherence enum name
    
    * clean code and use atomic streamk by default
    
    * remove unused var
    
    * throw exception if have empty pointer
    
    * fix format
    
    * fix CI warning
    
    * fix type in init
    
    * modify CI error
    
    * filter out on gfx10+
    
    * restore changed example code
    
    ---------
    Co-authored-by: default avatarQianfeng Zhang <Qianfeng.Zhang@amd.com>
    e7dca79d
common.hpp 4.61 KB