"vscode:/vscode.git/clone" did not exist on "eee5ca5f5bd207dbf061d64ab550002984d08361"
  • Anthony Chang's avatar
    Manual control of MAC cluster for improved interwave performance (#184) · 76764d8c
    Anthony Chang authored
    * manual control of MAC cluster for improved 2-wave performance
    
    ensure setprio's order; ensure inner loop size >= local read size
    
    synchronize when single mac cluster
    
    * format
    
    * use value field from ck::integral_constant
    
    * roll out inter-wave loop scheduler to c-shuffle gemm variants
    
    will gradually roll out to other applicable device ops when occasional reg spill is resolved
    
    * additional comments
    
    * format
    
    * fix mismatch between inter-wave pipeline and interwave blockwise gemm
    
    * address review feedback
    
    * amend
    76764d8c
config.hpp 4.99 KB