"csrc/vscode:/vscode.git/clone" did not exist on "76bb5d10cee439a8c6ca3ae5f53463c955cd8822"
  • Chenggang Zhao's avatar
    Canonicalize TMA usages (#410) · 2012e310
    Chenggang Zhao authored
    * Remove redundant TMA flushes
    
    * Less barrier initialization overhead
    
    * Simplify `elect_one_sync`
    
    * Use `elect_one_sync` instead of lanes
    
    * Minor fix
    
    * Polish testing prints
    
    * Refactor for internode kernels
    
    * Better performance
    2012e310
utils.cuh 22.6 KB