• Chenggang Zhao's avatar
    Use TMA instead of LD/ST for intra-node normal kernels (#191) · c8dceba1
    Chenggang Zhao authored
    * Update CMake files
    
    * Use TMA instead of LD/ST for intranode dispatch
    
    * Use TMA instead of LD/ST for intranode combine
    
    * Adjust configs
    
    * Test default configs as well
    
    * More warps for combine
    
    * Add inter-thread fence
    
    * Enable more warps
    
    * Do not use TMA for senders
    
    * Update configs
    
    * Remove useless wait
    c8dceba1
test_intranode.py 13.1 KB