1. 11 Jun, 2025 1 commit
    • Chenggang Zhao's avatar
      Support Ampere architecture (#204) · b8d90fb7
      Chenggang Zhao authored
      * Update README
      
      * Update `setup.py`
      
      * Fix headers
      
      * Add `DISABLE_NVSHMEM` for APIs
      
      * Fix launch
      
      * Fix TMA settings
      
      * Fix TMA usages
      
      * Fix dlink
      
      * Separate layout kernels
      
      * Update version
      
      * Add `is_sm90_compiled`
      
      * Fix tests
      
      * Add NVLink connection checks
      
      * Update README
      
      * Fix tests
      
      * Add some comments
      
      * Minor fix
      
      * Minor fix
      
      * Fix bugs
      b8d90fb7
  2. 10 Jun, 2025 1 commit
  3. 09 Jun, 2025 1 commit
  4. 06 Jun, 2025 1 commit
    • Chenggang Zhao's avatar
      Use TMA instead of LD/ST for intra-node normal kernels (#191) · c8dceba1
      Chenggang Zhao authored
      * Update CMake files
      
      * Use TMA instead of LD/ST for intranode dispatch
      
      * Use TMA instead of LD/ST for intranode combine
      
      * Adjust configs
      
      * Test default configs as well
      
      * More warps for combine
      
      * Add inter-thread fence
      
      * Enable more warps
      
      * Do not use TMA for senders
      
      * Update configs
      
      * Remove useless wait
      c8dceba1
  5. 27 Feb, 2025 1 commit
  6. 25 Feb, 2025 1 commit