• Wenhao Xie's avatar
    [Enhancement] Add eviction policy support for TMA operations, enhance CUDA... · 6664d170
    Wenhao Xie authored
    [Enhancement] Add eviction policy support for TMA operations, enhance CUDA codegen, and introduce new pass config (#690)
    
    * Enhance TMA and barrier handling in CUDA code generation
    
    - Updated `CodeGenTileLangCUDA` to support eviction policies for TMA operations, allowing for more flexible memory management.
    - Introduced a new `CacheHintSm90` enum to define eviction strategies in `copy_sm90.h`.
    - Modified TMA load/store functions to accept eviction policies, improving performance on different architectures.
    - Enhanced `TmaBarrierCollector` and `TmaBarrierRewriter` to account for SIMT copies, ensuring correct barrier insertion.
    - Refactored thread synchronization logic to utilize barrier IDs, improving the efficiency of partial thread synchronization.
    - Updated Python interface for `copy` and `c2d_im2col` to include optional eviction policy parameters, enhancing usability.
    
    * update shuffle and elect optimization
    
    * fix bug
    
    * fix bug
    
    * fix potential bug
    
    * lint fix
    
    * lint fix
    
    * update shuffle_elect template
    
    * fix bug
    
    * fix bug
    
    * fix template
    
    * lint and fix
    
    * fix typo
    6664d170
builtin.cc 5.6 KB