• Lei Wang's avatar
    [Language] Expose `T.warpgroup_fence_operand` for nvcc code motion (#986) · aef0a6bb
    Lei Wang authored
    
    
    * remove debug print
    
    * pipeline fix
    
    * use the correct buffer access scope
    
    * rs support
    
    * warp warpgroup_fence_operand
    
    * fix
    
    * fp8 dtype ptx enhance
    
    * mma fix
    
    * TCGEN05 Interface
    
    * tcgen05 support
    
    * rebase
    
    * update
    
    * Enhance TCGEN05 support by adding new intrinsic operations and descriptors. Introduced `ptx_tcgen05_mma_ts` for tensor-memory to shared-memory instructions and `tcgen05_mma_arrive` for signaling barrier completion. Updated existing descriptors and code generation logic to accommodate these changes, ensuring compatibility with new instruction sets. Refactored related allocation functions and improved handling of shared memory descriptors.
    
    * lint fix
    
    * Refactor buffer reference handling in CUDA code generation and update test execution in tilelang. Ensure default annotations for unrolling are set correctly in TIR IR module.
    
    * wgmma fix
    
    ---------
    Co-authored-by: default avatarZhiwen Mo <zm125@ic.ac.uk>
    aef0a6bb
layout.cc 20.5 KB