• Lei Wang's avatar
    [Layout] Introduce Flexible Parallel to Support T.serial and local buffers... · c382dcbc
    Lei Wang authored
    
    [Layout] Introduce Flexible Parallel to Support T.serial and local buffers inside T.Parallel loop (#844)
    
    * Support T.serial and local buffers inside T.Parallel loop.
    
    * Fix reducer layout in T.Parallel nested inside other loops
    
    * Debug output with LOG(INFO)
    
    * Add disable option for WGMMA.
    
    * fix
    
    * Use DLOG; fix missing registration for new pass config
    
    * bug fix
    
    * lint fix
    
    * Enhance GEMM instruction set with UTCMMA and improve local buffer handling in casting example
    
    * Update format.sh shebang, improve logging in layout inference, and enhance buffer store wrapper with detailed comments
    
    * Enhance GEMM instantiation logic and improve layout inference for local buffer detection
    
    - Updated the GEMM instantiation logic to include a check for WGMMA compatibility, ensuring that the conditions for using WGMMA are more robust.
    - Refined the layout inference process to better identify when loops manipulate only local buffers, improving the accuracy of thread binding decisions in parallel loops.
    
    ---------
    Co-authored-by: default avatarHuanqi Cao <caohuanqi@deepseek.com>
    c382dcbc
builtin.h 10.3 KB