• Lei Wang's avatar
    [Bugfix] Fix layout inference for free fragment buffer (#443) · 2ea45ae9
    Lei Wang authored
    * [Enhancement] Improve layout inference accuracy in ParallelOp (#441)
    
    * Added logic to use non-replicated buffers as source buffers for more accurate layout inference.
    * Enhanced comments to clarify the rationale behind buffer selection in layout inference process.
    
    * [Enhancement] Add error handling macros and refactor loop partitioning logic
    
    * Introduced TILELANG_CHECK macro for improved error handling in CUDA and HIP code, providing detailed error messages for kernel launches.
    * Enhanced loop partitioning logic to handle fragment buffers more effectively, ensuring correct replication based on thread extent.
    * Added logging for thread range in PlanLoopPartition to aid in debugging and performance analysis.
    * Updated pass configuration management to streamline vectorization control in the optimization process.
    
    * lint fix
    
    * remove debug print
    2ea45ae9
loop_partition.cc 6.63 KB