• Lei Wang's avatar
    [Language] Support explicit programming for identified warp groups (#445) · 6972aed7
    Lei Wang authored
    * [Refactor] Update KernelLaunch to clarify CPU and GPU kernel launch logic
    
    * Added comments to distinguish between CPU and GPU kernel launch sections for better code readability.
    * Changed the creation of empty blocks to use a consistent "root" identifier, enhancing clarity in frame management.
    
    * [Refactor] Rename operations for consistency in lower_hopper_intrin and related files
    
    * Updated function names from CamelCase to snake_case for better consistency across the codebase.
    * Refactored calls to `CreateTMADescriptorOp`, `CreateListofMBarrierOp`, and similar functions to their new names: `create_tma_descriptor`, `create_list_of_mbarrier`, etc.
    * Adjusted corresponding test cases to reflect these changes, ensuring compatibility with the new naming conventions.
    
    * [Refactor] Rename operations to snake_case for consistency
    
    * Updated function names from CamelCase to snake_case across various files, including `CreateTMADescriptorOp` to `create_tma_descriptor`, `GetMBarrierOp` to `get_mbarrier`, and others.
    * Adjusted corresponding calls and definitions in the codebase to reflect these naming changes, ensuring uniformity and improved readability.
    * Enhanced layout inference and loop partitioning logic to accommodate the new naming conventions.
    
    * [Feature] Introduce Warp Specialization and Eliminate Storage Sync for MBarrier
    
    * Added a new example `gemm_ws.py` demonstrating matrix multiplication with warp specialization using TileLang.
    * Implemented `WarpSpecializeFrame` and `WarpSpecialize` functionality to manage warp group indices in TIR frames.
    * Introduced `EliminateStorageSyncForMBarrier` transformation to optimize storage synchronization in mbarrier regions.
    * Enhanced the TileLang API with new methods for retrieving block and thread extents.
    * Updated the `LowerAndLegalize` and `OptimizeForTarget` functions to incorporate the new transformation.
    * Improved layout inference and kernel launch logic for better performance and clarity.
    
    * [Refactor] Clean up code formatting and improve readability
    
    * Added blank lines for better separation of code blocks in `gemm_ws.py`, `phase.py`, `kernel.py`, and `warpgroup.py`.
    * Reformatted the `tilelang.compile` call in `gemm_ws.py` for improved clarity.
    * Updated comments in `warpgroup.py` to clarify the availability of the `WarpSpecialize` function for NVIDIA GPUs.
    * Ensured consistent spacing and formatting across multiple files to enhance overall code readability.
    
    * lint fix
    
    * [Refactor] Update mbarrier functions for improved clarity and consistency
    
    * Refactored `mbarrier_wait_parity` and `mbarrier_arrive` functions in `builtin.py` to accept explicit parameters for better readability.
    * Updated calls in `gemm_ws.py` to use the new function signatures, enhancing code clarity.
    * Adjusted `warpgroup.py` to remove unused thread extent variable, streamlining the code.
    * Added detailed docstrings to clarify usage examples for memory barrier functions.
    
    * Added blank lines in `mbarrier_wait_parity` and `mbarrier_arrive` functions in `builtin.py` for improved code readability and separation of logical sections.
    6972aed7
builtin.h 5.95 KB