• Lei Wang's avatar
    [TileOp] Introduce a experimental python defined `T.gemm_v2` (#793) · 91a7bb2b
    Lei Wang authored
    * Refactor GEMM and GEMM-SP operations to enhance clarity and maintainability
    
    - Removed deprecated prime factorization functions from `gemm.cc` and `gemm_sp.cc`.
    - Introduced a new `GemmWarpPolicy` class to manage warp policy attributes and methods, improving encapsulation.
    - Updated reflection methods to include the new policy structure, ensuring proper registration and introspection capabilities.
    - Enhanced `GetArchInt` function in `utils.cc` for better readability and type safety.
    - Added new `gemm_v2` function in `gemm.py` for improved GEMM operation with additional parameters and checks.
    
    * Refactor GEMM and frontend legalize operations for improved clarity and functionality
    
    - Updated `gemm_py.h` to include the correct header for GEMM operations.
    - Renamed `FrontendLegalizer` class to `LetInliner` and updated related methods to reflect this change, enhancing code clarity.
    - Modified the pass function from `FrontendLegalize` to `LetInline` for better alignment with its purpose.
    - Updated test cases to utilize the new `gemm_v2` function and adjusted the testing framework for improved output and clarity.
    - Removed obsolete test file `test_tilelang_transform_frontend_legalize.py` to streamline the test suite.
    - Enhanced the `LowerAndLegalize` function to utilize the new `LetInline` pass, improving the overall transformation process.
    
    * Enhance CUDA code generation and testing for GEMM operations
    
    - Added indentation printing in `codegen_cuda.cc` for improved assembly code formatting.
    - Updated `test_tilelang_tilelibrary_gemm.py` to include additional GEMM test cases and shared memory allocation with specified scope.
    - Introduced new `matmul_sr` and `run_gemm_sr` functions for GEMM operations with shared and fragment memory layouts.
    - Refactored layout inference in `mma_macro_generator.py` to improve clarity and correctness in shared memory handling.
    - Enhanced `gemm/__init__.py` to support new GEMM operation combinations and layout inference logic.
    
    These changes improve the clarity, functionality, and testing coverage of GEMM operations in the TileLang framework.
    
    * Refactor GEMM layout and testing for improved clarity and functionality
    
    - Updated `gemm_layouts.cc` to enhance the layout generation logic for transposed and non-transposed GEMM operations.
    - Renamed and modified functions in `test_tilelang_tilelibrary_gemm.py` to reflect changes in GEMM function signatures and improve test coverage.
    - Introduced new GEMM operation combinations in `gemm/__init__.py` to support additional layouts and configurations.
    - Enhanced layout inference in `mma_layout.py` and `mma_macro_generator.py` for better handling of shared memory layouts.
    
    These changes improve the clarity, functionality, and testing coverage of GEMM operations in the TileLang framework.
    
    * Refactor GEMM layout and Python integration for improved functionality
    
    - Updated `gemm_layouts.cc` to correct the order of layout replication and repetition for transposed and non-transposed GEMM operations.
    - Enhanced `gemm_py.cc` to handle block realization more robustly, ensuring correct assignment of global symbols and block attributes.
    - Refactored `inject_pipeline.cc` to streamline buffer read/write region handling, improving clarity and maintainability.
    - Cleaned up test cases in `test_tilelang_tilelibrary_gemm.py` by removing unnecessary print statements and adjusting function calls for better test execution flow.
    
    These changes enhance the clarity, functionality, and robustness of GEMM operations and their testing in the TileLang framework.
    
    * Refactor GEMM layout and testing for improved clarity and functionality
    
    - Updated `gemm_layouts.cc` to enhance layout generation logic for transposed and non-transposed GEMM operations.
    - Improved block realization handling in `gemm_py.cc` for better assignment of global symbols.
    - Streamlined buffer read/write region handling in `inject_pipeline.cc` for clarity.
    - Enhanced test cases in `test_tilelang_tilelibrary_gemm.py` by adjusting function calls and adding new GEMM operation combinations.
    
    These changes improve the clarity, functionality, and robustness of GEMM operations and their testing in the TileLang framework.
    
    * tfloat32 support.
    
    * lint fix
    
    * lint fix
    
    * Refactor shared memory allocation in GEMM tests
    
    - Removed unnecessary scope specification in shared memory allocation for matrices A and B in `test_tilelang_tilelibrary_gemm.py`.
    - This change simplifies the allocation process and aligns with the updated GEMM function signatures.
    91a7bb2b
gemm.cc 23.6 KB