1. 06 Sep, 2025 1 commit
    • Lei Wang's avatar
      [TMA] Automatically lower 1d tma in appropriate cases (#788) · 9d7d45be
      Lei Wang authored
      * Enhance layout inference and copy operations with 1D TMA support
      
      - Updated `CopyNode` to introduce separate handling for 1D bulk load/store operations, including new methods for checking and lowering these operations.
      - Modified `InferLayout` and `GetCopyInst` to accommodate additional parameters for layout maps and analyzers.
      - Enhanced `AtomicAddNode` and `FillNode` to utilize the updated layout inference logic.
      - Improved buffer out-of-bounds checks during layout inference to ensure safe memory access.
      
      This update improves the efficiency and correctness of memory operations in the TileLang framework.
      
      * Refactor layout inference calls for improved readability
      
      - Updated `InferLayout` calls in `AtomicAddNode`, `CopyNode`, and `FillNode` to enhance code clarity by formatting parameters across multiple lines.
      - Cleaned up whitespace and formatting in `copy.h` and `layout_inference.cc` to adhere to coding standards and improve maintainability.
      
      This refactor aims to streamline the layout inference logic and improve overall code organization.
      
      * Fix shared tensor check in CopyNode for bulk copy operations
      
      - Updated the condition in `CheckBulkCopy1D` to verify contiguity of `shared_tensor` instead of `dst`, ensuring correct handling of shared memory layouts during bulk copy operations.
      - This change enhances the accuracy of memory operations in the TileLang framework.
      
      * Update test_example_gdn_compilation.py to invoke test function directly
      
      - Commented out the call to `tilelang.testing.main()` in `test_example_gdn_compilation.py` and replaced it with a direct call to `test_example_chunk_delta_bwd_compilation()`. This change simplifies the test execution flow and focuses on the specific test case.
      
      * Enhance bulk load/store checks in CopyNode with last dimension validation
      
      - Updated `CheckBulkLoad` and `CheckBulkStore` methods in `CopyNode` to include an optional parameter for validating the last dimension during bulk copy operations.
      - Adjusted related methods `CheckBulkLoad1D` and `CheckBulkStore1D` to pass the new parameter, improving the accuracy of bulk copy checks.
      - This change enhances the robustness of memory operations in the TileLang framework by ensuring compliance with dimensional requirements.
      
      * Refactor CheckBulkLoad and CheckBulkStore methods for improved readability
      
      - Reformatted the parameter lists of `CheckBulkLoad` and `CheckBulkStore` methods in `CopyNode` to enhance code clarity by aligning parameters across multiple lines.
      - This change improves the maintainability of the code and adheres to coding standards.
      9d7d45be
  2. 04 Sep, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Support python reflection for tile operators (#783) · 3cfefc8e
      Lei Wang authored
      * Implement Fill operator and related reflection methods in TileLang
      
      - Added Fill operator implementation in `fill.cc` and `fill.h` for element-wise filling of buffers.
      - Introduced reflection methods for Fill, AtomicAdd, Copy, Conv2DIm2Col, FinalizeReducer, Gemm, and Parallel operators to enhance introspection capabilities.
      - Updated relevant files to register reflection methods and ensure proper initialization in static blocks.
      - Removed outdated comments and unnecessary code in various operator files to improve clarity and maintainability.
      - Added new Python bindings for the Fill operator in `tilelang/ir/fill.py` and updated the module imports accordingly.
      
      * Refactor operator reflection methods and improve code clarity
      
      - Updated reflection methods for AtomicAdd, Copy, FinalizeReducer, Gemm, and Parallel operators to enhance readability by using `empty()` instead of size checks.
      - Consolidated static initialization blocks for various operators to a single line for improved consistency.
      - Cleaned up whitespace and formatting in multiple files to adhere to coding standards and improve maintainability.
      - Added new Python bindings for operators in the `tilelang/ir` module, ensuring proper registration and organization of imports.
      
      * Refactor GEMM and AtomicAdd operations for improved clarity
      
      - Updated the `GetArchInt` function in `atomic_add.cc` to use `std::string` and `std::stoi` for better readability and type safety.
      - Removed unnecessary variables and comments in `gemm_sp.cc` and `gemm.cc` to streamline the `ComputeWarpPartition` method.
      - Cleaned up the `layout_reducer.cc` file by removing unused variable declarations, enhancing code clarity.
      - Added import for the `ir` module in `tilelang/__init__.py` to ensure proper organization of module imports.
      
      * Remove deprecated operator files from the tilelang IR module
      
      - Deleted files for Fill, AtomicAdd, Copy, Gemm, GemmSP, FinalizeReducer, Parallel, Reduce, and Region operators to streamline the codebase.
      - This cleanup enhances maintainability by removing unused code and improving overall organization of the module.
      
      * Refactor imports in tilelang IR module for improved organization
      
      - Updated import statements in `tilelang/ir.py` to reflect changes in the TVM library structure, enhancing clarity and maintainability of the codebase.
      
      * lint fix
      
      * Refactor GEMM and GEMM-SP operations to enhance clarity and maintainability
      
      - Updated the `Gemm` and `GemmSP` classes to utilize a new `GemmWarpPolicy` object for warp partitioning, improving encapsulation and readability.
      - Removed deprecated `ComputeWarpPartition` methods and replaced them with calls to the new policy object, streamlining the code.
      - Cleaned up comments and unnecessary code in `gemm.cc`, `gemm_sp.cc`, and related header files to enhance overall clarity.
      - Introduced a new `GemmWarpPolicyNode` class to manage warp policy attributes and methods, facilitating better organization of related functionalities.
      - Updated reflection methods to include the new policy structure, ensuring proper registration and introspection capabilities.
      
      * Refactor Reduce operation to utilize ReduceType class for improved clarity and maintainability
      
      - Replaced multiple conditional checks for reduce types with a single ReduceType object, simplifying the code structure.
      - Introduced a new ReduceTypeNode class to encapsulate reduce type logic and methods, enhancing organization.
      - Updated MakeInitValue, MakeReduce, and Lower methods to leverage the new ReduceType class, improving readability.
      - Added Python bindings for the ReduceType class in tilelang IR module to ensure proper registration and usability.
      
      * comment
      
      * Refactor operator header files for improved readability
      
      - Cleaned up formatting and whitespace in `atomic_add.h`, `copy.h`, `fill.h`, `reduce.cc`, and `reduce.h` to enhance code clarity.
      - Consolidated comments and adjusted line breaks for better organization and maintainability across multiple operator definitions.
      
      * Refactor MakeReduce method in ReduceOpNode for clarity
      
      - Updated the parameter name in the MakeReduce method from `rhs` to `b` and assigned it to `rhs` for improved readability.
      - This change enhances the clarity of the method's purpose and aligns with the overall refactoring efforts in the Reduce operation.
      
      * Update Reduce operation type checks for consistency
      
      - Changed string comparisons for reduce types in the MakeReduce method from "abs_sum" to "abssum" and "abs_max" to "absmax" for uniformity.
      - This adjustment enhances the clarity and consistency of the reduce type handling in the codebase.
      3cfefc8e
  3. 02 Sep, 2025 1 commit
    • Lei Wang's avatar
      [Lint] Introduce clang-tidy into format.sh (#777) · cdc5d8d3
      Lei Wang authored
      * [Refactor] Update Clang-Tidy Checks and Improve Code Consistency
      
      - Enhanced .clang-tidy configuration by adding specific checks for better bug detection and performance optimization.
      - Refactored function signatures across multiple files to use `const` references for parameters, improving performance and code clarity.
      - Updated various methods to ensure consistent handling of parameters, particularly in `AddPredicate`, `Substitute`, and `PlanLoopPartition` functions.
      - Improved readability by replacing size checks with `empty()` method calls in several locations, ensuring clearer intent in the code.
      - General code cleanup and adherence to best practices for better maintainability.
      
      * [Refactor] Enhance Code Consistency and Clang-Tidy Configuration
      
      - Updated .clang-tidy configuration to include additional checks for improved code quality and performance.
      - Refactored function signatures across multiple files to use `const` references, enhancing performance and clarity.
      - Replaced size checks with `empty()` method calls in various locations for clearer intent.
      - Improved handling of parameters in several functions, ensuring consistent usage of `std::move` where applicable.
      - General code cleanup to adhere to best practices and improve maintainability.
      
      * [Refactor] Integrate Clang-Tidy Checks and Enhance Code Consistency
      
      - Added clang-tidy checks to the format script for improved code quality assurance.
      - Refactored function signatures across multiple files to consistently use `const` references, enhancing performance and clarity.
      - Updated the requirements-lint.txt file to include clang-tidy as a dependency.
      - General code cleanup to adhere to best practices and improve maintainability.
      
      * [CI] Update AMD CI Workflow to Include Build Directory Creation
      
      - Added steps to create a build directory and configure CMake with ROCm support during the format check process.
      - Ensured cleanup of the build directory after the format check to maintain a clean workspace.
      
      * [Refactor] Remove Unused Member Variables in AtomicAddNode and CopyNode
      
      - Removed the `args_` member variable from both `AtomicAddNode` and `CopyNode` classes to streamline the code and eliminate unnecessary data members.
      - This change enhances code clarity and maintainability by focusing on relevant attributes for each class.
      
      * [Refactor] Update Clang-Tidy Integration and Code Improvements
      
      - Modified the format script to include the `-fix` option in the clang-tidy command for automatic code fixes.
      - Refactored the `AtomicAddVectorizePlanner` class to improve variable handling and consistency, including changes to member variable types and function signatures.
      - Enhanced code clarity by removing unnecessary `std::move` calls and ensuring consistent usage of types across the class.
      - General code cleanup to adhere to best practices and improve maintainability.
      
      * [Refactor] Improve Parameter Handling and Consistency in AtomicAddVectorize
      
      - Updated function signatures in `AtomicAddVectorizePlanResult` and `AtomicAddVectorizeRewriter` to use `const` references and `std::move` for better performance and clarity.
      - Enhanced the `UpdateVectorSize` method to accept `const Array<PrimExpr>&` for improved efficiency.
      - General code cleanup to maintain consistency and adhere to best practices.
      
      * [CI] Add Git Submodule Initialization to CI Workflow
      
      - Included a step to initialize and update git submodules recursively in the CI workflow.
      - This change ensures that all necessary submodules are available during the format check process, improving build reliability.
      
      * [CI] Add Git Submodule Update Step to Format Check
      
      - Included a command to initialize and update git submodules recursively in the CI workflow during the format check process.
      - This enhancement ensures that all required submodules are available, contributing to improved build reliability.
      
      * [Refactor] Update Function Signatures in AtomicAddVectorize
      
      - Modified the `VectorizeAtomicAdd` function signature to use `const` references for `thread_var` and `thread_bounds`, enhancing performance and code clarity.
      - This change aligns with previous refactoring efforts to improve parameter handling and consistency across the codebase.
      cdc5d8d3
  4. 31 Aug, 2025 1 commit
  5. 29 Aug, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Refactor `Operator` into `TileOperator` and with tvm reflection (#763) · b38bd69e
      Lei Wang authored
      * Refactor operator classes to inherit from TileOperator and update layout inference methods
      
      - Changed base class of several operator classes (AtomicAdd, Copy, Gemm, etc.) from Operator to TileOperator for better alignment with tile operations.
      - Updated InferLayout and Lower methods to use 'override' specifier for clarity and consistency.
      - Adjusted header inclusions to replace "op.h" with "operator.h" across multiple files for improved organization.
      - Added missing layout inference implementations for Fill and Conv2DIm2ColOp.
      - Removed deprecated op.cc and op.h files to streamline the codebase.
      
      * lint fix
      
      * Refactor operator classes to use Node pattern and improve memory management
      
      - Updated several operator classes (AtomicAdd, Copy, Gemm, etc.) to utilize the Node pattern for better memory management and encapsulation.
      - Changed constructors to initialize member variables through a node object, enhancing clarity and reducing direct member access.
      - Updated Clone methods to return TileOperator instances instead of unique pointers, aligning with the new design.
      - Refactored InferLayout and Lower methods to ensure consistency across operator implementations.
      - Adjusted header files to reflect the new class structure and removed deprecated code for a cleaner codebase.
      
      * Enhance Clone methods in AtomicAdd and Copy classes to support parallel operation cloning
      
      - Updated the Clone methods in AtomicAddNode and CopyNode to ensure that the parallel operation (par_op_) is properly cloned when defined, improving the integrity of cloned objects.
      - Refactored the FillNode class to use ParallelOp directly instead of std::make_unique, streamlining the creation of parallel operations.
      - Made minor adjustments in layout inference and other related methods for consistency and clarity.
      
      * Refactor FillNode::Lower method to remove unused global function call
      
      - Eliminated the call to the global function "tl.fill.lower" in the FillNode::Lower method, streamlining the code and improving clarity.
      - Retained the core functionality of the method while enhancing maintainability by reducing unnecessary dependencies.
      b38bd69e
  6. 22 Aug, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Merge bulk copy into copy and improve layout inference for bulk copy (#746) · 5c11d245
      Lei Wang authored
      * [Refactor] Merge bulk copy into copy and refactor layout inference for bulk copy
      
      * Deleted the `bulk_copy` operator implementation and its header file as it is no longer needed.
      * Introduced a new function `cuTensorMapType()` to return the data type for CUDA tensor mapping.
      * Updated related files to reflect these changes, ensuring that the codebase remains clean and maintainable.
      
      * lint fix
      
      * Fix typos in intrinsic names and remove unused print statement in block_sparse_attn_tilelang.py. Updated references from `ptx_ldmatirx` to `ptx_ldmatrix` across multiple files for consistency.
      
      * remove bulk copy
      
      * Refactor copy and atomic add operations to support TMA lower configuration
      
      - Updated `GetCopyInst` to accept a `disable_tma_lower` parameter, allowing for conditional usage of TMA in bulk load/store operations.
      - Modified `Lower` method in `Copy` to incorporate the new TMA configuration.
      - Refactored `AtomicAdd::Lower` to streamline layout inference and vectorization logic.
      - Removed unused `disable_tma_lower` field from `LowerArgs` structure for clarity.
      - Enhanced atomic add vectorization by replacing the buggy implementation with a more robust loop vectorization approach.
      
      * Enhance TMA bulk copy logic in `LowerBulkCopy` method
      
      - Added a condition to set `desc.swizzle` to `CU_TENSOR_MAP_SWIZZLE_NONE` when `shared_layout` matches `linear_layout`, improving clarity in layout handling.
      - Updated warning log to provide more detailed information about fallback scenarios, including source and destination buffer names and shapes, enhancing debugging capabilities.
      
      * lint fix
      
      * Remove fallback logging for non-swizzled global layout in `LowerBulkCopy` method to streamline the bulk copy logic. This change enhances code clarity by eliminating unnecessary warning messages related to inner box dimensions.
      
      * Enhance reshape kernel compilation in `run_reshape` and `run_reshape_smem_1d_2_2d` functions
      
      - Updated the `tl.compile` method to include `pass_configs` that disable TMA lower and warp specialization, addressing shared memory layout transformation limitations.
      - Added TODO comments to indicate the need for further improvements in shared memory handling.
      
      * Update `native_sparse_attention` function to include TMA configuration options
      
      - Added `pass_configs` to the JIT decorator to disable TMA lower and warp specialization, addressing potential issues with shared memory layout transformations.
      - Updated comments to clarify modifications in tensor shapes for inference, specifically setting `q` sequence length to 1.
      
      * Refactor JIT decorator formatting in `native_sparse_attention` function
      
      - Improved readability by reformatting the JIT decorator parameters for `native_sparse_attention`, ensuring consistent style across the codebase.
      - No functional changes were made; this update focuses on code clarity and maintainability.
      
      * Enhance thread management and logging in TileLang compilation
      
      - Added a method to check if printing is enabled during compilation, improving control over logging behavior.
      - Updated the JIT kernel class to utilize the new method for logging compilation status, ensuring consistent and clear output.
      - Added comments to clarify the purpose of changes and improve code readability.
      
      * Add warp specialization scope and refactor register management in TileLang
      
      - Introduced a new constant `kWarpSpecializationScope` in `builtin.h` for better attribute management.
      - Removed the `SetMaxNRegCollector` class and its related logic from `warp_specialized_rewriter.cc`, streamlining the warp specialization process.
      - Added functions `annotate_producer_reg_dealloc` and `annotate_consumer_reg_alloc` in `builtin.py` to facilitate register management.
      - Implemented `AnnotateWarpGroupRegAlloc` in `__init__.py` to inject register allocation calls into warp-specialized functions, enhancing the overall register handling in the compilation process.
      
      * Refactor test for InjectSetMaxNReg pass in TileLang
      
      - Improved readability by restructuring conditional checks and assertions in the test cases.
      - Enhanced clarity in the collection of `set_max_nreg` calls by simplifying the logic.
      - Ensured consistent formatting and spacing throughout the test functions for better maintainability.
      
      * Enhance bulk copy and store checks in `Copy` class
      
      - Updated scope validation for source and destination tensors in `CheckBulkLoad` and `CheckBulkStore` methods to include both `shared.dyn` and `shared` as valid options.
      - Modified `CheckLDSMCopy` and `CheckSTSMCopy` methods to accommodate the new scope validation, ensuring compatibility with shared memory configurations.
      - Improved logging in `LowerBulkCopy` to provide clearer warnings regarding unsupported swizzle layouts, including source and destination names for better debugging.
      
      * lint fix
      5c11d245