1. 26 Nov, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Phaseout vmap for Tile Operators (#1334) · f5d9da46
      Lei Wang authored
      
      
      * Refactor GEMM and Reduce operations by moving NormalizeToBufferRegion and MakeAccessPtrFromRegion to utils.{h,cc} for better code organization and reuse.
      
      * lint fix
      
      * Refactor region handling by removing the RegionOp and updating NormalizeToBufferRegion to only accept BufferLoad and BufferRegion. This change improves code organization and simplifies the handling of memory regions across various operations.
      
      * fix
      
      * Refactor memory region handling by introducing `tl.region` calls across various operations, including GEMM and fill functions. This change enhances the consistency of region management and improves code organization by utilizing utility functions for buffer region conversions.
      
      * fix
      
      * fix
      
      * test fix
      
      * lint fix
      
      * Refactor GEMM operations to improve memory region handling by replacing `mbarPtr_` with `mbarRegion_` and updating related logic in both C++ and Python implementations. This change enhances the clarity and consistency of buffer region management.
      
      * fix
      
      * lint fix
      
      * fix
      
      * fix
      
      * test fix
      
      * lint fix
      
      * lint fix
      
      * minor fix
      
      * fix
      
      ---------
      Co-authored-by: default avatarZhiwen Mo <zm125@ic.ac.uk>
      f5d9da46
  2. 04 Nov, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Improve Python3.9 compatibility for ParamSpec and Self (#1190) · 7d961892
      Lei Wang authored
      * [Feature] Enhance fill operation to support various buffer types
      
      - Added support for `BufferLoad` in the `fill` function to handle different buffer types.
      - Updated `Fill` class to process region descriptors and buffer regions, improving flexibility in buffer handling.
      - Introduced checks for static bounds in region definitions to ensure safety during operations.
      - Refactored loop induction variable handling in `FillNode` to accommodate sliced regions.
      
      * lint fix
      
      * [Refactor] Improve Python compatibility for ParamSpec and Self
      
      - Added compatibility handling for ParamSpec and Self to support Python versions below 3.10 and 3.11 respectively.
      - Updated type annotations across multiple files to ensure consistent usage of typing features.
      
      * [Update] Require Python 3.9 and enhance type annotations
      
      - Updated the minimum required Python version from 3.8 to 3.9 in `pyproject.toml`.
      - Removed references to Python 3.8 in classifiers.
      - Changed type annotations from `int | None` to `Optional[int]` in multiple example files for better clarity and compatibility.
      - Improved import statements to use `collections.abc` for `Iterable` and `contextlib` for `AbstractContextManager` in relevant files.
      
      * [Refactor] Update import statements to enhance type annotations
      
      - Replaced imports from `typing` with `collections.abc` for `Iterable` and `Mapping` in relevant files to improve compatibility and clarity.
      - Updated the caching decorator from `functools.lru_cache` to `functools.cache` for better performance in the C++ compiler retrieval function.
      - Adjusted import statements in the language proxy file to maintain consistency in type annotations.
      
      * disable rocm rs nt test.
      
      * lint fix
      7d961892
  3. 31 Oct, 2025 1 commit
    • Lei Wang's avatar
      [FFI] Rebase tvm to v0.22.0 to utilize tvm-ffi (#1108) · 10911e28
      Lei Wang authored
      
      
      * 3rdparty tvm bump
      
      * bump tvm into v0.22.0
      
      * lint fix
      
      * rebase tvm
      
      * Update submodule tvm to latest commit 3085bc4
      
      * Refactor: Update configuration retrieval in CopyNode and adjust test registration in tilelang
      
      * test fix
      
      * add requirement
      
      * atomic_fix
      
      * atomic_fix
      
      * phaseout py39
      
      * optimize
      
      * optimize
      
      * lint fix
      
      * do not clean cache
      
      * do not clean cache
      
      * [Minor] Minor update for Python versions and dependencies
      
      * [Lint] fix lint for py39
      
      * [Lint] fix lint for ROCm
      
      * [Build][CI] Sync CI changes from upstream/sdist
      
      * [Lint] fix lint for ROCm
      
      * [Build][CI] Update `repair-wheel-command`
      
      * [Minor] update abi3audit result format
      
      * [Lint] fix lint for ROCm
      
      * [BugFix] fix build
      
      * [Lint] fix lint for ROCm
      
      * [BugFix] set rpath for libtvm and libtvm_runtime
      
      * [Deps] pin apache-tvm-ffi version
      
      * [Build] set Python 3.9 Limited API for Cython target
      
      * [Build] set Python 3.9 Limited API for Cython target
      
      * [Deps] Restore Python 3.8 support
      
      * [Build] use `apache-tvm-ffi`'s `libtvm_ffi`
      
      * [BugFix] use `;` as delimiter for RPATH on macOS
      
      * [BugFix] use `--ignore-missing-dependencies` for `delocate-wheel`
      
      * [Build] support `sccache` if available
      
      * [Build] add CIBW import test
      
      * [Build][CI] enable ccache for CIBW on Linux
      
      * [BugFix] set rpath for libtvm and libtvm_runtime
      
      * Revert "[Build][CI] enable ccache for CIBW on Linux"
      
      This reverts commit cd9ab57bb5ddd2572c60bcbbebde81480a658fd3.
      
      * [CI] fix perfbench bot
      
      * [BugFix] use Python 3.9 to build wheel
      
      * [Minor] update perfbench bot envs
      
      * [BugFix] fix CIBW environment on Linux
      
      * [CI] skip import test on CentOS 7
      
      * [CI] use Python urllib to download file instead of Wget
      
      ---------
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@pku.edu.cn>
      10911e28
  4. 22 Oct, 2025 1 commit
  5. 26 Sep, 2025 1 commit
    • Lei Wang's avatar
      [Layout] Introduce Flexible Parallel to Support T.serial and local buffers... · c382dcbc
      Lei Wang authored
      
      [Layout] Introduce Flexible Parallel to Support T.serial and local buffers inside T.Parallel loop (#844)
      
      * Support T.serial and local buffers inside T.Parallel loop.
      
      * Fix reducer layout in T.Parallel nested inside other loops
      
      * Debug output with LOG(INFO)
      
      * Add disable option for WGMMA.
      
      * fix
      
      * Use DLOG; fix missing registration for new pass config
      
      * bug fix
      
      * lint fix
      
      * Enhance GEMM instruction set with UTCMMA and improve local buffer handling in casting example
      
      * Update format.sh shebang, improve logging in layout inference, and enhance buffer store wrapper with detailed comments
      
      * Enhance GEMM instantiation logic and improve layout inference for local buffer detection
      
      - Updated the GEMM instantiation logic to include a check for WGMMA compatibility, ensuring that the conditions for using WGMMA are more robust.
      - Refined the layout inference process to better identify when loops manipulate only local buffers, improving the accuracy of thread binding decisions in parallel loops.
      
      ---------
      Co-authored-by: default avatarHuanqi Cao <caohuanqi@deepseek.com>
      c382dcbc
  6. 04 Sep, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Support python reflection for tile operators (#783) · 3cfefc8e
      Lei Wang authored
      * Implement Fill operator and related reflection methods in TileLang
      
      - Added Fill operator implementation in `fill.cc` and `fill.h` for element-wise filling of buffers.
      - Introduced reflection methods for Fill, AtomicAdd, Copy, Conv2DIm2Col, FinalizeReducer, Gemm, and Parallel operators to enhance introspection capabilities.
      - Updated relevant files to register reflection methods and ensure proper initialization in static blocks.
      - Removed outdated comments and unnecessary code in various operator files to improve clarity and maintainability.
      - Added new Python bindings for the Fill operator in `tilelang/ir/fill.py` and updated the module imports accordingly.
      
      * Refactor operator reflection methods and improve code clarity
      
      - Updated reflection methods for AtomicAdd, Copy, FinalizeReducer, Gemm, and Parallel operators to enhance readability by using `empty()` instead of size checks.
      - Consolidated static initialization blocks for various operators to a single line for improved consistency.
      - Cleaned up whitespace and formatting in multiple files to adhere to coding standards and improve maintainability.
      - Added new Python bindings for operators in the `tilelang/ir` module, ensuring proper registration and organization of imports.
      
      * Refactor GEMM and AtomicAdd operations for improved clarity
      
      - Updated the `GetArchInt` function in `atomic_add.cc` to use `std::string` and `std::stoi` for better readability and type safety.
      - Removed unnecessary variables and comments in `gemm_sp.cc` and `gemm.cc` to streamline the `ComputeWarpPartition` method.
      - Cleaned up the `layout_reducer.cc` file by removing unused variable declarations, enhancing code clarity.
      - Added import for the `ir` module in `tilelang/__init__.py` to ensure proper organization of module imports.
      
      * Remove deprecated operator files from the tilelang IR module
      
      - Deleted files for Fill, AtomicAdd, Copy, Gemm, GemmSP, FinalizeReducer, Parallel, Reduce, and Region operators to streamline the codebase.
      - This cleanup enhances maintainability by removing unused code and improving overall organization of the module.
      
      * Refactor imports in tilelang IR module for improved organization
      
      - Updated import statements in `tilelang/ir.py` to reflect changes in the TVM library structure, enhancing clarity and maintainability of the codebase.
      
      * lint fix
      
      * Refactor GEMM and GEMM-SP operations to enhance clarity and maintainability
      
      - Updated the `Gemm` and `GemmSP` classes to utilize a new `GemmWarpPolicy` object for warp partitioning, improving encapsulation and readability.
      - Removed deprecated `ComputeWarpPartition` methods and replaced them with calls to the new policy object, streamlining the code.
      - Cleaned up comments and unnecessary code in `gemm.cc`, `gemm_sp.cc`, and related header files to enhance overall clarity.
      - Introduced a new `GemmWarpPolicyNode` class to manage warp policy attributes and methods, facilitating better organization of related functionalities.
      - Updated reflection methods to include the new policy structure, ensuring proper registration and introspection capabilities.
      
      * Refactor Reduce operation to utilize ReduceType class for improved clarity and maintainability
      
      - Replaced multiple conditional checks for reduce types with a single ReduceType object, simplifying the code structure.
      - Introduced a new ReduceTypeNode class to encapsulate reduce type logic and methods, enhancing organization.
      - Updated MakeInitValue, MakeReduce, and Lower methods to leverage the new ReduceType class, improving readability.
      - Added Python bindings for the ReduceType class in tilelang IR module to ensure proper registration and usability.
      
      * comment
      
      * Refactor operator header files for improved readability
      
      - Cleaned up formatting and whitespace in `atomic_add.h`, `copy.h`, `fill.h`, `reduce.cc`, and `reduce.h` to enhance code clarity.
      - Consolidated comments and adjusted line breaks for better organization and maintainability across multiple operator definitions.
      
      * Refactor MakeReduce method in ReduceOpNode for clarity
      
      - Updated the parameter name in the MakeReduce method from `rhs` to `b` and assigned it to `rhs` for improved readability.
      - This change enhances the clarity of the method's purpose and aligns with the overall refactoring efforts in the Reduce operation.
      
      * Update Reduce operation type checks for consistency
      
      - Changed string comparisons for reduce types in the MakeReduce method from "abs_sum" to "abssum" and "abs_max" to "absmax" for uniformity.
      - This adjustment enhances the clarity and consistency of the reduce type handling in the codebase.
      3cfefc8e
  7. 31 Aug, 2025 2 commits
    • coderabbitai[bot]'s avatar
      📝 Add docstrings to `reducer_0825` (#772) · 9a869396
      coderabbitai[bot] authored
      * 📝 Add docstrings to `reducer_0825`
      
      Docstrings generation was requested by @LeiWang1999.
      
      * https://github.com/tile-ai/tilelang/pull/757#issuecomment-3219088118
      
      
      
      The following files were modified:
      
      * `setup.py`
      * `src/op/builtin.h`
      * `src/op/finalize_reducer.cc`
      * `src/op/finalize_reducer.h`
      * `src/op/parallel.cc`
      * `src/op/parallel.h`
      * `src/op/reduce.cc`
      * `src/target/codegen_cuda.cc`
      * `src/tl_templates/cuda/common.h`
      * `src/transform/layout_inference.cc`
      * `src/transform/layout_reducer.cc`
      * `src/transform/layout_reducer.h`
      * `src/transform/merge_shared_memory_allocations.cc`
      * `src/transform/storage_access.cc`
      * `src/transform/warp_specialized_rewriter.cc`
      * `testing/python/autotune/test_tilelang_autotune_with_inputs.py`
      * `tilelang/engine/phase.py`
      * `tilelang/language/customize.py`
      * `tilelang/language/reduce.py`
      * `tilelang/transform/__init__.py`
      
      * lint fix
      
      * lint fix
      
      ---------
      Co-authored-by: default avatarcoderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      9a869396
    • Lei Wang's avatar
      [Reducer] Introduce `alloc_reducer` to separate inter and intra warp reduction (#757) · 8eab7755
      Lei Wang authored
      
      
      * [Enhancement] Introduce finalize_reducer operator and layout reducer support
      
      - Added `FinalizeReducer` operator to handle reduction finalization in the TileLang framework, allowing for efficient reduction operations.
      - Implemented layout inference for local.reducer buffers, enhancing the handling of layout mappings and reducing complexity in buffer management.
      - Updated `setup.py` to include logging for build directory paths, improving build process visibility.
      - Enhanced atomic operations with new functions for atomic max, min, load, and store, providing more robust atomicity control in memory operations.
      - Refactored parallel loop handling to incorporate reducer information, ensuring proper management of reduction operations in parallel contexts.
      - Cleaned up test cases by removing unnecessary cache disabling and optimizing test parameters for better performance.
      
      * Refactor code formatting and improve readability in multiple files
      
      - Cleaned up whitespace in `setup.py` to enhance logging clarity.
      - Reformatted `AtomicMax` and `AtomicMin` functions in `common.h` for better alignment and readability.
      - Adjusted `debug_print_var` function in `debug.h` to improve code structure and maintainability.
      - Enhanced readability of the `atomic_add` function in `customize.py` by breaking long lines for better clarity.
      
      * Remove debug print statements from `copy.cc` and `inject_tma_barrier.cc` to enhance code clarity and maintainability.
      
      * [Enhancement] Disable reuse of small arrays in shared memory allocation
      
      - Added logic to prevent the reuse of small arrays (<= 32 bits) in `merge_shared_memory_allocations.cc`, ensuring they are lowered to registers in LLVM for improved performance and memory management.
      
      * Refactor `setup.py` to remove duplicate logging statements and enhance clarity. Update `finalize_reducer` function documentation in `reduce.py` to include detailed parameter and return descriptions, improving code readability and maintainability.
      
      * Refactor `finalize_reducer` and `reduce` functions to remove redundant target checks. Simplified conditionals by retaining only the `TargetIsHopper` check, enhancing code clarity and maintainability.
      
      * bug fix
      
      * Add thread checks workaround for replicated cases
      
      * Remove the is_one check
      
      * fix lint error
      
      * lint fix
      
      * Update autotune tests to use smaller matrix sizes for improved performance and reliability
      
      * [Refactor] Update FinalizeReducer to FinalizeReducerOp and adjust related methods
      
      - Refactored FinalizeReducer class to FinalizeReducerOp, updating constructor and method signatures for consistency with the new TileOperator structure.
      - Enhanced layout inference and cloning methods in FinalizeReducerOpNode.
      - Updated test_example_flash_attention.py to call test_example_gqa_bwd instead of tilelang.testing.main.
      - Adjusted header inclusions for improved organization and clarity across multiple files.
      
      * [Refactor] Update atomic operations in common.h and modify test_example_flash_attention.py
      
      - Enhanced atomic operations (Add, Min, Max) in common.h to handle half and bfloat16 types more efficiently.
      - Updated test_example_flash_attention.py to call test_example_gqa_bwd instead of tilelang.testing.main, improving test organization.
      
      * [Refactor] Simplify CopyNode::LowerBulkCopy logic and update test execution
      
      - Removed redundant checks for contiguous memory access in CopyNode::LowerBulkCopy, streamlining the logic for TMA copy operations.
      - Updated test_tilelang_kernel_gemm.py to comment out the main testing function and call a specific test for i8i8i32 tensor operations instead, improving test focus.
      
      ---------
      Co-authored-by: default avatarHuanqi Cao <caohuanqi@deepseek.com>
      Co-authored-by: default avatarFreebase6912 <amid-gauze-racing@duck.com>
      8eab7755