- 26 May, 2025 1 commit
-
-
Lei Wang authored
* Updated `get_tilelang_version` to include an optional commit ID in the version string. * Enhanced the `TileLangBuilPydCommand` to write the version with commit ID to the VERSION file during the build process. * Introduced a new function `get_git_commit_id` in `version.py` to retrieve the current git commit hash. * Refactored logger initialization in `autotuner/__init__.py` to ensure handlers are set up only once, improving performance and clarity. * Minor fixes in `flatten_buffer.cc` and `kernel_cache.py` for better handling of versioning and logging.
-
- 24 May, 2025 1 commit
-
-
Lei Wang authored
* [Refactor] Enhance GEMM Warp Partitioning Logic and Introduce Buffer Remapping (#516) * Improved the warp partitioning logic in `Gemm::ComputeWarpPartition` to better accommodate various GEMM policies, including FullRow, FullCol, and Square, ensuring optimal performance based on matrix dimensions. * Introduced a new `RemapBufferRewriter` class to handle buffer reference updates and padding annotations during statement transformations, enhancing memory access safety and clarity. * Updated the `OptimizeForTarget` function to include a new step for configuring index bitwidth, improving the overall optimization process. * Refactored existing code to utilize constants for warp sizes, enhancing maintainability and readability. * Added checks to ensure correct warp allocation and padding map handling, improving robustness in memory management strategies. * [Refactor] Update ConfigIndexBitwidthRewriter to Support Auto-Check Feature * Modified the constructor of `ConfigIndexBitwidthRewriter` to include an `auto_check` parameter, allowing for dynamic bitwidth adjustments based on input conditions. * Enhanced the `VisitExpr_` methods to apply the new auto-check logic, ensuring that integer types are upgraded to 64 bits when necessary, or to a specified index bitwidth otherwise. * Updated the `ConfigIndexBitwidth` pass to determine the index bitwidth based on the presence of configuration, improving flexibility in handling different scenarios. * Add dynamic matrix multiplication example and corresponding test * Introduced `example_dynamic.py` to demonstrate dynamic matrix multiplication using TileLang and PyTorch, including a main function for execution and performance profiling. * Added `test_example_dynamic.py` to validate the functionality of the dynamic matrix multiplication example. * The example includes detailed parameter configurations and checks against PyTorch's implementation for correctness. * lint fix * Add get_num_sms function to retrieve the number of streaming multiprocessors on the CUDA device * Implemented the `get_num_sms` function in `cuda_driver.py` to return the count of streaming multiprocessors for a specified CUDA device. * Updated the `__init__.py` file to include the new function in the module exports. * lint fix
-
- 12 Apr, 2025 1 commit
-
-
Lei Wang authored
* fix * Update submodule TVM to latest commit and enhance FlattenBuffer pass in TileLang engine. Added boolean handling in buffer loading and improved address_of detection in flattening logic. * lint fix
-
- 11 Apr, 2025 1 commit
-
-
Lei Wang authored
* [Enhancement] Introduce logical operations `any_of` and `all_of` for buffer checks - Added new logical operations `any_of` and `all_of` to the TileLang language interface, allowing users to check conditions across buffer elements. - Implemented corresponding intrinsic calls for CUDA, enhancing the functionality of the TileLang framework. - Updated the `allocate.py` to handle boolean types correctly in shared memory allocations. - Introduced tests for the new logical operations to ensure correctness and performance. Co-authored-by:
Zhiwen Mo <zhiwen.mo25@ic.ac.uk> * lint fix --------- Co-authored-by:
Zhiwen Mo <zhiwen.mo25@ic.ac.uk>
-