1. 13 Jun, 2025 1 commit
    • Lei Wang's avatar
      [Enhancement] Include Metadata (LayoutMap etc.) into hashing (#570) · 9247a879
      Lei Wang authored
      - Modified the serialization of function scripts in both KernelCache and AutoTunerCache to include metadata by setting `show_meta=True` in `cloudpickle.dumps()`. This change enhances the hash key generation for kernel configurations, improving cache accuracy and consistency.
      9247a879
  2. 11 Jun, 2025 1 commit
    • Lei Wang's avatar
      [Bugfix] Add `__tune_params` into key hash for autotuning (#565) · ae386a7b
      Lei Wang authored
      * [Enhancement] Update AutoTuner and Profiler for improved kernel handling and output validation
      
      - Modified AutoTuner to store cache in a dedicated "autotuner" directory.
      - Enhanced kernel source code saving logic in AutotuneResult and AutoTunerCache to check for None before writing.
      - Updated Profiler to handle None outputs gracefully during tensor comparisons, improving robustness in output validation.
      
      * lint fix
      ae386a7b
  3. 09 Jun, 2025 1 commit
    • Lei Wang's avatar
      [Enhancement] Optimize debug info for auto tuning (#560) · e5e36dbf
      Lei Wang authored
      
      
      * [Enhancement] Update AutoTuner and JIT compilation arguments
      
      * Added functionality to return compile arguments in the JIT implementation, enhancing the autotuner's caching capabilities.
      * Modified `CompileArgs` and `AutotuneResult` classes to support optional `out_idx` parameter, improving flexibility in compile argument handling.
      * Refactored the `_AutoTunerImplementation` to utilize the new compile arguments, ensuring better integration and performance during tuning processes.
      
      * Update tilelang/autotuner/param.py
      Co-authored-by: default avatargemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
      
      * remove redundant comments
      
      * Refactor kernel source retrieval and logging levels
      
      * Updated `AutotuneResult` to use `kernel.get_kernel_source()` instead of `kernel.adapter.get_kernel_source()`.
      * Changed logging level in `KernelCache` from `ERROR` to `DEBUG` for improved verbosity during kernel caching operations.
      * Removed unnecessary verbose logging in JIT compilation process to streamline output.
      
      * Merge branch 'main' of https://github.com/tile-ai/tilelang
      
       into bugfix_autotune_0604
      
      * lint fix
      
      ---------
      Co-authored-by: default avatargemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
      e5e36dbf
  4. 05 Jun, 2025 1 commit
    • Gabriel Wu's avatar
      [Enhancement] Add nvrtc execution backend (#461) · 17f7394f
      Gabriel Wu authored
      
      
      * [wip] feat: add nvrtc backend
      
      * [wip] fix: handle out_idx
      
      * [wip] refactor: move lib logic to libgen
      
      * feat: cache for nvrtc backend
      
      * fmt: run format
      
      * fix: handle cuda bindings import error
      
      * fix: handle cuda bindings import error
      
      * fix: handle cuda bindings import error
      
      * fix: handle cuda bindings import error
      
      * fix: get kernel source
      
      * refactor: speedup pyimport
      
      * Improve error handling for missing cuda-python dependency in nvrtc backend. Raise ImportError with detailed installation instructions instead of logging a warning.
      
      * Enhance nvrtc backend error handling by introducing a flag to check for cuda-python availability. Raise ImportError with detailed installation instructions during initialization if the nvrtc backend is unavailable, improving user experience and clarity.
      
      * Update README.md to include recent NVRTC Backend addition, highlighting reduced compilation time for CUDA templates.
      
      * fix tl_templates
      
      * ensure CUDA context
      
      ---------
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      17f7394f
  5. 28 May, 2025 1 commit
    • Lei Wang's avatar
      [Autotune] Introduce cache mechanism for auto tuner (#527) · 7171aff6
      Lei Wang authored
      * [Enhancement] Add commit ID to versioning and improve logging initialization
      
      * Updated `get_tilelang_version` to include an optional commit ID in the version string.
      * Enhanced the `TileLangBuilPydCommand` to write the version with commit ID to the VERSION file during the build process.
      * Introduced a new function `get_git_commit_id` in `version.py` to retrieve the current git commit hash.
      * Refactored logger initialization in `autotuner/__init__.py` to ensure handlers are set up only once, improving performance and clarity.
      * Minor fixes in `flatten_buffer.cc` and `kernel_cache.py` for better handling of versioning and logging.
      
      * [Refactor] Enhance AutoTuner and JITKernel for improved performance and caching
      
      * Refactored the AutoTuner class to include new methods for setting compilation and profiling arguments, enhancing configurability.
      * Introduced caching mechanisms for tuning results, allowing for faster retrieval of previously computed configurations.
      * Updated JITKernel to store tuning results, including latency and configuration details, improving the kernel's performance tracking.
      * Added new methods for generating cache keys and saving/loading results to/from disk, streamlining the tuning process.
      * Enhanced the overall structure and readability of the autotuning logic, ensuring better maintainability and clarity.
      * Minor adjustments in related modules to support the new caching and profiling features.
      
      * [Refactor] Clean up code formatting and improve readability in AutoTuner and related modules
      
      * Consolidated import statements and removed unnecessary line breaks for better readability.
      * Standardized function argument formatting across the AutoTuner and CompileArgs classes.
      * Enhanced consistency in the use of whitespace and indentation throughout the codebase.
      * Minor adjustments in the Profiler and JITKernel classes to improve clarity and maintainability.
      * Ensured that all changes adhere to the project's coding style guidelines.
      
      * [Refactor] Remove redundant type hints in AutoTuner modules
      
      * Simplified import statements in `__init__.py` and `param.py` by removing unnecessary duplicate type hints for `Any`.
      * Improved code readability and maintainability by streamlining type imports across the AutoTuner module.
      
      * [Refactor] Update AutoTuner configuration for improved profiling and target detection
      
      * Enhanced the AutoTuner configuration across multiple examples by adding `set_profile_args` to better manage profiling settings.
      * Standardized the use of `target="auto"` in compile arguments to ensure automatic target detection.
      * Removed redundant target specifications in certain instances to streamline the configuration process.
      * Improved overall clarity and maintainability of the autotuning logic in various example scripts.
      
      * [Refactor] Simplify code formatting and improve readability in example scripts
      
      * Consolidated function argument formatting in `benchmark_mla_decode_amd_tilelang.py`, `example_elementwise_add.py`, and `performance.py` for better clarity.
      * Removed unnecessary line breaks and standardized argument placement across multiple files.
      * Enhanced overall code readability and maintainability in autotuning examples and performance scripts.
      
      * [Refactor] Update JIT decorator usage across multiple files
      
      * Removed redundant parameters from the JIT decorator in various benchmark and example scripts, simplifying the code.
      * Standardized the import of the JIT decorator from `tilelang`, enhancing consistency across the codebase.
      * Improved overall readability and maintainability by consolidating import statements and cleaning up function definitions.
      
      * [Refactor] Standardize JIT decorator formatting across benchmark and example scripts
      
      * Simplified the formatting of the JIT decorator in multiple files by removing unnecessary line breaks.
      * Enhanced code readability and consistency in the usage of the JIT decorator across benchmark and example scripts.
      * Improved overall maintainability by ensuring uniformity in function definitions and decorator usage.
      7171aff6
  6. 26 May, 2025 1 commit
    • Lei Wang's avatar
      [Enhancement] Add commit ID to versioning and improve logging initialization (#524) · 62a8d7f0
      Lei Wang authored
      * Updated `get_tilelang_version` to include an optional commit ID in the version string.
      * Enhanced the `TileLangBuilPydCommand` to write the version with commit ID to the VERSION file during the build process.
      * Introduced a new function `get_git_commit_id` in `version.py` to retrieve the current git commit hash.
      * Refactored logger initialization in `autotuner/__init__.py` to ensure handlers are set up only once, improving performance and clarity.
      * Minor fixes in `flatten_buffer.cc` and `kernel_cache.py` for better handling of versioning and logging.
      62a8d7f0
  7. 06 May, 2025 2 commits
    • Lei Wang's avatar
      [Feature] Add cache directory management functions in tilelang.cache (#453) · 0aaef97d
      Lei Wang authored
      * [Feature] Add cache directory management functions in tilelang.cache
      
      * Introduced `get_cache_dir` and `set_cache_dir` functions to manage the kernel cache directory.
      * Updated `KernelCache` class to store cache directory as a `Path` object for improved path handling.
      * Enhanced documentation with examples for new cache directory functions.
      
      * [Refactor] Update cache imports in tilelang.__init__.py
      
      * Added `set_cache_dir` and `get_cache_dir` functions to the import statement for improved cache directory management.
      * This change enhances the accessibility of cache directory management functions within the module.
      0aaef97d
    • Lei Wang's avatar
      [Enhancement] Introduce pass_configs parameter for kernel Caching (#452) · b1ba0cc8
      Lei Wang authored
      * [Enhancement] Introduce pass_configs parameter for kernel compilation
      
      * Added a new `pass_configs` parameter to the `tilelang.compile` function to allow for more flexible kernel compilation configurations.
      * Updated related classes and methods to accommodate the new parameter, ensuring compatibility across the codebase.
      * Enhanced the `torch_assert_close` function to include customizable tensor names for better debugging output.
      * Refactored input handling in example scripts to streamline the process of obtaining inputs for kernel execution.
      
      * lint fix
      b1ba0cc8
  8. 27 Apr, 2025 1 commit
  9. 31 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Cache] Implement in-memory cache (#308) · 5802c01b
      Lei Wang authored
      * [Enhancement] Add support for CUDA architecture 8.9 in GEMM template
      
      - Introduced conditional inclusion of "gemm_sm89.h" for CUDA architectures 8.9 and above, enhancing compatibility with newer hardware.
      - This change ensures that the GEMM template can leverage optimizations specific to the 8.9 architecture, improving performance for users with compatible GPUs.
      
      * lintfix
      
      * [Refactor] Clean up includes in gemm_sm89.h
      
      - Removed duplicate inclusion of "common.h" and added "cuda_fp8.h" for improved clarity and organization.
      - This change enhances the maintainability of the code by ensuring that header files are included only once and in a logical order.
      
      * [Enhancement] Improve KernelCache with in-memory caching and detailed docstrings
      
      - Added an in-memory cache to the KernelCache class to enhance performance by reducing disk access.
      - Updated the __new__ method to initialize the memory cache and added logic to check the cache before loading from disk.
      - Enhanced docstrings across multiple methods to provide clearer explanations of parameters and return values, improving code readability and maintainability.
      - Implemented a clear_cache method to clear both in-memory and disk caches, ensuring efficient cache management.
      
      * lint fix
      5802c01b
  10. 28 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Improve documentation and add detailed docstrings across multiple modules (#298) · 3f294650
      Lei Wang authored
      * [Enhancement] Update AtomicAdd functions for BFLOAT16 in common.h
      
      - Added conditional compilation for BFLOAT16 atomic operations to ensure compatibility with CUDA architectures greater than 7.5.
      - Improved code clarity by organizing the AtomicAdd functions and adding relevant comments for better understanding.
      
      * [Enhancement] Improve documentation and add detailed docstrings across multiple modules
      
      - Updated the `__init__.py` file to enhance module documentation, providing clarity on auto-tuning functionalities.
      - Added comprehensive docstrings to the `JITContext`, `AutotuneResult`, and `AutoTuner` classes, detailing their attributes and methods.
      - Enhanced memory allocation utilities in `allocate.py` with detailed descriptions for each allocation function.
      - Improved documentation for various intrinsic operations in `builtin.py`, `copy.py`, `customize.py`, `frame.py`, `gemm.py`, `memscope.py`, and `reduce.py`, ensuring clear explanations of parameters and return values.
      - Refactored the `KernelCache` class to improve clarity and maintainability, including detailed comments and docstrings for methods.
      - Overall, these changes aim to enhance code readability and provide better guidance for future developers and users of the Tile-AI framework.
      3f294650
  11. 25 Mar, 2025 1 commit
  12. 23 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Language] Enhance alias to support blockwise memory load (#261) · 927e50d9
      Lei Wang authored
      * [Enhancement] Introduce caching control and frame management in TileLang
      
      - Added cache control functions (`enable_cache`, `disable_cache`, `is_cache_enabled`) in `env.py` to manage kernel caching behavior.
      - Updated `kernel_cache.py` to utilize the cache state, preventing unnecessary kernel compilation when caching is disabled.
      - Introduced a new `frame.py` module to manage LetFrame instances, including a stack for variable-value mapping and enhanced frame management.
      - Updated imports in various modules to accommodate new caching and frame functionalities, improving overall organization and clarity.
      
      * [Refactor] Clean up and enhance caching and frame management in TileLang
      
      - Added spacing for improved readability in `env.py` and `frame.py`.
      - Refactored `LetFrame` class to enhance clarity in buffer region assignment.
      - Ensured consistent formatting and organization across caching control and frame management functions.
      
      * [Feature] Add matrix multiplication functionality in TileLang
      
      - Introduced a new test file `test_tilelang_language_alias.py` that implements a matrix multiplication function using TileLang's primitives.
      - The `matmul` function defines a kernel for performing tile-level GEMM operations, with support for customizable block sizes and data types.
      - Added a `run_matmul` function to compile and execute the kernel, along with a test function to validate the implementation.
      - Updated `gemm.py` to allow `tir.Buffer` or `tir.Var` as valid argument types for the `gemm` function, enhancing flexibility in argument handling.
      
      * [Refactor] Improve formatting and readability in test_tilelang_language_alias.py
      
      - Adjusted spacing and alignment in the `matmul` and `run_matmul` functions for better readability.
      - Cleaned up unnecessary blank lines and ensured consistent formatting throughout the file.
      - Enhanced overall code clarity without altering functionality.
      927e50d9
  13. 22 Mar, 2025 2 commits
    • You Jiacheng's avatar
      [Refactor] Move compilation outside critical section (#260) · 001e7b2a
      You Jiacheng authored
      
      
      * move compilation outside critical section
      
      * lint fix
      
      ---------
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      001e7b2a
    • Lei Wang's avatar
      [Example] Implement Kernel Example cumsum (#258) · cd9ec62e
      Lei Wang authored
      * Add GPU kernel for 2D continuous cumulative sum in TileLang example
      
      - Introduced a new example script `example_tilelang_cumsum.py` that generates a GPU kernel for 2D continuous cumulative sum.
      - Implemented functions to handle kernel configuration, memory allocation, and inclusive scan operations.
      - Added a main execution block to demonstrate the kernel's functionality using PyTorch for tensor operations.
      - Enhanced the example with error handling for power-of-two configurations and validation of results against PyTorch's built-in cumulative sum function.
      
      * Refactor TileLang examples and enhance kernel compilation
      
      - Updated `example_tilelang_cumsum.py` to improve GPU kernel generation for 2D continuous cumulative sum, including better parameter handling and error checking.
      - Refactored `example_mha_bwd.py` to enhance kernel compilation readability and maintainability.
      - Modified `kernel_cache.py` to prevent saving kernels to disk when using the DLPack backend, ensuring proper cache management.
      - Added `get_block_bindings` function to `kernel.py` for improved access to block bindings in kernel launch frames.
      - Cleaned up import statements in `__init__.py` for better organization and clarity.
      
      * Enhance GPU kernel for 2D continuous cumulative sum in TileLang example
      
      - Added additional spacing for improved readability in `example_tilelang_cumsum.py`.
      - Refined kernel structure to enhance clarity and maintainability during GPU kernel generation for cumulative sum operations.
      cd9ec62e
  14. 20 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Phaseout LLVM Dependency by Making it Optional (#247) · f2e99180
      Lei Wang authored
      * remove llvm build
      
      * [Refactor] Update kernel compilation and profiling in examples
      
      - Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation.
      - Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency.
      - Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations.
      
      * lint fix
      
      * License Update
      
      * [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files
      
      - Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields.
      - Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability.
      
      * [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files
      
      - Improved comment alignment and readability in `cuda.h`.
      - Cleaned up whitespace and formatting in `rt_mod_hip.cc` to enhance maintainability.
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * fix
      
      * License update
      
      * [Enhancement] Update JITKernel to use artifact for kernel source
      
      - Assigned the generated artifact to `self.artifact` for better management.
      - Updated kernel source references to use `artifact.kernel_source` for consistency in execution backend handling.
      
      * lint fix
      
      * Add @tilelang.testing.requires_llvm decorator to vectorization tests
      
      * Enhance setup.py and env.py for library management
      
      - Added functionality to remove original files after copying in CMakeBuild.
      - Updated TVM_LIBRARY_PATH in env.py to include the PyPI build library path for better integration.
      
      * Refactor TVM_LIBRARY_PATH assignment for improved readability in env.py
      
      * Refactor CMakeBuild file handling in setup.py
      
      - Added a check to ensure the target library directory exists before copying .so files.
      - Improved the logic for creating the target directory and copying files to enhance robustness.
      
      * bugfix
      
      * Rename BuildTLDebug to BuildTileLangCUDAWithoutCompile and update registration. Add @tilelang.testing.requires_llvm decorator to multiple tests for LLVM requirement.
      
      * lint fix
      
      * Enhance TileLang code generation by adding support for device code generation without compilation. Updated `host_codegen` and `device_codegen` functions to include new transformations and registration for `tilelang_hip_without_compile`. Refactored JIT kernel adapters to accommodate host and device modules, improving overall integration and flexibility.
      
      * lint fix
      
      * Add support for C target in device code generation
      
      - Updated `device_codegen_without_compile` to include handling for the C target by registering the `tilelang_cpp` function.
      
      * [Enhancement] Implement auto-clear cache feature based on environment variable
      
      * Added TILELANG_CLEAR_CACHE environment variable to control cache clearing.
      * Updated CI workflow to set TILELANG_CLEAR_CACHE during testing.
      * Modified cache initialization to clear cache if TILELANG_CLEAR_CACHE is set to true.
      
      * [Refactor] Update kernel invocation and import paths in tests and cache
      
      * Changed kernel invocation in `test_tilelang_kernel_dequantize_gemm.py` to return the result.
      * Updated import statements in `test_tilelang_kernel_int4_gemm_mma.py` to use `bitblas` instead of `tilelang`.
      * Refactored paths for artifact and parameters in `kernel_cache.py` for better maintainability.
      
      * [Refactor] Clean up whitespace and improve code formatting in kernel_cache.py
      
      * Removed unnecessary blank lines and adjusted spacing for better readability in the KernelCache class.
      * Enhanced overall code formatting to align with project standards.
      
      * [Enhancement] Add bfloat16 test case and improve kernel caching logic
      
      * Introduced a new test case for bfloat16 matrix multiplication in `test_tilelang_kernel_gemm_mma_intrinsic.py`.
      * Updated `KernelCache` to handle multiple kernel source files and improve error handling during saving and loading.
      * Refactored `JITKernel` to support instantiation from a database, enhancing flexibility in kernel management.
      * Adjusted `CtypesKernelAdapter` and `CythonKernelAdapter` to utilize the new kernel loading mechanism from the database.
      * Improved code formatting and readability across several files.
      
      * lint fix
      
      * Update bfloat16 matrix multiplication test case to use larger dimensions for improved coverage
      f2e99180
  15. 19 Mar, 2025 1 commit
    • alex_xiao's avatar
      [Feature] Add database storage for JITKernel cache with Cython and Ctypes adapters (#213) · e789808b
      alex_xiao authored
      
      
      * [Dev] Add database mechanism to cache
      
      * [Dev] Fix database cache and test for it
      
      * [Dev] Refactor env.py to use TILELANG_CACHE_DIR and remove extra comment.
      
      * [Refactor] Improve code formatting and readability in multiple files
      
      * [Enhancement] Add execution backend options and improve kernel adapter initialization
      
      * [Refactor] Rename cached function to cached_kernel and update related references
      
      * [Enhancement] Enable target and target_host parameters in kernel loading and improve gemm test case
      
      * [Enhancement] Update kernel compilation to specify execution backend as "cython"
      
      * [Refactor] Rename cached_kernel to cached and update references in the codebase
      
      * [Enhancement] Un-comment and add test cases for matrix multiplication correctness; improve kernel caching logic and remove redundant code
      
      * [Refactor] Clean up code formatting and improve readability in cache and adapter modules
      
      * [Refactor] Remove unused imports
      
      * [Refactor] Update cached function signature to use PrimFunc and Optional types for improved type safety
      
      * [Refactor] Update cached function calls to use PrimFunc and improve parameter handling
      
      * [Refactor] Clean up import statements and improve code formatting in cache and kernel test files
      
      * Update tilelang/jit/kernel.py
      
      ---------
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      e789808b