1. 18 Sep, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Turn off `ENABLE_FAST_MATH` by default (#846) · e7e38355
      Lei Wang authored
      * [Enhancement] Enable fast math optimization in tilelang JIT configurations
      
      - Updated multiple examples and kernel functions to include `pass_configs` for enabling fast math optimization.
      - Added support for the `TL_ENABLE_FAST_MATH` configuration option in the built-in operations.
      - Enhanced the `LibraryGenerator` to handle the new fast math configuration, ensuring compatibility with existing settings.
      - Updated documentation to reflect the changes in fast math handling and deprecation of the `TL_DISABLE_FAST_MATH` option.
      
      * lint fix
      
      * [Refactor] Introduce deprecated_warning utility for improved deprecation handling
      
      - Added a new `deprecated_warning` function to streamline deprecation messages.
      - Updated the `LibraryGenerator` to utilize the new function for warning about the deprecated `TL_DISABLE_FAST_MATH` configuration.
      - Enhanced the `deprecated` decorator to support phaseout version messaging, improving clarity for users.
      e7e38355
  2. 15 Jul, 2025 1 commit
  3. 25 Jun, 2025 1 commit
    • Cunxiao Ni's avatar
      [Example] Update examples to use @tilelang.jit (#597) · 3db18726
      Cunxiao Ni authored
      
      
      * [Example] Update kernel compilation in examples to use @tilelang.jit
      
      - Refactored multiple examples to eliminate the use of `tilelang.compile` for kernel creation, directly invoking the functions instead.
      - Added `@tilelang.jit` decorators with appropriate output indices to enhance performance and maintainability.
      - Improved code clarity by simplifying the kernel invocation process across various examples, ensuring consistency in how kernels are defined and executed.
      
      * format
      
      * Update example_tilelang_sparse_gqa_decode_varlen_indice.py
      
      * Update example_dequant_gemm_fine_grained.py
      
      * Update example_gemm_autotune.py
      
      ---------
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      3db18726
  4. 11 Jun, 2025 1 commit
    • Yu Cheng's avatar
      [Feature] Introduce Persistent Loop and Update GEMM Example (#563) · e7b97be2
      Yu Cheng authored
      * [Feature] Added Support for Synchronizing Grids and Persistent Threadblock Transformation
      
      - Defined the sync_grid operation in builtin.cc and builtin.h, allowing synchronization of all threads within a grid.
      - Implemented support for sync_grid in codegen_cuda.cc, ensuring proper handling of this operation in the generated CUDA code.
      - Added the PersistThreadblock transformation, enabling the conversion of thread blocks to persistent thread blocks, enhancing support for persistent kernels.
      - Updated relevant documentation and comments to reflect the addition of new features and usage instructions.
      
      * [Example] Add MLA Decode With Persistent Threadblock Example
      
      * [Feature] Introduce Persistent Loop and Update GEMM Example
      
      - Added a new persistent loop construct in the TIR framework, enabling more efficient kernel execution.
      - Updated the GEMM example to utilize the new persistent primitive, enhancing performance for matrix multiplication.
      - Introduced a `loop_break` intrinsic for better control flow within persistent loops.
      - Updated relevant files to support the new features, including changes in code generation and language interface.
      
      * lint fix
      e7b97be2