1. 12 Dec, 2025 1 commit
  2. 18 Oct, 2025 1 commit
  3. 10 Jul, 2025 1 commit
    • Lei Wang's avatar
      [Enhancement] Support more flexible layout host pythonic expr (#623) · 22aed721
      Lei Wang authored
      * [Refactor] Enhance expression handling in utils.py and update wrapper to use pythonic_expr
      
      - Added support for additional TIR expressions (FloorDiv, Min, Max, Add, Sub, FloorMod) in the pythonic_expr function to improve string representation.
      - Replaced the deprecated legalize_c function calls in TLCUDASourceWrapper and TLCPUSourceWrapper with pythonic_expr for better expression handling in kernel launch code.
      
      * [Refactor] Simplify expression handling in pythonic_expr function
      
      - Consolidated binary and min/max operation handling in the pythonic_expr function to improve readability and maintainability.
      - Replaced individual checks for binary operations with a mapping approach, streamlining the code and enhancing performance in expression representation.
      
      * [Enhancement] Improve expression representation in pythonic_expr function
      
      - Added operator precedence handling to the pythonic_expr function, enhancing the conversion of TVM PrimExpr to Python-style strings.
      - Updated the visitor logic to intelligently add parentheses based on operator precedence, improving the accuracy of expression representation.
      - Included a docstring for better clarity on the function's purpose and usage.
      
      * test fix
      22aed721
  4. 25 Jun, 2025 1 commit
    • Cunxiao Ni's avatar
      [Example] Update examples to use @tilelang.jit (#597) · 3db18726
      Cunxiao Ni authored
      
      
      * [Example] Update kernel compilation in examples to use @tilelang.jit
      
      - Refactored multiple examples to eliminate the use of `tilelang.compile` for kernel creation, directly invoking the functions instead.
      - Added `@tilelang.jit` decorators with appropriate output indices to enhance performance and maintainability.
      - Improved code clarity by simplifying the kernel invocation process across various examples, ensuring consistency in how kernels are defined and executed.
      
      * format
      
      * Update example_tilelang_sparse_gqa_decode_varlen_indice.py
      
      * Update example_dequant_gemm_fine_grained.py
      
      * Update example_gemm_autotune.py
      
      ---------
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      3db18726
  5. 24 May, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Support auto index bitwidth casting (#517) · 6ad73f6f
      Lei Wang authored
      * [Refactor] Enhance GEMM Warp Partitioning Logic and Introduce Buffer Remapping (#516)
      
      * Improved the warp partitioning logic in `Gemm::ComputeWarpPartition` to better accommodate various GEMM policies, including FullRow, FullCol, and Square, ensuring optimal performance based on matrix dimensions.
      * Introduced a new `RemapBufferRewriter` class to handle buffer reference updates and padding annotations during statement transformations, enhancing memory access safety and clarity.
      * Updated the `OptimizeForTarget` function to include a new step for configuring index bitwidth, improving the overall optimization process.
      * Refactored existing code to utilize constants for warp sizes, enhancing maintainability and readability.
      * Added checks to ensure correct warp allocation and padding map handling, improving robustness in memory management strategies.
      
      * [Refactor] Update ConfigIndexBitwidthRewriter to Support Auto-Check Feature
      
      * Modified the constructor of `ConfigIndexBitwidthRewriter` to include an `auto_check` parameter, allowing for dynamic bitwidth adjustments based on input conditions.
      * Enhanced the `VisitExpr_` methods to apply the new auto-check logic, ensuring that integer types are upgraded to 64 bits when necessary, or to a specified index bitwidth otherwise.
      * Updated the `ConfigIndexBitwidth` pass to determine the index bitwidth based on the presence of configuration, improving flexibility in handling different scenarios.
      
      * Add dynamic matrix multiplication example and corresponding test
      
      * Introduced `example_dynamic.py` to demonstrate dynamic matrix multiplication using TileLang and PyTorch, including a main function for execution and performance profiling.
      * Added `test_example_dynamic.py` to validate the functionality of the dynamic matrix multiplication example.
      * The example includes detailed parameter configurations and checks against PyTorch's implementation for correctness.
      
      * lint fix
      
      * Add get_num_sms function to retrieve the number of streaming multiprocessors on the CUDA device
      
      * Implemented the `get_num_sms` function in `cuda_driver.py` to return the count of streaming multiprocessors for a specified CUDA device.
      * Updated the `__init__.py` file to include the new function in the module exports.
      
      * lint fix
      6ad73f6f
  6. 09 May, 2025 1 commit
    • Zhengju Tang's avatar
      [CI] Add BlocksparseGemm, Dynamic, and Cast examples to CI (#467) · 46eb4589
      Zhengju Tang authored
      
      
      * [Refactor] Enhance TMA barrier validation and support for additional architectures (#463)
      
      * Updated the TMA barrier validation in `inject_tma_barrier.cc` to check for non-empty `barrier_id_to_range_` before raising an error for missing `create_list_of_mbarrier`.
      * Refactored architecture checks in `phase.py` to utilize a new constant `SUPPORTED_TMA_ARCHS`, allowing for easier updates and improved readability in the target architecture validation logic.
      
      * [CI] Add BlocksparseGemm, Dynamic, and Cast examples to CI.
      
      * Lint
      
      ---------
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      46eb4589
  7. 13 Apr, 2025 1 commit