1. 07 May, 2026 2 commits
  2. 06 May, 2026 2 commits
  3. 27 Apr, 2026 4 commits
  4. 24 Apr, 2026 1 commit
  5. 22 Apr, 2026 4 commits
  6. 21 Apr, 2026 1 commit
  7. 20 Apr, 2026 1 commit
  8. 16 Apr, 2026 1 commit
  9. 09 Apr, 2026 1 commit
  10. 03 Apr, 2026 2 commits
  11. 17 Mar, 2026 2 commits
  12. 22 Dec, 2025 4 commits
  13. 21 Dec, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Phaseout PassConfig `kDisableDynamicTailSplit` and... · a874e4e8
      Lei Wang authored
      [Refactor] Phaseout PassConfig `kDisableDynamicTailSplit` and `kDynamicAlignment` as they are legacy (#1486)
      
      * [Cleanup] Remove dynamic shape example and related tests
      
      * Deleted the dynamic shape example script `example_dynamic.py` and its corresponding test file `test_example_dynamic.py` to streamline the codebase.
      * Removed unused dynamic tail split and dynamic alignment configurations from `builtin.h` and `pass_config.py`.
      * Cleaned up the dynamic shape testing files to eliminate redundancy and improve maintainability.
      
      * build fix
      a874e4e8
  14. 20 Dec, 2025 1 commit
    • Lei Wang's avatar
      [Enhancement] Enhance let binding handling in layout inference and warp specialized pass (#1484) · 7e8d1f82
      Lei Wang authored
      * [Feature] Add FullyReplicated Fragment Layout and Enhance Layout Inference
      
      * Introduced a new static method `FullyReplicated` in the `Fragment` class to create fully replicated fragment layouts, ensuring all threads hold identical copies of the buffer.
      * Updated `CopyNode` to collect fragment layouts and mark them as fully replicated during layout inference.
      * Enhanced `ParallelOpNode` to expand let bindings for fragment buffer accesses, improving layout inference accuracy.
      * Added documentation for new methods and updated existing methods to support the new layout features.
      
      * lint fix
      
      * Remove debug logging statements from layout inference process to streamline output and improve performance.
      7e8d1f82
  15. 19 Dec, 2025 7 commits
    • Lei Wang's avatar
      [Enhancement] Use static Z3 context (#1482) · 168aec7b
      Lei Wang authored
      * use static Z3 context
      
      * Update submodule reference for TVM to indicate a dirty state
      168aec7b
    • Lei Wang's avatar
      [News] update with latest news (#1475) · 2217eb74
      Lei Wang authored
      * Update README.md with latest news, including CuTeDSL backend support, Z3 theorem prover integration, and migration to apache-tvm-ffi for improved compatibility.
      
      * Update README.md to enhance CuTeDSL backend announcement with a link to related issue and clarify migration benefits to apache-tvm-ffi, reducing CPU overhead.
      2217eb74
    • Lei Wang's avatar
      [Language] Enhance T.dtype.as_torch conversion for compatibility (#1473) · 3516f1ee
      Lei Wang authored
      * [Language] Enhance dtype conversion for PyTorch compatibility
      
      - Added support for new float8 and float4 data types in the __dtype_as_torch__ method.
      - Implemented backend-specific handling for float8_e4m3 based on HIP or CUDA.
      - Included assertions to ensure compatibility with the required PyTorch versions for each dtype.
      - Improved error handling for unsupported dtypes.
      
      * Fix test script execution and improve error messages for dtype assertions
      
      - Commented out the main execution call in the test script and replaced it with a direct call to the test function `test_divmod()`.
      - Enhanced error messages in the dtype conversion assertions to improve clarity and readability, ensuring proper guidance for required PyTorch versions.
      3516f1ee
    • silentCoder-dev's avatar
      [Refactor] Remove triton dependence in testing & move triton baseline into examples (#1470) · 95e3b5a7
      silentCoder-dev authored
      * remove triton dependence in testing & move triton baseline into example
      
      * use ceildiv and handles arbitrary M correctly for triton
      95e3b5a7
    • Chaofan Lin's avatar
      [Language] Make TL scripts friendly to Python syntax highlights (#1466) · 1a3a64fb
      Chaofan Lin authored
      * Language] Make TL scripts friendly to Python syntax highlights
      
      * add comments
      
      * fix submodule
      1a3a64fb
    • Lei Wang's avatar
      [ArgBinder] Enhance shape variable handling and assertions (#1467) · f6db2014
      Lei Wang authored
      * feat(arg_binder): enhance shape variable handling and assertions
      
      - Implemented special handling for comparing if_then_else expressions to simplify conditions involving NULL checks.
      - Added methods to set shared shape variables and finalize deferred bindings, generating cascading if_then_else expressions and runtime assertions for non-NULL buffers.
      - Updated the binding logic to defer shape variable bindings for shared variables, ensuring proper handling across multiple nullable buffers.
      
      * refactor(arg_binder): clean up shape variable handling and remove unused code
      
      - Removed deprecated methods for setting shared shape variables and finalizing deferred bindings, streamlining the argument binding process.
      - Simplified the logic for handling shape values in the `BindDLTensor` function, ensuring immediate binding for normal shape variables.
      - Enhanced clarity by eliminating unnecessary comments and code related to cascading if_then_else expressions for shared variables.
      
      * refactor(arg_binder): enhance DLTensor binding with improved shape handling
      
      - Replaced the single `BindDLTensor` method with `BindDLTensors` to support multiple buffers, improving flexibility in handling DLTensor bindings.
      - Introduced a two-pass approach for shape variable handling, allowing for better management of symbolic dimensions and null checks.
      - Updated the logic to assert non-null conditions at runtime and utilize cascaded if_then_else expressions for shape retrieval, enhancing robustness.
      - Removed deprecated code and streamlined the binding process for clarity and maintainability.
      
      * fix(test_nullable_buffer_params): improve formatting and consistency in test output
      
      - Updated string formatting for better readability in the `test_nullable_shared_shape` function.
      - Ensured consistent use of double quotes for string literals.
      - Added a missing newline at the end of the file for proper formatting.
      
      * refactor(arg_binder): simplify allocation size calculation in BindDLTensors
      
      - Streamlined the calculation of allocation size by replacing a lambda function with a direct loop, enhancing readability and maintainability.
      - Improved clarity in the null check message for data pointers, ensuring better understanding of the binding process.
      
      * Remove debug prints from phase.py
      
      Removed debug print statements after MakePackedAPI transformation.
      f6db2014
    • silentCoder-dev's avatar
      [Refactor] Rename test for curand & add triton baseline in `test_tilelang_language_rand.py` (#1464) · f0672603
      silentCoder-dev authored
      * rename test for curand & add triton baseline
      
      * add a comment for calling T.rng_rand() four times
      
      * refactor tilelang&triton kernel
      
      * Add boundary checks for M not divisible by 128
      f0672603
  16. 18 Dec, 2025 4 commits
    • qisan's avatar
      [Bugfix] Fix tvm_mmac not found error · d6dd2ddf
      qisan authored
      d6dd2ddf
    • Gabriel Wu's avatar
      feat(cutedsl): add CuTeDSL backend (#1421) · 7248a810
      Gabriel Wu authored
      
      
      * feat: CuTeDSL backend
      
      * fix: clang-tidy
      
      * fix: clang-format
      
      * fix: ci
      
      * fix: revert example gemm fp8
      
      * fix: remove duplicate code
      
      * fix: switch-case
      
      * fix: fp16 silence
      
      * fix: TVM IR print
      
      * fix: useless tir
      
      * fix: clang-format
      
      * fix: remove tilelang/contrib/cutedsl/.gitignore
      
      * fix: use hexfloat
      
      * fix: gsym guard
      
      * fix: unknown storage sync type
      
      * fix: string literal
      
      * fix: add args guard
      
      * fix: name hint dedup
      
      * fix: better find_kernel_by_pattern
      
      * fix: set libpath for from_database path
      
      * fix: guard buffer.strides
      
      * fix: from guard
      
      * fix: eviction guard
      
      * fix: use thread local tma descs
      
      * fix: ruff
      
      * fix: drop tma_init_cpp
      
      * fix: exc_info
      
      * fix: negative unmatch early return
      
      * fix: rename postproc func and add test
      
      * fix: handle fast math according to pass config
      
      * fix: dyn_sym parse
      
      * fix: wrap_forward
      
      * fix: use tvm_ffi.libinfo instead of cli
      
      * fix: keep signature
      
      * fix: C++ string safety
      
      * fix: mark tma_store_add as unsupported
      
      * fix: tvm version
      
      * resolve ldsm and cpasync issues.
      
      * fix: minor fixes
      
      * fix: parse signature using ast
      
      * fix: guard global_addr
      
      * fix: create tempfile only when necessary
      
      * fix: use logger.execption for exceptions
      
      * fix: guard lib_path and host_func
      
      * fix: remove tma_cpp_init and add timeout for cpp compile
      
      * add timeout for mbarrier_wait.
      
      * fix: _load_kernel_from_disk signature
      
      * resolve codegen issues.
      
      * fix: logger.exception
      
      * add comment for div_by=1
      
      * merge
      
      * fix: reserve cutlass,cute,tl
      
      * fix: guard tma_store
      
      * fix: allow int64 offset in make_tensor_at_offset
      
      * fix: guard barrier
      
      * fix: add comments for div_by=16
      
      * fix: div_by=1 issue
      
      * delete div_by when offset is 0
      
      * use tl.make_tensor when offset is 0
      
      * fix: explicitly check cutedsl target
      
      * fix: use param.torch_dtype()
      
      ---------
      Co-authored-by: default avataryuxic <yuxic@nvidia.com>
      Co-authored-by: default avatarYong <yong@local>
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      7248a810
    • Jinjie Liu's avatar
      remove unused duplicated type check (#1462) · a6f59f31
      Jinjie Liu authored
      
      Signed-off-by: default avatarJinjie Liu <jjliu@baai.ac.cn>
      a6f59f31
    • silentCoder-dev's avatar
      [Language]Adds a random number generation capability through curand_kernel (#1461) · cae06edd
      silentCoder-dev authored
      
      
      * add curand.{curand_init, curand}
      
      * run format.sh
      
      * add default value for curand_init & add test for curand
      
      * Update testing/python/language/test_rand.py
      
      Remove unused thread binding
      Co-authored-by: default avatarcoderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
      
      * remove unused library
      
      * enable tilelang cache for testing
      
      * run format.sh
      
      * Revert "run format.sh"
      
      This reverts commit 5afaff782f31cdf653e2c45b469da8dead228b8a.
      
      * Revert "enable tilelang cache for testing"
      
      This reverts commit c277a43e77938bd88d47a108dd1bd65734d4a1ae.
      
      * Revert "remove unused library"
      
      This reverts commit 568ad20611f039380113937fd131151a2bffd801.
      
      * run format.sh
      
      * ensure FreshName for __philox_state
      
      * ensure FreshName for __philox_state
      
      * change the return type of T.rng_init
      
      ---------
      Co-authored-by: default avatarcoderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
      cae06edd
  17. 17 Dec, 2025 2 commits