Commits · b14f201ef5d5db28accd001dccbaf799253a2741 · OpenDAS / tilelang

07 May, 2026 2 commits
- Feats: Add async, pipeline and ds_read · b14f201e
  qisan authored May 07, 2026
  
  b14f201e
- Feats: add register pipeline · 44cc93c7
  qisan authored May 07, 2026
  
  44cc93c7
06 May, 2026 2 commits
- fix ds_read pass · eff4082d
  wangziyang authored May 06, 2026
  
  eff4082d
- add B_local layout transformation with loop optimization · dd95e41b
  wangziyang authored May 06, 2026
  
  dd95e41b
27 Apr, 2026 4 commits
- [Bugfix] B share to local warps num · bba13746
  wangziyang authored Apr 27, 2026
  
  bba13746
- Merge remote ds_read: resolve conflicts in phase.py, mmac_macro_generator.py, gemm_mmac.py · 64179eaf
  qisan authored Apr 27, 2026
  
  64179eaf
- Feats: vectorize async copy · dd91b1e0
  qisan authored Apr 27, 2026
  
  dd91b1e0
- print B warp layout & B s_to_l padding · ad92620d
  wangziyang authored Apr 27, 2026
  
  ad92620d
24 Apr, 2026 1 commit
- add B mmac layout · b8213492
  wangziyang authored Apr 24, 2026
  
  b8213492
22 Apr, 2026 4 commits
- [Bugfix] A share to local padding · bb2f5e4f
  wangziyang authored Apr 22, 2026
  
  bb2f5e4f
- add multi-round intra-warp offset, inter-warp offset · 41887aed
  wangziyang authored Apr 22, 2026
  
  41887aed
- Merge: async_copy & lds_copy · c6e888bd
  qisan authored Apr 22, 2026
  
  c6e888bd
- Feats: support async_copy pass! · 32d0b3cb
  qisan authored Apr 21, 2026
  
  32d0b3cb
21 Apr, 2026 1 commit
- add ldmatrix warp_interval_idx mapping · a0ec0f57
  wangziyang authored Apr 21, 2026
  
  a0ec0f57
20 Apr, 2026 1 commit
- add simple mmac_layout · f3f31091
  wangziyang authored Apr 20, 2026
  
  f3f31091
16 Apr, 2026 1 commit
- add load share offset from load indices · 8443d88e
  wangziyang authored Apr 16, 2026
  
  8443d88e
09 Apr, 2026 1 commit
- add inject_blocal_layout · 74e57416
  wangziyang authored Apr 09, 2026
  
  74e57416
03 Apr, 2026 2 commits
- print MatrixCore init local size · 15599a93
  wangziyang authored Apr 03, 2026
  
  15599a93
- update cp_async & init inject_ds_read · 3852d58b
  wangziyang authored Apr 03, 2026
  
  3852d58b
17 Mar, 2026 2 commits
- feat(dcu):update installation_dcu.md · 19cdf0ca
  qisan authored Mar 17, 2026
  
  19cdf0ca
- feat(dcu): switch to gemm_v1 instead of gemm_v2 · ae295a4a
  qisan authored Mar 17, 2026
  
  ae295a4a
22 Dec, 2025 4 commits
- Merge branch 'dcu' of github.com:Lukinon/tilelang into dcu · d0436b7b
  qisan authored Dec 22, 2025
  
  d0436b7b
- [Bugfix] Pass pre commit check · bb62f6bf
  qisan authored Dec 22, 2025
  
  bb62f6bf
- [Bugfix] Pass pre commit check · e942c054
  qisan authored Dec 22, 2025
  
  e942c054
- Merge branch 'main' into dcu · 667632cc
  guchaoyang authored Dec 22, 2025
  
  667632cc
21 Dec, 2025 1 commit

[Refactor] Phaseout PassConfig `kDisableDynamicTailSplit` and... · a874e4e8

Lei Wang authored Dec 21, 2025

[Refactor] Phaseout PassConfig `kDisableDynamicTailSplit` and `kDynamicAlignment` as they are legacy (#1486)

* [Cleanup] Remove dynamic shape example and related tests

* Deleted the dynamic shape example script `example_dynamic.py` and its corresponding test file `test_example_dynamic.py` to streamline the codebase.
* Removed unused dynamic tail split and dynamic alignment configurations from `builtin.h` and `pass_config.py`.
* Cleaned up the dynamic shape testing files to eliminate redundancy and improve maintainability.

* build fix

a874e4e8

20 Dec, 2025 1 commit

[Enhancement] Enhance let binding handling in layout inference and warp specialized pass (#1484) · 7e8d1f82

Lei Wang authored Dec 21, 2025

* [Feature] Add FullyReplicated Fragment Layout and Enhance Layout Inference

* Introduced a new static method `FullyReplicated` in the `Fragment` class to create fully replicated fragment layouts, ensuring all threads hold identical copies of the buffer.
* Updated `CopyNode` to collect fragment layouts and mark them as fully replicated during layout inference.
* Enhanced `ParallelOpNode` to expand let bindings for fragment buffer accesses, improving layout inference accuracy.
* Added documentation for new methods and updated existing methods to support the new layout features.

* lint fix

* Remove debug logging statements from layout inference process to streamline output and improve performance.

7e8d1f82

19 Dec, 2025 7 commits

[Enhancement] Use static Z3 context (#1482) · 168aec7b
Lei Wang authored Dec 20, 2025
```
* use static Z3 context

* Update submodule reference for TVM to indicate a dirty state
```
168aec7b

[News] update with latest news (#1475) · 2217eb74

Lei Wang authored Dec 19, 2025

* Update README.md with latest news, including CuTeDSL backend support, Z3 theorem prover integration, and migration to apache-tvm-ffi for improved compatibility.

* Update README.md to enhance CuTeDSL backend announcement with a link to related issue and clarify migration benefits to apache-tvm-ffi, reducing CPU overhead.

2217eb74

[Language] Enhance T.dtype.as_torch conversion for compatibility (#1473) · 3516f1ee

Lei Wang authored Dec 19, 2025

* [Language] Enhance dtype conversion for PyTorch compatibility

- Added support for new float8 and float4 data types in the __dtype_as_torch__ method.
- Implemented backend-specific handling for float8_e4m3 based on HIP or CUDA.
- Included assertions to ensure compatibility with the required PyTorch versions for each dtype.
- Improved error handling for unsupported dtypes.

* Fix test script execution and improve error messages for dtype assertions

- Commented out the main execution call in the test script and replaced it with a direct call to the test function `test_divmod()`.
- Enhanced error messages in the dtype conversion assertions to improve clarity and readability, ensuring proper guidance for required PyTorch versions.

3516f1ee

[Refactor] Remove triton dependence in testing & move triton baseline into examples (#1470) · 95e3b5a7
silentCoder-dev authored Dec 19, 2025
```
* remove triton dependence in testing & move triton baseline into example

* use ceildiv and handles arbitrary M correctly for triton
```
95e3b5a7
[Language] Make TL scripts friendly to Python syntax highlights (#1466) · 1a3a64fb
Chaofan Lin authored Dec 19, 2025
```
* Language] Make TL scripts friendly to Python syntax highlights

* add comments

* fix submodule
```
1a3a64fb

[ArgBinder] Enhance shape variable handling and assertions (#1467) · f6db2014

Lei Wang authored Dec 19, 2025

* feat(arg_binder): enhance shape variable handling and assertions

- Implemented special handling for comparing if_then_else expressions to simplify conditions involving NULL checks.
- Added methods to set shared shape variables and finalize deferred bindings, generating cascading if_then_else expressions and runtime assertions for non-NULL buffers.
- Updated the binding logic to defer shape variable bindings for shared variables, ensuring proper handling across multiple nullable buffers.

* refactor(arg_binder): clean up shape variable handling and remove unused code

- Removed deprecated methods for setting shared shape variables and finalizing deferred bindings, streamlining the argument binding process.
- Simplified the logic for handling shape values in the `BindDLTensor` function, ensuring immediate binding for normal shape variables.
- Enhanced clarity by eliminating unnecessary comments and code related to cascading if_then_else expressions for shared variables.

* refactor(arg_binder): enhance DLTensor binding with improved shape handling

- Replaced the single `BindDLTensor` method with `BindDLTensors` to support multiple buffers, improving flexibility in handling DLTensor bindings.
- Introduced a two-pass approach for shape variable handling, allowing for better management of symbolic dimensions and null checks.
- Updated the logic to assert non-null conditions at runtime and utilize cascaded if_then_else expressions for shape retrieval, enhancing robustness.
- Removed deprecated code and streamlined the binding process for clarity and maintainability.

* fix(test_nullable_buffer_params): improve formatting and consistency in test output

- Updated string formatting for better readability in the `test_nullable_shared_shape` function.
- Ensured consistent use of double quotes for string literals.
- Added a missing newline at the end of the file for proper formatting.

* refactor(arg_binder): simplify allocation size calculation in BindDLTensors

- Streamlined the calculation of allocation size by replacing a lambda function with a direct loop, enhancing readability and maintainability.
- Improved clarity in the null check message for data pointers, ensuring better understanding of the binding process.

* Remove debug prints from phase.py

Removed debug print statements after MakePackedAPI transformation.

f6db2014

[Refactor] Rename test for curand & add triton baseline in `test_tilelang_language_rand.py` (#1464) · f0672603

silentCoder-dev authored Dec 19, 2025

* rename test for curand & add triton baseline

* add a comment for calling T.rng_rand() four times

* refactor tilelang&triton kernel

* Add boundary checks for M not divisible by 128

f0672603

18 Dec, 2025 4 commits

[Bugfix] Fix tvm_mmac not found error · d6dd2ddf
qisan authored Dec 18, 2025

d6dd2ddf

feat(cutedsl): add CuTeDSL backend (#1421) · 7248a810

Gabriel Wu authored Dec 18, 2025



* feat: CuTeDSL backend

* fix: clang-tidy

* fix: clang-format

* fix: ci

* fix: revert example gemm fp8

* fix: remove duplicate code

* fix: switch-case

* fix: fp16 silence

* fix: TVM IR print

* fix: useless tir

* fix: clang-format

* fix: remove tilelang/contrib/cutedsl/.gitignore

* fix: use hexfloat

* fix: gsym guard

* fix: unknown storage sync type

* fix: string literal

* fix: add args guard

* fix: name hint dedup

* fix: better find_kernel_by_pattern

* fix: set libpath for from_database path

* fix: guard buffer.strides

* fix: from guard

* fix: eviction guard

* fix: use thread local tma descs

* fix: ruff

* fix: drop tma_init_cpp

* fix: exc_info

* fix: negative unmatch early return

* fix: rename postproc func and add test

* fix: handle fast math according to pass config

* fix: dyn_sym parse

* fix: wrap_forward

* fix: use tvm_ffi.libinfo instead of cli

* fix: keep signature

* fix: C++ string safety

* fix: mark tma_store_add as unsupported

* fix: tvm version

* resolve ldsm and cpasync issues.

* fix: minor fixes

* fix: parse signature using ast

* fix: guard global_addr

* fix: create tempfile only when necessary

* fix: use logger.execption for exceptions

* fix: guard lib_path and host_func

* fix: remove tma_cpp_init and add timeout for cpp compile

* add timeout for mbarrier_wait.

* fix: _load_kernel_from_disk signature

* resolve codegen issues.

* fix: logger.exception

* add comment for div_by=1

* merge

* fix: reserve cutlass,cute,tl

* fix: guard tma_store

* fix: allow int64 offset in make_tensor_at_offset

* fix: guard barrier

* fix: add comments for div_by=16

* fix: div_by=1 issue

* delete div_by when offset is 0

* use tl.make_tensor when offset is 0

* fix: explicitly check cutedsl target

* fix: use param.torch_dtype()

---------
Co-authored-by: yuxic <yuxic@nvidia.com>
Co-authored-by: Yong <yong@local>
Co-authored-by: LeiWang1999 <leiwang1999@outlook.com>

7248a810

remove unused duplicated type check (#1462) · a6f59f31
Jinjie Liu authored Dec 18, 2025
```
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
```
a6f59f31

[Language]Adds a random number generation capability through curand_kernel (#1461) · cae06edd

silentCoder-dev authored Dec 18, 2025



* add curand.{curand_init, curand}

* run format.sh

* add default value for curand_init & add test for curand

* Update testing/python/language/test_rand.py

Remove unused thread binding
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* remove unused library

* enable tilelang cache for testing

* run format.sh

* Revert "run format.sh"

This reverts commit 5afaff782f31cdf653e2c45b469da8dead228b8a.

* Revert "enable tilelang cache for testing"

This reverts commit c277a43e77938bd88d47a108dd1bd65734d4a1ae.

* Revert "remove unused library"

This reverts commit 568ad20611f039380113937fd131151a2bffd801.

* run format.sh

* ensure FreshName for __philox_state

* ensure FreshName for __philox_state

* change the return type of T.rng_init

---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

cae06edd

17 Dec, 2025 2 commits

[Cache] Rename sparse compress cache directory (#1460) · 48e70e68

Lei Wang authored Dec 17, 2025

* Enhance cache directory structure by including version information in sparse.py to ensure separate caches for different versions.

* Fix formatting in sparse.py by adding a newline for improved readability and consistency.

48e70e68

[Analyzer] Fix floordiv & floormod bug in z3 prover (#1458) · 91cf7966
Kuris authored Dec 17, 2025
```
* fix floordiv & floormod in z3 prover

* fix lint error
```
91cf7966