- 07 May, 2026 2 commits
- 06 May, 2026 2 commits
-
-
wangziyang authored
-
wangziyang authored
-
- 27 Apr, 2026 4 commits
-
-
wangziyang authored
-
qisan authored
-
qisan authored
-
wangziyang authored
-
- 24 Apr, 2026 1 commit
-
-
wangziyang authored
-
- 22 Apr, 2026 4 commits
-
-
wangziyang authored
-
wangziyang authored
-
qisan authored
-
qisan authored
-
- 21 Apr, 2026 1 commit
-
-
wangziyang authored
-
- 20 Apr, 2026 1 commit
-
-
wangziyang authored
-
- 16 Apr, 2026 1 commit
-
-
wangziyang authored
-
- 09 Apr, 2026 1 commit
-
-
wangziyang authored
-
- 03 Apr, 2026 2 commits
-
-
wangziyang authored
-
wangziyang authored
-
- 17 Mar, 2026 2 commits
- 22 Dec, 2025 4 commits
-
-
qisan authored
-
qisan authored
-
qisan authored
-
guchaoyang authored
-
- 21 Dec, 2025 1 commit
-
-
Lei Wang authored
[Refactor] Phaseout PassConfig `kDisableDynamicTailSplit` and `kDynamicAlignment` as they are legacy (#1486) * [Cleanup] Remove dynamic shape example and related tests * Deleted the dynamic shape example script `example_dynamic.py` and its corresponding test file `test_example_dynamic.py` to streamline the codebase. * Removed unused dynamic tail split and dynamic alignment configurations from `builtin.h` and `pass_config.py`. * Cleaned up the dynamic shape testing files to eliminate redundancy and improve maintainability. * build fix
-
- 20 Dec, 2025 1 commit
-
-
Lei Wang authored
* [Feature] Add FullyReplicated Fragment Layout and Enhance Layout Inference * Introduced a new static method `FullyReplicated` in the `Fragment` class to create fully replicated fragment layouts, ensuring all threads hold identical copies of the buffer. * Updated `CopyNode` to collect fragment layouts and mark them as fully replicated during layout inference. * Enhanced `ParallelOpNode` to expand let bindings for fragment buffer accesses, improving layout inference accuracy. * Added documentation for new methods and updated existing methods to support the new layout features. * lint fix * Remove debug logging statements from layout inference process to streamline output and improve performance.
-
- 19 Dec, 2025 7 commits
-
-
Lei Wang authored
* use static Z3 context * Update submodule reference for TVM to indicate a dirty state
-
Lei Wang authored
* Update README.md with latest news, including CuTeDSL backend support, Z3 theorem prover integration, and migration to apache-tvm-ffi for improved compatibility. * Update README.md to enhance CuTeDSL backend announcement with a link to related issue and clarify migration benefits to apache-tvm-ffi, reducing CPU overhead.
-
Lei Wang authored
* [Language] Enhance dtype conversion for PyTorch compatibility - Added support for new float8 and float4 data types in the __dtype_as_torch__ method. - Implemented backend-specific handling for float8_e4m3 based on HIP or CUDA. - Included assertions to ensure compatibility with the required PyTorch versions for each dtype. - Improved error handling for unsupported dtypes. * Fix test script execution and improve error messages for dtype assertions - Commented out the main execution call in the test script and replaced it with a direct call to the test function `test_divmod()`. - Enhanced error messages in the dtype conversion assertions to improve clarity and readability, ensuring proper guidance for required PyTorch versions.
-
silentCoder-dev authored
* remove triton dependence in testing & move triton baseline into example * use ceildiv and handles arbitrary M correctly for triton
-
Chaofan Lin authored
* Language] Make TL scripts friendly to Python syntax highlights * add comments * fix submodule
-
Lei Wang authored
* feat(arg_binder): enhance shape variable handling and assertions - Implemented special handling for comparing if_then_else expressions to simplify conditions involving NULL checks. - Added methods to set shared shape variables and finalize deferred bindings, generating cascading if_then_else expressions and runtime assertions for non-NULL buffers. - Updated the binding logic to defer shape variable bindings for shared variables, ensuring proper handling across multiple nullable buffers. * refactor(arg_binder): clean up shape variable handling and remove unused code - Removed deprecated methods for setting shared shape variables and finalizing deferred bindings, streamlining the argument binding process. - Simplified the logic for handling shape values in the `BindDLTensor` function, ensuring immediate binding for normal shape variables. - Enhanced clarity by eliminating unnecessary comments and code related to cascading if_then_else expressions for shared variables. * refactor(arg_binder): enhance DLTensor binding with improved shape handling - Replaced the single `BindDLTensor` method with `BindDLTensors` to support multiple buffers, improving flexibility in handling DLTensor bindings. - Introduced a two-pass approach for shape variable handling, allowing for better management of symbolic dimensions and null checks. - Updated the logic to assert non-null conditions at runtime and utilize cascaded if_then_else expressions for shape retrieval, enhancing robustness. - Removed deprecated code and streamlined the binding process for clarity and maintainability. * fix(test_nullable_buffer_params): improve formatting and consistency in test output - Updated string formatting for better readability in the `test_nullable_shared_shape` function. - Ensured consistent use of double quotes for string literals. - Added a missing newline at the end of the file for proper formatting. * refactor(arg_binder): simplify allocation size calculation in BindDLTensors - Streamlined the calculation of allocation size by replacing a lambda function with a direct loop, enhancing readability and maintainability. - Improved clarity in the null check message for data pointers, ensuring better understanding of the binding process. * Remove debug prints from phase.py Removed debug print statements after MakePackedAPI transformation.
-
silentCoder-dev authored
* rename test for curand & add triton baseline * add a comment for calling T.rng_rand() four times * refactor tilelang&triton kernel * Add boundary checks for M not divisible by 128
-
- 18 Dec, 2025 4 commits
-
-
qisan authored
-
Gabriel Wu authored
* feat: CuTeDSL backend * fix: clang-tidy * fix: clang-format * fix: ci * fix: revert example gemm fp8 * fix: remove duplicate code * fix: switch-case * fix: fp16 silence * fix: TVM IR print * fix: useless tir * fix: clang-format * fix: remove tilelang/contrib/cutedsl/.gitignore * fix: use hexfloat * fix: gsym guard * fix: unknown storage sync type * fix: string literal * fix: add args guard * fix: name hint dedup * fix: better find_kernel_by_pattern * fix: set libpath for from_database path * fix: guard buffer.strides * fix: from guard * fix: eviction guard * fix: use thread local tma descs * fix: ruff * fix: drop tma_init_cpp * fix: exc_info * fix: negative unmatch early return * fix: rename postproc func and add test * fix: handle fast math according to pass config * fix: dyn_sym parse * fix: wrap_forward * fix: use tvm_ffi.libinfo instead of cli * fix: keep signature * fix: C++ string safety * fix: mark tma_store_add as unsupported * fix: tvm version * resolve ldsm and cpasync issues. * fix: minor fixes * fix: parse signature using ast * fix: guard global_addr * fix: create tempfile only when necessary * fix: use logger.execption for exceptions * fix: guard lib_path and host_func * fix: remove tma_cpp_init and add timeout for cpp compile * add timeout for mbarrier_wait. * fix: _load_kernel_from_disk signature * resolve codegen issues. * fix: logger.exception * add comment for div_by=1 * merge * fix: reserve cutlass,cute,tl * fix: guard tma_store * fix: allow int64 offset in make_tensor_at_offset * fix: guard barrier * fix: add comments for div_by=16 * fix: div_by=1 issue * delete div_by when offset is 0 * use tl.make_tensor when offset is 0 * fix: explicitly check cutedsl target * fix: use param.torch_dtype() --------- Co-authored-by:
yuxic <yuxic@nvidia.com> Co-authored-by:
Yong <yong@local> Co-authored-by:
LeiWang1999 <leiwang1999@outlook.com>
-
Jinjie Liu authored
Signed-off-by:Jinjie Liu <jjliu@baai.ac.cn>
-
silentCoder-dev authored
* add curand.{curand_init, curand} * run format.sh * add default value for curand_init & add test for curand * Update testing/python/language/test_rand.py Remove unused thread binding Co-authored-by:coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * remove unused library * enable tilelang cache for testing * run format.sh * Revert "run format.sh" This reverts commit 5afaff782f31cdf653e2c45b469da8dead228b8a. * Revert "enable tilelang cache for testing" This reverts commit c277a43e77938bd88d47a108dd1bd65734d4a1ae. * Revert "remove unused library" This reverts commit 568ad20611f039380113937fd131151a2bffd801. * run format.sh * ensure FreshName for __philox_state * ensure FreshName for __philox_state * change the return type of T.rng_init --------- Co-authored-by:
coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
-
- 17 Dec, 2025 2 commits
-
-
Lei Wang authored
* Enhance cache directory structure by including version information in sparse.py to ensure separate caches for different versions. * Fix formatting in sparse.py by adding a newline for improved readability and consistency.
-
Kuris authored
* fix floordiv & floormod in z3 prover * fix lint error
-