- 28 Nov, 2025 3 commits
-
-
LJC00118 authored
* disable overflow-prone floordiv optimization in lower_intrin.cc * disable overflow-prone floordiv optimization in lower_intrin.cc
-
Lei Wang authored
[Enhancement] Improve error handling and assertion messages across runtime and argument binding (#1356) This commit enhances the error handling mechanisms in the runtime by introducing CPU-safe runtime helpers and refining assertion messages in the CodeGenCHost and ArgBinder. It includes structured packed error messages for various conditions, improving clarity in diagnostics. Additionally, the CMake configuration is updated to always include necessary runtime helpers, ensuring consistent error reporting. The changes aim to provide clearer feedback during runtime errors and improve the overall robustness of the argument binding process.
-
Lei Wang authored
This commit refines the logic for determining the sign state of indices in the LegalizeNegativeIndex transformation. It prioritizes vector patterns, specifically Ramp and Broadcast nodes, to avoid compile-time lane queries. The handling of scalar indices is also streamlined, ensuring clearer diagnostics when non-negativity cannot be proven. These changes enhance the robustness and clarity of index handling in the transformation pass.
-
- 27 Nov, 2025 2 commits
-
-
Lei Wang authored
* [Refactor] Improve assertion handling in CodeGenCHost and ArgBinder This commit refines the assertion message generation in CodeGenCHost by optimizing the handling of equality checks and reducing buffer size for error messages. Additionally, it enhances the ArgBinder by introducing a nullable guard mechanism for assertions, allowing for more precise error handling when binding arguments. The changes improve the clarity and efficiency of assertion handling across the codebase. * [Enhancement] Update matmul kernel and optimize argument binding This commit enhances the matmul kernel by introducing additional tensor parameters and refining the pipeline stages for improved performance. It also updates the argument binding mechanism to include a flag indicating whether buffers are used, enhancing the efficiency of buffer management. Furthermore, the optimization phase in the engine is improved by adding a simplification step, ensuring better performance and clarity in the generated code. * lint fix * [Enhancement] Add tensor checks documentation and improve argument binding assertions This commit introduces a new documentation page for host-side tensor checks, detailing the automatic validations performed by TileLang on kernel arguments. It enhances the ArgBinder by adding assertions for non-null pointers when arguments are used, improving error handling. Additionally, the optimization phase in the engine is updated to include a simplification step, ensuring better performance and clarity in the generated code. * [Enhancement] Update .gitignore and refine matmul kernel for improved performance This commit adds host checks logs to the .gitignore file to prevent unnecessary log files from being tracked. Additionally, it refines the matmul kernel by adjusting pipeline stages, updating tensor parameters, and enhancing argument handling for better performance. The changes also include improved error messages in the argument binding process, ensuring clearer diagnostics for users. * lint fix * lint fix * [Refactor] Simplify tensor_null_test function and remove ptr_null_test This commit refactors the tensor_null_test function by adding a with_bias parameter and removing the ptr_null_test function, which was previously unused. The run_test function is updated to reflect these changes, streamlining the testing process for tensor operations. * lint fix * fix
-
Yuxuan Hu authored
* [EXAMPLE] add example for dsa sparse finetuning * [Refactor]
-
- 26 Nov, 2025 7 commits
-
-
Gongen-Ali authored
* add support for k_pack * support benchmark on ROCm * fix format
-
Kuris authored
-
Lei Wang authored
* [Refactor] Enhance CopyNode's IterVar Creation and Range Handling This commit refines the `MakeIterVars` method in `CopyNode` to select base ranges based on memory scope levels, ensuring that the chosen ranges are not smaller than the original source ranges. Additionally, it updates the Python `copy` function to clarify range handling, including broadcasting logic and extent alignment. These changes improve the robustness and clarity of the copy operation's implementation. * test fix
-
Yunqian Fan authored
* feat: add fp8 variants; add placeholder for fp6/fp4 in meta support ld with pack for fp32 dtype add dump add tempalte expand remove unused dtype and change to rebased apis * fix: when atom-m!=128, enable_ws * fix: typo in tcgen05 meta; dispatch in gemm sm100
-
Lei Wang authored
* Refactor GEMM and Reduce operations by moving NormalizeToBufferRegion and MakeAccessPtrFromRegion to utils.{h,cc} for better code organization and reuse. * lint fix * Refactor region handling by removing the RegionOp and updating NormalizeToBufferRegion to only accept BufferLoad and BufferRegion. This change improves code organization and simplifies the handling of memory regions across various operations. * fix * Refactor memory region handling by introducing `tl.region` calls across various operations, including GEMM and fill functions. This change enhances the consistency of region management and improves code organization by utilizing utility functions for buffer region conversions. * fix * fix * test fix * lint fix * Refactor GEMM operations to improve memory region handling by replacing `mbarPtr_` with `mbarRegion_` and updating related logic in both C++ and Python implementations. This change enhances the clarity and consistency of buffer region management. * fix * lint fix * fix * fix * test fix * lint fix * lint fix * minor fix * fix --------- Co-authored-by:Zhiwen Mo <zm125@ic.ac.uk>
-
ConvolutedDog authored
This commit enhances the LegalizeNegativeIndex transformation pass to handle both buffer load and store operations with negative indices and adds some test cases.
-
LJC00118 authored
* Add test for T.assume * Add unit test for T.assume * Add unit test for T.assume * Add unit tests for T.assume * Remove debug print for kernel source Remove print statement for kernel source in tests. * Update test_tilelang_language_assume.py --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-
- 25 Nov, 2025 5 commits
-
-
Chaofan Lin authored
-
Kuris authored
* [Fix] fix copy from or to local buffer (#1304) * fix lint error * minor fix testing script
-
Lei Wang authored
* Refactor GEMM and Reduce operations by moving NormalizeToBufferRegion and MakeAccessPtrFromRegion to utils.{h,cc} for better code organization and reuse. * lint fix -
Kuris authored
-
Kuris authored
-
- 24 Nov, 2025 8 commits
-
-
Chaofan Lin authored
* [BugFix] Use BufferRegion in tl.cumsum to infer buffer shape * remove debug lines * remove rubbish * Fix decorator syntax for atomic_different_memory_orders_program --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-
Wenhao Xie authored
* [Enhancement] Support more dtype in `T.print` * upd * upd
-
Tong WU authored
* [Feat] Support warp reduce * lint * add test * lint
-
Yichen Yan authored
-
Chaofan Lin authored
-
dependabot[bot] authored
Co-authored-by:dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
dependabot[bot] authored
Co-authored-by:dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
Lei Wang authored
This reverts commit 0d101c11 . Co-authored-by:
Zhiwen Mo <zm125@ic.ac.uk>
-
- 23 Nov, 2025 1 commit
-
-
Lei Wang authored
* [Refactor] Update Vectorization Functions to Accept Analyzer Parameter - Modified `VectorizeLoop` and related functions to accept an `arith::Analyzer` parameter, enhancing their capability to perform analysis during vectorization. - Updated multiple instances in `copy.cc`, `fill.cc`, `parallel.cc`, and layout inference files to utilize the new analyzer parameter for improved performance and correctness. - Ensured consistency across vectorization logic by integrating the analyzer into existing workflows, facilitating better optimization opportunities. * [Fix] Corrected PostOrderVisit call in loop_vectorize.cc - Updated the PostOrderVisit function to analyze the body of the loop node instead of the node itself, ensuring proper handling of nested loops during vectorization analysis. * fix * lint fix * fix
-
- 22 Nov, 2025 2 commits
-
-
Lei Wang authored
-
LJC00118 authored
* Improve memory access safety and T.assume handling * Improve memory access safety and T.assume handling * bugfix * lint fix * bugfix * bugfix * refactor legalize safe memory access pass --------- Co-authored-by:Lei Wang <leiwang1999@outlook.com>
-
- 21 Nov, 2025 4 commits
-
-
Yunqian Fan authored
support ld with pack for fp32 dtype add dump add tempalte expand remove unused dtype and change to rebased apis
-
Kuris authored
* [Fix] Fix #1307 by adding macro inside function * fix lint error * add comments and fix lint error * Remove debug print from enter_frame method Removed debug print statement from enter_frame method. --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-
Lei Wang authored
-
Kuris authored
* [Feat] add missing support of uint32x2 * [Feat] Add `T.Ref` annotation and tests * fix lint error * minor update for error message on twice decl * Remove unused let_bindings_ in CodeGenC to fix #1300
-
- 20 Nov, 2025 4 commits
-
-
Lei Wang authored
* bugfix * lint fix * test * lint fix * increate procs * recover
-
Kuris authored
-
Kuris authored
* [Feature] Add support for A: T.Tensor(n + 1) and A: T.Tensor(2*n) * issue fix * fix * fix * decreate nproc for debugging --------- Co-authored-by:Lei Wang <leiwang1999@outlook.com>
-
Lei Wang authored
-
- 19 Nov, 2025 4 commits
-
-
Chaofan Lin authored
* [Language][UX] Nested loop checker in pre-lowering stage * rename * comment * address comments
-
liu yuhao authored
Co-authored-by:cheeryBloosm <liu_yu_hao@126.com>
-
Lei Wang authored
- Updated the `tilelang_callback_cuda_compile` function to accept a `pass_config` parameter, allowing for more flexible compilation options. - Introduced handling for fast math and PTXAS options based on the provided pass configuration. - Modified the CUDA build process in `rt_mod_cuda.cc` to utilize the current pass context, improving the integration of compilation settings. - Refactored NVCC command construction to use a dedicated function for better clarity and maintainability.
-
Kuris authored
* add typing stub for tir.ir * remove idents * minor update * [Refactor] add numpy conversion for dtype * fix lint error * remove unused np.float_ in dtype conversion * fix type in np.int_ * fix typo * minor fix * remove debug files * fix memory leak bug * fix lint error * add comments * fix lint error * remove duplicated, because tilelang doesn't dependent deprecated
-