"...resnet50_tensorflow.git" did not exist on "c787baad4fcf3e008107be0662a4138194b24522"
  1. 15 Dec, 2025 11 commits
    • Dayuxiaoshui's avatar
      [Feature] Support region as input of T.cumsum (#1426) · 869f021b
      Dayuxiaoshui authored
      
      
      * [Feature] Support region as input of T.cumsum
      
      - Extend T.cumsum to accept BufferRegion and BufferLoad inputs in addition to Buffer
      - This enables operations on buffer slices/regions like:
        T.cumsum(InputG_fragment[i * chunk_size:(i + 1) * chunk_size], dim=0)
      - Update cumsum_fragment to handle region inputs properly
      - Add comprehensive tests for 1D and 2D region inputs including normal and reverse modes
      
      Fixes #879
      
      * Fix formatting and add docstring for cumsum_fragment
      
      - Add comprehensive docstring for cumsum_fragment function
      - Format code according to ruff style guidelines
      
      * Fix CodeRabbit review issues
      
      - Fix negative dimension bounds check (dim < -len(shape) instead of dim <= -len(shape))
      - Add src/dst shape compatibility validation for out-of-place cumsum
      - Update copy() type annotation to accept BufferRegion as dst parameter
      - Fix test in-place mutation issues by using out-of-place cumsum operations
      - Add non-divisible size test cases for tail region coverage
      
      * Fix out-of-bounds access in region tests
      
      - Add bounds clamping using T.min() for chunk_end calculations
      - Prevents accessing beyond tensor bounds for non-divisible sizes
      - Matches reference implementation behavior
      - Fixes both 1D and 2D region test cases
      
      * Fix region test: use simple slice expressions instead of T.min()
      
      - Remove T.min() which cannot be used directly in slice indices
      - Use chunk_start + chunk_size form instead
      - Rely on system's automatic bounds checking for non-divisible sizes
      - Update comments to reflect this approach
      
      * Fix cumsum region: use region extents in lowering and update tests for shared memory
      
      * Simplify fragment scope check using is_fragment()
      
      ---------
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      869f021b
    • Xiangwen Wang's avatar
      bcae814e
    • Lei Wang's avatar
      [Enhancement] Refactor vectorization checks in loop_vectorize (#1440) · e387102c
      Lei Wang authored
      * Introduced a new function, IsExprInvariantInVectorBoundary, to encapsulate the logic for checking if an expression is invariant within vector boundaries, improving code clarity and reusability.
      * Updated the existing vectorization logic to utilize this new function, streamlining the process of determining vectorization feasibility based on boundary conditions.
      * Enhanced comments for better understanding of the vectorization criteria and mathematical rationale behind the checks.
      e387102c
    • Xuehai Pan's avatar
      [CI] Update lint dependencies and fix lint on trunk (#1433) · 4dbc910d
      Xuehai Pan authored
      * [CI] Update pre-commit hooks
      
      * [Lint] Pass correct `exclude-header-filter` to `clang-tidy`
      
      * [Lint] Download latest `run-clang-tidy` script
      
      * [CI] Show compile commands
      
      * [CI] Add output grouping to GHA
      
      * [Lint] Re-order pre-commit hooks
      4dbc910d
    • Lei Wang's avatar
      [Enhancement] Include PrimFunc name in memory cache logs for better debugging (#1437) · b8003a28
      Lei Wang authored
      * Added the `get_prim_func_name` utility to extract human-readable function names from TVM PrimFuncs.
      * Updated memory cache logging in `AutoTuner` and `KernelCache` classes to include the kernel name, improving clarity during cache hits.
      * Enhanced debug logging to provide more informative messages when checking disk cache for kernels.
      b8003a28
    • Chaofan Lin's avatar
      [Enhancement] Improve InjectAssumes logic and make assumes work after SplitHostDevice (#1405) · 2feaa41e
      Chaofan Lin authored
      * [Refactor] Refactor InjectAssumes logic and make assumes work after SplitHostDevice
      
      * address comments
      
      * fix
      
      * fix submodule
      
      * fix
      
      * fix 3rdparty
      2feaa41e
    • Lei Wang's avatar
      [Enhancement] Improve buffer usage tracking in MakePackedAPI (#1435) · 0788feb8
      Lei Wang authored
      * Added detailed logging for data and shape variable parameters during buffer usage detection in the MakePackedAPI function.
      * Refactored the UsedBufferDetector to differentiate between used parameters by data and shape variables, enhancing clarity in buffer management.
      * Updated logic to ensure minimal carrier buffers are selected for shape symbols, improving the efficiency of parameter handling.
      0788feb8
    • Lei Wang's avatar
      [Bugfix] Convey `compile_flags` to ffi compilation path with pass_configs (#1434) · fba12a5f
      Lei Wang authored
      * [Enhancement] Add device compile flags support in pass configuration
      
      * Introduced `kDeviceCompileFlags` option in the pass configuration to allow additional device compiler flags for CUDA compilation.
      * Updated the `tilelang_callback_cuda_compile` function to merge extra flags from the pass configuration, enhancing flexibility in compiler options.
      * Modified the `JITKernel` class to handle device compile flags appropriately, ensuring they are included during compilation.
      * Documented the new pass configuration key for clarity on usage and expected input formats.
      
      * lint fix
      
      * [Refactor] Simplify compile_flags handling in JIT functions
      
      * Removed redundant string check for compile_flags in the compile, jit, and lazy_jit functions, ensuring compile_flags is consistently treated as a list.
      * Updated the JITKernel class to handle compile_flags as a list when a string is provided, enhancing code clarity and maintainability.
      
      * lint fix
      
      * fix
      fba12a5f
    • dependabot[bot]'s avatar
      [CI]: Bump actions/download-artifact from 6 to 7 (#1432) · 87e9e170
      dependabot[bot] authored
      Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 6 to 7.
      - [Release notes](https://github.com/actions/download-artifact/releases)
      - [Commits](https://github.com/actions/download-artifact/compare/v6...v7
      
      )
      
      ---
      updated-dependencies:
      - dependency-name: actions/download-artifact
        dependency-version: '7'
        dependency-type: direct:production
        update-type: version-update:semver-major
      ...
      Signed-off-by: default avatardependabot[bot] <support@github.com>
      Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
      87e9e170
    • dependabot[bot]'s avatar
      [CI]: Bump actions/upload-artifact from 5 to 6 (#1431) · 3aa6938f
      dependabot[bot] authored
      Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 5 to 6.
      - [Release notes](https://github.com/actions/upload-artifact/releases)
      - [Commits](https://github.com/actions/upload-artifact/compare/v5...v6
      
      )
      
      ---
      updated-dependencies:
      - dependency-name: actions/upload-artifact
        dependency-version: '6'
        dependency-type: direct:production
        update-type: version-update:semver-major
      ...
      Signed-off-by: default avatardependabot[bot] <support@github.com>
      Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
      3aa6938f
    • Lei Wang's avatar
      [Refactor] Phase out the primitives folder since its design has been merged into tileop (#1429) · 89521e63
      Lei Wang authored
      * Phase out primitives
      
      * revert changes
      
      * Refactor GemmWarpPolicy method signature for clarity
      
      Updated the `from_warp_partition` method in the `GemmWarpPolicy` class to return the type `GemmWarpPolicy` instead of a string, enhancing type safety and clarity in the codebase. Removed an unnecessary blank line for improved readability.
      
      * fix
      89521e63
  2. 13 Dec, 2025 2 commits
    • Lei Wang's avatar
      [CUDA] Add read-only parameter annotation for CUDA codegen (#1416) · 00dd7388
      Lei Wang authored
      * [Enhancement] Add read-only parameter annotation for CUDA codegen
      
      * Introduced the `AnnotateReadOnlyParams` transformation to annotate read-only handle parameters in PrimFuncs, enabling the generation of `const` qualifiers in CUDA codegen.
      * Updated `PrintFunctionSignature` and `AddFunction` methods to utilize the new attribute `tl.readonly_param_indices`, enhancing performance by allowing read-only cache loads.
      * Modified the optimization pipeline to include the new annotation step, improving the overall efficiency of the code generation process.
      
      * lint fix
      
      * [Dependency] Update apache-tvm-ffi version to >=0.1.3
      
      * Updated the version of apache-tvm-ffi in pyproject.toml, requirements.txt, and requirements-dev.txt to ensure compatibility with the latest features and fixes.
      * Made adjustments in CUDA and HIP template files to use `const` qualifiers for global pointer parameters, enhancing code safety and clarity.
      
      * lint fix
      
      * [Enhancement] Refactor ReadWriteMarker for improved parameter handling
      
      * Updated the ReadWriteMarker class to accept a set of parameter or data variables, enhancing its ability to track written variables.
      * Introduced a new method, ResolveDataVarFromPtrArg, to resolve underlying buffer data from pointer-like arguments, improving accuracy in identifying written variables.
      * Modified the MarkReadOnlyParams function to gather handle parameters and their corresponding buffer data variables, streamlining the process of determining read-only parameters.
      * Enhanced the logic for identifying written variables to account for aliased data variables, ensuring comprehensive tracking of modifications.
      
      * lint fix
      
      * Update tma_load function to use const qualifier for global memory pointer
      
      * Changed the parameter type of gmem_ptr in the tma_load function from void* to void const* to enhance type safety and clarity in memory operations.
      * This modification ensures that the function correctly handles read-only global memory pointers, aligning with best practices in CUDA programming.
      
      * Remove commented-out code and reorder transformations in OptimizeForTarget function for clarity
      
      * Refactor buffer marking logic in annotate_read_only_params.cc to improve accuracy in identifying written variables. Update OptimizeForTarget function to reorder transformations for better clarity.
      00dd7388
    • Lei Wang's avatar
      [Atomic] Use ptr for atomicAdd dst instead of reference (#1425) · 3546e2ee
      Lei Wang authored
      * [Enhancement] Update AtomicAdd function signature to accept pointer to destination
      
      * Modified AtomicAdd in CUDA to take a pointer instead of a reference for the destination argument.
      * Updated related code in atomicadd_vectorize.cc to ensure compatibility with the new signature.
      * Adjusted Python interface in atomic.py to pass the destination by pointer, aligning with device function requirements.
      
      * [Enhancement] Refactor AtomicAddRet function signature to accept pointer
      
      * Updated AtomicAddRet in both CUDA and HIP to take a pointer instead of a reference for the address argument, improving consistency with the AtomicAdd function.
      * Adjusted the implementation to ensure proper reinterpretation of the address type for atomic operations.
      
      * lint fix
      
      * [Enhancement] Refactor AtomicAddNode::MakeSIMTLoop to use destination pointer
      
      * Updated the MakeSIMTLoop function to build a pointer to the destination element using tvm_access_ptr instead of loading the destination value directly.
      * Simplified the handling of source and destination predicates, improving clarity and maintainability of the code.
      * Ensured compatibility with the new pointer-based approach for atomic operations.
      
      * lint fix
      
      * test fix
      
      * lint fix
      3546e2ee
  3. 12 Dec, 2025 5 commits
    • Lei Wang's avatar
      29051439
    • Xiangwen Wang's avatar
      [Enhancement] Improve vectorization invariant check (#1398) · e84b24bc
      Xiangwen Wang authored
      * Improve loop vectorize
      
      * Improve loop vectorize
      
      * Improve loop vectorize
      
      * Improve loop vectorize
      
      * Improve loop vectorize
      
      * Add some vectorize tests and comments
      e84b24bc
    • Lei Wang's avatar
      [Enhancement] Introduce `T.__ldg` (#1414) · 6f67da84
      Lei Wang authored
      * [Enhancement] Add __ldg intrinsic for CUDA read-only cache loads
      
      * Introduced the __ldg intrinsic to enable explicit read-only cached loads from global memory in CUDA.
      * Updated the corresponding documentation and added support in both CUDA and HIP code generation.
      * Enhanced the Python interface for __ldg to accept BufferLoad and Buffer types, improving usability.
      
      * [Enhancement] Update formatting and linting rules in pyproject.toml; minor test adjustment
      
      * Added new formatting rules in pyproject.toml to enforce consistent code style, including hanging indents and argument splitting.
      * Updated test_tilelang_language_intrinsics_codegen.py to improve readability by adding a blank line before the main execution block.
      * Refactored error messages in builtin.py for better clarity and consistency, ensuring proper formatting in function definitions and raising ValueErrors.
      
      * lint fix
      6f67da84
    • Lei Wang's avatar
    • Lei Wang's avatar
      [Dependency] Add torch-c-dlpack-ext to project requirements (#1403) · ba2c1856
      Lei Wang authored
      
      
      * [Dependency] Add torch-c-dlpack-ext to project requirements
      
      * Added torch-c-dlpack-ext to both pyproject.toml and requirements.txt to provide prebuilt torch extensions, which may prevent JIT compilation on first import of TVM FFI.
      
      * [Build] Update manylinux images in project configuration
      
      * Changed the manylinux image for x86_64 from "manylinux2014" to "manylinux_2_28" in both pyproject.toml and the Dockerfile to align with updated standards for compatibility and performance.
      
      * [Build] Update CUDA repository configuration in pyproject.toml
      
      * Changed the package manager command from `yum-config-manager` to `dnf config-manager` for adding the CUDA repository, ensuring compatibility with newer systems.
      
      * fix
      
      * [Build] Update CUDA repository to RHEL 8
      
      * Changed the CUDA repository configuration in both pyproject.toml and the manylinux Dockerfile from RHEL 7 to RHEL 8, ensuring compatibility with newer systems.
      
      * test: run out of space
      
      * use cu130 to reduce size
      
      * upd
      
      * upd comment
      
      * upd
      
      ---------
      Co-authored-by: default avatarYour Name <wenji.yyc@alibaba-inc.com>
      ba2c1856
  4. 11 Dec, 2025 5 commits
  5. 10 Dec, 2025 4 commits
    • danielhua23's avatar
      d19142f6
    • Lei Wang's avatar
      [Enhancement] Refactor inflight computing to support dynamic pipeline extents (#1399) · f2858fa1
      Lei Wang authored
      * [Build] Update CMake configuration for tilelang_cython_wrapper installation
      
      - Adjusted output directories for the tilelang_cython_wrapper to ensure that development builds place the extension in build/lib.
      - Updated installation paths to place the extension in tilelang/lib within the wheel, improving organization and avoiding potential conflicts with other modules.
      - Modified the internal library path exposure in env.py to prevent shadowing of common module names, enhancing compatibility and usability in user projects.
      
      * [Build] Standardize output directories for tilelang libraries
      
      - Set output directories for both tilelang and tilelang_module libraries to "${CMAKE_BINARY_DIR}/lib" for consistency in development builds.
      - This change enhances organization and ensures that all build artifacts are located in a unified directory structure.
      
      * [Refactor] Update TVM subproject and enhance pipeline loop handling
      
      - Updated the TVM subproject to commit 90581fe9e5287bbcf1844ad14255a1e1e8cdf7f0.
      - Added new fields to `PipelineAnnotation` and `RewrittenBlockInfo` structures to track original statement indices and improve async state management.
      - Refactored `EmitImpl` and `PopulateWaitCounts` methods to enhance clarity and functionality, including better handling of commit groups and wait counts.
      - Simplified access index calculations and strengthened analyzer constraints for loop bounds.
      
      * [Cleanup] Remove license block and unused includes from inject_pipeline.cc
      
      - Eliminated the Apache license block from the top of the file to streamline the code.
      - Removed unused include directives for memory and stringstream to enhance code clarity and reduce unnecessary dependencies.
      
      * [Refactor] Enhance transformation pipeline and test execution
      
      - Added an additional Simplify transformation in the InjectSoftwarePipeline to improve optimization.
      - Updated the test file to call `test_trival_pipeline()` directly, commenting out the previous main execution for better test isolation.
      f2858fa1
    • Chaofan Lin's avatar
      [Doc] Update logging docs (#1395) · bc084aa4
      Chaofan Lin authored
      bc084aa4
    • Kuris's avatar
  6. 08 Dec, 2025 2 commits
    • Zhengju Tang's avatar
      [BugFix] Fix split kernel layout bug of GQA decode (#1386) · 242b43bb
      Zhengju Tang authored
      * [BugFix] Fix split kernel layout bug of GQA decode
      
      * [BugFix] Avoid local with Parallel; use robust fragment instead
      242b43bb
    • Lei Wang's avatar
      [Bugfix][Build] Update CMake configuration to remove project root injection for sys.path (#1385) · d933d65b
      Lei Wang authored
      * [Build] Update CMake configuration for tilelang_cython_wrapper installation
      
      - Adjusted output directories for the tilelang_cython_wrapper to ensure that development builds place the extension in build/lib.
      - Updated installation paths to place the extension in tilelang/lib within the wheel, improving organization and avoiding potential conflicts with other modules.
      - Modified the internal library path exposure in env.py to prevent shadowing of common module names, enhancing compatibility and usability in user projects.
      
      * [Build] Standardize output directories for tilelang libraries
      
      - Set output directories for both tilelang and tilelang_module libraries to "${CMAKE_BINARY_DIR}/lib" for consistency in development builds.
      - This change enhances organization and ensures that all build artifacts are located in a unified directory structure.
      d933d65b
  7. 07 Dec, 2025 2 commits
    • Lei Wang's avatar
      [Typing] Enhance compatibility for advanced typing features in Python (#1382) · 305c854b
      Lei Wang authored
      - Updated `allocate.py` and `annot.py` to improve compatibility with Python 3.9 and later by conditionally importing advanced typing features such as `TypeVarTuple`, `Unpack`, and `ParamSpec`.
      - Added fallback imports from `typing_extensions` for environments using earlier Python versions.
      - Improved handling of generic alias detection to ensure consistent behavior across different Python versions.
      305c854b
    • Lei Wang's avatar
      [Release] Bump Version into 0.1.7 (#1377) · ce16e479
      Lei Wang authored
      * Update VERSION to 0.1.7
      
      * Update Python version in distribution scripts to support CPython 3.9 and log output
      ce16e479
  8. 06 Dec, 2025 8 commits
    • Lei Wang's avatar
      6021f863
    • Yunqian Fan's avatar
      [Fix] typo in cuda attr (#1380) · 8f50c122
      Yunqian Fan authored
      * [Bugfix] make cuda driver api compat with cuda12/13, along with tests
      
      * fix typo in cudaDevAttr
      8f50c122
    • Yunqian Fan's avatar
    • Lei Wang's avatar
      [Builder] Enhance variable name binding and scope management (#1378) · 3f8e6b59
      Lei Wang authored
      - Improved handling of TVM Var/Buffer names to prevent out-of-scope errors when reusing Python names across different for-frames.
      - Added assertions to ensure variables are defined within the correct control flow frame, enhancing error checking and code reliability.
      3f8e6b59
    • Kuris's avatar
      [Language] Tilelang LazyJIT Experimental Version (#1337) · 0921328d
      Kuris authored
      
      
      * initial step
      
      * modify builder
      
      * scratch version of new frontend
      
      * write some tests
      
      * add many tests
      
      * add typing stub for tir.ir
      
      * remove idents
      
      * minor update
      
      * minor update
      
      * First version of jitv2 (renamed to LazyJIT)
      
      * fix pre-commit error
      
      * minor fix
      
      * fix lint error
      
      * fix lint error
      
      * Fix conditional check for PrimFunc instance
      
      ---------
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      0921328d
    • Yichen Yan's avatar
    • Cunxiao Ni's avatar
      [Tool] Provide layout visualization tool (#1353) · 924225ed
      Cunxiao Ni authored
      * Provide layout visualization tool
      
      Adds a layout visualization tool to TileLang, which helps users understand and debug the layout transformations applied during compilation.
      
      This tool visualizes the memory layout of tensors at different stages of the compilation process, allowing developers to identify potential inefficiencies and optimize their code for better performance.
      
      The visualization can be enabled via a pass config option.
      
      * format
      
      * add layout visual example
      
      * Adds vis extra with matplotlib dependency
      
      * rafactor pass config name
      
      * fix lint
      
      * Enables configurable layout visualization formats
      
      Allows users to specify the output formats (png, pdf, svg) for layout visualization through a pass config option.
      
      This change provides more flexibility in how layout visualizations are generated, allowing users to choose the formats that best suit their needs.
      
      It also fixes a bug where layout visualization was not correctly disabled when the config option was set to "false".
      
      * Adds visual layout inference tool docs
      
      * fix lint
      
      * fix lint
      
      * Rafactor configurable layout visualization formats
      
      * fix lint
      
      * fix typo
      
      * add some comments
      
      * fix lints
      
      * add some warnings for user
      
      * Moves layout visualization
      
      * Refactors layout visualization pass configuration
      
      Updates the layout visualization pass configuration to use boolean flag for enabling and a string for specifying formats.
      
      * Enables multiple layout visualization formats
      
      * Updates layout visualization docs
      
      * Moves layout visualization to analysis
      924225ed
    • Lei Wang's avatar
      [Enhancement] Introduce buffer var lca analysis for pass plan buffer allocations (#1376) · f8e7fef5
      Lei Wang authored
      * Update submodule TVM to latest commit and add PlanAndUpdateBufferAllocationLocation function to transform module
      
      - Updated the TVM submodule to commit 3a32b763.
      - Added a new function `PlanAndUpdateBufferAllocationLocation` in the transform module to facilitate buffer allocation planning within PrimFuncs.
      
      * Refactor buffer allocation code for improved readability and consistency
      
      - Updated formatting and spacing in `plan_update_buffer_allocation_location.cc` for better code clarity.
      - Standardized the use of pointer and reference syntax across various class methods.
      - Enhanced comments for better understanding of buffer allocation logic.
      - Removed unnecessary lines and improved overall code structure.
      
      * Refactor buffer allocation checks for improved clarity
      
      - Replaced size checks with empty checks for `ffi::Array<Buffer>` in `plan_update_buffer_allocation_location.cc` to enhance code readability.
      - Updated conditions in multiple methods to use `empty()` instead of comparing size to zero, streamlining the logic.
      f8e7fef5
  9. 05 Dec, 2025 1 commit
    • Lei Wang's avatar
      [Layout] Enhance Free Layout Inference (#1375) · 6654064d
      Lei Wang authored
      * [Refactor] Update condition for benchmarking in example_gemv.py and simplify cached library path handling in sparse.py
      
      * [Enhancement] Extend support for float8 data types in GEMM operations
      
      - Updated GEMM operations to recognize additional float8 data types: `float8_e4m3fn` and `float8_e5m2fnuz`.
      - Refactored condition checks in `checkWgmma` methods to simplify float8 type handling.
      - Adjusted test cases to ensure compatibility with the new float8 types in tile language examples.
      
      * lint fix
      
      * [Enhancement] Add injective layout detection and exception handling
      
      - Introduced `DetectInjective` method in `FragmentNode` to check for injective layouts.
      - Added `LoopLayoutInjectiveException` to handle errors related to non-injective layouts.
      - Updated `InferLayout` methods in `ParallelOpNode` to utilize injective checks and log relevant information.
      - Refactored layout inference queue management to use `std::deque` for improved performance and added prioritization logic for buffer layouts.
      
      * remove debug print
      
      * remove debug print
      
      * remove debug print
      
      * minor layout fix
      
      * fix for T.view
      
      * [Enhancement] Improve injective layout detection in FragmentNode
      
      - Updated the `DetectInjective` method to handle symbolic dimensions more effectively by introducing a mechanism to collect symbolic shapes and adjust the detection level accordingly.
      - Added logging for cases where the layout detection falls back to NoCheck due to symbolic dimensions.
      - Minor update to the test file to include the tilelang testing module.
      
      * [Refactor] Simplify layout inference for bulk copy operations
      
      - Removed unnecessary conditions for bulk load/store operations in the layout inference logic.
      - Streamlined the handling of layout application for bulk copy instances to enhance clarity and maintainability.
      
      * remove debug print
      
      * [Enhancement] Introduce layout-related exceptions and improve error handling
      
      - Added `LayoutConflictException` and `LoopLayoutInjectiveException` classes for better exception management in layout operations.
      - Updated `InferLayout` method in `ParallelOpNode` to throw `LoopLayoutInjectiveException` with detailed error information when injective layout checks fail.
      - Removed redundant exception class definitions from `parallel.h` to streamline code organization.
      6654064d