1. 12 Dec, 2025 1 commit
    • Lei Wang's avatar
      [Dependency] Add torch-c-dlpack-ext to project requirements (#1403) · ba2c1856
      Lei Wang authored
      
      
      * [Dependency] Add torch-c-dlpack-ext to project requirements
      
      * Added torch-c-dlpack-ext to both pyproject.toml and requirements.txt to provide prebuilt torch extensions, which may prevent JIT compilation on first import of TVM FFI.
      
      * [Build] Update manylinux images in project configuration
      
      * Changed the manylinux image for x86_64 from "manylinux2014" to "manylinux_2_28" in both pyproject.toml and the Dockerfile to align with updated standards for compatibility and performance.
      
      * [Build] Update CUDA repository configuration in pyproject.toml
      
      * Changed the package manager command from `yum-config-manager` to `dnf config-manager` for adding the CUDA repository, ensuring compatibility with newer systems.
      
      * fix
      
      * [Build] Update CUDA repository to RHEL 8
      
      * Changed the CUDA repository configuration in both pyproject.toml and the manylinux Dockerfile from RHEL 7 to RHEL 8, ensuring compatibility with newer systems.
      
      * test: run out of space
      
      * use cu130 to reduce size
      
      * upd
      
      * upd comment
      
      * upd
      
      ---------
      Co-authored-by: default avatarYour Name <wenji.yyc@alibaba-inc.com>
      ba2c1856
  2. 07 Dec, 2025 1 commit
  3. 05 Nov, 2025 1 commit
  4. 31 Oct, 2025 1 commit
    • Lei Wang's avatar
      [Release] Bump version to v0.1.6.post2 (#1160) · c37621c5
      Lei Wang authored
      * [Release] Update README and VERSION for v0.1.6.post2 compatibility with Python 3.8
      
      * [Enhancement] Update packaging configuration and Docker scripts for multi-architecture support
      
      * Add allowlist for TVM, CUTLASS, and Composable Kernel items in pyproject.toml
      * Enhance docker_local_distribute.sh to support cross-architecture builds using docker buildx
      * Modify pypi.manylinux.Dockerfile to accept TARGETARCH argument for better architecture handling
      
      * [Enhancement] Improve Docker scripts and build process for multi-architecture support
      
      * Update .gitignore to include dist directories
      * Refactor docker_local_distribute.sh for better cross-architecture handling and error management
      * Enhance docker_pypi_distribute.sh to support multi-architecture builds with docker buildx
      * Modify pypi_distribution.sh to clean up additional directories
      * Update pypi.manylinux.Dockerfile for improved environment configuration and architecture handling
      
      * fix
      
      * Remove outdated classifier for Artificial Intelligence from pyproject.toml
      
      * Update pyproject.toml classifiers and modify Docker distribution scripts for clarity
      
      * Add new classifier for Artificial Intelligence in pyproject.toml
      * Rename output directories in docker_local_distribute.sh and docker_pypi_distribute.sh for better context
      c37621c5
  5. 14 Oct, 2025 1 commit
  6. 13 Oct, 2025 1 commit
    • Yichen Yan's avatar
      [Build] Migrate to scikit-build-core (#939) · d89ba5b8
      Yichen Yan authored
      
      
      * cleanup
      
      * init
      
      * build first wheel that may not work
      
      * build cython ext
      
      * fix tvm build
      
      * use sabi
      
      * update rpath to support auditwheel
      
      * pass editible build
      
      * update ci
      
      * fix warnings
      
      * do not use ccache in self host runner
      
      * test local uv cache
      
      * test pip index
      
      * update lib search to respect new lib location
      
      * fix
      
      * update ci
      
      * enable cuda by default
      
      * update src map
      
      * fix
      
      * fix
      
      * fix
      
      * Generate version with backend and git information at build time
      
      * copy tvm_cython to wheels
      
      * fix tvm lib search
      
      * fmt
      
      * remove unused
      
      * auto detect ccache
      
      * add back backend-related files
      
      * remove jit cython adaptor to simplify code
      
      * fmt
      
      * fix ci
      
      * ci fix 2
      
      * ci fix 3
      
      * workaround metal
      
      * ci fix 4
      
      * fmt
      
      * fmt
      
      * Revert "ci fix 4"
      
      This reverts commit d1de8291c3e40927955f3ad3cf87a75c78813676.
      
      * tmp
      
      * fix metal
      
      * trivial cleanup
      
      * add detailed build-time version for cuda
      
      * add back mlc
      
      * Restore wheel info and other trivial updates
      
      * update
      
      * fix cuda
      
      * upd
      
      * fix metal ci
      
      * test for ga build
      
      * test for nvidia/cuda
      
      * test ubuntu 20
      
      * fix
      
      * fix
      
      * Do not use `uv build`
      
      * fix
      
      * fix
      
      * log toolchain version
      
      * merge wheel
      
      * update
      
      * debug
      
      * fix
      
      * update
      
      * skip rocm
      
      * update artifacts each
      
      * fix
      
      * fix
      
      * add mac
      
      * fix cache
      
      * fix cache
      
      * fix cache
      
      * reset and add comment
      
      * upd
      
      * fix git version
      
      * update deps
      
      * trivial update
      
      * use in-tree build dir and install to src to speedup editable build
      
      * Revert "use in-tree build dir and install to src to speedup editable build"
      
      This reverts commit 6ab87b05c5eed811210136b8dca4fc3677dd51f2.
      
      * add build-dir
      
      * update docs
      
      * remove old scrips
      
      * [1/n] cleanup scripts
      
      * [Lint]: [pre-commit.ci] auto fixes [...]
      
      * fix and update
      
      * wait for tvm fix
      
      * revert some tmp fix
      
      * fix
      
      * fix
      
      * spell
      
      * doc update
      
      * test cibuildwheel
      
      * fix and test macos on ci
      
      * Update .github/workflows/dist.yml
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@outlook.com>
      
      * fix
      
      * test ga event
      
      * cleanup
      
      * bump tvm to support api3
      
      * test final version
      
      * add cron
      
      * Update .github/workflows/dist.yml
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@outlook.com>
      
      * fix
      
      * test ccache for metal cibuildwheel
      
      * test newer macos
      
      * finish
      
      ---------
      Co-authored-by: default avatarpre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
      Co-authored-by: default avatarXuehai Pan <XuehaiPan@outlook.com>
      d89ba5b8
  7. 26 Sep, 2025 1 commit
    • Lei Wang's avatar
      [Dist] Provide an option to include commit ID in version (#884) · c861d8a2
      Lei Wang authored
      * Update MANIFEST.in and setup.py to include commit ID in versioning and adjust included files
      
      - Modified MANIFEST.in to include shared library files `libtvm.so` and `libtvm_runtime.so`.
      - Updated setup.py to conditionally include the commit ID in the package version based on the `WITH_COMMITID` environment variable.
      - Enhanced versioning logic in version.py to use a truncated commit ID for better compatibility.
      
      * Update setup.py and related scripts to enable commit ID inclusion in package metadata
      
      - Changed the default value of the `WITH_COMMITID` environment variable in setup.py to "True".
      - Updated tox.ini to set `WITH_COMMITID` to "TRUE" for the testing environment and "FALSE" for the build environment.
      - Modified pypi_distribution.sh to pass `WITH_COMMITID=FALSE` during the wheel build process.
      
      * Update MANIFEST.in to include additional files and directories for packaging
      
      - Added VERSION, CMakeLists.txt, and various requirements files to the package.
      - Included recursive inclusion of source files and third-party libraries, while excluding specific clang and llvm directories.
      c861d8a2
  8. 21 Sep, 2025 1 commit
    • Lei Wang's avatar
      [PATCH] Static libg++ linking fix (#854) · a3497ebc
      Lei Wang authored
      * bump version to 0.1.6
      
      * phaseout py38
      
      * py39
      
      * Update submodule 'tvm' to latest commit adc0e48
      
      * [Build] Update CMake and Python environment settings
      
      - Added static linking flags for GCC and libstdc++ in CMakeLists.txt to enhance library linking.
      - Removed the cmake version requirement from pyproject.toml to allow for broader compatibility.
      - Updated the tox command in the Docker distribution script to include Python 3.8 for testing environments.
      
      * [Build] Update Python version requirements in scripts and documentation
      
      - Changed Python version requirement in README.md from 3.9+ to 3.8+.
      - Updated installation and testing scripts to use Python 3.8 instead of 3.9, ensuring compatibility with the new minimum version.
      - Adjusted tox commands in local and PyPI distribution scripts to include Python 3.8 in the testing environments.
      
      * [Build] Update Python and CMake requirements in Dockerfile and pyproject.toml
      
      - Added CMake version requirement (>=3.26) to pyproject.toml for build compatibility.
      - Created a Python 3.8 environment in the Dockerfile and added a symlink for easier access to the Python 3.8 executable.
      
      * [Build] Update CMake and Dockerfile for improved compatibility
      
      - Removed static linking flags from CMakeLists.txt to simplify build configuration.
      - Updated Dockerfile to use Ubuntu 20.04 and streamlined the installation of dependencies, removing gcc-9 and g++-9.
      - Adjusted symlink creation for Python environments to use the `-sf` option for safer linking.
      
      * [Build] Bump version to 0.1.6.post1 for post-release updates
      
      * [Build] Remove static linking flags from CMakeLists.txt
      
      - Eliminated static linking flags for GCC and libstdc++ to simplify build configuration and avoid potential conflicts with Python extensions.
      
      * [Build] Update Docker distribution scripts for manylinux compatibility
      
      - Changed base image from `tilelang-builder:18.04` to `tilelang-builder:manylinux` in both local and PyPI distribution scripts.
      - Updated Dockerfile references to use `pypi.manylinux.Dockerfile`.
      - Added `--gpus all` flag to the Docker run command to enable GPU support during execution.
      
      * lint fix
      
      * add cmake
      a3497ebc
  9. 19 Sep, 2025 1 commit
    • Lei Wang's avatar
      [Release] Bump Version to 0.1.6 (#818) · 1ad6e461
      Lei Wang authored
      * bump version to 0.1.6
      
      * phaseout py38
      
      * py39
      
      * Update submodule 'tvm' to latest commit adc0e48
      
      * [Build] Update CMake and Python environment settings
      
      - Added static linking flags for GCC and libstdc++ in CMakeLists.txt to enhance library linking.
      - Removed the cmake version requirement from pyproject.toml to allow for broader compatibility.
      - Updated the tox command in the Docker distribution script to include Python 3.8 for testing environments.
      
      * [Build] Update Python version requirements in scripts and documentation
      
      - Changed Python version requirement in README.md from 3.9+ to 3.8+.
      - Updated installation and testing scripts to use Python 3.8 instead of 3.9, ensuring compatibility with the new minimum version.
      - Adjusted tox commands in local and PyPI distribution scripts to include Python 3.8 in the testing environments.
      
      * [Build] Update Python and CMake requirements in Dockerfile and pyproject.toml
      
      - Added CMake version requirement (>=3.26) to pyproject.toml for build compatibility.
      - Created a Python 3.8 environment in the Dockerfile and added a symlink for easier access to the Python 3.8 executable.
      1ad6e461
  10. 18 Sep, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Refactor some build related configurations (#827) · 232782dd
      Lei Wang authored
      * bugfix
      
      * [Build] Update build dependencies and Dockerfile configuration
      
      - Updated `pyproject.toml` and `requirements-build.txt` to specify Cython version as `Cython>=3.0.0`.
      - Removed unnecessary dependencies from the build system.
      - Enhanced `pypi.Dockerfile` to install gcc-9 and g++-9, and added ninja-build for improved build performance.
      - Updated conda environment creation to include Python 3.9 to 3.12, while removing the Python 3.8 environment.
      
      * cmake fix
      
      * fix
      
      * fix
      232782dd
  11. 19 Aug, 2025 1 commit
    • Zhengju Tang's avatar
      [Feature] Low-bit twiddling dequantization and FP4 GEMM (#725) · 24603e4a
      Zhengju Tang authored
      
      
      * [Dequant] Add bit-twiddling dequantize cuda for fp4-->bf16
      
      * [Dequant] Add extern call and serial dequantization
      
      * [Dequant] Parallel Dequant wait for fence debug.
      
      * [Scale] Add scale matrix to mxfp4 gemm
      
      * [Remove] Remove fence-buggy example and some generated source cuda code
      
      * [MXFP4] Update initial version of MXFP4 GEMM
      
      * [Scale] Add scale to latest mxfp4 gemm
      
      * [Lint]
      
      * [BugFix] Load Scale, disabe TMA to recover performance
      
      * [Lint]
      
      * [Lint]
      
      * [Scale] Use L2 to hold Scale and enable TMA will slightly boost performance
      
      * [Lint]
      
      * Update example_dequant_gemm_bf16_fp4_hopper_serial.py
      
      * Remove deprecated dequantization examples for BF16 and MXFP4 in the dequantize_gemm directory.
      
      * Refactor dequantization examples for improved readability and consistency. Adjusted formatting in matmul function and added spacing for clarity. Updated function signatures and comments for better understanding.
      
      * Refactor index_to_coordinates usage in bitnet example and update dequantization example configurations. Removed the custom index_to_coordinates function and replaced it with the built-in version. Adjusted block_K parameter in dequantization example for consistency.
      
      * lint fix
      
      * ci fix
      
      * Remove non-existent example
      
      * [BugFix] Add smem swizzle to recover performance of TMA
      
      * [BugFix] Enough reg for producer when threads=512
      
      ---------
      Co-authored-by: default avatarLei Wang <34334180+LeiWang1999@users.noreply.github.com>
      Co-authored-by: default avatarLeiWang1999 <leiwang1999@outlook.com>
      24603e4a
  12. 24 Jul, 2025 1 commit
  13. 05 Jun, 2025 1 commit
    • Lei Wang's avatar
      [Release] Bump Version to 0.1.5 (#551) · cc5e9f7c
      Lei Wang authored
      * Update VERSION to 0.1.5
      
      * Add DEBUG_MODE support in setup.py and update CMake build type; enhance pypi.Dockerfile with git installation
      cc5e9f7c
  14. 28 May, 2025 1 commit
    • Lei Wang's avatar
      [Autotune] Introduce cache mechanism for auto tuner (#527) · 7171aff6
      Lei Wang authored
      * [Enhancement] Add commit ID to versioning and improve logging initialization
      
      * Updated `get_tilelang_version` to include an optional commit ID in the version string.
      * Enhanced the `TileLangBuilPydCommand` to write the version with commit ID to the VERSION file during the build process.
      * Introduced a new function `get_git_commit_id` in `version.py` to retrieve the current git commit hash.
      * Refactored logger initialization in `autotuner/__init__.py` to ensure handlers are set up only once, improving performance and clarity.
      * Minor fixes in `flatten_buffer.cc` and `kernel_cache.py` for better handling of versioning and logging.
      
      * [Refactor] Enhance AutoTuner and JITKernel for improved performance and caching
      
      * Refactored the AutoTuner class to include new methods for setting compilation and profiling arguments, enhancing configurability.
      * Introduced caching mechanisms for tuning results, allowing for faster retrieval of previously computed configurations.
      * Updated JITKernel to store tuning results, including latency and configuration details, improving the kernel's performance tracking.
      * Added new methods for generating cache keys and saving/loading results to/from disk, streamlining the tuning process.
      * Enhanced the overall structure and readability of the autotuning logic, ensuring better maintainability and clarity.
      * Minor adjustments in related modules to support the new caching and profiling features.
      
      * [Refactor] Clean up code formatting and improve readability in AutoTuner and related modules
      
      * Consolidated import statements and removed unnecessary line breaks for better readability.
      * Standardized function argument formatting across the AutoTuner and CompileArgs classes.
      * Enhanced consistency in the use of whitespace and indentation throughout the codebase.
      * Minor adjustments in the Profiler and JITKernel classes to improve clarity and maintainability.
      * Ensured that all changes adhere to the project's coding style guidelines.
      
      * [Refactor] Remove redundant type hints in AutoTuner modules
      
      * Simplified import statements in `__init__.py` and `param.py` by removing unnecessary duplicate type hints for `Any`.
      * Improved code readability and maintainability by streamlining type imports across the AutoTuner module.
      
      * [Refactor] Update AutoTuner configuration for improved profiling and target detection
      
      * Enhanced the AutoTuner configuration across multiple examples by adding `set_profile_args` to better manage profiling settings.
      * Standardized the use of `target="auto"` in compile arguments to ensure automatic target detection.
      * Removed redundant target specifications in certain instances to streamline the configuration process.
      * Improved overall clarity and maintainability of the autotuning logic in various example scripts.
      
      * [Refactor] Simplify code formatting and improve readability in example scripts
      
      * Consolidated function argument formatting in `benchmark_mla_decode_amd_tilelang.py`, `example_elementwise_add.py`, and `performance.py` for better clarity.
      * Removed unnecessary line breaks and standardized argument placement across multiple files.
      * Enhanced overall code readability and maintainability in autotuning examples and performance scripts.
      
      * [Refactor] Update JIT decorator usage across multiple files
      
      * Removed redundant parameters from the JIT decorator in various benchmark and example scripts, simplifying the code.
      * Standardized the import of the JIT decorator from `tilelang`, enhancing consistency across the codebase.
      * Improved overall readability and maintainability by consolidating import statements and cleaning up function definitions.
      
      * [Refactor] Standardize JIT decorator formatting across benchmark and example scripts
      
      * Simplified the formatting of the JIT decorator in multiple files by removing unnecessary line breaks.
      * Enhanced code readability and consistency in the usage of the JIT decorator across benchmark and example scripts.
      * Improved overall maintainability by ensuring uniformity in function definitions and decorator usage.
      7171aff6
  15. 27 Mar, 2025 1 commit
    • Wenhao Xie's avatar
      [Doc] Python API docs generation (#278) · 5501b31c
      Wenhao Xie authored
      * fix bug
      
      * update performance.py
      
      * update python api docs
      
      * test workflow
      
      * fix dependency
      
      * fix bug
      
      * fix
      
      * update correct git config
      
      * test workflow
      
      * clear cache
      
      * lint fix
      
      * fix exclude path
      5501b31c
  16. 26 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Deprecated `T.Buffer` as arguments and rename related calls into `T.Tensor` (#281) · bf8a6fc1
      Lei Wang authored
      * [Refactor] Improve flash attention example and layout comparison logic
      
      - Removed unnecessary annotation for `lse_local_split` in the flash attention example to streamline the code.
      - Updated the handling of `lse_local_split` to utilize parallel processing for better performance.
      - Refactored kernel compilation and profiling logic to enhance clarity and maintainability in the flash attention example.
      - Added a condition in `FragmentNode::IsEqual` to handle broadcast cases, improving the robustness of layout comparisons.
      
      * lint fix
      
      * [Enhancement] Add support for shared memory scope in Fill operation
      
      - Introduced handling for `shared.dyn` and `shared` memory scopes in the Fill operation.
      - Implemented parallel operation and layout inference for improved performance in shared memory scenarios.
      - Updated thread loop partitioning and vectorization logic to accommodate new memory scope handling.
      
      * [Refactor] Remove deprecated decorator and enhance Cython kernel handling
      
      - Removed the deprecated decorator from the main module and added a new implementation in the utils module for better organization.
      - Introduced a pointer map in the Cython kernel adapter to manage pointer arguments, improving runtime shape resolution.
      - Updated the Cython kernel wrapper to utilize the new pointer map for handling kernel arguments.
      - Enhanced error checking in the tensor utility functions to ensure static shapes are enforced.
      - Added a new proxy module for buffer and tensor handling, streamlining the interface for TIR programs.
      
      * [Feature] Add matrix multiplication test and kernel implementation
      
      - Introduced a new test file `test_tilelang_language_ptr.py` that implements a matrix multiplication function using TileLang's primitives.
      - The `matmul_test` function defines a kernel for performing tile-level GEMM operations with customizable block sizes and data types.
      - Added a `run_matmul` function to compile and execute the kernel, along with a test function to validate the implementation.
      - Updated the `proxy.py` file to enhance type handling for buffer and tensor proxies, ensuring compatibility with TIR programs.
      - Minor formatting improvements in `deprecated.py` for better readability.
      
      * lint fix
      
      * [Refactor] Update tensor creation in matrix multiplication test
      
      - Replaced `T.Tensor.from_ptr` with `T.make_tensor` in `matmul_test` for improved clarity and consistency.
      - Updated imports in `__init__.py` to include `make_tensor`.
      - Added `make_tensor` function in `proxy.py` to streamline tensor creation from pointers.
      
      * [Refactor] Update tensor definitions across multiple files
      
      - Replaced instances of `T.Tensor` with updated tensor definitions in various benchmark and example files to enhance consistency and clarity.
      - Adjusted tensor shapes and types in functions related to matrix multiplication, attention mechanisms, and other operations.
      - Improved documentation in README and example files to reflect changes in tensor usage.
      
      * lint fix
      
      * [Refactor] Update tensor types in attention and matrix multiplication examples
      
      - Replaced instances of `T.Tensor` with `T.SharedTensor` and `T.FragmentTensor` in various attention and matrix multiplication functions to improve consistency and clarity.
      - Adjusted tensor definitions in benchmark and example files to align with the new tensor types.
      - Enhanced the overall structure and readability of the code by standardizing tensor usage across multiple files.
      
      * lint fix
      
      * [Refactor] Update tensor types in GEMM example and test files
      
      - Replaced instances of `T.Tensor` with `T.LocalTensor` and `T.Buffer` in the GEMM example and related test functions to improve consistency and clarity.
      - Enhanced the overall structure of the code by standardizing tensor usage across multiple files, aligning with recent updates in tensor definitions.
      
      * [Refactor] Update tensor usage in customize.py
      
      - Replaced instances of `T.Tensor` with `T.Buffer` in the `reshape` and `view` functions to enhance consistency with recent tensor definitions.
      - Improved code clarity by standardizing buffer usage across the file.
      
      * [Refactor] Update tensor types in test_tilelang_transform_annotate_device_regions.py
      
      - Replaced instances of `T.Tensor` with `T.Buffer` in the `before` and `expected` methods of the `TestAnnotateThreadExtent` and `TestAnnotateDeviceScope` classes to enhance consistency with recent tensor definitions.
      - Improved code clarity by standardizing buffer usage across the test file.
      
      * [Refactor] Update tensor types to SharedBuffer and FragmentBuffer
      
      - Replaced instances of `T.SharedTensor` and `T.FragmentTensor` with `T.SharedBuffer` and `T.FragmentBuffer` across multiple benchmark, example, and test files to enhance consistency with recent tensor definitions.
      - Improved code clarity and structure by standardizing buffer usage in attention and matrix multiplication functions.
      
      * [Refactor] Introduce Tensor alias for Buffer in proxy.py
      
      - Added a new alias `Tensor` for `Buffer` in `proxy.py` to facilitate JIT compilation, ensuring that inputs and outputs are mapped with `torch.Tensor`.
      - This change enhances clarity and consistency in tensor usage across the codebase.
      bf8a6fc1
  17. 25 Mar, 2025 1 commit
    • Wenhao Xie's avatar
      [CI] Add gemm performance test (#274) · 18f29277
      Wenhao Xie authored
      * [Typo] Fix formatting in installation instructions in README.md
      
      * [Enhancement] Improve CUDA path detection and update configuration handling
      
      * fix typo
      
      * remove IS_WINDOWS constant
      
      * lint fix
      
      * Improve error messages for CUDA detection failure
      
      * lint fix
      
      * lint fix
      
      * Fix .gitignore to correctly include venv directory
      
      * [Doc] Add instructions for installing nightly version of TileLang
      
      * update installation instructions
      
      * update install instruction
      
      * update performance ci
      
      * update
      
      * update
      
      * update
      
      * update ci workflow
      
      * delete test.yml
      
      * lint fix
      
      * update bot.yml
      
      * update bot.yml
      
      * remove changes in ci.yml
      18f29277
  18. 23 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Release] Bump version to 0.1.3 (#264) · 9981ac59
      Lei Wang authored
      * Bump version to 0.1.3
      
      * Refactor Docker script to streamline installation commands
      
      - Removed the installation of the Python environment and CMake from the Docker run command, simplifying the execution process.
      - Updated the command to focus on pip installation and running tox for testing across multiple Python versions.
      9981ac59
  19. 22 Mar, 2025 1 commit
  20. 20 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Refactor] Phaseout LLVM Dependency by Making it Optional (#247) · f2e99180
      Lei Wang authored
      * remove llvm build
      
      * [Refactor] Update kernel compilation and profiling in examples
      
      - Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation.
      - Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency.
      - Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations.
      
      * lint fix
      
      * License Update
      
      * [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files
      
      - Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields.
      - Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability.
      
      * [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files
      
      - Improved comment alignment and readability in `cuda.h`.
      - Cleaned up whitespace and formatting in `rt_mod_hip.cc` to enhance maintainability.
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * lint fix
      
      * fix
      
      * License update
      
      * [Enhancement] Update JITKernel to use artifact for kernel source
      
      - Assigned the generated artifact to `self.artifact` for better management.
      - Updated kernel source references to use `artifact.kernel_source` for consistency in execution backend handling.
      
      * lint fix
      
      * Add @tilelang.testing.requires_llvm decorator to vectorization tests
      
      * Enhance setup.py and env.py for library management
      
      - Added functionality to remove original files after copying in CMakeBuild.
      - Updated TVM_LIBRARY_PATH in env.py to include the PyPI build library path for better integration.
      
      * Refactor TVM_LIBRARY_PATH assignment for improved readability in env.py
      
      * Refactor CMakeBuild file handling in setup.py
      
      - Added a check to ensure the target library directory exists before copying .so files.
      - Improved the logic for creating the target directory and copying files to enhance robustness.
      
      * bugfix
      
      * Rename BuildTLDebug to BuildTileLangCUDAWithoutCompile and update registration. Add @tilelang.testing.requires_llvm decorator to multiple tests for LLVM requirement.
      
      * lint fix
      
      * Enhance TileLang code generation by adding support for device code generation without compilation. Updated `host_codegen` and `device_codegen` functions to include new transformations and registration for `tilelang_hip_without_compile`. Refactored JIT kernel adapters to accommodate host and device modules, improving overall integration and flexibility.
      
      * lint fix
      
      * Add support for C target in device code generation
      
      - Updated `device_codegen_without_compile` to include handling for the C target by registering the `tilelang_cpp` function.
      
      * [Enhancement] Implement auto-clear cache feature based on environment variable
      
      * Added TILELANG_CLEAR_CACHE environment variable to control cache clearing.
      * Updated CI workflow to set TILELANG_CLEAR_CACHE during testing.
      * Modified cache initialization to clear cache if TILELANG_CLEAR_CACHE is set to true.
      
      * [Refactor] Update kernel invocation and import paths in tests and cache
      
      * Changed kernel invocation in `test_tilelang_kernel_dequantize_gemm.py` to return the result.
      * Updated import statements in `test_tilelang_kernel_int4_gemm_mma.py` to use `bitblas` instead of `tilelang`.
      * Refactored paths for artifact and parameters in `kernel_cache.py` for better maintainability.
      
      * [Refactor] Clean up whitespace and improve code formatting in kernel_cache.py
      
      * Removed unnecessary blank lines and adjusted spacing for better readability in the KernelCache class.
      * Enhanced overall code formatting to align with project standards.
      
      * [Enhancement] Add bfloat16 test case and improve kernel caching logic
      
      * Introduced a new test case for bfloat16 matrix multiplication in `test_tilelang_kernel_gemm_mma_intrinsic.py`.
      * Updated `KernelCache` to handle multiple kernel source files and improve error handling during saving and loading.
      * Refactored `JITKernel` to support instantiation from a database, enhancing flexibility in kernel management.
      * Adjusted `CtypesKernelAdapter` and `CythonKernelAdapter` to utilize the new kernel loading mechanism from the database.
      * Improved code formatting and readability across several files.
      
      * lint fix
      
      * Update bfloat16 matrix multiplication test case to use larger dimensions for improved coverage
      f2e99180
  21. 07 Mar, 2025 1 commit
  22. 22 Feb, 2025 1 commit
    • Lei Wang's avatar
      [Wheel] Provide a bare docker scripts to help build wheels for manylinux (#105) · b4bd2a56
      Lei Wang authored
      * [Build] Improve build configuration and package distribution support
      
      - Add `build` to requirements-build.txt for package building
      - Update MANIFEST.in to include Cython wrapper source file
      - Enhance setup.py to improve Cython file copying logic
      - Update build scripts to support multi-Python version distribution
      
      * [Build] Improve Cython file handling in setup.py and MANIFEST.in
      
      - Remove Cython wrapper from MANIFEST.in
      - Enhance setup.py to create target directory if it doesn't exist when copying Cython files
      - Improve file copying logic for Cython source files during build process
      
      * [Build] Remove Cython file copying logic in setup.py
      
      - Comment out Cython file copying code in TileLangBuilPydCommand
      - Simplify setup.py build process by removing redundant Cython file handling
      
      * [Build] Enhance Docker distribution scripts for multi-Python version support
      
      - Refactor local and PyPI distribution Docker scripts
      - Replace hardcoded Python installation with Miniconda-based multi-version Python environment
      - Improve Docker image setup with dynamic Python version creation
      - Simplify build process by using Miniconda for Python environment management
      
      * [Build] Separate lint requirements into a dedicated file
      
      - Create new requirements-lint.txt for formatting and linting tools
      - Update format.sh to use requirements-lint.txt instead of requirements-dev.txt
      - Update requirements-dev.txt and requirements-test.txt to reference requirements-lint.txt
      - Improve dependency management by isolating lint-specific requirements
      
      * [Build] Restore Cython file copying logic in setup.py
      
      - Re-add Cython file copying mechanism in TileLangBuilPydCommand
      - Implement robust file search across multiple potential directories
      - Add warning for cases where Cython source files cannot be found
      - Improve build process reliability for Cython source files
      
      * [Build] Refactor Cython file copying logic in setup.py
      
      - Simplify Cython file copying mechanism in TileLangBuilPydCommand
      - Improve directory creation and file copying for Cython source files
      - Relocate potential directories list to a more logical position
      - Enhance robustness of file and directory handling during build process
      
      * [Build] Refine Cython file copying logic in setup.py
      
      - Improve file existence check when copying Cython source files
      - Use os.path.join to construct full path for more robust file checking
      - Enhance file copying mechanism in TileLangBuilPydCommand
      b4bd2a56
  23. 18 Feb, 2025 1 commit
    • Lei Wang's avatar
      [Wheel] Support pypi build scripts for different python via tox (#93) · fa9a19b0
      Lei Wang authored
      * bump version into v0.1.0
      
      * [Enhancement] Add custom develop command for editable installs and update .gitignore
      
      * [Documentation] Update README to include system dependencies installation instructions
      
      * [Build] Update setup.py to support library file copying for both release and develop modes
      
      * [Build] Refactor library file copying logic in setup.py
      
      * [Documentation] Remove unnecessary install section header in Installation.md
      
      * [Build] Add tox configuration and local distribution script for multi-Python version support
      
      * [Build] Improve git submodule update function with better error handling
      
      * [Build] Update LLVM configuration path in ROCm installation script
      
      * [Build] Add .tox/ to .gitignore for tox testing environment
      
      * [Build] Add support for TVM prebuild path configuration in CMakeLists.txt
      
      * [Cleanup] Remove unused TVM runtime error codes header
      
      * [Cleanup] Fix TVM grid constant type reference in CUDA module
      
      * [Cleanup] Remove unused customized_code function from IR module
      
      * [Feature] Add TileLang thread synchronization and storage access analysis passes
      
      * [Build] Reorder DLL search path directories for more flexible library loading
      
      * [Refactor] Improve thread synchronization and library path handling
      
      - Rename ThreadSync and TileLangThreadSync functions in C++ code
      - Update Python docstring for ThreadSync with more detailed description
      - Reorder library path detection in tilelang environment setup
      - Minor comment and code cleanup in CUDA and warp specialization modules
      
      * [Refactor] Improve thread synchronization code style and formatting
      
      - Standardize pointer type spacing in storage_access.h and storage_access.cc
      - Update whitespace and indentation in thread_storage_sync.cc
      - Reorder include statements in thread_partial_sync.cc
      - Minor code formatting improvements across thread synchronization files
      
      * [Refactor] Fix global function registration for ThreadSync
      
      - Correct global function registration to use ThreadSync instead of TileLangThreadSync
      - Update TVM global registration to match recent refactoring efforts
      
      * [Refactor] Simplify ThreadSync global function registration
      
      - Remove unnecessary whitespace in global function registration
      - Compact the TVM global registration line for ThreadSync
      
      * [Feature] Add WebGPU code generation support in TileLang
      
      - Implement WebGPU code generator (codegen_webgpu.cc and codegen_webgpu.h)
      - Add WebGPU target support in lower.py and target.py
      - Update CMakeLists.txt to include WebGPU codegen source files
      - Introduce WebGPU-specific code generation for WGSL shader language
      
      * [Refactor] Improve WebGPU code generation formatting and readability
      
      - Enhance code formatting in codegen_webgpu.cc and codegen_webgpu.h
      - Standardize pointer type spacing and indentation
      - Improve line breaks and reduce line length for better readability
      - Minor code style improvements in WebGPU code generation
      
      * [Test] Add WebGPU matrix multiplication code generation test
      
      - Implement test_webgpu_codegen.py for WebGPU matrix multiplication
      - Add assert_gemm_codegen function to validate WebGPU code generation
      - Include basic matrix multiplication kernel test case
      
      * Update README with WebGPU codegen support announcement
      
      * Support multi version pypi package build via tox
      fa9a19b0
  24. 13 Feb, 2025 1 commit
    • Lei Wang's avatar
      [WHL] Support whl building for different python versions via tox (#83) · f55defac
      Lei Wang authored
      * bump version into v0.1.0
      
      * [Enhancement] Add custom develop command for editable installs and update .gitignore
      
      * [Documentation] Update README to include system dependencies installation instructions
      
      * [Build] Update setup.py to support library file copying for both release and develop modes
      
      * [Build] Refactor library file copying logic in setup.py
      
      * [Documentation] Remove unnecessary install section header in Installation.md
      
      * [Build] Add tox configuration and local distribution script for multi-Python version support
      
      * [Build] Improve git submodule update function with better error handling
      f55defac
  25. 24 Jan, 2025 1 commit
  26. 20 Jan, 2025 1 commit
  27. 11 Jan, 2025 1 commit
    • Lei Wang's avatar
      [Initialization] Migration of Codebase from Dev Branch into Main (#10) · 57ab687c
      Lei Wang authored
      
      
      * Add format.sh script for code formatting and linting
      
      * docs update
      
      * center align the title
      
      * lint fix
      
      * add ignore
      
      * Add .gitignore for 3rdparty directory
      
      * Add requirements-dev.txt, requirements-test.txt, and requirements.txt
      
      * 3rdparty
      
      * Add gemm.h, CMakeLists.txt, _ffi_api.py, __init__.py, runtime.h, reduce.h, loop_partition.h, utils.h, and loop_vectorize.h
      
      * Refactor CMakeLists.txt and include statements
      
      - Update CMakeLists.txt to use a newer version of CMake and add project name
      - Remove unnecessary include directories
      
      Fix include paths in layout.cc, codegen.cc, codegen.h, rt_mod.cc, frontend_legalize.cc, inject_pipeline.cc, layout_inference.cc, loop_vectorize.cc, and lower_tile_op.cc
      
      - Update include paths to use relative paths instead of absolute paths
      
      * Update submodule for 3rdparty/tvm
      
      * update
      
      * load dll first
      
      * Refactor CMakeLists.txt and include statements
      
      * Refactor CMakeLists.txt and include statements
      
      * git keep update
      
      * Refactor CMakeLists.txt and include statements
      
      * Refactor CMakeLists.txt and include statements
      
      * refactor code structure
      
      * Update Readme
      
      * CMakeLists Customized
      
      * update readme
      
      * update README
      
      * update readme
      
      * update usage
      
      * with TVM_IMPORT_PYTHON_PATH to handle own tvm build python import
      
      * annotate lower transform global func with `transform` prefix
      
      * Migrate Simplify Pass from tilelang tvm branch
      
      * enhance system environment handling with __init__ and CMake
      
      * Initial commit
      
      * CODE_OF_CONDUCT.md committed
      
      * LICENSE committed
      
      * README.md committed
      
      * SECURITY.md committed
      
      * SUPPORT.md committed
      
      * CODE_OF_CONDUCT Commit
      
      * LICENSE Commit
      
      * SECURITY Commit
      
      * SUPPORT Commit
      
      * Modify Support
      
      * Update README.md
      
      * security ci update
      
      * remove examples
      
      * Update and implement clang-format
      
      * add composable kernel components
      
      * Migrate from latest update
      
      * submodule update
      
      * Test update
      
      * Update License
      
      * Spell check
      
      * lint fix
      
      * add clang-tidy to apply static analysis for c source
      
      * update tilelang examples
      
      * Update Install Docs
      
      * Refactor filetree
      
      * Enhance Install
      
      * conflict resloved
      
      * annotate_version
      
      * Initial Update
      
      * test fix
      
      * install
      
      * Implement setup.py
      
      * lint fix
      
      * Separate Init
      
      * Separate test
      
      * docker file commit
      
      * add logo
      
      * Update Readme and Examples
      
      * update readme
      
      * update logo
      
      * Implement AMD Installation
      
      * Add License
      
      * Update AMD MI300x Benchmark
      
      * update README
      
      * update mi300 benchmark scripts
      
      * update ignore
      
      * enhance build scirpt
      
      * update image
      
      * enhance setup.py to remove duplicated libraries
      
      * remove debug files
      
      * update readme
      
      * update image
      
      * update gemm examples
      
      * update flashattention README
      
      * readme update
      
      * add cmake into requirements
      
      * libinfo fix
      
      * auto update submodule
      
      * lint fix
      
      * Fix AMD Build and Test
      
      * Update check for transpose attribute for CDNA Arch
      
      * typo fix for amd
      
      * Implement Matmul Benchmark
      
      * Refactor Code
      
      * [TypoFix] Fix GEMM Example
      
      * [Docs] Init Linear Attention README
      
      * [TYPO] Typo fix
      
      * [Lint] Lint Fix
      
      * enhance example with intrinsics
      
      * [Enhancement] Improve Buffer Collection during IR Parser
      
      * [Dev] Introduce Current classmethod to get current frame
      
      * submodule update
      
      * fake test pass update
      
      * support thread_extent_api
      
      * code optimize
      
      * Add GEMM function implementation for matrix multiplication
      
      * Update logging format to reflect TileLang in logger messages
      
      * Refactor CMakeLists.txt for improved readability and set default build type to Release
      
      * Support Gemm SS Primitives Implementation
      
      * [README] Upload Tile Language Logo (#5)
      
      * update logo
      
      * Update README.md to enhance formatting and center the title
      
      ---------
      Co-authored-by: default avatarmicrosoft-github-operations[bot] <55726097+microsoft-github-operations[bot]@users.noreply.github.com>
      Co-authored-by: default avatarMicrosoft Open Source <microsoftopensource@users.noreply.github.com>
      Co-authored-by: default avatarYu Cheng <yu.cheng@pku.edu.cn>
      57ab687c