- 13 Oct, 2025 1 commit
-
-
Yichen Yan authored
* cleanup * init * build first wheel that may not work * build cython ext * fix tvm build * use sabi * update rpath to support auditwheel * pass editible build * update ci * fix warnings * do not use ccache in self host runner * test local uv cache * test pip index * update lib search to respect new lib location * fix * update ci * enable cuda by default * update src map * fix * fix * fix * Generate version with backend and git information at build time * copy tvm_cython to wheels * fix tvm lib search * fmt * remove unused * auto detect ccache * add back backend-related files * remove jit cython adaptor to simplify code * fmt * fix ci * ci fix 2 * ci fix 3 * workaround metal * ci fix 4 * fmt * fmt * Revert "ci fix 4" This reverts commit d1de8291c3e40927955f3ad3cf87a75c78813676. * tmp * fix metal * trivial cleanup * add detailed build-time version for cuda * add back mlc * Restore wheel info and other trivial updates * update * fix cuda * upd * fix metal ci * test for ga build * test for nvidia/cuda * test ubuntu 20 * fix * fix * Do not use `uv build` * fix * fix * log toolchain version * merge wheel * update * debug * fix * update * skip rocm * update artifacts each * fix * fix * add mac * fix cache * fix cache * fix cache * reset and add comment * upd * fix git version * update deps * trivial update * use in-tree build dir and install to src to speedup editable build * Revert "use in-tree build dir and install to src to speedup editable build" This reverts commit 6ab87b05c5eed811210136b8dca4fc3677dd51f2. * add build-dir * update docs * remove old scrips * [1/n] cleanup scripts * [Lint]: [pre-commit.ci] auto fixes [...] * fix and update * wait for tvm fix * revert some tmp fix * fix * fix * spell * doc update * test cibuildwheel * fix and test macos on ci * Update .github/workflows/dist.yml Co-authored-by:
Xuehai Pan <XuehaiPan@outlook.com> * fix * test ga event * cleanup * bump tvm to support api3 * test final version * add cron * Update .github/workflows/dist.yml Co-authored-by:
Xuehai Pan <XuehaiPan@outlook.com> * fix * test ccache for metal cibuildwheel * test newer macos * finish --------- Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Xuehai Pan <XuehaiPan@outlook.com>
-
- 10 Oct, 2025 1 commit
-
-
Xuehai Pan authored
* chore: misc cleanup * feat: add pre-commit config * chore: update lint dependencies * style: fix lint issues * feat: add pre-commit hooks * fix: fix typos * chore: update .gitattributes * [Lint]: [pre-commit.ci] auto fixes [...] * docs: update CONTRIBUTING.md * chore: update default venv name * chore: revert and exclude CUDA files --------- Co-authored-by:pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 30 Sep, 2025 1 commit
-
-
Lei Wang authored
[Example] Specify a fixed commit for the flash-linear-attention repository and optimize nsa examples (#913) - Updated the requirements.txt to specify a fixed commit for the flash-linear-attention repository. - Refactored import paths in benchmark_nsa_fwd.py for better organization. - Added a new function to generate configurations for autotuning. - Modified the tilelang_sparse_attention function to accept parameters for block size, number of stages, and threads, enhancing flexibility. - Changed allocation of shared memory for accumulators to optimize performance.
-
- 13 Apr, 2025 1 commit
-
-
Zhengju Tang authored
* [Dynamic Symbolic] Add pass_config to customize vectorization and tail split * Lint * Only check for vectorized dimension. Add docs. * Lint * Update comment for cache directory in .gitignore * Use CUTLASS convention to represent dynamic alignment. Fix bugs * Add benchmark examples * Add more benchmarks. Fix accumulate type bug. * Lint * Lint * Test Lint * Lint * Test Lint * Lint * Fix typo * Lint * Lint --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-
- 22 Mar, 2025 1 commit
-
-
Yichen Yan authored
* use auditwheel to get correct manylinux wheels * fix * make py3.8 happy * trivial updates * Add typing.Tuple import and update annotations * fmt * Remove unused import and update type hints * lint fix --------- Co-authored-by:
Lei Wang <34334180+LeiWang1999@users.noreply.github.com> Co-authored-by:
LeiWang1999 <leiwang1999@outlook.com>
-
- 21 Feb, 2025 1 commit
-
-
Lei Wang authored
* [Feature] Add CTypes JIT kernel support for dynamic shapes and multi-stream execution - Enhance CtypesKernelAdapter to handle dynamic symbolic shapes - Add support for multi-stream kernel execution in CTypes backend - Implement dynamic shape handling in test_tilelang_jit_gemm_ctypes.py - Add symbolic shape utility function in tilelang.language - Update profiler to improve flexibility in benchmark selection * Remove redundant thread binding in GEMM kernel implementations - Remove unnecessary `thread_binding` line in GEMM kernel functions - Clean up code in `examples/gemm/README.md` and `testing/python/kernel/test_tilelang_kernel_int4_gemm_mma.py` - Enhance code readability by removing redundant thread binding annotation * Fix indentation in int4 GEMM kernel test file - Correct indentation for function calls in `test_tilelang_kernel_int4_gemm_mma.py` - Remove extra indentation in `mma_emitter.ldmatrix_a()` and `mma_emitter.ldmatrix_b()` calls - Improve code formatting for better readability * [Feature] Add Cython JIT kernel support for dynamic shapes and multi-stream execution - Implement CythonKernelAdapter to handle dynamic symbolic shapes - Add support for multi-stream kernel execution in Cython backend - Create comprehensive test suite for Cython GEMM kernel in test_tilelang_jit_gemm_cython.py - Update JITKernel to include "cython" as a valid execution backend - Add Cython-specific wrapper and library generation modules - Update .gitignore to exclude Cython cache directory - Modify setup.py to include Cython source files in package data * lint fix * [Refactor] Replace JITKernel with compile() function for kernel compilation - Add new `compile()` function in tilelang/jit/__init__.py as a wrapper for JITKernel - Update multiple test files and examples to use `tilelang.compile()` instead of `tilelang.JITKernel()` - Modify kernel adapters to support optional kernel-only source retrieval - Update `__init__.py` to import the new `compile()` function - Improve kernel source retrieval for different execution backends * lint fix * remove debug print * Add C/C++ compiler utility module and update Cython JIT kernel support - Introduce new `tilelang/contrib/cc.py` module with cross-platform C/C++ compiler utilities - Add functions to detect and retrieve system C/C++ compilers - Implement cross-compilation and shared library creation support - Update Cython JIT kernel to validate C++ compiler availability - Modify Cython adapter to use detected C++ compiler for library generation * Refactor float8 dtype mapping in tensor utility module - Move float8_dtype_map inside adapt_torch2tvm function - Simplify global scope by localizing the dtype mapping - Maintain existing functionality for converting torch float8 tensors to TVM ndarray * Refactor float8 dtype mapping in tensor utility module - Move float8_dtype_map inside adapt_torch2tvm function - Simplify global scope by localizing the dtype mapping - Maintain existing functionality for converting torch float8 tensors to TVM ndarray * revert * Enhance Cython JIT adapter with Cython compiler detection - Add `get_cython_compiler()` function to dynamically locate Cython executable - Update Cython adapter to use detected Cython compiler instead of hardcoded command - Raise an exception if no Cython compiler is found - Update requirements.txt to specify minimum PyTorch version (>=2.2.0) * Fix Cython kernel wrapper stream handling and type annotations - Update stream parameter type to int64_t for better compatibility - Directly use torch.cuda.current_stream().cuda_stream instead of casting - Improve type safety and precision in Cython kernel wrapper
-
- 14 Feb, 2025 1 commit
-
-
Lei Wang authored
* bump version into v0.1.0 * [Enhancement] Add custom develop command for editable installs and update .gitignore * [Documentation] Update README to include system dependencies installation instructions * [Build] Update setup.py to support library file copying for both release and develop modes * [Build] Refactor library file copying logic in setup.py * [Documentation] Remove unnecessary install section header in Installation.md * [Build] Add tox configuration and local distribution script for multi-Python version support * [Build] Improve git submodule update function with better error handling * [Build] Update LLVM configuration path in ROCm installation script * [Build] Add .tox/ to .gitignore for tox testing environment * [Build] Add support for TVM prebuild path configuration in CMakeLists.txt * [Cleanup] Remove unused TVM runtime error codes header * [Cleanup] Fix TVM grid constant type reference in CUDA module * [Cleanup] Remove unused customized_code function from IR module * [Feature] Add TileLang thread synchronization and storage access analysis passes * [Build] Reorder DLL search path directories for more flexible library loading * [Refactor] Improve thread synchronization and library path handling - Rename ThreadSync and TileLangThreadSync functions in C++ code - Update Python docstring for ThreadSync with more detailed description - Reorder library path detection in tilelang environment setup - Minor comment and code cleanup in CUDA and warp specialization modules * [Refactor] Improve thread synchronization code style and formatting - Standardize pointer type spacing in storage_access.h and storage_access.cc - Update whitespace and indentation in thread_storage_sync.cc - Reorder include statements in thread_partial_sync.cc - Minor code formatting improvements across thread synchronization files * [Refactor] Fix global function registration for ThreadSync - Correct global function registration to use ThreadSync instead of TileLangThreadSync - Update TVM global registration to match recent refactoring efforts * [Refactor] Simplify ThreadSync global function registration - Remove unnecessary whitespace in global function registration - Compact the TVM global registration line for ThreadSync
-
- 13 Feb, 2025 1 commit
-
-
Lei Wang authored
* bump version into v0.1.0 * [Enhancement] Add custom develop command for editable installs and update .gitignore * [Documentation] Update README to include system dependencies installation instructions * [Build] Update setup.py to support library file copying for both release and develop modes * [Build] Refactor library file copying logic in setup.py * [Documentation] Remove unnecessary install section header in Installation.md
-
- 24 Jan, 2025 1 commit
-
-
Lei Wang authored
* [Doc] Update documentation structure and content: add overview section, revise project name, and change theme to Furo * [Feature] Add device-side debug printing functions and integrate into kernel interface * lint fix * remove debug print * implement test for debug * lint fix * add some comments * Enhance fragment design and assert fragment print * enhance debug print * add test for msg * lint fix
-
- 11 Jan, 2025 1 commit
-
-
Lei Wang authored
* Add format.sh script for code formatting and linting * docs update * center align the title * lint fix * add ignore * Add .gitignore for 3rdparty directory * Add requirements-dev.txt, requirements-test.txt, and requirements.txt * 3rdparty * Add gemm.h, CMakeLists.txt, _ffi_api.py, __init__.py, runtime.h, reduce.h, loop_partition.h, utils.h, and loop_vectorize.h * Refactor CMakeLists.txt and include statements - Update CMakeLists.txt to use a newer version of CMake and add project name - Remove unnecessary include directories Fix include paths in layout.cc, codegen.cc, codegen.h, rt_mod.cc, frontend_legalize.cc, inject_pipeline.cc, layout_inference.cc, loop_vectorize.cc, and lower_tile_op.cc - Update include paths to use relative paths instead of absolute paths * Update submodule for 3rdparty/tvm * update * load dll first * Refactor CMakeLists.txt and include statements * Refactor CMakeLists.txt and include statements * git keep update * Refactor CMakeLists.txt and include statements * Refactor CMakeLists.txt and include statements * refactor code structure * Update Readme * CMakeLists Customized * update readme * update README * update readme * update usage * with TVM_IMPORT_PYTHON_PATH to handle own tvm build python import * annotate lower transform global func with `transform` prefix * Migrate Simplify Pass from tilelang tvm branch * enhance system environment handling with __init__ and CMake * Initial commit * CODE_OF_CONDUCT.md committed * LICENSE committed * README.md committed * SECURITY.md committed * SUPPORT.md committed * CODE_OF_CONDUCT Commit * LICENSE Commit * SECURITY Commit * SUPPORT Commit * Modify Support * Update README.md * security ci update * remove examples * Update and implement clang-format * add composable kernel components * Migrate from latest update * submodule update * Test update * Update License * Spell check * lint fix * add clang-tidy to apply static analysis for c source * update tilelang examples * Update Install Docs * Refactor filetree * Enhance Install * conflict resloved * annotate_version * Initial Update * test fix * install * Implement setup.py * lint fix * Separate Init * Separate test * docker file commit * add logo * Update Readme and Examples * update readme * update logo * Implement AMD Installation * Add License * Update AMD MI300x Benchmark * update README * update mi300 benchmark scripts * update ignore * enhance build scirpt * update image * enhance setup.py to remove duplicated libraries * remove debug files * update readme * update image * update gemm examples * update flashattention README * readme update * add cmake into requirements * libinfo fix * auto update submodule * lint fix * Fix AMD Build and Test * Update check for transpose attribute for CDNA Arch * typo fix for amd * Implement Matmul Benchmark * Refactor Code * [TypoFix] Fix GEMM Example * [Docs] Init Linear Attention README * [TYPO] Typo fix * [Lint] Lint Fix * enhance example with intrinsics * [Enhancement] Improve Buffer Collection during IR Parser * [Dev] Introduce Current classmethod to get current frame * submodule update * fake test pass update * support thread_extent_api * code optimize * Add GEMM function implementation for matrix multiplication * Update logging format to reflect TileLang in logger messages * Refactor CMakeLists.txt for improved readability and set default build type to Release * Support Gemm SS Primitives Implementation * [README] Upload Tile Language Logo (#5) * update logo * Update README.md to enhance formatting and center the title --------- Co-authored-by:
microsoft-github-operations[bot] <55726097+microsoft-github-operations[bot]@users.noreply.github.com> Co-authored-by:
Microsoft Open Source <microsoftopensource@users.noreply.github.com> Co-authored-by:
Yu Cheng <yu.cheng@pku.edu.cn>
-