- 13 Apr, 2025 1 commit
-
-
Zhengju Tang authored
* [Dynamic Symbolic] Add pass_config to customize vectorization and tail split * Lint * Only check for vectorized dimension. Add docs. * Lint * Update comment for cache directory in .gitignore * Use CUTLASS convention to represent dynamic alignment. Fix bugs * Add benchmark examples * Add more benchmarks. Fix accumulate type bug. * Lint * Lint * Test Lint * Lint * Test Lint * Lint * Fix typo * Lint * Lint --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-
- 08 Apr, 2025 1 commit
-
-
Lei Wang authored
[Enhancement] Support pass config `disable_warp_specialize` to disable auto specialization on hopper (#357) * [Enhancement] Add warp specialization configuration option and update related functionality * [Add] Introduced a new pass configuration option `kDisableWarpSpecialized` to control warp specialization behavior. * [Refactor] Updated `WarpSpecializedRewriter` and `WSCodeEmitter` to utilize the new configuration option, allowing for more flexible optimization strategies. * [Update] Modified the optimization pipeline in `phase.py` to include pipeline planning when warp specialization is disabled, enhancing performance with async copy. * [Documentation] Updated JIT compilation parameters to reflect the new configuration option for better clarity. * lint fix * [Add] Implement test for GEMM with warp specialization configuration * Introduced a new test file `test_tilelang_pass_config_disable_warp_specialized.py` to validate the functionality of the warp specialization configuration option. * Added a `run_gemm` function to execute matrix multiplication tests with and without warp specialization, ensuring correctness through profiling against reference results. * Included a specific test case for GEMM with float16 data types, enhancing test coverage for the new configuration feature. * [Refactor] Improve formatting in test_tilelang_pass_config_disable_warp_specialized.py * Reformatted the `tilelang.compile` call in the `run_gemm` function for better readability by breaking it into multiple lines. * Added a blank line for improved code structure and clarity in the `test_gemm_f16f16f16_nn` function.
-
- 06 Apr, 2025 1 commit
-
-
Lei Wang authored
* [Refactor] Clean up whitespace in CUDA-related files - Removed unnecessary blank lines in `cuda.py`, `__init__.py`, and `cuda_driver.py` to improve code readability and maintainability. - This change enhances the overall organization of the codebase without altering functionality. * [Benchmark] Add FP8 Matrix Multiplication Benchmark Script - Introduced a new benchmark script for FP8 matrix multiplication in `benchmark/matmul_fp8/benchmark_matmul.py`. - The script includes functions for reference matrix multiplication, configuration generation for autotuning, and an autotuned kernel for performance measurement. - Added command-line argument parsing for matrix dimensions and the option to enable BitBLAS roller for search space exploration. - The benchmark computes and prints the best latency and performance metrics, enhancing the benchmarking capabilities for FP8 operations. * lint fix * Enhance variable creation by associating data types in IR and layout files, and introduce ExpandIndexDataType transformation - Updated variable creation in `ir.cc`, `gemm_layouts.cc`, and `elem.cc` to include data types for better type safety. - Added a new transformation `ExpandIndexDataType` to promote integer types to int64 where necessary, improving compatibility and performance. - Integrated the new transformation into the optimization pipeline in `phase.py`. - Documented the new transformation in `__init__.py` for clarity. * lint fix * Add configuration option for index bitwidth and remove ExpandIndexDataType transformation - Introduced a new pass configuration option `kConfigIndexBitwidth` to allow customization of index bitwidth. - Updated the optimization pipeline in `phase.py` to utilize the new configuration option instead of the removed `ExpandIndexDataType` transformation. - Documented the new configuration option in the JIT compilation function's parameters for clarity. - Removed the `ExpandIndexDataType` transformation implementation from the codebase to streamline the transformation process. * lint fix * Refactor index bitwidth configuration handling - Updated the `ConfigIndexBitwidth` pass to only apply the bitwidth transformation if the configuration option is defined, preventing potential errors with undefined values. - Changed the default value of `tl.config_index_bitwidth` in the JIT compilation function's parameters from 32 to None for better clarity and flexibility. * lint fix * lint fix --------- Co-authored-by:LeiWang1999 <wyatuestc@gmail.com>
-
- 26 Mar, 2025 1 commit
-
-
Yu Cheng authored
- Added NoSetMaxNReg as a new TIR built-in to indicate no register hint for warp-specialized branches. - Updated the warp specialization rewriter to handle the new NoSetMaxNReg operation, allowing for improved register management. - Enhanced the Python interface to include NoSetMaxNReg for consistency with TIR operations.
-
- 12 Mar, 2025 1 commit
-
-
Yu Cheng authored
- Introduce TMAStoreArrive and TMAStoreWait operations for CUDA TMA store synchronization - Add new builtin operations in op/builtin.cc and op/builtin.h - Implement TMAStoreSyncInjector to automatically inject TMA store synchronization calls - Update CUDA codegen to support new TMA store synchronization intrinsics - Add Python language bindings for new TMA store synchronization operations
-
- 09 Mar, 2025 1 commit
-
-
Lei Wang authored
* Add TMA lowering configuration option and update copyright notices This commit introduces a new configuration option to disable TMA (Tensor Memory Access) lowering and updates copyright notices across multiple files. Key changes include: - Add `kDisableTMALower` configuration option in builtin.h and builtin.cc - Update copyright notices from Microsoft Corporation to Tile-AI Corporation - Modify `LowerArgs` struct to include `disable_tma_lower` flag - Update JIT compilation interfaces to support pass configuration - Enhance error reporting in bulk copy lowering - Propagate pass configuration through various adapter layers * lint fix
-
- 11 Jan, 2025 2 commits
-
-
Lei Wang authored
* README.md fixed * update test ci * Lint and Typo Fix * Clang Format Lint Fix
-
Lei Wang authored
* Add format.sh script for code formatting and linting * docs update * center align the title * lint fix * add ignore * Add .gitignore for 3rdparty directory * Add requirements-dev.txt, requirements-test.txt, and requirements.txt * 3rdparty * Add gemm.h, CMakeLists.txt, _ffi_api.py, __init__.py, runtime.h, reduce.h, loop_partition.h, utils.h, and loop_vectorize.h * Refactor CMakeLists.txt and include statements - Update CMakeLists.txt to use a newer version of CMake and add project name - Remove unnecessary include directories Fix include paths in layout.cc, codegen.cc, codegen.h, rt_mod.cc, frontend_legalize.cc, inject_pipeline.cc, layout_inference.cc, loop_vectorize.cc, and lower_tile_op.cc - Update include paths to use relative paths instead of absolute paths * Update submodule for 3rdparty/tvm * update * load dll first * Refactor CMakeLists.txt and include statements * Refactor CMakeLists.txt and include statements * git keep update * Refactor CMakeLists.txt and include statements * Refactor CMakeLists.txt and include statements * refactor code structure * Update Readme * CMakeLists Customized * update readme * update README * update readme * update usage * with TVM_IMPORT_PYTHON_PATH to handle own tvm build python import * annotate lower transform global func with `transform` prefix * Migrate Simplify Pass from tilelang tvm branch * enhance system environment handling with __init__ and CMake * Initial commit * CODE_OF_CONDUCT.md committed * LICENSE committed * README.md committed * SECURITY.md committed * SUPPORT.md committed * CODE_OF_CONDUCT Commit * LICENSE Commit * SECURITY Commit * SUPPORT Commit * Modify Support * Update README.md * security ci update * remove examples * Update and implement clang-format * add composable kernel components * Migrate from latest update * submodule update * Test update * Update License * Spell check * lint fix * add clang-tidy to apply static analysis for c source * update tilelang examples * Update Install Docs * Refactor filetree * Enhance Install * conflict resloved * annotate_version * Initial Update * test fix * install * Implement setup.py * lint fix * Separate Init * Separate test * docker file commit * add logo * Update Readme and Examples * update readme * update logo * Implement AMD Installation * Add License * Update AMD MI300x Benchmark * update README * update mi300 benchmark scripts * update ignore * enhance build scirpt * update image * enhance setup.py to remove duplicated libraries * remove debug files * update readme * update image * update gemm examples * update flashattention README * readme update * add cmake into requirements * libinfo fix * auto update submodule * lint fix * Fix AMD Build and Test * Update check for transpose attribute for CDNA Arch * typo fix for amd * Implement Matmul Benchmark * Refactor Code * [TypoFix] Fix GEMM Example * [Docs] Init Linear Attention README * [TYPO] Typo fix * [Lint] Lint Fix * enhance example with intrinsics * [Enhancement] Improve Buffer Collection during IR Parser * [Dev] Introduce Current classmethod to get current frame * submodule update * fake test pass update * support thread_extent_api * code optimize * Add GEMM function implementation for matrix multiplication * Update logging format to reflect TileLang in logger messages * Refactor CMakeLists.txt for improved readability and set default build type to Release * Support Gemm SS Primitives Implementation * [README] Upload Tile Language Logo (#5) * update logo * Update README.md to enhance formatting and center the title --------- Co-authored-by:
microsoft-github-operations[bot] <55726097+microsoft-github-operations[bot]@users.noreply.github.com> Co-authored-by:
Microsoft Open Source <microsoftopensource@users.noreply.github.com> Co-authored-by:
Yu Cheng <yu.cheng@pku.edu.cn>
-