- 14 Apr, 2025 1 commit
-
-
Yu Cheng authored
Updated SyncPatternMap to use vectors for acquire and release, enhancing flexibility in handling synchronization patterns. Improved barrier handling logic for both producer and consumer cases, ensuring accurate synchronization in the pipeline.
-
- 08 Apr, 2025 1 commit
-
-
Lei Wang authored
[Enhancement] Support pass config `disable_warp_specialize` to disable auto specialization on hopper (#357) * [Enhancement] Add warp specialization configuration option and update related functionality * [Add] Introduced a new pass configuration option `kDisableWarpSpecialized` to control warp specialization behavior. * [Refactor] Updated `WarpSpecializedRewriter` and `WSCodeEmitter` to utilize the new configuration option, allowing for more flexible optimization strategies. * [Update] Modified the optimization pipeline in `phase.py` to include pipeline planning when warp specialization is disabled, enhancing performance with async copy. * [Documentation] Updated JIT compilation parameters to reflect the new configuration option for better clarity. * lint fix * [Add] Implement test for GEMM with warp specialization configuration * Introduced a new test file `test_tilelang_pass_config_disable_warp_specialized.py` to validate the functionality of the warp specialization configuration option. * Added a `run_gemm` function to execute matrix multiplication tests with and without warp specialization, ensuring correctness through profiling against reference results. * Included a specific test case for GEMM with float16 data types, enhancing test coverage for the new configuration feature. * [Refactor] Improve formatting in test_tilelang_pass_config_disable_warp_specialized.py * Reformatted the `tilelang.compile` call in the `run_gemm` function for better readability by breaking it into multiple lines. * Added a blank line for improved code structure and clarity in the `test_gemm_f16f16f16_nn` function.
-
- 26 Mar, 2025 1 commit
-
-
Yu Cheng authored
- Added NoSetMaxNReg as a new TIR built-in to indicate no register hint for warp-specialized branches. - Updated the warp specialization rewriter to handle the new NoSetMaxNReg operation, allowing for improved register management. - Enhanced the Python interface to include NoSetMaxNReg for consistency with TIR operations.
-
- 24 Mar, 2025 1 commit
-
-
Yu Cheng authored
- Introduced TMAFinder and ProducerUsedBufferFinder classes to analyze TMA loads and identify buffers used in producer conditions. - Enhanced WarpSpecializedRoleMarker to prepare and utilize the identified buffers during role marking. - Updated VisitStmt methods to incorporate new analysis logic for IfThenElse and For nodes, improving the handling of TMA loads in the warp specialization process.
-
- 18 Mar, 2025 1 commit
-
-
Yu Cheng authored
-
- 14 Mar, 2025 1 commit
-
-
Yu Cheng authored
* [Dev] Implement IfStmtBinding and MergeIfStmt transformations - Add IfStmtBinding to bind If statements to each statement in SeqStmt, enhancing the handling of conditional statements. - Introduce MergeIfStmt to merge consecutive If statements within SeqStmt, optimizing the structure of conditional logic. - Update phase.py to apply IfStmtBinding and MergeIfStmt transformations for the "sm_90" target. - Enhance __init__.py with new functions for IfStmtBinding and MergeIfStmt, providing a clear interface for these transformations. * Update license header in if_stmt_binding.cc * Update license header in merge_if_stmt.cc --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-
- 12 Mar, 2025 1 commit
-
-
Yu Cheng authored
[Refactor] Add SetMaxNRegCollector to Improve Register Hint Handling in Warp Specialized Rewriter (#194) * [Refactor] Add SetMaxNRegCollector to Improve Register Hint Handling in Warp Specialized Rewriter - Introduce `SetMaxNRegCollector` to collect register hints from SetMaxNReg calls - Modify `WarpSpecializedRewriter` to use collected register hints for producer and consumer code - Add validation checks for register hint values in the collector - Remove SetMaxNReg calls during code transformation - Enhance flexibility of register allocation in warp specialized rewriting * temporary remove check in lower_hopper_intrin
-
- 28 Feb, 2025 1 commit
-
-
Yu Cheng authored
[Dev][Bugfix] Fix bug in ThreadTagChecker; Add WgmmaSync rewriter and add MHA WGMMA pipelined example (#128) * [Dev] Add RetNet Linear Attention example * [Dev] Add WgmmaSync rewriter for pipelined WGMMA operations and add MHA WGMMA pipelined example (FA3-like scheduling) This commit introduces a new transformation pass `RewriteWgmmaSync` to optimize warp group matrix multiply accumulate (WGMMA) operations in the TileLang compiler: - Implemented `WgmmaSyncRewriter` in `src/transform/wgmma_sync_rewriter.cc` - Added pass registration for `RewriteWgmmaSync` - Updated `tilelang/engine/phase.py` to include the new transformation pass - Updated `tilelang/transform/__init__.py` to expose the new pass The rewriter intelligently manages synchronization and dependencies between WGMMA operations, improving pipeline efficiency for complex matrix multiplication kernels. * [Bugfix] Fix bug in ThreadTagChecker for warp specialization Improve thread tag validation in warp specialized rewriter to prevent unintended transformations: - Add more precise checks for threadIdx.y and threadIdx.z - Validate thread extent to ensure only single-extent thread bindings are allowed - Prevent warp specialization for multi-extent thread bindings in y and z dimensions * lint * [CI] Add TMA descriptor attribute to transformed module in test case
-
- 27 Feb, 2025 1 commit
-
-
Lei Wang authored
* refactor code * enhance tutorial * Enhance error handling and code generation in CUDA and TileLang components This commit introduces several improvements across multiple files: - Added more informative error messages in GEMM layout checks - Updated CUDA codegen to support more flexible function signature generation - Improved TMA descriptor initialization and kernel dispatch logic - Refined library generation and source code parsing utilities - Enhanced error handling in various adapter and wrapper classes * Add thread tag validation for warp specialization Introduce a ThreadTagChecker to validate that a PrimFunc only uses threadIdx.x before applying warp specialization. This prevents unintended transformations on kernels with complex thread binding and provides a clear warning to users about potential issues with warp specialization. * Update TileLang Profiling and Compilation in Flash Decoding Examples Refactor the profiling and compilation workflow in two flash decoding example scripts: - Replace `tilelang.lower()` and `tilelang.Profiler()` with `tilelang.compile()` - Simplify profiler initialization using `get_profiler()` - Update method calls to use the new profiler and compiled kernel objects - Maintain existing performance benchmarking and validation logic * Refactor and clean up code formatting in TileLang testing and adapter modules This commit includes several code style and formatting improvements: - Adjust whitespace and line breaks in test files - Improve code formatting in CUDA source wrapper and adapter utilities - Enhance readability of function calls and argument handling - Remove unnecessary whitespace and standardize indentation - Simplify function signatures and argument parsing * Refactor CUDA codegen and improve code formatting This commit includes several improvements to CUDA code generation and formatting: - Enhance function signature generation in CodeGenTileLangCUDA - Improve code formatting and readability in CUDA-related files - Simplify parameter handling and type annotations - Clean up whitespace and line breaks in codegen and layout files --------- Co-authored-by:Ubuntu <dlisuser@h100testl730RPS.xu5snccwrbtejcqqalluoku5hb.xx.internal.cloudapp.net>
-
- 14 Feb, 2025 1 commit
-
-
Lei Wang authored
* bump version into v0.1.0 * [Enhancement] Add custom develop command for editable installs and update .gitignore * [Documentation] Update README to include system dependencies installation instructions * [Build] Update setup.py to support library file copying for both release and develop modes * [Build] Refactor library file copying logic in setup.py * [Documentation] Remove unnecessary install section header in Installation.md * [Build] Add tox configuration and local distribution script for multi-Python version support * [Build] Improve git submodule update function with better error handling * [Build] Update LLVM configuration path in ROCm installation script * [Build] Add .tox/ to .gitignore for tox testing environment * [Build] Add support for TVM prebuild path configuration in CMakeLists.txt * [Cleanup] Remove unused TVM runtime error codes header * [Cleanup] Fix TVM grid constant type reference in CUDA module * [Cleanup] Remove unused customized_code function from IR module * [Feature] Add TileLang thread synchronization and storage access analysis passes * [Build] Reorder DLL search path directories for more flexible library loading * [Refactor] Improve thread synchronization and library path handling - Rename ThreadSync and TileLangThreadSync functions in C++ code - Update Python docstring for ThreadSync with more detailed description - Reorder library path detection in tilelang environment setup - Minor comment and code cleanup in CUDA and warp specialization modules * [Refactor] Improve thread synchronization code style and formatting - Standardize pointer type spacing in storage_access.h and storage_access.cc - Update whitespace and indentation in thread_storage_sync.cc - Reorder include statements in thread_partial_sync.cc - Minor code formatting improvements across thread synchronization files * [Refactor] Fix global function registration for ThreadSync - Correct global function registration to use ThreadSync instead of TileLangThreadSync - Update TVM global registration to match recent refactoring efforts * [Refactor] Simplify ThreadSync global function registration - Remove unnecessary whitespace in global function registration - Compact the TVM global registration line for ThreadSync
-
- 11 Jan, 2025 2 commits
-
-
Lei Wang authored
* README.md fixed * update test ci * Lint and Typo Fix * Clang Format Lint Fix
-
Lei Wang authored
* Add format.sh script for code formatting and linting * docs update * center align the title * lint fix * add ignore * Add .gitignore for 3rdparty directory * Add requirements-dev.txt, requirements-test.txt, and requirements.txt * 3rdparty * Add gemm.h, CMakeLists.txt, _ffi_api.py, __init__.py, runtime.h, reduce.h, loop_partition.h, utils.h, and loop_vectorize.h * Refactor CMakeLists.txt and include statements - Update CMakeLists.txt to use a newer version of CMake and add project name - Remove unnecessary include directories Fix include paths in layout.cc, codegen.cc, codegen.h, rt_mod.cc, frontend_legalize.cc, inject_pipeline.cc, layout_inference.cc, loop_vectorize.cc, and lower_tile_op.cc - Update include paths to use relative paths instead of absolute paths * Update submodule for 3rdparty/tvm * update * load dll first * Refactor CMakeLists.txt and include statements * Refactor CMakeLists.txt and include statements * git keep update * Refactor CMakeLists.txt and include statements * Refactor CMakeLists.txt and include statements * refactor code structure * Update Readme * CMakeLists Customized * update readme * update README * update readme * update usage * with TVM_IMPORT_PYTHON_PATH to handle own tvm build python import * annotate lower transform global func with `transform` prefix * Migrate Simplify Pass from tilelang tvm branch * enhance system environment handling with __init__ and CMake * Initial commit * CODE_OF_CONDUCT.md committed * LICENSE committed * README.md committed * SECURITY.md committed * SUPPORT.md committed * CODE_OF_CONDUCT Commit * LICENSE Commit * SECURITY Commit * SUPPORT Commit * Modify Support * Update README.md * security ci update * remove examples * Update and implement clang-format * add composable kernel components * Migrate from latest update * submodule update * Test update * Update License * Spell check * lint fix * add clang-tidy to apply static analysis for c source * update tilelang examples * Update Install Docs * Refactor filetree * Enhance Install * conflict resloved * annotate_version * Initial Update * test fix * install * Implement setup.py * lint fix * Separate Init * Separate test * docker file commit * add logo * Update Readme and Examples * update readme * update logo * Implement AMD Installation * Add License * Update AMD MI300x Benchmark * update README * update mi300 benchmark scripts * update ignore * enhance build scirpt * update image * enhance setup.py to remove duplicated libraries * remove debug files * update readme * update image * update gemm examples * update flashattention README * readme update * add cmake into requirements * libinfo fix * auto update submodule * lint fix * Fix AMD Build and Test * Update check for transpose attribute for CDNA Arch * typo fix for amd * Implement Matmul Benchmark * Refactor Code * [TypoFix] Fix GEMM Example * [Docs] Init Linear Attention README * [TYPO] Typo fix * [Lint] Lint Fix * enhance example with intrinsics * [Enhancement] Improve Buffer Collection during IR Parser * [Dev] Introduce Current classmethod to get current frame * submodule update * fake test pass update * support thread_extent_api * code optimize * Add GEMM function implementation for matrix multiplication * Update logging format to reflect TileLang in logger messages * Refactor CMakeLists.txt for improved readability and set default build type to Release * Support Gemm SS Primitives Implementation * [README] Upload Tile Language Logo (#5) * update logo * Update README.md to enhance formatting and center the title --------- Co-authored-by:
microsoft-github-operations[bot] <55726097+microsoft-github-operations[bot]@users.noreply.github.com> Co-authored-by:
Microsoft Open Source <microsoftopensource@users.noreply.github.com> Co-authored-by:
Yu Cheng <yu.cheng@pku.edu.cn>
-