"vscode:/vscode.git/clone" did not exist on "ccbd8d907be06fa585e5298824760a959829936c"
- 02 Sep, 2025 1 commit
-
-
Lei Wang authored
* [Refactor] Update Clang-Tidy Checks and Improve Code Consistency - Enhanced .clang-tidy configuration by adding specific checks for better bug detection and performance optimization. - Refactored function signatures across multiple files to use `const` references for parameters, improving performance and code clarity. - Updated various methods to ensure consistent handling of parameters, particularly in `AddPredicate`, `Substitute`, and `PlanLoopPartition` functions. - Improved readability by replacing size checks with `empty()` method calls in several locations, ensuring clearer intent in the code. - General code cleanup and adherence to best practices for better maintainability. * [Refactor] Enhance Code Consistency and Clang-Tidy Configuration - Updated .clang-tidy configuration to include additional checks for improved code quality and performance. - Refactored function signatures across multiple files to use `const` references, enhancing performance and clarity. - Replaced size checks with `empty()` method calls in various locations for clearer intent. - Improved handling of parameters in several functions, ensuring consistent usage of `std::move` where applicable. - General code cleanup to adhere to best practices and improve maintainability. * [Refactor] Integrate Clang-Tidy Checks and Enhance Code Consistency - Added clang-tidy checks to the format script for improved code quality assurance. - Refactored function signatures across multiple files to consistently use `const` references, enhancing performance and clarity. - Updated the requirements-lint.txt file to include clang-tidy as a dependency. - General code cleanup to adhere to best practices and improve maintainability. * [CI] Update AMD CI Workflow to Include Build Directory Creation - Added steps to create a build directory and configure CMake with ROCm support during the format check process. - Ensured cleanup of the build directory after the format check to maintain a clean workspace. * [Refactor] Remove Unused Member Variables in AtomicAddNode and CopyNode - Removed the `args_` member variable from both `AtomicAddNode` and `CopyNode` classes to streamline the code and eliminate unnecessary data members. - This change enhances code clarity and maintainability by focusing on relevant attributes for each class. * [Refactor] Update Clang-Tidy Integration and Code Improvements - Modified the format script to include the `-fix` option in the clang-tidy command for automatic code fixes. - Refactored the `AtomicAddVectorizePlanner` class to improve variable handling and consistency, including changes to member variable types and function signatures. - Enhanced code clarity by removing unnecessary `std::move` calls and ensuring consistent usage of types across the class. - General code cleanup to adhere to best practices and improve maintainability. * [Refactor] Improve Parameter Handling and Consistency in AtomicAddVectorize - Updated function signatures in `AtomicAddVectorizePlanResult` and `AtomicAddVectorizeRewriter` to use `const` references and `std::move` for better performance and clarity. - Enhanced the `UpdateVectorSize` method to accept `const Array<PrimExpr>&` for improved efficiency. - General code cleanup to maintain consistency and adhere to best practices. * [CI] Add Git Submodule Initialization to CI Workflow - Included a step to initialize and update git submodules recursively in the CI workflow. - This change ensures that all necessary submodules are available during the format check process, improving build reliability. * [CI] Add Git Submodule Update Step to Format Check - Included a command to initialize and update git submodules recursively in the CI workflow during the format check process. - This enhancement ensures that all required submodules are available, contributing to improved build reliability. * [Refactor] Update Function Signatures in AtomicAddVectorize - Modified the `VectorizeAtomicAdd` function signature to use `const` references for `thread_var` and `thread_bounds`, enhancing performance and code clarity. - This change aligns with previous refactoring efforts to improve parameter handling and consistency across the codebase.
-
- 30 Apr, 2025 1 commit
-
-
Lei Wang authored
* [Refactor] Update KernelLaunch to clarify CPU and GPU kernel launch logic * Added comments to distinguish between CPU and GPU kernel launch sections for better code readability. * Changed the creation of empty blocks to use a consistent "root" identifier, enhancing clarity in frame management. * [Refactor] Rename operations for consistency in lower_hopper_intrin and related files * Updated function names from CamelCase to snake_case for better consistency across the codebase. * Refactored calls to `CreateTMADescriptorOp`, `CreateListofMBarrierOp`, and similar functions to their new names: `create_tma_descriptor`, `create_list_of_mbarrier`, etc. * Adjusted corresponding test cases to reflect these changes, ensuring compatibility with the new naming conventions. * [Refactor] Rename operations to snake_case for consistency * Updated function names from CamelCase to snake_case across various files, including `CreateTMADescriptorOp` to `create_tma_descriptor`, `GetMBarrierOp` to `get_mbarrier`, and others. * Adjusted corresponding calls and definitions in the codebase to reflect these naming changes, ensuring uniformity and improved readability. * Enhanced layout inference and loop partitioning logic to accommodate the new naming conventions. * [Feature] Introduce Warp Specialization and Eliminate Storage Sync for MBarrier * Added a new example `gemm_ws.py` demonstrating matrix multiplication with warp specialization using TileLang. * Implemented `WarpSpecializeFrame` and `WarpSpecialize` functionality to manage warp group indices in TIR frames. * Introduced `EliminateStorageSyncForMBarrier` transformation to optimize storage synchronization in mbarrier regions. * Enhanced the TileLang API with new methods for retrieving block and thread extents. * Updated the `LowerAndLegalize` and `OptimizeForTarget` functions to incorporate the new transformation. * Improved layout inference and kernel launch logic for better performance and clarity. * [Refactor] Clean up code formatting and improve readability * Added blank lines for better separation of code blocks in `gemm_ws.py`, `phase.py`, `kernel.py`, and `warpgroup.py`. * Reformatted the `tilelang.compile` call in `gemm_ws.py` for improved clarity. * Updated comments in `warpgroup.py` to clarify the availability of the `WarpSpecialize` function for NVIDIA GPUs. * Ensured consistent spacing and formatting across multiple files to enhance overall code readability. * lint fix * [Refactor] Update mbarrier functions for improved clarity and consistency * Refactored `mbarrier_wait_parity` and `mbarrier_arrive` functions in `builtin.py` to accept explicit parameters for better readability. * Updated calls in `gemm_ws.py` to use the new function signatures, enhancing code clarity. * Adjusted `warpgroup.py` to remove unused thread extent variable, streamlining the code. * Added detailed docstrings to clarify usage examples for memory barrier functions. * Added blank lines in `mbarrier_wait_parity` and `mbarrier_arrive` functions in `builtin.py` for improved code readability and separation of logical sections.
-
- 20 Mar, 2025 1 commit
-
-
Lei Wang authored
* remove llvm build * [Refactor] Update kernel compilation and profiling in examples - Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation. - Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency. - Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations. * lint fix * License Update * [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files - Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields. - Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability. * [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files - Improved comment alignment and readability in `cuda.h`. - Cleaned up whitespace and formatting in `rt_mod_hip.cc` to enhance maintainability. * lint fix * lint fix * lint fix * lint fix * fix * License update * [Enhancement] Update JITKernel to use artifact for kernel source - Assigned the generated artifact to `self.artifact` for better management. - Updated kernel source references to use `artifact.kernel_source` for consistency in execution backend handling. * lint fix * Add @tilelang.testing.requires_llvm decorator to vectorization tests * Enhance setup.py and env.py for library management - Added functionality to remove original files after copying in CMakeBuild. - Updated TVM_LIBRARY_PATH in env.py to include the PyPI build library path for better integration. * Refactor TVM_LIBRARY_PATH assignment for improved readability in env.py * Refactor CMakeBuild file handling in setup.py - Added a check to ensure the target library directory exists before copying .so files. - Improved the logic for creating the target directory and copying files to enhance robustness. * bugfix * Rename BuildTLDebug to BuildTileLangCUDAWithoutCompile and update registration. Add @tilelang.testing.requires_llvm decorator to multiple tests for LLVM requirement. * lint fix * Enhance TileLang code generation by adding support for device code generation without compilation. Updated `host_codegen` and `device_codegen` functions to include new transformations and registration for `tilelang_hip_without_compile`. Refactored JIT kernel adapters to accommodate host and device modules, improving overall integration and flexibility. * lint fix * Add support for C target in device code generation - Updated `device_codegen_without_compile` to include handling for the C target by registering the `tilelang_cpp` function. * [Enhancement] Implement auto-clear cache feature based on environment variable * Added TILELANG_CLEAR_CACHE environment variable to control cache clearing. * Updated CI workflow to set TILELANG_CLEAR_CACHE during testing. * Modified cache initialization to clear cache if TILELANG_CLEAR_CACHE is set to true. * [Refactor] Update kernel invocation and import paths in tests and cache * Changed kernel invocation in `test_tilelang_kernel_dequantize_gemm.py` to return the result. * Updated import statements in `test_tilelang_kernel_int4_gemm_mma.py` to use `bitblas` instead of `tilelang`. * Refactored paths for artifact and parameters in `kernel_cache.py` for better maintainability. * [Refactor] Clean up whitespace and improve code formatting in kernel_cache.py * Removed unnecessary blank lines and adjusted spacing for better readability in the KernelCache class. * Enhanced overall code formatting to align with project standards. * [Enhancement] Add bfloat16 test case and improve kernel caching logic * Introduced a new test case for bfloat16 matrix multiplication in `test_tilelang_kernel_gemm_mma_intrinsic.py`. * Updated `KernelCache` to handle multiple kernel source files and improve error handling during saving and loading. * Refactored `JITKernel` to support instantiation from a database, enhancing flexibility in kernel management. * Adjusted `CtypesKernelAdapter` and `CythonKernelAdapter` to utilize the new kernel loading mechanism from the database. * Improved code formatting and readability across several files. * lint fix * Update bfloat16 matrix multiplication test case to use larger dimensions for improved coverage
-
- 11 Jan, 2025 2 commits
-
-
Lei Wang authored
* README.md fixed * update test ci * Lint and Typo Fix * Clang Format Lint Fix
-
Lei Wang authored
* Add format.sh script for code formatting and linting * docs update * center align the title * lint fix * add ignore * Add .gitignore for 3rdparty directory * Add requirements-dev.txt, requirements-test.txt, and requirements.txt * 3rdparty * Add gemm.h, CMakeLists.txt, _ffi_api.py, __init__.py, runtime.h, reduce.h, loop_partition.h, utils.h, and loop_vectorize.h * Refactor CMakeLists.txt and include statements - Update CMakeLists.txt to use a newer version of CMake and add project name - Remove unnecessary include directories Fix include paths in layout.cc, codegen.cc, codegen.h, rt_mod.cc, frontend_legalize.cc, inject_pipeline.cc, layout_inference.cc, loop_vectorize.cc, and lower_tile_op.cc - Update include paths to use relative paths instead of absolute paths * Update submodule for 3rdparty/tvm * update * load dll first * Refactor CMakeLists.txt and include statements * Refactor CMakeLists.txt and include statements * git keep update * Refactor CMakeLists.txt and include statements * Refactor CMakeLists.txt and include statements * refactor code structure * Update Readme * CMakeLists Customized * update readme * update README * update readme * update usage * with TVM_IMPORT_PYTHON_PATH to handle own tvm build python import * annotate lower transform global func with `transform` prefix * Migrate Simplify Pass from tilelang tvm branch * enhance system environment handling with __init__ and CMake * Initial commit * CODE_OF_CONDUCT.md committed * LICENSE committed * README.md committed * SECURITY.md committed * SUPPORT.md committed * CODE_OF_CONDUCT Commit * LICENSE Commit * SECURITY Commit * SUPPORT Commit * Modify Support * Update README.md * security ci update * remove examples * Update and implement clang-format * add composable kernel components * Migrate from latest update * submodule update * Test update * Update License * Spell check * lint fix * add clang-tidy to apply static analysis for c source * update tilelang examples * Update Install Docs * Refactor filetree * Enhance Install * conflict resloved * annotate_version * Initial Update * test fix * install * Implement setup.py * lint fix * Separate Init * Separate test * docker file commit * add logo * Update Readme and Examples * update readme * update logo * Implement AMD Installation * Add License * Update AMD MI300x Benchmark * update README * update mi300 benchmark scripts * update ignore * enhance build scirpt * update image * enhance setup.py to remove duplicated libraries * remove debug files * update readme * update image * update gemm examples * update flashattention README * readme update * add cmake into requirements * libinfo fix * auto update submodule * lint fix * Fix AMD Build and Test * Update check for transpose attribute for CDNA Arch * typo fix for amd * Implement Matmul Benchmark * Refactor Code * [TypoFix] Fix GEMM Example * [Docs] Init Linear Attention README * [TYPO] Typo fix * [Lint] Lint Fix * enhance example with intrinsics * [Enhancement] Improve Buffer Collection during IR Parser * [Dev] Introduce Current classmethod to get current frame * submodule update * fake test pass update * support thread_extent_api * code optimize * Add GEMM function implementation for matrix multiplication * Update logging format to reflect TileLang in logger messages * Refactor CMakeLists.txt for improved readability and set default build type to Release * Support Gemm SS Primitives Implementation * [README] Upload Tile Language Logo (#5) * update logo * Update README.md to enhance formatting and center the title --------- Co-authored-by:
microsoft-github-operations[bot] <55726097+microsoft-github-operations[bot]@users.noreply.github.com> Co-authored-by:
Microsoft Open Source <microsoftopensource@users.noreply.github.com> Co-authored-by:
Yu Cheng <yu.cheng@pku.edu.cn>
-