"templates/vscode:/vscode.git/clone" did not exist on "78f81fc0e598fdb066e41608a8384d493308728e"
- 31 Aug, 2025 1 commit
-
-
Lei Wang authored
* [Enhancement] Introduce finalize_reducer operator and layout reducer support - Added `FinalizeReducer` operator to handle reduction finalization in the TileLang framework, allowing for efficient reduction operations. - Implemented layout inference for local.reducer buffers, enhancing the handling of layout mappings and reducing complexity in buffer management. - Updated `setup.py` to include logging for build directory paths, improving build process visibility. - Enhanced atomic operations with new functions for atomic max, min, load, and store, providing more robust atomicity control in memory operations. - Refactored parallel loop handling to incorporate reducer information, ensuring proper management of reduction operations in parallel contexts. - Cleaned up test cases by removing unnecessary cache disabling and optimizing test parameters for better performance. * Refactor code formatting and improve readability in multiple files - Cleaned up whitespace in `setup.py` to enhance logging clarity. - Reformatted `AtomicMax` and `AtomicMin` functions in `common.h` for better alignment and readability. - Adjusted `debug_print_var` function in `debug.h` to improve code structure and maintainability. - Enhanced readability of the `atomic_add` function in `customize.py` by breaking long lines for better clarity. * Remove debug print statements from `copy.cc` and `inject_tma_barrier.cc` to enhance code clarity and maintainability. * [Enhancement] Disable reuse of small arrays in shared memory allocation - Added logic to prevent the reuse of small arrays (<= 32 bits) in `merge_shared_memory_allocations.cc`, ensuring they are lowered to registers in LLVM for improved performance and memory management. * Refactor `setup.py` to remove duplicate logging statements and enhance clarity. Update `finalize_reducer` function documentation in `reduce.py` to include detailed parameter and return descriptions, improving code readability and maintainability. * Refactor `finalize_reducer` and `reduce` functions to remove redundant target checks. Simplified conditionals by retaining only the `TargetIsHopper` check, enhancing code clarity and maintainability. * bug fix * Add thread checks workaround for replicated cases * Remove the is_one check * fix lint error * lint fix * Update autotune tests to use smaller matrix sizes for improved performance and reliability * [Refactor] Update FinalizeReducer to FinalizeReducerOp and adjust related methods - Refactored FinalizeReducer class to FinalizeReducerOp, updating constructor and method signatures for consistency with the new TileOperator structure. - Enhanced layout inference and cloning methods in FinalizeReducerOpNode. - Updated test_example_flash_attention.py to call test_example_gqa_bwd instead of tilelang.testing.main. - Adjusted header inclusions for improved organization and clarity across multiple files. * [Refactor] Update atomic operations in common.h and modify test_example_flash_attention.py - Enhanced atomic operations (Add, Min, Max) in common.h to handle half and bfloat16 types more efficiently. - Updated test_example_flash_attention.py to call test_example_gqa_bwd instead of tilelang.testing.main, improving test organization. * [Refactor] Simplify CopyNode::LowerBulkCopy logic and update test execution - Removed redundant checks for contiguous memory access in CopyNode::LowerBulkCopy, streamlining the logic for TMA copy operations. - Updated test_tilelang_kernel_gemm.py to comment out the main testing function and call a specific test for i8i8i32 tensor operations instead, improving test focus. --------- Co-authored-by:
Huanqi Cao <caohuanqi@deepseek.com> Co-authored-by:
Freebase6912 <amid-gauze-racing@duck.com>
-
- 29 Aug, 2025 1 commit
-
-
Lei Wang authored
* Refactor operator classes to inherit from TileOperator and update layout inference methods - Changed base class of several operator classes (AtomicAdd, Copy, Gemm, etc.) from Operator to TileOperator for better alignment with tile operations. - Updated InferLayout and Lower methods to use 'override' specifier for clarity and consistency. - Adjusted header inclusions to replace "op.h" with "operator.h" across multiple files for improved organization. - Added missing layout inference implementations for Fill and Conv2DIm2ColOp. - Removed deprecated op.cc and op.h files to streamline the codebase. * lint fix * Refactor operator classes to use Node pattern and improve memory management - Updated several operator classes (AtomicAdd, Copy, Gemm, etc.) to utilize the Node pattern for better memory management and encapsulation. - Changed constructors to initialize member variables through a node object, enhancing clarity and reducing direct member access. - Updated Clone methods to return TileOperator instances instead of unique pointers, aligning with the new design. - Refactored InferLayout and Lower methods to ensure consistency across operator implementations. - Adjusted header files to reflect the new class structure and removed deprecated code for a cleaner codebase. * Enhance Clone methods in AtomicAdd and Copy classes to support parallel operation cloning - Updated the Clone methods in AtomicAddNode and CopyNode to ensure that the parallel operation (par_op_) is properly cloned when defined, improving the integrity of cloned objects. - Refactored the FillNode class to use ParallelOp directly instead of std::make_unique, streamlining the creation of parallel operations. - Made minor adjustments in layout inference and other related methods for consistency and clarity. * Refactor FillNode::Lower method to remove unused global function call - Eliminated the call to the global function "tl.fill.lower" in the FillNode::Lower method, streamlining the code and improving clarity. - Retained the core functionality of the method while enhancing maintainability by reducing unnecessary dependencies.
-
- 08 Aug, 2025 1 commit
-
-
Lei Wang authored
* Implement new free stage layout inference. * Fix bug * Make replication upcasting and unnormalizable iterators safe. * Better handling of updating with more replica * Remove unnecessary check. * Fix compilation. * Fix setup.py. * Simplify development mode. * Allow ParallelOp layout when there's already a compatible layout specified * lint fix * Add ProveFragmentContains function to validate thread access between small and large fragments This function checks if the threads accessing elements of a smaller fragment are a subset of those accessing a larger fragment, ensuring valid access during updates. The implementation includes deriving thread indices, computing logical indices, and verifying thread mappings. * Update dependencies in requirements files * Remove 'thefuzz' from requirements-dev.txt * Specify exact versions for 'torch' and add 'flash_attn' in requirements-test.txt * Update CI workflow to use SHA256 hash for requirements file * Update requirements and CI workflow for flash attention * Removed specific version for 'torch' in requirements-test.txt * Added installation of 'flash_attn==2.5.8' in CI workflow to ensure compatibility * Refactor flash attention import handling in examples * Removed availability checks for 'flash_attn' in multiple example scripts. * Simplified import statements for 'flash_attn' to ensure consistent usage across examples. --------- Co-authored-by:Huanqi Cao <caohuanqi@deepseek.com>
-
- 20 Mar, 2025 1 commit
-
-
Lei Wang authored
* remove llvm build * [Refactor] Update kernel compilation and profiling in examples - Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation. - Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency. - Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations. * lint fix * License Update * [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files - Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields. - Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability. * [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files - Improved comment alignment and readability in `cuda.h`. - Cleaned up whitespace and formatting in `rt_mod_hip.cc` to enhance maintainability. * lint fix * lint fix * lint fix * lint fix * fix * License update * [Enhancement] Update JITKernel to use artifact for kernel source - Assigned the generated artifact to `self.artifact` for better management. - Updated kernel source references to use `artifact.kernel_source` for consistency in execution backend handling. * lint fix * Add @tilelang.testing.requires_llvm decorator to vectorization tests * Enhance setup.py and env.py for library management - Added functionality to remove original files after copying in CMakeBuild. - Updated TVM_LIBRARY_PATH in env.py to include the PyPI build library path for better integration. * Refactor TVM_LIBRARY_PATH assignment for improved readability in env.py * Refactor CMakeBuild file handling in setup.py - Added a check to ensure the target library directory exists before copying .so files. - Improved the logic for creating the target directory and copying files to enhance robustness. * bugfix * Rename BuildTLDebug to BuildTileLangCUDAWithoutCompile and update registration. Add @tilelang.testing.requires_llvm decorator to multiple tests for LLVM requirement. * lint fix * Enhance TileLang code generation by adding support for device code generation without compilation. Updated `host_codegen` and `device_codegen` functions to include new transformations and registration for `tilelang_hip_without_compile`. Refactored JIT kernel adapters to accommodate host and device modules, improving overall integration and flexibility. * lint fix * Add support for C target in device code generation - Updated `device_codegen_without_compile` to include handling for the C target by registering the `tilelang_cpp` function. * [Enhancement] Implement auto-clear cache feature based on environment variable * Added TILELANG_CLEAR_CACHE environment variable to control cache clearing. * Updated CI workflow to set TILELANG_CLEAR_CACHE during testing. * Modified cache initialization to clear cache if TILELANG_CLEAR_CACHE is set to true. * [Refactor] Update kernel invocation and import paths in tests and cache * Changed kernel invocation in `test_tilelang_kernel_dequantize_gemm.py` to return the result. * Updated import statements in `test_tilelang_kernel_int4_gemm_mma.py` to use `bitblas` instead of `tilelang`. * Refactored paths for artifact and parameters in `kernel_cache.py` for better maintainability. * [Refactor] Clean up whitespace and improve code formatting in kernel_cache.py * Removed unnecessary blank lines and adjusted spacing for better readability in the KernelCache class. * Enhanced overall code formatting to align with project standards. * [Enhancement] Add bfloat16 test case and improve kernel caching logic * Introduced a new test case for bfloat16 matrix multiplication in `test_tilelang_kernel_gemm_mma_intrinsic.py`. * Updated `KernelCache` to handle multiple kernel source files and improve error handling during saving and loading. * Refactored `JITKernel` to support instantiation from a database, enhancing flexibility in kernel management. * Adjusted `CtypesKernelAdapter` and `CythonKernelAdapter` to utilize the new kernel loading mechanism from the database. * Improved code formatting and readability across several files. * lint fix * Update bfloat16 matrix multiplication test case to use larger dimensions for improved coverage
-
- 11 Jan, 2025 2 commits
-
-
Lei Wang authored
* README.md fixed * update test ci * Lint and Typo Fix * Clang Format Lint Fix
-
Lei Wang authored
* Add format.sh script for code formatting and linting * docs update * center align the title * lint fix * add ignore * Add .gitignore for 3rdparty directory * Add requirements-dev.txt, requirements-test.txt, and requirements.txt * 3rdparty * Add gemm.h, CMakeLists.txt, _ffi_api.py, __init__.py, runtime.h, reduce.h, loop_partition.h, utils.h, and loop_vectorize.h * Refactor CMakeLists.txt and include statements - Update CMakeLists.txt to use a newer version of CMake and add project name - Remove unnecessary include directories Fix include paths in layout.cc, codegen.cc, codegen.h, rt_mod.cc, frontend_legalize.cc, inject_pipeline.cc, layout_inference.cc, loop_vectorize.cc, and lower_tile_op.cc - Update include paths to use relative paths instead of absolute paths * Update submodule for 3rdparty/tvm * update * load dll first * Refactor CMakeLists.txt and include statements * Refactor CMakeLists.txt and include statements * git keep update * Refactor CMakeLists.txt and include statements * Refactor CMakeLists.txt and include statements * refactor code structure * Update Readme * CMakeLists Customized * update readme * update README * update readme * update usage * with TVM_IMPORT_PYTHON_PATH to handle own tvm build python import * annotate lower transform global func with `transform` prefix * Migrate Simplify Pass from tilelang tvm branch * enhance system environment handling with __init__ and CMake * Initial commit * CODE_OF_CONDUCT.md committed * LICENSE committed * README.md committed * SECURITY.md committed * SUPPORT.md committed * CODE_OF_CONDUCT Commit * LICENSE Commit * SECURITY Commit * SUPPORT Commit * Modify Support * Update README.md * security ci update * remove examples * Update and implement clang-format * add composable kernel components * Migrate from latest update * submodule update * Test update * Update License * Spell check * lint fix * add clang-tidy to apply static analysis for c source * update tilelang examples * Update Install Docs * Refactor filetree * Enhance Install * conflict resloved * annotate_version * Initial Update * test fix * install * Implement setup.py * lint fix * Separate Init * Separate test * docker file commit * add logo * Update Readme and Examples * update readme * update logo * Implement AMD Installation * Add License * Update AMD MI300x Benchmark * update README * update mi300 benchmark scripts * update ignore * enhance build scirpt * update image * enhance setup.py to remove duplicated libraries * remove debug files * update readme * update image * update gemm examples * update flashattention README * readme update * add cmake into requirements * libinfo fix * auto update submodule * lint fix * Fix AMD Build and Test * Update check for transpose attribute for CDNA Arch * typo fix for amd * Implement Matmul Benchmark * Refactor Code * [TypoFix] Fix GEMM Example * [Docs] Init Linear Attention README * [TYPO] Typo fix * [Lint] Lint Fix * enhance example with intrinsics * [Enhancement] Improve Buffer Collection during IR Parser * [Dev] Introduce Current classmethod to get current frame * submodule update * fake test pass update * support thread_extent_api * code optimize * Add GEMM function implementation for matrix multiplication * Update logging format to reflect TileLang in logger messages * Refactor CMakeLists.txt for improved readability and set default build type to Release * Support Gemm SS Primitives Implementation * [README] Upload Tile Language Logo (#5) * update logo * Update README.md to enhance formatting and center the title --------- Co-authored-by:
microsoft-github-operations[bot] <55726097+microsoft-github-operations[bot]@users.noreply.github.com> Co-authored-by:
Microsoft Open Source <microsoftopensource@users.noreply.github.com> Co-authored-by:
Yu Cheng <yu.cheng@pku.edu.cn>
-