- 09 Oct, 2025 1 commit
-
-
Lei Wang authored
* [Feature] Introduce WGMMA support and enhance GEMM layout handling - Added support for the WGMMA intrinsic in the TileLang framework, enabling efficient matrix multiplication on newer architectures. - Refactored GEMM layout functions to accept a boolean parameter for K dimension handling, improving flexibility in layout generation. - Updated layout inference logic to accommodate new WGMMA configurations and ensure compatibility with existing GEMM operations. - Enhanced Python bindings for layout functions, allowing for better integration and usability in user-defined operations. - Improved documentation for layout functions and GEMM operations to clarify usage and parameters. These changes enhance the performance and usability of GEMM operations, particularly for advanced architectures, while maintaining backward compatibility with existing implementations. * [Refactor] Clean up code formatting and enhance layout function readability - Improved code formatting across multiple files for better readability, including consistent indentation and line breaks. - Updated layout function signatures to enhance clarity, particularly in `gemm_layouts.cc`, `layout.cc`, and `layout.h`. - Refactored lambda functions in `builtin.cc` and `gemm_py.cc` for improved structure and maintainability. - Enhanced comments and documentation in layout-related files to clarify usage and parameters. These changes contribute to a cleaner codebase and improved maintainability of layout functions in the TileLang framework. * [Feature] Add descriptor initialization and offset manipulation for WGMMA - Introduced new TileLang builtins `initialize_descriptor` and `increase_descriptor_offset` to facilitate descriptor management for WGMMA operations. - Updated `builtin.cc` and `builtin.h` to define and document the new builtins, enhancing the framework's capabilities for descriptor handling. - Modified `codegen_cuda.cc` and `ptx.cc` to integrate the new builtins into the code generation process, ensuring proper assembly generation for WGMMA operations. - Enhanced the `GemmWGMMA` class to utilize the new descriptor functionalities, improving the efficiency of matrix multiplication operations. - Updated related tests and documentation to reflect the new features and ensure comprehensive coverage. These changes enhance the TileLang framework's support for advanced matrix operations on newer architectures, improving performance and usability. * [Refactor] Improve code formatting and readability in various files - Enhanced code formatting across multiple files for better readability, including consistent indentation and line breaks. - Updated function signatures and comments in `builtin.h`, `codegen_cuda.cc`, and `ptx.cc` to improve clarity. - Refactored descriptor initialization and offset manipulation functions in `builtin.py` and `wgmma_macro_generator.py` for improved structure. - Cleaned up unnecessary whitespace and improved alignment in `common.h` and `allocate.py`. These changes contribute to a cleaner and more maintainable codebase in the TileLang framework. * [Update] Update subproject commit and refactor layout function call - Updated the subproject commit for `cutlass` to indicate a dirty state. - Refactored the `UpdateAnalyzer` function in `layout.cc` to call `LayoutNode::getVarMap()` instead of `getVarMap()`, improving clarity and ensuring proper context for variable mapping. These changes enhance the maintainability and clarity of the layout handling in the TileLang framework. * support more data types * gemm_rs support * lint fix * wgmma wrapper * Remove debug logging for wgmma assembly code and refactor swizzle byte size calculations in wgmma macro generator. Enhanced handling of leading and stride byte offsets based on swizzle mode, improving clarity and performance in tensor core intrinsic emissions. * Refactor GEMM layout functions to replace 'kfactor' with 'k_inner' for improved clarity and consistency. Update includes necessary changes in error messages for Hopper and Sm100 layouts. Additionally, include a new header for CUTE utilities in common.h. * Comprehensively support WGMMA GEMM SS * remove debug print * lint fix * remove debug print * reduce bwd test shape * lint fix * clear cache for pytest * lint fix * Update sparse MLA examples to support SKV adjustment and correctness checks - Changed SKV parameter from 32768 to 8192 in sparse MLA backward and forward tests. - Added check_correctness parameter to test functions for validation of outputs. - Updated test cases to reflect new SKV values and correctness checks. * test fix * adjust test case * test fix * skip some test currently
-
- 04 Oct, 2025 1 commit
-
-
Tong WU authored
* [Enhancement] Enhance the GQA backward kernel by calculating `dq` and `dv` via copy&sum * [Example] Implement GQA backward example for Hopper with customized tiling and pipeline * [Example] Add relevant tests * Fix all typos of wrong shape of `V_shared` in macros
-
- 26 Sep, 2025 1 commit
-
-
Tong WU authored
* [Example] Add a new example to support attention sink for MHA - Introduced a new example script for multi-head attention (MHA) with sliding window attention and sink tokens. - Added a reference attention function to validate the implementation against PyTorch. - Included argument parsing for command-line execution of the example. * [Example] Replace MHA sink forward example with updated implementation - Removed the old example script for multi-head attention (MHA) with sliding window attention and sink tokens. - Introduced a new example script that modifies the attention mechanism to enhance performance and maintainability. - Updated argument parsing and reference functions to align with the new implementation. * Enhance MHA sink example with sliding window support - Added a `window_size` parameter to the `flashattn` function to enable sliding window attention. - Implemented assertions to ensure `window_size` is compatible with `block_N`. - Updated the main function to include a `tune` option for performance tuning. - Introduced a new test file to validate both full attention and sliding window scenarios. - Adjusted FLOPS calculation to account for the sliding window configuration. * lint * [Fix] Add checkinf process to fix the bug of swa * Migrate to BSHD layout to align with triton baselines * lint * fix typo * Refactor MHA sink example to use seq_q and seq_kv parameters to accommodate the new sequence length parameters. * Add GQA sink example for optimized attention mechanism & lint fix * fix several typos and bugs * lint * fix speed issues of swa * Add flash attention example with backward pass for BHSD layout and corresponding test cases * Add backward pass implementation for flash attention with sinks and corresponding test case * fix lint and typo * Optimze the calculation of `dsinks` * Add support for swa backward and update examples * fix previous typos * Add example for GQA sink backward pass and update tests for both MHA and GQA sinks * fix lint * fix previous typos * typo
-
- 23 Jul, 2025 1 commit
-
-
Wenhao Xie authored
* fix CI bugs in hopper * lint fix * Update bulk_copy.cc * Refactor bulk copy logic in LowerBulkCopy function - Removed unnecessary blank lines for improved code readability. - Enhanced stride validation by checking for null pointers in global stride calculations, ensuring robustness against symbolic strides. - Updated pass configuration handling in dynamic tile language tests to streamline dynamic alignment and TMA lower pass settings. * test fix * ci fix * Update flash-attention dependencies and clean up example code - Downgraded `flash-attn` dependency version in `requirements-test.txt` to `<=2.2.0`. - Removed unused imports and commented-out code in various example files to enhance readability and maintainability. - Updated the `flashattn` function signature to include default parameters for `block_M`, `block_N`, `num_stages`, and `threads`. - Cleaned up the `example_mha_fwd_varlen.py` and `example_mha_bwd_wgmma_pipelined.py` files by removing unnecessary comments and improving code clarity. - Deleted the `example_mha_inference.py` file as it is no longer needed. * Update CI workflow to remove `--user` flag from pip install commands - Removed the `--user` flag from the pip install commands in both the development and testing sections of the CI workflow to ensure proper installation of dependencies in the virtual environment. * Update CI workflow to include `--no-user` flag in pip install commands - Added the `--no-user` flag to the pip install commands in both the development and testing sections of the CI workflow to ensure dependencies are installed correctly within the virtual environment. * Update CI workflow to include `--no-user` flag in pip install command for wheel mode - Added the `--no-user` flag to the pip install command in the wheel mode section of the CI workflow to ensure dependencies are installed correctly within the virtual environment. * test fix * avoid conflict with system environments * test fix * add commnets --------- Co-authored-by:
Lei Wang <34334180+LeiWang1999@users.noreply.github.com> Co-authored-by:
LeiWang1999 <leiwang1999@outlook.com>
-
- 06 Jun, 2025 1 commit
-
-
xs-keju authored
* [CI] Add CI test for flash_attention examples * Update example_gqa_fwd_bshd.py * Update example_mha_fwd_bshd_wgmma_pipelined.py * [CI] Added conditional annotations for tests in flash_attention * [CI] Added conditional annotations for tests in flash_attention --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-