"ts/webui/vscode:/vscode.git/clone" did not exist on "e83fa1761590df215041f205acb64466e248e926"
- 23 Jul, 2025 1 commit
-
-
Wenhao Xie authored
* fix CI bugs in hopper * lint fix * Update bulk_copy.cc * Refactor bulk copy logic in LowerBulkCopy function - Removed unnecessary blank lines for improved code readability. - Enhanced stride validation by checking for null pointers in global stride calculations, ensuring robustness against symbolic strides. - Updated pass configuration handling in dynamic tile language tests to streamline dynamic alignment and TMA lower pass settings. * test fix * ci fix * Update flash-attention dependencies and clean up example code - Downgraded `flash-attn` dependency version in `requirements-test.txt` to `<=2.2.0`. - Removed unused imports and commented-out code in various example files to enhance readability and maintainability. - Updated the `flashattn` function signature to include default parameters for `block_M`, `block_N`, `num_stages`, and `threads`. - Cleaned up the `example_mha_fwd_varlen.py` and `example_mha_bwd_wgmma_pipelined.py` files by removing unnecessary comments and improving code clarity. - Deleted the `example_mha_inference.py` file as it is no longer needed. * Update CI workflow to remove `--user` flag from pip install commands - Removed the `--user` flag from the pip install commands in both the development and testing sections of the CI workflow to ensure proper installation of dependencies in the virtual environment. * Update CI workflow to include `--no-user` flag in pip install commands - Added the `--no-user` flag to the pip install commands in both the development and testing sections of the CI workflow to ensure dependencies are installed correctly within the virtual environment. * Update CI workflow to include `--no-user` flag in pip install command for wheel mode - Added the `--no-user` flag to the pip install command in the wheel mode section of the CI workflow to ensure dependencies are installed correctly within the virtual environment. * test fix * avoid conflict with system environments * test fix * add commnets --------- Co-authored-by:
Lei Wang <34334180+LeiWang1999@users.noreply.github.com> Co-authored-by:
LeiWang1999 <leiwang1999@outlook.com>
-
- 08 Jul, 2025 1 commit
-
-
Lei Wang authored
* [Refactor] Update tilelang kernel functions and remove unused imports - Refactored the `flashattn_fwd`, `flashattn_bwd_preprocess`, and `flashattn_bwd_postprocess` functions to utilize direct kernel calls instead of cached versions, improving clarity and performance. - Added `@tilelang.jit` decorators with specified output indices to enhance kernel compilation. - Removed unused import of `cached` from `tilelang`, streamlining the code. - Commented out the main testing function call in `test_tilelang_kernel_mha_bwd.py` for potential future use. * [Refactor] Simplify configuration generation in benchmark and example scripts - Refactored the `get_configs` functions in multiple benchmark and example scripts to utilize a dictionary-based approach for parameter configuration, improving readability and maintainability. - Updated the `flashattn` and `chunk_scan_fwd` functions to directly accept configuration parameters, enhancing flexibility in kernel tuning. - Removed redundant code and streamlined the configuration generation process across various files, ensuring consistency in how configurations are defined and utilized. * [Refactor] Update configuration handling in benchmark scripts - Refactored the `get_configs` functions in benchmark scripts to accept a variable argument list, improving flexibility in configuration management. - Enhanced the `matmul` and `flashattn` functions to utilize the updated configuration approach, streamlining parameter handling for kernel tuning. - Added `@autotune` decorators to relevant functions, ensuring consistent autotuning behavior across benchmarks. - Cleaned up redundant code and improved overall readability in the affected files. * [Refactor] Clean up formatting and update subproject commit - Updated the subproject commit reference in the TVM directory to indicate a dirty state. - Removed unnecessary blank lines and improved formatting in the `benchmark_matmul` and `benchmark_matmul_fp8` scripts for better readability. - Streamlined the function definitions in the `flashattn` example script to enhance clarity and maintainability. * [Refactor] Update AutoTuner configuration handling - Modified the AutoTuner class to check if kernel parameters are set before processing tunable arguments, improving robustness in configuration handling. - Enhanced the logic for skipping compilation when tunable parameters are already provided, ensuring efficient use of resources. - Updated comments for clarity and maintainability. * lint fix * Update TVM subproject commit to indicate dirty state and modify MHA backward test cases - Updated the subproject commit reference in the TVM directory to reflect a dirty state. - Adjusted the `test_mha_bwd` function to use a new configuration for the MHA backward tests, changing the context size from 128 to 256. - Uncommented the main testing function call for potential execution.
-
- 25 Jun, 2025 1 commit
-
-
Cunxiao Ni authored
* [Example] Update kernel compilation in examples to use @tilelang.jit - Refactored multiple examples to eliminate the use of `tilelang.compile` for kernel creation, directly invoking the functions instead. - Added `@tilelang.jit` decorators with appropriate output indices to enhance performance and maintainability. - Improved code clarity by simplifying the kernel invocation process across various examples, ensuring consistency in how kernels are defined and executed. * format * Update example_tilelang_sparse_gqa_decode_varlen_indice.py * Update example_dequant_gemm_fine_grained.py * Update example_gemm_autotune.py --------- Co-authored-by:Lei Wang <34334180+LeiWang1999@users.noreply.github.com>
-
- 28 May, 2025 1 commit
-
-
Lei Wang authored
* [Enhancement] Add commit ID to versioning and improve logging initialization * Updated `get_tilelang_version` to include an optional commit ID in the version string. * Enhanced the `TileLangBuilPydCommand` to write the version with commit ID to the VERSION file during the build process. * Introduced a new function `get_git_commit_id` in `version.py` to retrieve the current git commit hash. * Refactored logger initialization in `autotuner/__init__.py` to ensure handlers are set up only once, improving performance and clarity. * Minor fixes in `flatten_buffer.cc` and `kernel_cache.py` for better handling of versioning and logging. * [Refactor] Enhance AutoTuner and JITKernel for improved performance and caching * Refactored the AutoTuner class to include new methods for setting compilation and profiling arguments, enhancing configurability. * Introduced caching mechanisms for tuning results, allowing for faster retrieval of previously computed configurations. * Updated JITKernel to store tuning results, including latency and configuration details, improving the kernel's performance tracking. * Added new methods for generating cache keys and saving/loading results to/from disk, streamlining the tuning process. * Enhanced the overall structure and readability of the autotuning logic, ensuring better maintainability and clarity. * Minor adjustments in related modules to support the new caching and profiling features. * [Refactor] Clean up code formatting and improve readability in AutoTuner and related modules * Consolidated import statements and removed unnecessary line breaks for better readability. * Standardized function argument formatting across the AutoTuner and CompileArgs classes. * Enhanced consistency in the use of whitespace and indentation throughout the codebase. * Minor adjustments in the Profiler and JITKernel classes to improve clarity and maintainability. * Ensured that all changes adhere to the project's coding style guidelines. * [Refactor] Remove redundant type hints in AutoTuner modules * Simplified import statements in `__init__.py` and `param.py` by removing unnecessary duplicate type hints for `Any`. * Improved code readability and maintainability by streamlining type imports across the AutoTuner module. * [Refactor] Update AutoTuner configuration for improved profiling and target detection * Enhanced the AutoTuner configuration across multiple examples by adding `set_profile_args` to better manage profiling settings. * Standardized the use of `target="auto"` in compile arguments to ensure automatic target detection. * Removed redundant target specifications in certain instances to streamline the configuration process. * Improved overall clarity and maintainability of the autotuning logic in various example scripts. * [Refactor] Simplify code formatting and improve readability in example scripts * Consolidated function argument formatting in `benchmark_mla_decode_amd_tilelang.py`, `example_elementwise_add.py`, and `performance.py` for better clarity. * Removed unnecessary line breaks and standardized argument placement across multiple files. * Enhanced overall code readability and maintainability in autotuning examples and performance scripts. * [Refactor] Update JIT decorator usage across multiple files * Removed redundant parameters from the JIT decorator in various benchmark and example scripts, simplifying the code. * Standardized the import of the JIT decorator from `tilelang`, enhancing consistency across the codebase. * Improved overall readability and maintainability by consolidating import statements and cleaning up function definitions. * [Refactor] Standardize JIT decorator formatting across benchmark and example scripts * Simplified the formatting of the JIT decorator in multiple files by removing unnecessary line breaks. * Enhanced code readability and consistency in the usage of the JIT decorator across benchmark and example scripts. * Improved overall maintainability by ensuring uniformity in function definitions and decorator usage.
-
- 09 May, 2025 1 commit
-
-
Cunxiao Ni authored
* [CI] Add elementwise and gemv examples to CI. * fix lint * test * fix gemv lint * fix lint
-
- 31 Mar, 2025 1 commit
-
-
Lei Wang authored
* [Enhancement] Add support for CUDA architecture 8.9 in GEMM template - Introduced conditional inclusion of "gemm_sm89.h" for CUDA architectures 8.9 and above, enhancing compatibility with newer hardware. - This change ensures that the GEMM template can leverage optimizations specific to the 8.9 architecture, improving performance for users with compatible GPUs. * lintfix * [Refactor] Clean up includes in gemm_sm89.h - Removed duplicate inclusion of "common.h" and added "cuda_fp8.h" for improved clarity and organization. - This change enhances the maintainability of the code by ensuring that header files are included only once and in a logical order. * [Enhancement] Improve KernelCache with in-memory caching and detailed docstrings - Added an in-memory cache to the KernelCache class to enhance performance by reducing disk access. - Updated the __new__ method to initialize the memory cache and added logic to check the cache before loading from disk. - Enhanced docstrings across multiple methods to provide clearer explanations of parameters and return values, improving code readability and maintainability. - Implemented a clear_cache method to clear both in-memory and disk caches, ensuring efficient cache management. * lint fix * typofix * [Refactor] Update matmul and flashattn function calls to return structured results - Modified the matmul and flashattn function calls to return a single object containing latency, configuration, and reference latency, improving code clarity and reducing the number of returned variables. - Updated all relevant instances in benchmark and example scripts to accommodate the new return structure, ensuring consistent usage across the codebase. * lint fix
-
- 28 Mar, 2025 1 commit
-
-
botbw authored
* [doc/example] init gemv doc and examples * [example] add vectorized read * [example] use local register instead of smem * [example] add bench * [doc] update doc * [doc] refine doc * [lint] format code * [doc] add tips * [doc/example] fix typo * [example] use tmv_all_reduce * [doc] update doc accordingly * [doc] add benchmark table * [lint] format code
-