Commits · f005db9f6b78bc9031d169437076fc4d5eac45a1 · OpenDAS / tilelang

"...git@developer.sourcefind.cn:yangql/composable_kernel.git" did not exist on "68b61504a38e26ad5ad4c8be26e9653cfe62c6ed"

07 Apr, 2025 1 commit

[AutoTune] Refactor AutoTuneArtifact to utilize kernel as context instead of profiler (#344) · f005db9f

Lei Wang authored Apr 07, 2025

* [Enhancement] Update GEMM examples and autotuner for improved performance

- Modified `example_gemm_intrinsics.py` to enhance matrix multiplication configurations, increasing warp sizes and adjusting data types for better performance.
- Updated the kernel compilation process to utilize the new `tilelang.compile` method and improved latency measurement with the profiler.
- Refactored `example_gemm.py` to include a new autotuning configuration and ensure consistency in latency checks against reference results.
- Adjusted tensor supply generation in `tilelang/utils/tensor.py` to use `torch.randn` for better randomness in tensor initialization.
- Enhanced the `JITContext` in `tilelang/autotuner/__init__.py` to replace the profiler with a kernel instance for performance measurement, improving the overall structure of the autotuner.

* bug fix

* fix

* [Enhancement] Update convolution tests and profiling assertions

- Added a random seed setting for reproducibility in convolution tests.
- Removed several redundant convolution test cases to streamline the testing process.
- Updated the assertion in the matrix multiplication profiling to include a maximum mismatched ratio for improved accuracy in results.
- Enabled the main testing function for better test execution.

* lint fix

f005db9f

26 Mar, 2025 1 commit

[Refactor] Deprecated `T.Buffer` as arguments and rename related calls into `T.Tensor` (#281) · bf8a6fc1

Lei Wang authored Mar 26, 2025

* [Refactor] Improve flash attention example and layout comparison logic

- Removed unnecessary annotation for `lse_local_split` in the flash attention example to streamline the code.
- Updated the handling of `lse_local_split` to utilize parallel processing for better performance.
- Refactored kernel compilation and profiling logic to enhance clarity and maintainability in the flash attention example.
- Added a condition in `FragmentNode::IsEqual` to handle broadcast cases, improving the robustness of layout comparisons.

* lint fix

* [Enhancement] Add support for shared memory scope in Fill operation

- Introduced handling for `shared.dyn` and `shared` memory scopes in the Fill operation.
- Implemented parallel operation and layout inference for improved performance in shared memory scenarios.
- Updated thread loop partitioning and vectorization logic to accommodate new memory scope handling.

* [Refactor] Remove deprecated decorator and enhance Cython kernel handling

- Removed the deprecated decorator from the main module and added a new implementation in the utils module for better organization.
- Introduced a pointer map in the Cython kernel adapter to manage pointer arguments, improving runtime shape resolution.
- Updated the Cython kernel wrapper to utilize the new pointer map for handling kernel arguments.
- Enhanced error checking in the tensor utility functions to ensure static shapes are enforced.
- Added a new proxy module for buffer and tensor handling, streamlining the interface for TIR programs.

* [Feature] Add matrix multiplication test and kernel implementation

- Introduced a new test file `test_tilelang_language_ptr.py` that implements a matrix multiplication function using TileLang's primitives.
- The `matmul_test` function defines a kernel for performing tile-level GEMM operations with customizable block sizes and data types.
- Added a `run_matmul` function to compile and execute the kernel, along with a test function to validate the implementation.
- Updated the `proxy.py` file to enhance type handling for buffer and tensor proxies, ensuring compatibility with TIR programs.
- Minor formatting improvements in `deprecated.py` for better readability.

* lint fix

* [Refactor] Update tensor creation in matrix multiplication test

- Replaced `T.Tensor.from_ptr` with `T.make_tensor` in `matmul_test` for improved clarity and consistency.
- Updated imports in `__init__.py` to include `make_tensor`.
- Added `make_tensor` function in `proxy.py` to streamline tensor creation from pointers.

* [Refactor] Update tensor definitions across multiple files

- Replaced instances of `T.Tensor` with updated tensor definitions in various benchmark and example files to enhance consistency and clarity.
- Adjusted tensor shapes and types in functions related to matrix multiplication, attention mechanisms, and other operations.
- Improved documentation in README and example files to reflect changes in tensor usage.

* lint fix

* [Refactor] Update tensor types in attention and matrix multiplication examples

- Replaced instances of `T.Tensor` with `T.SharedTensor` and `T.FragmentTensor` in various attention and matrix multiplication functions to improve consistency and clarity.
- Adjusted tensor definitions in benchmark and example files to align with the new tensor types.
- Enhanced the overall structure and readability of the code by standardizing tensor usage across multiple files.

* lint fix

* [Refactor] Update tensor types in GEMM example and test files

- Replaced instances of `T.Tensor` with `T.LocalTensor` and `T.Buffer` in the GEMM example and related test functions to improve consistency and clarity.
- Enhanced the overall structure of the code by standardizing tensor usage across multiple files, aligning with recent updates in tensor definitions.

* [Refactor] Update tensor usage in customize.py

- Replaced instances of `T.Tensor` with `T.Buffer` in the `reshape` and `view` functions to enhance consistency with recent tensor definitions.
- Improved code clarity by standardizing buffer usage across the file.

* [Refactor] Update tensor types in test_tilelang_transform_annotate_device_regions.py

- Replaced instances of `T.Tensor` with `T.Buffer` in the `before` and `expected` methods of the `TestAnnotateThreadExtent` and `TestAnnotateDeviceScope` classes to enhance consistency with recent tensor definitions.
- Improved code clarity by standardizing buffer usage across the test file.

* [Refactor] Update tensor types to SharedBuffer and FragmentBuffer

- Replaced instances of `T.SharedTensor` and `T.FragmentTensor` with `T.SharedBuffer` and `T.FragmentBuffer` across multiple benchmark, example, and test files to enhance consistency with recent tensor definitions.
- Improved code clarity and structure by standardizing buffer usage in attention and matrix multiplication functions.

* [Refactor] Introduce Tensor alias for Buffer in proxy.py

- Added a new alias `Tensor` for `Buffer` in `proxy.py` to facilitate JIT compilation, ensuring that inputs and outputs are mapped with `torch.Tensor`.
- This change enhances clarity and consistency in tensor usage across the codebase.

bf8a6fc1

16 Mar, 2025 1 commit

[Refactor] Introduce KernelParam integration across modules (#223) · 3de9f13c

Lei Wang authored Mar 16, 2025

* [Refactor] Update KernelParam integration across modules

- Replaced instances of TensorType with KernelParam in various modules to standardize parameter handling.
- Updated JITKernel, BaseKernelAdapter, and CythonKernelAdapter to utilize KernelParam for improved type consistency.
- Enhanced Profiler class to include KernelParam in its parameters, ensuring better integration with the new parameter structure.
- Adjusted tensor handling in utility functions to accommodate the new KernelParam type, improving overall code clarity and maintainability.
- Updated copyright headers to reflect the correct organization.

* [Refactor] Clean up whitespace in kernel, profiler, and tensor modules

- Added blank lines for improved readability in kernel.py, __init__.py, and tensor.py.
- Enhanced code clarity by ensuring consistent formatting across these modules.

* [Enhancement] Add detailed docstrings to KernelParam and Profiler classes

- Enhanced KernelParam class with comprehensive docstrings for better understanding of its purpose and methods.
- Updated Profiler class to include detailed docstrings for its attributes and methods, improving code documentation and usability.
- Removed unused do_bench function to streamline the profiler module and improve clarity.

* [Refactor] Update type hints in do_bench function and clean up whitespace in profiler module

- Changed type hints for grad_to_none and quantiles parameters in do_bench function to use Optional for better clarity.
- Added a blank line in __init__.py for improved readability and consistency in the profiler module.

* [Refactor] Update type hint in do_bench function for consistency

- Changed the return type hint in the do_bench function from a union type to a more explicit List type for better clarity and consistency in type annotations.

* [Refactor] Update return type hint in do_bench function for clarity

- Changed the return type hint in the do_bench function from a union type to Union[float, List[float]] for improved clarity and consistency in type annotations.

* [Enhancement] Add func property to Profiler class for adapter access

- Introduced a new property `func` in the Profiler class to provide access to the adapter, ensuring that the adapter is set before retrieval. This enhancement improves the usability of the Profiler class by allowing easier access to the adapter functionality.

* [Refactor] Update kernel compilation and profiling in tests

- Replaced instances of `TL.lower` and `TL.Profiler` with `tilelang.compile` and the new profiler interface across multiple test files.
- Enhanced the kernel compilation process to utilize the updated API, improving consistency and maintainability in the testing framework.
- Updated assertions to use the new profiler methods for better clarity and functionality in performance testing.

* [Refactor] Simplify kernel invocation and remove unused parameters in tests

- Updated the kernel invocation in `test_tilelang_dynamic_symbolic.py` to directly assign the result to `C`, improving clarity.
- Removed the `execution_backend` parameter from `tilelang.compile` calls in `test_tilelang_jit_callback.py` and `test_tilelang_jit_gemm.py` for consistency with the updated API.
- Commented out the call to `tilelang.testing.main()` in `test_tilelang_jit_callback.py` and replaced it with a direct call to `test_gemm_jit_kernel()` to streamline test execution.
- Adjusted the dtype mapping in `TorchDLPackKernelAdapter` to use the parameter's dtype directly, enhancing code simplicity.

* [Refactor] Remove unused imports in test files for cleaner code

- Eliminated unnecessary imports of `tilelang` as `TL` in various test files to enhance code clarity and maintainability.
- Updated multiple test files to streamline the codebase and reduce potential confusion from unused references.

* [Refactor] Simplify kernel invocation in tilelang kernel test

- Updated the kernel invocation in `test_tilelang_kernel_bf16_gemm_mma.py` to directly assign the result to `C`, enhancing code clarity and consistency with recent changes in the API.

* [Refactor] Simplify kernel invocation in tilelang kernel tests

- Updated kernel invocations in multiple test files to directly assign the result to `C`, improving code clarity and consistency with the updated API.
- Removed unnecessary initialization of `C` as a zero tensor, streamlining the code further.

* [Refactor] Update kernel invocation in tilelang transform tests

- Replaced the use of `TL.Profiler` with `tilelang.compile` in `test_tilelang_transform_simplify.py`, enhancing code clarity and consistency with the updated API.
- Streamlined the kernel invocation process by directly assigning the result to `C`, improving readability and maintainability of the test code.

3de9f13c

25 Jan, 2025 1 commit
- [CI][Test] Add test cases for tilelang kernel convolution (#51) · 34de04a6
  Yu Cheng authored Jan 25, 2025
```
* [CI][Test] Add test cases for tilelang kernel convolution
```
  34de04a6