Commits · 166a9585f8e3a169690107929c1a36c84b3f9807 · OpenDAS / tilelang

"examples/git@developer.sourcefind.cn:OpenDAS/nni.git" did not exist on "25c4c3b5fa472a07cab61a964a1ae632b4ee5925"

07 Mar, 2025 1 commit

[Refactor] Replace `T.thread_binding` with `T.get_thread_binding` in examples and test cases (#163) · de1ba1e4

Lei Wang authored Mar 07, 2025

* [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation

- Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py
- Modify roller hints generation using new TileLang Carver template and utility functions
- Update get_roller_hints_from_func to handle None cases and improve return logic
- Adjust DefaultPolicy to handle different codegen dictionary formats

* [Refactor] Update Thread Binding and Import Statements in TileLang Kernels

- Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files
- Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples
- Move map_torch_type utility function to tilelang.utils.tensor
- Remove unnecessary imports and improve code organization

de1ba1e4

05 Mar, 2025 1 commit

[Enhancement] Enable runtime tensor data type validation (#146) · d0434c3e

Lei Wang authored Mar 05, 2025

* Fix debug print buffer template for unsigned char type

- Update debug_print_buffer_value template specialization for unsigned char
- Modify test_tilelang_debug_print.py to include additional dtype tests
- Add test case for uint8 dtype in debug print buffer function

* Refactor debug print buffer template formatting for unsigned char

- Improve code formatting for debug_print_buffer_value template specialization
- Adjust line breaks and indentation for better readability
- Maintain consistent code style with other template specializations

* Extract map_torch_type utility function to tilelang.utils.tensor

- Move map_torch_type function from multiple test files to a centralized location
- Import map_torch_type from tilelang.utils.tensor in kernel test files
- Improve code reusability by creating a shared utility function for type mapping

* Add buffer dtype mapping for Cython kernel adapter

- Introduce buffer_dtype_map in CythonKernelAdapter to track buffer variable dtypes
- Add _process_buffer_dtype method to extract dtype information from TIR function
- Update CythonKernelWrapper to support setting and validating buffer dtypes
- Enhance type checking during kernel execution with dtype verification
- Improve logging message for Cython JIT adapter compilation

* Add static shape mapping for Cython kernel adapter

- Introduce static_shape_map in CythonKernelAdapter to track buffer variable static shapes
- Add _process_static_shape method to extract static shape information from TIR function
- Update CythonKernelWrapper to support setting and validating static shapes
- Enhance type checking during kernel execution with static shape verification

* Add Multi-Head Attention (MHA) Backward Pass Test for TileLang Kernel

- Implement comprehensive test for Multi-Head Attention backward pass
- Support both causal and non-causal attention scenarios
- Add reference implementation for comparing kernel outputs
- Test different batch sizes, head counts, sequence lengths, and head dimensions
- Verify forward and backward pass correctness using torch.testing.assert_close

* Set random seed for MHA backward pass test

- Add random seed initialization for consistent test reproducibility
- Use tilelang.testing.set_random_seed(42) to ensure deterministic test results

d0434c3e

04 Mar, 2025 1 commit

[Bugfix] Add missing definition for AtomicAdd (#138) · 3960d3d0

Lei Wang authored Mar 04, 2025

* Change default log level from WARNING to INFO in TileLang initialization

* Refactor Flash Attention Variable-Length MHA Example with Cython Backend Support

- Update `example_mha_fwd_varlen.py` to use Cython backend for kernel compilation
- Remove unused imports and simplify function signature
- Modify `flashattn` function to handle max sequence length as a separate argument
- Update kernel call to include max sequence length parameter
- Improve code readability and remove commented-out code
- Add print statement to confirm successful assertion

* Refactor code formatting in TileLang lowering and example files

- Improve line breaks and code formatting in `lower.py`, `wrapper.py`, and `tensor.py`
- Simplify line breaks and reduce unnecessary whitespace
- Enhance code readability by adjusting indentation and line breaks
- Update example MHA forward pass script with cleaner tensor initialization

* Update TileLang kernel test with import path changes for MMA layout and macro generator

- Modify import statements in test_tilelang_kernel_dequantize_gemm.py
- Replace bitblas imports with tilelang.intrinsics imports for MMA-related utilities
- Update main function to use tilelang.testing.main()

* Add Block Sparse Attention Examples for TileLang and Triton

- Implement block sparse attention kernels for both TileLang and Triton
- Add utility functions for generating sparse attention masks using top-k and threshold methods
- Support causal and variable-length attention scenarios
- Include test cases for different sequence length configurations
- Demonstrate block-level sparse attention with configurable parameters

* Refactor Block Sparse Attention Examples with Code Style Improvements

- Improve code formatting in block_sparse_attn_tilelang.py and block_sparse_attn_triton.py
- Enhance readability by adjusting line breaks and indentation
- Simplify kernel and function calls with better formatting
- Add whitespace and line break improvements for better code clarity

* Enhance Layout Plotting with Multi-Replication and Dynamic Visualization

- Update plot_layout function to support multiple replications in thread and value mapping
- Improve thread and value mapping to handle replicated layouts
- Dynamically adjust figure size and legend positioning
- Add print statements for saved plot file paths
- Modify example fragment_mma_load_a.py to uncomment and enable warp and block layout plotting

* Refactor AtomicAdd functions in CUDA common header

- Implement a generic template for AtomicAdd function
- Specialize templates for half_t, bfloat16_t, and pointer types
- Reorganize and clean up existing AtomicAdd implementations
- Improve type handling and conversion in atomic operations

* Remove unused import in MHA backward test file

- Remove unnecessary argparse import from test_tilelang_kenrel_mha_bwd.py
- Add blank line for improved code formatting
- Minor code cleanup in test file

3960d3d0

11 Feb, 2025 1 commit

[Dev] Add mha backward example (#77) · a6fe61e2

Yu Cheng authored Feb 12, 2025

* [CI][Test] Add test cases for tilelang transform MultiVersionBuffer and WarpSpecialized

* Relax the mismatch ratio restrictions in the flash_linear_attention and mha tests

* [Dev] Add mha backward example

a6fe61e2