1. 09 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Feat] Introduce new caching mechanism for compiled kernels (#176) · 7bde63d5
      Lei Wang authored
      * Add kernel caching mechanism to TileLang
      
      - Implement a new `cached` function in `tilelang/cache/__init__.py` to cache and reuse compiled kernels
      - Expose the `cached` function in the main `tilelang/__init__.py`
      - Add a test case for cached matrix multiplication in `testing/python/cache/test_tilelang_cache_matmul.py`
      - Provide a `clear_cache()` function to reset the kernel cache when needed
      
      * Refactor kernel caching test and implementation
      
      - Simplify the `cached` function in `tilelang/cache/__init__.py`
      - Update test script `test_tilelang_cache_matmul.py` to use `tilelang.testing.main()`
      - Remove unnecessary whitespace and improve code formatting
      
      * Update import for `cached` function in MHA examples
      
      - Modify import statement in `example_mha_bwd.py` and `test_tilelang_kernel_mha_bwd.py`
      - Change import from `tilelang.profiler import cached` to `tilelang import cached`
      - Align with recent refactoring of kernel caching mechanism
      
      * Refactor `cached` function signature in kernel caching
      
      - Update function signature to use keyword-only arguments for `target` and `target_host`
      - Improve parameter order and readability of the `cached` decorator
      - Maintain existing functionality while enhancing function definition
      7bde63d5
  2. 07 Mar, 2025 2 commits
    • Lei Wang's avatar
      [Example] Implement tilelang native sparse attention varlen example (#170) · 8e1845d2
      Lei Wang authored
      * [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation
      
      - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py
      - Modify roller hints generation using new TileLang Carver template and utility functions
      - Update get_roller_hints_from_func to handle None cases and improve return logic
      - Adjust DefaultPolicy to handle different codegen dictionary formats
      
      * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels
      
      - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files
      - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples
      - Move map_torch_type utility function to tilelang.utils.tensor
      - Remove unnecessary imports and improve code organization
      
      * Refactor Native Sparse Attention Example with Enhanced Triton Kernel
      
      - Update parallel_nsa_fwd_kernel to support more flexible sparse attention computation
      - Add support for block counts and offsets in the Triton kernel
      - Modify kernel grid and computation logic for improved performance
      - Update example script to use naive_nsa_simple reference implementation
      - Improve type hints and kernel configuration
      
      * Add Native Sparse Attention Examples with Tilelang and Triton Implementations
      
      - Introduce new example scripts for native sparse attention:
        * example_tilelang_nsa_fwd.py: Forward pass implementation using TileLang
        * example_tilelang_nsa_decode.py: Decoding-specific sparse attention implementation
        * example_triton_nsa_fwd.py: Triton-based sparse attention forward pass
      - Update reference.py with naive implementations for sparse attention
      - Support different sparse attention scenarios including forward pass and inference
      - Add comprehensive testing and validation against reference implementations
      
      * lint fix
      
      * Add Variable-Length Native Sparse Attention Examples for TileLang and Triton
      
      - Introduce new example scripts for variable-length native sparse attention:
        * example_tilelang_nsa_fwd_varlen.py: TileLang implementation with variable sequence lengths
        * example_triton_nsa_fwd_varlen.py: Triton implementation with variable sequence lengths
      - Update reference.py to support variable-length sparse attention scenarios
      - Enhance existing sparse attention implementations to handle variable-length inputs
      - Add comprehensive testing and validation for variable-length sparse attention
      
      * Refactor Native Sparse Attention Examples: Code Style and Formatting Improvements
      
      - Standardize function and parameter formatting across NSA example files
      - Improve code readability by adjusting indentation and line breaks
      - Enhance type hints and parameter alignment
      - Remove unnecessary whitespaces and optimize imports
      - Maintain consistent code style across TileLang and Triton implementations
      8e1845d2
    • Lei Wang's avatar
      [Refactor] Replace `T.thread_binding` with `T.get_thread_binding` in examples and test cases (#163) · de1ba1e4
      Lei Wang authored
      * [Refactor] Update BitBLAS Benchmark with TileLang Carver Imports and Roller Hints Generation
      
      - Replace BitBLAS imports with TileLang Carver imports in benchmark_matmul.py
      - Modify roller hints generation using new TileLang Carver template and utility functions
      - Update get_roller_hints_from_func to handle None cases and improve return logic
      - Adjust DefaultPolicy to handle different codegen dictionary formats
      
      * [Refactor] Update Thread Binding and Import Statements in TileLang Kernels
      
      - Replace T.thread_binding() with T.get_thread_binding() across multiple kernel test files
      - Update import statements for MMA layout and macro generator in dequantize GEMM and FP8 examples
      - Move map_torch_type utility function to tilelang.utils.tensor
      - Remove unnecessary imports and improve code organization
      de1ba1e4
  3. 05 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Enhancement] Enable runtime tensor data type validation (#146) · d0434c3e
      Lei Wang authored
      * Fix debug print buffer template for unsigned char type
      
      - Update debug_print_buffer_value template specialization for unsigned char
      - Modify test_tilelang_debug_print.py to include additional dtype tests
      - Add test case for uint8 dtype in debug print buffer function
      
      * Refactor debug print buffer template formatting for unsigned char
      
      - Improve code formatting for debug_print_buffer_value template specialization
      - Adjust line breaks and indentation for better readability
      - Maintain consistent code style with other template specializations
      
      * Extract map_torch_type utility function to tilelang.utils.tensor
      
      - Move map_torch_type function from multiple test files to a centralized location
      - Import map_torch_type from tilelang.utils.tensor in kernel test files
      - Improve code reusability by creating a shared utility function for type mapping
      
      * Add buffer dtype mapping for Cython kernel adapter
      
      - Introduce buffer_dtype_map in CythonKernelAdapter to track buffer variable dtypes
      - Add _process_buffer_dtype method to extract dtype information from TIR function
      - Update CythonKernelWrapper to support setting and validating buffer dtypes
      - Enhance type checking during kernel execution with dtype verification
      - Improve logging message for Cython JIT adapter compilation
      
      * Add static shape mapping for Cython kernel adapter
      
      - Introduce static_shape_map in CythonKernelAdapter to track buffer variable static shapes
      - Add _process_static_shape method to extract static shape information from TIR function
      - Update CythonKernelWrapper to support setting and validating static shapes
      - Enhance type checking during kernel execution with static shape verification
      
      * Add Multi-Head Attention (MHA) Backward Pass Test for TileLang Kernel
      
      - Implement comprehensive test for Multi-Head Attention backward pass
      - Support both causal and non-causal attention scenarios
      - Add reference implementation for comparing kernel outputs
      - Test different batch sizes, head counts, sequence lengths, and head dimensions
      - Verify forward and backward pass correctness using torch.testing.assert_close
      
      * Set random seed for MHA backward pass test
      
      - Add random seed initialization for consistent test reproducibility
      - Use tilelang.testing.set_random_seed(42) to ensure deterministic test results
      d0434c3e
  4. 04 Mar, 2025 1 commit
    • Lei Wang's avatar
      [Bugfix] Add missing definition for AtomicAdd (#138) · 3960d3d0
      Lei Wang authored
      * Change default log level from WARNING to INFO in TileLang initialization
      
      * Refactor Flash Attention Variable-Length MHA Example with Cython Backend Support
      
      - Update `example_mha_fwd_varlen.py` to use Cython backend for kernel compilation
      - Remove unused imports and simplify function signature
      - Modify `flashattn` function to handle max sequence length as a separate argument
      - Update kernel call to include max sequence length parameter
      - Improve code readability and remove commented-out code
      - Add print statement to confirm successful assertion
      
      * Refactor code formatting in TileLang lowering and example files
      
      - Improve line breaks and code formatting in `lower.py`, `wrapper.py`, and `tensor.py`
      - Simplify line breaks and reduce unnecessary whitespace
      - Enhance code readability by adjusting indentation and line breaks
      - Update example MHA forward pass script with cleaner tensor initialization
      
      * Update TileLang kernel test with import path changes for MMA layout and macro generator
      
      - Modify import statements in test_tilelang_kernel_dequantize_gemm.py
      - Replace bitblas imports with tilelang.intrinsics imports for MMA-related utilities
      - Update main function to use tilelang.testing.main()
      
      * Add Block Sparse Attention Examples for TileLang and Triton
      
      - Implement block sparse attention kernels for both TileLang and Triton
      - Add utility functions for generating sparse attention masks using top-k and threshold methods
      - Support causal and variable-length attention scenarios
      - Include test cases for different sequence length configurations
      - Demonstrate block-level sparse attention with configurable parameters
      
      * Refactor Block Sparse Attention Examples with Code Style Improvements
      
      - Improve code formatting in block_sparse_attn_tilelang.py and block_sparse_attn_triton.py
      - Enhance readability by adjusting line breaks and indentation
      - Simplify kernel and function calls with better formatting
      - Add whitespace and line break improvements for better code clarity
      
      * Enhance Layout Plotting with Multi-Replication and Dynamic Visualization
      
      - Update plot_layout function to support multiple replications in thread and value mapping
      - Improve thread and value mapping to handle replicated layouts
      - Dynamically adjust figure size and legend positioning
      - Add print statements for saved plot file paths
      - Modify example fragment_mma_load_a.py to uncomment and enable warp and block layout plotting
      
      * Refactor AtomicAdd functions in CUDA common header
      
      - Implement a generic template for AtomicAdd function
      - Specialize templates for half_t, bfloat16_t, and pointer types
      - Reorganize and clean up existing AtomicAdd implementations
      - Improve type handling and conversion in atomic operations
      
      * Remove unused import in MHA backward test file
      
      - Remove unnecessary argparse import from test_tilelang_kenrel_mha_bwd.py
      - Add blank line for improved code formatting
      - Minor code cleanup in test file
      3960d3d0
  5. 11 Feb, 2025 1 commit
    • Yu Cheng's avatar
      [Dev] Add mha backward example (#77) · a6fe61e2
      Yu Cheng authored
      * [CI][Test] Add test cases for tilelang transform MultiVersionBuffer and WarpSpecialized
      
      * Relax the mismatch ratio restrictions in the flash_linear_attention and mha tests
      
      * [Dev] Add mha backward example
      a6fe61e2