"vscode:/vscode.git/clone" did not exist on "4befb831202e38c239e60ddef88214c564ee1cd7"
  • Lei Wang's avatar
    [Kernel] Implement different SEQ Q/KV examples with block sparse (#133) · 159af5df
    Lei Wang authored
    * Change default log level from WARNING to INFO in TileLang initialization
    
    * Refactor Flash Attention Variable-Length MHA Example with Cython Backend Support
    
    - Update `example_mha_fwd_varlen.py` to use Cython backend for kernel compilation
    - Remove unused imports and simplify function signature
    - Modify `flashattn` function to handle max sequence length as a separate argument
    - Update kernel call to include max sequence length parameter
    - Improve code readability and remove commented-out code
    - Add print statement to confirm successful assertion
    
    * Refactor code formatting in TileLang lowering and example files
    
    - Improve line breaks and code formatting in `lower.py`, `wrapper.py`, and `tensor.py`
    - Simplify line breaks and reduce unnecessary whitespace
    - Enhance code readability by adjusting indentation and line breaks
    - Update example MHA forward pass script with cleaner tensor initialization
    
    * Update TileLang kernel test with import path changes for MMA layout and macro generator
    
    - Modify import statements in test_tilelang_kernel_dequantize_gemm.py
    - Replace bitblas imports with tilelang.intrinsics imports for MMA-related utilities
    - Update main function to use tilelang.testing.main()
    
    * Add Block Sparse Attention Examples for TileLang and Triton
    
    - Implement block sparse attention kernels for both TileLang and Triton
    - Add utility functions for generating sparse attention masks using top-k and threshold methods
    - Support causal and variable-length attention scenarios
    - Include test cases for different sequence length configurations
    - Demonstrate block-level sparse attention with configurable parameters
    
    * Refactor Block Sparse Attention Examples with Code Style Improvements
    
    - Improve code formatting in block_sparse_attn_tilelang.py and block_sparse_attn_triton.py
    - Enhance readability by adjusting line breaks and indentation
    - Simplify kernel and function calls with better formatting
    - Add whitespace and line break improvements for better code clarity
    159af5df
example_mha_fwd_bhsd.py 9.65 KB