• Lei Wang's avatar
    [Debug] Improve Memory Layout Plot (#136) · e32311b2
    Lei Wang authored
    * Change default log level from WARNING to INFO in TileLang initialization
    
    * Refactor Flash Attention Variable-Length MHA Example with Cython Backend Support
    
    - Update `example_mha_fwd_varlen.py` to use Cython backend for kernel compilation
    - Remove unused imports and simplify function signature
    - Modify `flashattn` function to handle max sequence length as a separate argument
    - Update kernel call to include max sequence length parameter
    - Improve code readability and remove commented-out code
    - Add print statement to confirm successful assertion
    
    * Refactor code formatting in TileLang lowering and example files
    
    - Improve line breaks and code formatting in `lower.py`, `wrapper.py`, and `tensor.py`
    - Simplify line breaks and reduce unnecessary whitespace
    - Enhance code readability by adjusting indentation and line breaks
    - Update example MHA forward pass script with cleaner tensor initialization
    
    * Update TileLang kernel test with import path changes for MMA layout and macro generator
    
    - Modify import statements in test_tilelang_kernel_dequantize_gemm.py
    - Replace bitblas imports with tilelang.intrinsics imports for MMA-related utilities
    - Update main function to use tilelang.testing.main()
    
    * Add Block Sparse Attention Examples for TileLang and Triton
    
    - Implement block sparse attention kernels for both TileLang and Triton
    - Add utility functions for generating sparse attention masks using top-k and threshold methods
    - Support causal and variable-length attention scenarios
    - Include test cases for different sequence length configurations
    - Demonstrate block-level sparse attention with configurable parameters
    
    * Refactor Block Sparse Attention Examples with Code Style Improvements
    
    - Improve code formatting in block_sparse_attn_tilelang.py and block_sparse_attn_triton.py
    - Enhance readability by adjusting line breaks and indentation
    - Simplify kernel and function calls with better formatting
    - Add whitespace and line break improvements for better code clarity
    
    * Enhance Layout Plotting with Multi-Replication and Dynamic Visualization
    
    - Update plot_layout function to support multiple replications in thread and value mapping
    - Improve thread and value mapping to handle replicated layouts
    - Dynamically adjust figure size and legend positioning
    - Add print statements for saved plot file paths
    - Modify example fragment_mma_load_a.py to uncomment and enable warp and block layout plotting
    e32311b2
fragment_mma_load_a.py 3.84 KB