• Lei Wang's avatar
    [Bugfix] Add missing definition for AtomicAdd (#138) · 3960d3d0
    Lei Wang authored
    * Change default log level from WARNING to INFO in TileLang initialization
    
    * Refactor Flash Attention Variable-Length MHA Example with Cython Backend Support
    
    - Update `example_mha_fwd_varlen.py` to use Cython backend for kernel compilation
    - Remove unused imports and simplify function signature
    - Modify `flashattn` function to handle max sequence length as a separate argument
    - Update kernel call to include max sequence length parameter
    - Improve code readability and remove commented-out code
    - Add print statement to confirm successful assertion
    
    * Refactor code formatting in TileLang lowering and example files
    
    - Improve line breaks and code formatting in `lower.py`, `wrapper.py`, and `tensor.py`
    - Simplify line breaks and reduce unnecessary whitespace
    - Enhance code readability by adjusting indentation and line breaks
    - Update example MHA forward pass script with cleaner tensor initialization
    
    * Update TileLang kernel test with import path changes for MMA layout and macro generator
    
    - Modify import statements in test_tilelang_kernel_dequantize_gemm.py
    - Replace bitblas imports with tilelang.intrinsics imports for MMA-related utilities
    - Update main function to use tilelang.testing.main()
    
    * Add Block Sparse Attention Examples for TileLang and Triton
    
    - Implement block sparse attention kernels for both TileLang and Triton
    - Add utility functions for generating sparse attention masks using top-k and threshold methods
    - Support causal and variable-length attention scenarios
    - Include test cases for different sequence length configurations
    - Demonstrate block-level sparse attention with configurable parameters
    
    * Refactor Block Sparse Attention Examples with Code Style Improvements
    
    - Improve code formatting in block_sparse_attn_tilelang.py and block_sparse_attn_triton.py
    - Enhance readability by adjusting line breaks and indentation
    - Simplify kernel and function calls with better formatting
    - Add whitespace and line break improvements for better code clarity
    
    * Enhance Layout Plotting with Multi-Replication and Dynamic Visualization
    
    - Update plot_layout function to support multiple replications in thread and value mapping
    - Improve thread and value mapping to handle replicated layouts
    - Dynamically adjust figure size and legend positioning
    - Add print statements for saved plot file paths
    - Modify example fragment_mma_load_a.py to uncomment and enable warp and block layout plotting
    
    * Refactor AtomicAdd functions in CUDA common header
    
    - Implement a generic template for AtomicAdd function
    - Specialize templates for half_t, bfloat16_t, and pointer types
    - Reorganize and clean up existing AtomicAdd implementations
    - Improve type handling and conversion in atomic operations
    
    * Remove unused import in MHA backward test file
    
    - Remove unnecessary argparse import from test_tilelang_kenrel_mha_bwd.py
    - Add blank line for improved code formatting
    - Minor code cleanup in test file
    3960d3d0
common.h 4.95 KB