• Tong WU's avatar
    [Example] Add examples to support efficient attention sink forward process (#853) · d9a171ce
    Tong WU authored
    
    
    * [Example] Add a new example to support attention sink for MHA
    
    - Introduced a new example script for multi-head attention (MHA) with sliding window attention and sink tokens.
    - Added a reference attention function to validate the implementation against PyTorch.
    - Included argument parsing for command-line execution of the example.
    
    * [Example] Replace MHA sink forward example with updated implementation
    
    - Removed the old example script for multi-head attention (MHA) with sliding window attention and sink tokens.
    - Introduced a new example script that modifies the attention mechanism to enhance performance and maintainability.
    - Updated argument parsing and reference functions to align with the new implementation.
    
    * Enhance MHA sink example with sliding window support
    
    - Added a `window_size` parameter to the `flashattn` function to enable sliding window attention.
    - Implemented assertions to ensure `window_size` is compatible with `block_N`.
    - Updated the main function to include a `tune` option for performance tuning.
    - Introduced a new test file to validate both full attention and sliding window scenarios.
    - Adjusted FLOPS calculation to account for the sliding window configuration.
    
    * lint
    
    * [Fix] Add checkinf process to fix the bug of swa
    
    * Migrate to BSHD layout to align with triton baselines
    
    * lint
    
    * fix typo
    
    * Refactor MHA sink example to use seq_q and seq_kv parameters to accommodate the new sequence length parameters.
    
    * Add GQA sink example for optimized attention mechanism & lint fix
    
    * fix several typos and bugs
    
    * lint
    
    * fix speed issues of swa
    
    * Update examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py
    Co-authored-by: default avatarcoderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
    
    * Update examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py
    Co-authored-by: default avatarcoderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
    
    ---------
    Co-authored-by: default avatarcoderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
    d9a171ce
test_example_attention_sink.py 1.31 KB