"src/targets/gpu/vscode:/vscode.git/clone" did not exist on "24a6fec2d45a835d4ea85eb0486ada208503b490"
  1. 23 Sep, 2025 1 commit
    • Tong WU's avatar
      [Example] Add examples to support efficient attention sink forward process (#853) · d9a171ce
      Tong WU authored
      
      
      * [Example] Add a new example to support attention sink for MHA
      
      - Introduced a new example script for multi-head attention (MHA) with sliding window attention and sink tokens.
      - Added a reference attention function to validate the implementation against PyTorch.
      - Included argument parsing for command-line execution of the example.
      
      * [Example] Replace MHA sink forward example with updated implementation
      
      - Removed the old example script for multi-head attention (MHA) with sliding window attention and sink tokens.
      - Introduced a new example script that modifies the attention mechanism to enhance performance and maintainability.
      - Updated argument parsing and reference functions to align with the new implementation.
      
      * Enhance MHA sink example with sliding window support
      
      - Added a `window_size` parameter to the `flashattn` function to enable sliding window attention.
      - Implemented assertions to ensure `window_size` is compatible with `block_N`.
      - Updated the main function to include a `tune` option for performance tuning.
      - Introduced a new test file to validate both full attention and sliding window scenarios.
      - Adjusted FLOPS calculation to account for the sliding window configuration.
      
      * lint
      
      * [Fix] Add checkinf process to fix the bug of swa
      
      * Migrate to BSHD layout to align with triton baselines
      
      * lint
      
      * fix typo
      
      * Refactor MHA sink example to use seq_q and seq_kv parameters to accommodate the new sequence length parameters.
      
      * Add GQA sink example for optimized attention mechanism & lint fix
      
      * fix several typos and bugs
      
      * lint
      
      * fix speed issues of swa
      
      * Update examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py
      Co-authored-by: default avatarcoderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
      
      * Update examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py
      Co-authored-by: default avatarcoderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarcoderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
      d9a171ce