"vscode:/vscode.git/clone" did not exist on "c65dad3dfaa88190a3b0a06862865d302b585718"
  1. 28 Feb, 2025 1 commit
    • Yu Cheng's avatar
      [Dev][Bugfix] Fix bug in ThreadTagChecker; Add WgmmaSync rewriter and add MHA... · 0d873fcf
      Yu Cheng authored
      [Dev][Bugfix] Fix bug in ThreadTagChecker; Add WgmmaSync rewriter and add MHA WGMMA pipelined example (#128)
      
      * [Dev] Add RetNet Linear Attention example
      
      * [Dev] Add WgmmaSync rewriter for pipelined WGMMA operations and add MHA WGMMA pipelined example (FA3-like scheduling)
      
      This commit introduces a new transformation pass `RewriteWgmmaSync` to optimize warp group matrix multiply accumulate (WGMMA) operations in the TileLang compiler:
      
      - Implemented `WgmmaSyncRewriter` in `src/transform/wgmma_sync_rewriter.cc`
      - Added pass registration for `RewriteWgmmaSync`
      - Updated `tilelang/engine/phase.py` to include the new transformation pass
      - Updated `tilelang/transform/__init__.py` to expose the new pass
      
      The rewriter intelligently manages synchronization and dependencies between WGMMA operations, improving pipeline efficiency for complex matrix multiplication kernels.
      
      * [Bugfix] Fix bug in ThreadTagChecker for warp specialization
      
      Improve thread tag validation in warp specialized rewriter to prevent unintended transformations:
      - Add more precise checks for threadIdx.y and threadIdx.z
      - Validate thread extent to ensure only single-extent thread bindings are allowed
      - Prevent warp specialization for multi-extent thread bindings in y and z dimensions
      
      * lint
      
      * [CI] Add TMA descriptor attribute to transformed module in test case
      0d873fcf