• Tong WU's avatar
    [BugFix] Refactor attention kernel to handle OOB positions by filling with... · 0af3fd7c
    Tong WU authored
    [BugFix] Refactor attention kernel to handle OOB positions by filling with `-inf` instead of clearing accumulators. (#1222)
    
    * Refactor attention kernel to handle OOB positions by filling with `-inf` instead of clearing accumulators.
    
    * lint
    
    * pre-commit
    
    * Update imports in flash attention test file to use new backward and forward examples for better clarity and consistency.
    0af3fd7c
example_gqa_bwd_tma_reduce.py 25.5 KB