• Lei Wang's avatar
    [Enhancement] Improve flashattn function in example_gqa_decode.py (#329) · 32060ecd
    Lei Wang authored
    - Added a manual seed for reproducibility in PyTorch.
    - Refactored local variable allocations for better memory management.
    - Enhanced parallel processing in the flashattn function to improve performance.
    - Updated layout annotations for clarity and efficiency.
    
    These changes optimize the flash attention mechanism and ensure consistent behavior across runs.
    32060ecd
example_gqa_decode.py 21.9 KB