examples/flash_decoding/example_gqa_decode.py · 32060ecda2ea5cfb17c6bf74788ec51bda3ff6e0 · OpenDAS / tilelang

[Enhancement] Improve flashattn function in example_gqa_decode.py (#329) · 32060ecd

Lei Wang authored Apr 04, 2025

- Added a manual seed for reproducibility in PyTorch.
- Refactored local variable allocations for better memory management.
- Enhanced parallel processing in the flashattn function to improve performance.
- Updated layout annotations for clarity and efficiency.

These changes optimize the flash attention mechanism and ensure consistent behavior across runs.

32060ecd

example_gqa_decode.py 21.9 KB

Replace example_gqa_decode.py