"...git@developer.sourcefind.cn:yangql/composable_kernel.git" did not exist on "cd51732690641ae0ac76f90641246214f4a95bf9"
[Enhancement] Improve flashattn function in example_gqa_decode.py (#329)
- Added a manual seed for reproducibility in PyTorch. - Refactored local variable allocations for better memory management. - Enhanced parallel processing in the flashattn function to improve performance. - Updated layout annotations for clarity and efficiency. These changes optimize the flash attention mechanism and ensure consistent behavior across runs.
Showing
Please register or sign in to comment