-
Lei Wang authored
- Added a manual seed for reproducibility in PyTorch. - Refactored local variable allocations for better memory management. - Enhanced parallel processing in the flashattn function to improve performance. - Updated layout annotations for clarity and efficiency. These changes optimize the flash attention mechanism and ensure consistent behavior across runs.
32060ecd