Add tests for numerical error

2ed471ec · Tri Dao · 42f54d88 · 2ed471ec · 2ed471ec
Commit 2ed471ec authored Jul 22, 2022 by Tri Dao
Expand all Hide whitespace changes
Inline Side-by-side

Showing with 678 additions and 0 deletions

README.md README.md +11 -0

tests/test_flash_attn.py tests/test_flash_attn.py +667 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -116,6 +116,17 @@ T4 GPUs are commonly used for inference, so we also measure speedup on the forwa

 We see speedups between 2.5x-4.5x on the forward pass.

+## Tests
+We test that FlashAttention produces the same output and gradient as a reference
+implementation, up to some numerical tolerance. In particular, we check that the
+maximum numerical error of FlashAttention is at most twice the numerical error
+of a baseline implementation in Pytorch (for different head dimensions, input
+dtype, sequence length, causal / non-causal).
+
+To run the tests:
+```
+pytest -q -s tests/test_flash_attn.py
+```
 ## When you encounter issues

 This alpha release of FlashAttention contains code written for a research

--- a/tests/test_flash_attn.py
+++ b/tests/test_flash_attn.py