- 30 Jul, 2024 1 commit
-
-
jayhshah authored
* base version * restructure pipelines, add special fp8 epilogue * add variants * add fp8 causal and modify dynamic tile scheduler * better causal schedule * maintain two schedules for non causal and causal * removing macros * fix regression * clean up unneeded methods and variants * fix mistake with NumProducerThreads * base version * restructure pipelines, add special fp8 epilogue * add variants * add fp8 causal and modify dynamic tile scheduler * better causal schedule * maintain two schedules for non causal and causal * removing macros * fix regression * clean up unneeded methods and variants * fix mistake with NumProducerThreads * use seqlen traits * add fp8 .cu files and benchmark script * fix merge issue * fix merge issue * fix merge issue * remove duplicate code * fix regression with varseqlen * move varseqlen init in constexpr * fix test script * more constexpr on varseqlen and add max offset * add back test cases
-
- 11 Jul, 2024 1 commit
-
-
Tri Dao authored
-
- 28 Mar, 2024 1 commit
-
-
Driss Guessous authored
-
- 20 Feb, 2024 1 commit
-
-
Tri Dao authored
-
- 21 Jan, 2024 1 commit
-
-
Tri Dao authored
-
- 20 Jan, 2024 1 commit
-
-
Tri Dao authored
-
- 14 Jan, 2024 4 commits
- 22 Dec, 2023 1 commit
-
-
Tri Dao authored
-
- 26 Sep, 2023 1 commit
-
-
Tri Dao authored
Co-authored-by:Timothee Lacroix <t@mistral.ai>
-
- 12 Sep, 2023 1 commit
-
-
Tri Dao authored
-
- 25 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 24 Aug, 2023 1 commit
-
-
BoxiangW authored
Support flash attention 2 with causal masking when KV's seq length is longer than Q's seq length. (#436)
-
- 01 Aug, 2023 1 commit
-
-
Tri Dao authored
-
- 17 Jul, 2023 1 commit
-
-
Tri Dao authored
-