[Docs] Mention FasterTransformer integration

a84d0728 · Tri Dao · 4a6eaa9f · a84d0728
Commit a84d0728 authored Dec 05, 2022 by Tri Dao
Show whitespace changes
Inline Side-by-side

Showing with 5 additions and 0 deletions

usage.md usage.md +5 -0

No files found.
--- a/usage.md
+++ b/usage.md
@@ -50,6 +50,11 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
  uses FlashAttention as part of their approach to speed up Transformer
  inference (up to 5.3x on BERT).

+- Nvidia's [FasterTransformer](https://github.com/NVIDIA/FasterTransformer) is a
+  state-of-the-art Transformer inference library. As of version
+  [5.2](https://github.com/NVIDIA/FasterTransformer/commit/b672f49e256ba7a2d4fc9691d270b60b7fc1a2ff),
+  FlashAttention is used as a component of FasterTransformer to speed up GPT inference.
+
 - [Kernl](https://github.com/ELS-RD/kernl) is a library for fast Transformer
  inference. They use FlashAttention as part of their
  [approach](https://twitter.com/pommedeterre33/status/1585284221014245377) to