"...git@developer.sourcefind.cn:wangsen/paddle_dbnet.git" did not exist on "112ad00d150ea57808845f2cca775e6717271535"
Commit a84d0728 authored by Tri Dao's avatar Tri Dao
Browse files

[Docs] Mention FasterTransformer integration

parent 4a6eaa9f
...@@ -50,6 +50,11 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June ...@@ -50,6 +50,11 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
uses FlashAttention as part of their approach to speed up Transformer uses FlashAttention as part of their approach to speed up Transformer
inference (up to 5.3x on BERT). inference (up to 5.3x on BERT).
- Nvidia's [FasterTransformer](https://github.com/NVIDIA/FasterTransformer) is a
state-of-the-art Transformer inference library. As of version
[5.2](https://github.com/NVIDIA/FasterTransformer/commit/b672f49e256ba7a2d4fc9691d270b60b7fc1a2ff),
FlashAttention is used as a component of FasterTransformer to speed up GPT inference.
- [Kernl](https://github.com/ELS-RD/kernl) is a library for fast Transformer - [Kernl](https://github.com/ELS-RD/kernl) is a library for fast Transformer
inference. They use FlashAttention as part of their inference. They use FlashAttention as part of their
[approach](https://twitter.com/pommedeterre33/status/1585284221014245377) to [approach](https://twitter.com/pommedeterre33/status/1585284221014245377) to
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment