Commit ece8f05d authored by Tri Dao's avatar Tri Dao
Browse files

[Docs] Mention PubMedGPT

parent 04c4c610
...@@ -45,6 +45,11 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June ...@@ -45,6 +45,11 @@ yields the fastest BERT training on cloud instances in MLPerf training 2.0 (June
## Language model training & inference ## Language model training & inference
- [PubMedGPT 2.7B](https://crfm.stanford.edu/2022/12/15/pubmedgpt.html), a
domain-specific LLM for biomedicine, by Stanford CRFM, trained on
[MosaicML](https://www.mosaicml.com/blog/introducing-pubmed-gpt) Cloud. Just
using FlashAttention nearly halves the total training time.
- Meta's - Meta's
[AITemplate](https://ai.facebook.com/blog/gpu-inference-engine-nvidia-amd-open-source/) [AITemplate](https://ai.facebook.com/blog/gpu-inference-engine-nvidia-amd-open-source/)
uses FlashAttention as part of their approach to speed up Transformer uses FlashAttention as part of their approach to speed up Transformer
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment