FlashAttention is [integrated](https://github.com/microsoft/DeepSpeed/blob/ec13da6ba7cabc44bb4745a64a208b8580792954/deepspeed/ops/transformer/inference/triton_ops.py) into DeepSpeed's inference engine.
- Nvidia's [Megatron-LM](https://github.com/NVIDIA/Megatron-LM/pull/267). This
library is a popular framework on training large transformer language models at scale.