This is a Triton implementation of the Flash Attention algorithm
(see: Dao et al., https://arxiv.org/pdf/2205.14135v2.pdf; Rabe and Staats https://arxiv.org/pdf/2112.05682v2.pdf; Triton https://github.com/openai/triton)
"""
importtorch
importsubprocess
importos
try:
importtriton
importtriton.languageastl
exceptImportError:
raiseImportError('please install triton from https://github.com/openai/triton')