The CUDA implementations of the LayerNorm and Softmax are modified from [OneFlow](https://github.com/Oneflow-Inc/oneflow). Thanks to OneFlow for the high performance CUDA implementation, we mainly add support of Bfloat16 precision.
## Cite us
## Cite us
Cite this paper, if you use FastFold in your research publication.
Cite this paper, if you use FastFold in your research publication.