README.md 865 Bytes
Newer Older
1
This CUDA extension implements fused dropout + residual + LayerNorm, building on
2
Apex's [FastLayerNorm](https://github.com/NVIDIA/apex/tree/master/apex/contrib/layer_norm).
3
4
5
6
7
8
Major changes:
- Add dropout and residual.
- Make it work for both pre-norm and post-norm architecture.
- Support more hidden dimensions (all dimensions divisible by 8, up to 8192).
- Implement RMSNorm as an option.
- Support layer norm with parallel residual (e.g., GPT-J, GPT-NeoX, PaLM).
9

10
If you want to use it for dimensions larger than 8k, please file an issue.
Tri Dao's avatar
Tri Dao committed
11

12
This extension has only been tested on A100s.
13

14
15
16
```sh
cd csrc/layer_norm && pip install .
```
17
18
19
20

As of 2024-01-05, this extension is no longer used in the FlashAttention repo.
We've instead switched to a Triton-based
[implementation](https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/ops/triton/layer_norm.py).