README.md 568 Bytes
Newer Older
1
This CUDA extension implements fused dropout + residual + LayerNorm, building on
2
3
Apex's [FastLayerNorm](https://github.com/NVIDIA/apex/tree/master/apex/contrib/layer_norm).
We add dropout and residual, and make it work for both pre-norm and post-norm architecture.
4
We also make it work for more hidden dimensions (all dimensions divisible by 8, up to 6144).
Tri Dao's avatar
Tri Dao committed
5
We also implement RMSNorm as an option.
6

7
If you want to use it for dimensions larger than 6k, please file an issue.
Tri Dao's avatar
Tri Dao committed
8

9
This extension has only been tested on A100s.
10

11
12
13
```sh
cd csrc/layer_norm && pip install .
```