README.md 528 Bytes
Newer Older
1
This CUDA extension implements fused dropout + residual + LayerNorm, building on
2
3
Apex's [FastLayerNorm](https://github.com/NVIDIA/apex/tree/master/apex/contrib/layer_norm).
We add dropout and residual, and make it work for both pre-norm and post-norm architecture.
4
We also make it work for more hidden dimensions (all dimensions divisible by 8, up to 6144).
5

6
If you want to use it for dimensions larger than 6k, please file an issue.
Tri Dao's avatar
Tri Dao committed
7

8
This extension has only been tested on A100s.
9

10
11
12
```sh
cd csrc/layer_norm && pip install .
```