Cherry-picked the commit from upstream for faster --fast_multihead_attn build (#76)
* Faster `--fast_multihead_attn` build (#1245)
* merge .so files
* odr
* fix build
* update import
* apply psf/black with max line length of 120
* update
* fix
* update
* build fixed again but undefined symbol again
* fix 2, still layer norm grad is undefined
* remove unused cpp files
* without layer_norm.cuh, import works
* import fast_multihead_attn works...
but why? Was unnecessary `#include "layer_norm.cuh"` was the culprit
causing .shared objects not to be able to link `HostApplyLayerNorm` and
`HostLayerNormGradient`?
* clean up layer norm
* Fix some bugs
Co-authored-by:
Masaki Kozuki <mkozuki@nvidia.com>
Showing
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment