Cherry-picked the commit from upstream for faster --fast_multihead_attn build (#76)
* Faster `--fast_multihead_attn` build (#1245)
* merge .so files
* odr
* fix build
* update import
* apply psf/black with max line length of 120
* update
* fix
* update
* build fixed again but undefined symbol again
* fix 2, still layer norm grad is undefined
* remove unused cpp files
* without layer_norm.cuh, import works
* import fast_multihead_attn works...
but why? Was unnecessary `#include "layer_norm.cuh"` was the culprit
causing .shared objects not to be able to link `HostApplyLayerNorm` and
`HostLayerNormGradient`?
* clean up layer norm
* Fix some bugs
Co-authored-by:
Masaki Kozuki <mkozuki@nvidia.com>
Showing
Please register or sign in to comment