• Hubert Lu's avatar
    Cherry-picked the commit from upstream for faster --fast_multihead_attn build (#76) · 29b36315
    Hubert Lu authored
    
    
    * Faster `--fast_multihead_attn` build (#1245)
    
    * merge .so files
    
    * odr
    
    * fix build
    
    * update import
    
    * apply psf/black with max line length of 120
    
    * update
    
    * fix
    
    * update
    
    * build fixed again but undefined symbol again
    
    * fix 2, still layer norm grad is undefined
    
    * remove unused cpp files
    
    * without layer_norm.cuh, import works
    
    * import fast_multihead_attn works...
    
    but why? Was unnecessary `#include "layer_norm.cuh"` was the culprit
    causing .shared objects not to be able to link `HostApplyLayerNorm` and
    `HostLayerNormGradient`?
    
    * clean up layer norm
    
    * Fix some bugs
    Co-authored-by: default avatarMasaki Kozuki <mkozuki@nvidia.com>
    29b36315
setup.py 33.6 KB