• Masaki Kozuki's avatar
    Faster `--fast_multihead_attn` build (#1245) · 7ec8ed67
    Masaki Kozuki authored
    * merge .so files
    
    * odr
    
    * fix build
    
    * update import
    
    * apply psf/black with max line length of 120
    
    * update
    
    * fix
    
    * update
    
    * build fixed again but undefined symbol again
    
    * fix 2, still layer norm grad is undefined
    
    * remove unused cpp files
    
    * without layer_norm.cuh, import works
    
    * import fast_multihead_attn works...
    
    but why? Was unnecessary `#include "layer_norm.cuh"` was the culprit
    causing .shared objects not to be able to link `HostApplyLayerNorm` and
    `HostLayerNormGradient`?
    
    * clean up layer norm
    7ec8ed67
setup.py 27.3 KB