• Kevin Stephano's avatar
    Add Fast Multihead Attention to APEX Contrib (#697) · 3f94528e
    Kevin Stephano authored
    * Adding C++ Multihead Attention implementation to contrib.
    
    * Add reference test that at least works for forward.
    
    * Remove CublasLt support from multihead attention.
    
    * Add new Python version of self attention.
    
    * Update python model of MHA with backward pass.
    
    * Fixed Output Linear connection in MHA.
    
    * Clean up compiles and add documentation to PySelfAttention.
    
    * Add Encdec Python version of multihead attention.  Cleanup files.
    
    * Tests for self and encdec multihead attention.
    
    * Add reference pytorch implementation of attention with norm and add.
    
    * Add cutlass branch definition.
    
    * Add cutlass download to compile.
    
    * Add norm/add tests.
    
    * Add biases to pytorch python versions.
    
    * Add tests and fix issues with python version of attention masking.
    
    * Create README.md
    
    * Update README.md
    
    * Update README.md
    
    * Update perf test parameters.
    
    * Update README.md
    
    * Update README.md
    
    * Update README.md
    
    * Add f...
    3f94528e
setup.py 15.8 KB