• Anton Vlasjuk's avatar
    [`GPT-NeoX`] Add SDPA support (#31031) · b07770c5
    Anton Vlasjuk authored
    * starting support for sdpa in `gptneox` models
    
    * small comment on tests
    
    * fix dropout
    
    * documentation and style
    
    * clarify concrete paths for reference
    
    * generalise attn projections and rope application
    
    added head mask check to sdpa mask creation
    
    handle sdpa memory backend bug via own version flag
    
    * update docs and style
    
    * move dtype casting outside of general attn_projection_and_rope function
    
    fix flash_attn_2 stuff
    
    * more generic attn warning if output_attns or head_mask
    
    * simplify head mask check by moving head mask creation to a later point
    
    * remove copied llama artifact
    
    * remove padding_mask from attention function signature
    
    * removing unnecessary comments, only "save" attn implementation once
    
    * [run_slow] gpt_neox
    b07770c5
gpt_neox.md 12.2 KB