• Daniël de Kok's avatar
    Unify attention output handling (#2343) · 47447ef0
    Daniël de Kok authored
    - Always return the hidden states.
    - Create the output tensor inside the `attention` and `paged_attention`
      functions.
    
    This removes the difference between how the output is handled between
    attention (output parameter) and paged attention (return value). This
    also removes the assumption that the attention implementation can
    write to an output tensor (in preparation of FlashInfer).
    47447ef0
rocm.py 7.09 KB