csrc/flash_attn/flash_api.cpp · e4f726fc446b80d82ed82d10ed251df7f483321c · gaoqiong / flash-attention

Support alibi, by Sanghun Cho from Kakao Brain · e4f726fc

Sanghun Cho authored Dec 20, 2023



* hard-code alibi in fwd

* use params.h as hun_heads

* hard-code alibi in bwd

* add alibi on/off option

* compute alibi_start, ratio outside of kernels

* fix minor merge conflict

* add test_alibi.py

* change apply_alibi() location before masking

* add alibi in splitkv kernel

* fix backward func # of returns

* add out-of-bound check in apply_alibi()

* update test_alibi.py

* update test_alibi.py for kvcache

* simplify alibi parameter interface

* fix performance issue
by computing alibi outside of branch

* update test_flash_attn_varlen_func() for left padding

* implement alibi_slopes (b, nh) loading

* optimize apply_alibi() a bit

* update test cases for alibi_slopes loading

* reflect stylistic comments

* disable "seqlenq_ngroups_swapped" when using alibi

---------
Co-authored-by: monk.detective <monk.detective@kakaobrain.com>

e4f726fc

flash_api.cpp 65.2 KB

Replace flash_api.cpp