• Sanghun Cho's avatar
    Support alibi, by Sanghun Cho from Kakao Brain · e4f726fc
    Sanghun Cho authored
    
    
    * hard-code alibi in fwd
    
    * use params.h as hun_heads
    
    * hard-code alibi in bwd
    
    * add alibi on/off option
    
    * compute alibi_start, ratio outside of kernels
    
    * fix minor merge conflict
    
    * add test_alibi.py
    
    * change apply_alibi() location before masking
    
    * add alibi in splitkv kernel
    
    * fix backward func # of returns
    
    * add out-of-bound check in apply_alibi()
    
    * update test_alibi.py
    
    * update test_alibi.py for kvcache
    
    * simplify alibi parameter interface
    
    * fix performance issue
    by computing alibi outside of branch
    
    * update test_flash_attn_varlen_func() for left padding
    
    * implement alibi_slopes (b, nh) loading
    
    * optimize apply_alibi() a bit
    
    * update test cases for alibi_slopes loading
    
    * reflect stylistic comments
    
    * disable "seqlenq_ngroups_swapped" when using alibi
    
    ---------
    Co-authored-by: default avatarmonk.detective <monk.detective@kakaobrain.com>
    e4f726fc
flash_api.cpp 65.2 KB