flash_attn/utils/generation.py · ba2fe7f378c938263e8b5eeeac0fb2766c754551 · gaoqiong / flash-attention · GitLab

Find file Blame History Permalink

[Gen] Move allocate_inference_cache to within the model · ba2fe7f3
Tri Dao authored Apr 20, 2023

ba2fe7f3

generation.py 13.6 KB