Commits · ccbb14f38ee1e51a6f65bccbef9e4765acba6d79 · gaoqiong / flash-attention

16 Sep, 2023 1 commit
- Implement rotary embedding in flash_attn_with_kvcache · ccbb14f3
  Tri Dao authored Sep 16, 2023
  
  ccbb14f3
23 Jul, 2023 1 commit
- [FT] Implement MQA/GQA · a157cc8c
  Tri Dao authored Jul 22, 2023
  
  a157cc8c
06 Jul, 2023 1 commit
- [FT] rotary_cos/sin should have batch_size dimension · 2800efc7
  Tri Dao authored Jul 06, 2023
  
  2800efc7
03 Jul, 2023 1 commit
- [FT] rotary_cos/sin should have shape (dim) instead of (seqlen, dim) · 3a9bfd07
  Tri Dao authored Jul 03, 2023
  
  3a9bfd07
02 Jul, 2023 1 commit
- [Rotary] Make sure frequency calculation is in fp32 · 62e98144
  Tri Dao authored Jul 02, 2023
  
  62e98144
30 May, 2023 1 commit
- [Gen] Add rotary base as an argument to FT attention kernel · 48bc6eac
  Tri Dao authored May 30, 2023
  
  48bc6eac
15 Jan, 2023 2 commits
- [Gen] Pass qkv_stride to ft_attention kernel for batched generation · f1e01c27
  Tri Dao authored Jan 15, 2023
  
  f1e01c27
- [Gen] Make generation work with Tensor Parallel · 7c219154
  Tri Dao authored Jan 15, 2023
  
  7c219154
04 Jan, 2023 1 commit
- [Gen] Add kernel from FasterTransformer for benchmarking · a01d1213
  Tri Dao authored Jan 03, 2023
  
  a01d1213