- 10 Jul, 2024 2 commits
- 09 Jul, 2024 1 commit
-
-
Phil Wang authored
* missing commas * another fix
-
- 08 Jul, 2024 2 commits
-
-
Nicolas Patry authored
* Softcap v2 (fwd only). * Some missing interface + remove overrides in tests.
-
Jianwei Dong authored
Add the return_softmax_lse parameter to the flash_attn_with_kvcache function to allow returning the logsumexp of the attention scores. (#989)
-
- 01 Jul, 2024 1 commit
-
-
JDKWangGuan authored
Update handling for KeyError in state_dict.pop() for non-existing keys. Changed state_dict.pop(f"h.{d}.attn.bias") to state_dict.pop(f"h.{d}.attn.bias", None) to prevent KeyError exceptions. The following code can re-produce the issue ``` from transformers import AutoTokenizer, GPT2Model, GPT2Config from flash_attn.models.gpt import GPTLMHeadModel, GPTModel # >>> transformers.__version__ # '4.38.2' model_path = 'gpt2' output_model_path = 'gpt2_model' config = GPT2Config.from_pretrained(model_path, output_hidden_states=True) model = GPT2Model.from_pretrained(model_path, from_tf=False, config=config) ''' model fine-tuning here ''' # dump the fine-tuned model model.save_pretrained(output_model_path) # load the fine-tuned model config = GPT2Config.from_pretrained(output_model_path, output_hidden_states=True) model = GPTModel.from_pretrained(output_model_path, config=config, strict=True) # failed due to KeyError: 'h.0.attn.bias' model = GPTLMHeadModel.from_pretrained(output_model_path, config=config, strict=True) # failed due to KeyError: 'h.0.attn.bias' ```
-
- 27 Jun, 2024 1 commit
-
-
Grigory Sizov authored
* Support unpadded LSE layout. Co-authored-by:
Xinfeng Xie <xfxie.ceca@gmail.com> Co-authored-by:
Jianyu Huang <hjyahead@gmail.com> * Cleanup * Fix unpadded LSE on split-kv path * Fix formatting and comments * Fix inline vs forceinline --------- Co-authored-by:
Xinfeng Xie <xfxie.ceca@gmail.com> Co-authored-by:
Jianyu Huang <hjyahead@gmail.com>
-
- 26 May, 2024 3 commits
- 23 May, 2024 1 commit
-
-
lancerts authored
-
- 26 Apr, 2024 2 commits
- 08 Apr, 2024 1 commit
-
-
Tri Dao authored
-
- 05 Apr, 2024 1 commit
-
-
Ivan Komarov authored
All integer parameters are specialized by default, so the two parameters removed in this commit could lead to kernel re-compilation, even if they were completely unused.
-
- 19 Mar, 2024 1 commit
-
-
Tri Dao authored
-
- 15 Mar, 2024 2 commits
-
-
Markus Krimmel authored
-
Grigory Sizov authored
* Enable paged attention in varlen forward * Format + fix padding
-
- 02 Mar, 2024 1 commit
-
-
Tri Dao authored
-
- 21 Feb, 2024 2 commits
- 10 Feb, 2024 2 commits
- 31 Jan, 2024 2 commits
- 30 Jan, 2024 2 commits
- 27 Jan, 2024 1 commit
-
-
Avelina9X authored
* Updated docstrings of bert_padding.py Added docstrings for missing arguments in the unpad and pad methods. * Update bert_padding.py Fixed spelling mistakes
-
- 23 Jan, 2024 4 commits
-
-
Tao He authored
Signed-off-by:Tao He <sighingnow@gmail.com>
-
Tri Dao authored
-
Tri Dao authored
Co-authored-by:ljss <450993438@qq.com>
-
Tri Dao authored
-
- 22 Jan, 2024 2 commits
- 21 Jan, 2024 1 commit
-
-
Curtis "Fjord" Hawthorne authored
-
- 13 Jan, 2024 1 commit
-
-
Tri Dao authored
-
- 10 Jan, 2024 1 commit
-
-
Tri Dao authored
-
- 05 Jan, 2024 3 commits