1. 03 Jul, 2024 1 commit
  2. 01 Jul, 2024 5 commits
    • 66RING's avatar
      Fix typos of comments about shape. (#837) · 9486635c
      66RING authored
      9486635c
    • JDKWangGuan's avatar
      Fix KeyError handling for non-existing key in state_dict.pop() (#898) · 0d810cfb
      JDKWangGuan authored
      Update handling for KeyError in state_dict.pop() for non-existing keys.
      Changed state_dict.pop(f"h.{d}.attn.bias") to state_dict.pop(f"h.{d}.attn.bias", None) to prevent KeyError exceptions.
      
      
      The following code can re-produce the issue
      ```
      from transformers import AutoTokenizer, GPT2Model, GPT2Config
      from flash_attn.models.gpt import GPTLMHeadModel, GPTModel
      
      # >>> transformers.__version__
      # '4.38.2'
      
      model_path = 'gpt2'
      output_model_path = 'gpt2_model'
      config = GPT2Config.from_pretrained(model_path, output_hidden_states=True)
      model = GPT2Model.from_pretrained(model_path, from_tf=False, config=config)
      '''
      model fine-tuning here
      '''
      # dump the fine-tuned model
      model.save_pretrained(output_model_path)
      
      # load the fine-tuned model
      config = GPT2Config.from_pretrained(output_model_path, output_hidden_states=True)
      model = GPTModel.from_pretrained(output_model_path, config=config, strict=True)  # failed due to KeyError: 'h.0.attn.bias'
      model = GPTLMHeadModel.from_pretrained(output_model_path, config=config, strict=True)  # failed due to KeyError: 'h.0.attn.bias'
      
      ```
      0d810cfb
    • cao lei's avatar
      fix typo (#974) · 6a2a16e9
      cao lei authored
      6a2a16e9
    • Nicolas Patry's avatar
      Fixing argument checking when using `seqlenq_ngroups_swapped`. (#976) · 5bf20196
      Nicolas Patry authored
      When user send `out` as a parameter of the function
      `seqlenq_ngroups_swapped` with parameters that trigger,
      the CHECK_SHAPE is incorrect (since q shape is modified.)
      5bf20196
    • Liang's avatar
  3. 27 Jun, 2024 1 commit
  4. 26 May, 2024 7 commits
  5. 23 May, 2024 1 commit
  6. 06 May, 2024 1 commit
  7. 26 Apr, 2024 3 commits
  8. 08 Apr, 2024 4 commits
  9. 05 Apr, 2024 1 commit
  10. 28 Mar, 2024 2 commits
  11. 19 Mar, 2024 1 commit
  12. 15 Mar, 2024 3 commits
  13. 14 Mar, 2024 2 commits
  14. 02 Mar, 2024 2 commits
  15. 21 Feb, 2024 4 commits
  16. 20 Feb, 2024 1 commit
  17. 18 Feb, 2024 1 commit
    • Qubitium's avatar
      Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads... · f45bbb4c
      Qubitium authored
      Optimize compile to 1: avoid oom 2: minimize swap usage 3: avoid threads starvation when letting ninja decide how many workers to spawn or manual MAX_JOBS "guesses". Logic is to take the min value of MAX_JOBS auto-calculated by two metrics: 1: cpu cores 2: free memory. This should allow flash-attn to compile close to the most efficient manner under any consumer/server env. (#832)
      
      f45bbb4c