1. 01 Jul, 2024 1 commit
    • JDKWangGuan's avatar
      Fix KeyError handling for non-existing key in state_dict.pop() (#898) · 0d810cfb
      JDKWangGuan authored
      Update handling for KeyError in state_dict.pop() for non-existing keys.
      Changed state_dict.pop(f"h.{d}.attn.bias") to state_dict.pop(f"h.{d}.attn.bias", None) to prevent KeyError exceptions.
      
      
      The following code can re-produce the issue
      ```
      from transformers import AutoTokenizer, GPT2Model, GPT2Config
      from flash_attn.models.gpt import GPTLMHeadModel, GPTModel
      
      # >>> transformers.__version__
      # '4.38.2'
      
      model_path = 'gpt2'
      output_model_path = 'gpt2_model'
      config = GPT2Config.from_pretrained(model_path, output_hidden_states=True)
      model = GPT2Model.from_pretrained(model_path, from_tf=False, config=config)
      '''
      model fine-tuning here
      '''
      # dump the fine-tuned model
      model.save_pretrained(output_model_path)
      
      # load the fine-tuned model
      config = GPT2Config.from_pretrained(output_model_path, output_hidden_states=True)
      model = GPTModel.from_pretrained(output_model_path, config=config, strict=True)  # failed due to KeyError: 'h.0.attn.bias'
      model = GPTLMHeadModel.from_pretrained(output_model_path, config=config, strict=True)  # failed due to KeyError: 'h.0.attn.bias'
      
      ```
      0d810cfb
  2. 31 Jan, 2024 1 commit
  3. 05 Jan, 2024 1 commit
  4. 25 Dec, 2023 4 commits
  5. 23 Dec, 2023 1 commit
  6. 22 Dec, 2023 1 commit
  7. 20 Dec, 2023 1 commit
  8. 21 Sep, 2023 2 commits
  9. 20 Sep, 2023 1 commit
  10. 13 Sep, 2023 1 commit
  11. 11 Sep, 2023 1 commit
  12. 09 Sep, 2023 1 commit
  13. 04 Sep, 2023 2 commits
  14. 03 Sep, 2023 1 commit
  15. 30 Aug, 2023 2 commits
  16. 27 Aug, 2023 1 commit
  17. 24 Aug, 2023 1 commit
  18. 21 Aug, 2023 1 commit
  19. 20 Aug, 2023 1 commit
  20. 19 Aug, 2023 1 commit
  21. 18 Aug, 2023 4 commits
  22. 17 Aug, 2023 1 commit
  23. 15 Aug, 2023 1 commit
    • Xuechen Li's avatar
      enable loading hf llama checkpoints for training (#446) · 0f7853c6
      Xuechen Li authored
      * prelim.
      
      * add hf convertion fn.
      
      * mlp.
      
      * change name.
      
      * fix bug.
      
      * inverse permute.
      
      * change comment.
      
      * revert style changes.
      
      * fix.
      
      * add doc.
      
      * revert.
      
      * enable load safe.
      
      * fix safe load.
      
      * fix import.
      
      * fix typing-related lints.
      
      * fix ckpt loading logic.
      
      * make single gpu work.
      
      * test with parallel.
      
      * ckpt format.
      
      * enable pretrained state dict.
      
      * remove unused imports.
      
      * remove unused.
      
      * mark idea related.
      0f7853c6
  24. 29 Jul, 2023 1 commit
  25. 26 Jul, 2023 1 commit
  26. 23 Jul, 2023 5 commits
  27. 30 May, 2023 1 commit