1. 18 May, 2023 1 commit
  2. 13 Feb, 2022 1 commit
  3. 05 Dec, 2021 1 commit
  4. 24 Nov, 2021 2 commits
  5. 05 Nov, 2021 1 commit
  6. 11 Oct, 2021 1 commit
  7. 10 Jun, 2021 1 commit
  8. 22 May, 2021 1 commit
  9. 11 May, 2021 1 commit
    • Leo Gao's avatar
      Overhaul command flags a bit · 5f42f976
      Leo Gao authored
      model_args should only be things that affect output of the model
      therefore, stuff like batch size, device, etc shouldn't be in there
      5f42f976
  10. 06 May, 2021 1 commit
  11. 05 May, 2021 1 commit
  12. 03 May, 2021 2 commits
  13. 15 Apr, 2021 1 commit
  14. 11 Apr, 2021 2 commits
  15. 05 Apr, 2021 1 commit
  16. 27 Mar, 2021 1 commit
  17. 26 Mar, 2021 1 commit
  18. 21 Feb, 2021 2 commits
  19. 19 Feb, 2021 1 commit
  20. 11 Feb, 2021 1 commit
  21. 08 Feb, 2021 1 commit
  22. 05 Feb, 2021 2 commits
  23. 04 Feb, 2021 4 commits
  24. 28 Jan, 2021 1 commit
  25. 05 Jan, 2021 1 commit
  26. 30 Nov, 2020 2 commits
    • Leo Gao's avatar
      Remove num_tokens · e3031e84
      Leo Gao authored
      e3031e84
    • Leo Gao's avatar
      Refactor to remove generate and fix some bad tokenization · 90e50b4c
      Leo Gao authored
      In particular, the following assumptions are FALSE in general:
      tokenize(context + continuation) = tokenize(context) + tokenize(continuation)
      len(tokenize(context + continuation)) = len(tokenize(context)) + len(tokenize(continuation))
      tokenize(context + continuation)[:len(tokenize(context))] = tokenize(context)
      
      So we need to tip-toe around the problem by being careful with how we do it.
      
      In particular, using Fast is not just for performance; while behavour of GPT2Tokenizer differs across Transformers 2 and 3, GPT2TokenizerFast doesn't.
      90e50b4c
  27. 04 Oct, 2020 1 commit
  28. 14 Sep, 2020 1 commit
  29. 07 Sep, 2020 3 commits