1. 13 Jul, 2023 2 commits
  2. 03 Jul, 2023 1 commit
  3. 30 Jun, 2023 2 commits
  4. 28 Jun, 2023 2 commits
  5. 23 Jun, 2023 1 commit
    • Matt's avatar
      Improved keras imports (#24448) · 8e164c54
      Matt authored
      * An end to accursed version-specific imports
      
      * No more K.is_keras_tensor() either
      
      * Update dependency tables
      
      * Use a cleaner call context function getter
      
      * Add a cap to <2.14
      
      * Add cap to examples requirements too
      8e164c54
  6. 14 Jun, 2023 1 commit
  7. 08 Jun, 2023 1 commit
  8. 07 Jun, 2023 2 commits
  9. 01 Jun, 2023 1 commit
  10. 31 May, 2023 2 commits
    • Zachary Mueller's avatar
      Upgrade safetensors version (#23911) · 55451c66
      Zachary Mueller authored
      * Upgrade safetensors
      
      * Second table
      55451c66
    • Sanchit Gandhi's avatar
      Unpin numba (#23162) · 8f915c45
      Sanchit Gandhi authored
      * fix for ragged list
      
      * unpin numba
      
      * make style
      
      * np.object -> object
      
      * propagate changes to tokenizer as well
      
      * np.long -> "long"
      
      * revert tokenization changes
      
      * check with tokenization changes
      
      * list/tuple logic
      
      * catch numpy
      
      * catch else case
      
      * clean up
      
      * up
      
      * better check
      
      * trigger ci
      
      * Empty commit to trigger CI
      8f915c45
  11. 23 May, 2023 1 commit
  12. 16 May, 2023 1 commit
  13. 12 May, 2023 1 commit
  14. 11 May, 2023 2 commits
  15. 10 May, 2023 1 commit
  16. 09 May, 2023 1 commit
  17. 08 May, 2023 1 commit
  18. 04 May, 2023 1 commit
  19. 03 May, 2023 1 commit
  20. 20 Apr, 2023 1 commit
  21. 18 Apr, 2023 1 commit
  22. 17 Apr, 2023 1 commit
  23. 13 Apr, 2023 1 commit
  24. 07 Apr, 2023 1 commit
  25. 06 Apr, 2023 1 commit
    • Nicolas Patry's avatar
      Adding Llama FastTokenizer support. (#22264) · 1670be4b
      Nicolas Patry authored
      * Adding Llama FastTokenizer support.
      
      - Requires https://github.com/huggingface/tokenizers/pull/1183 version
      - Only support byte_fallback for llama, raise otherwise (safety net).
      - Lots of questions are special tokens
      
      How to test:
      
      ```python
      
      from transformers.convert_slow_tokenizer import convert_slow_tokenizer
      from transformers import AutoTokenizer
      from tokenizers import Tokenizer
      
      tokenizer = AutoTokenizer.from_pretrained("huggingface/llama-7b")
      
      if False:
          new_tokenizer = Tokenizer.from_file("tok.json")
      else:
          new_tokenizer = convert_slow_tokenizer(tokenizer)
          new_tokenizer.save("tok.json")
      
      strings = [
          "This is a test",
          "生活的真谛是",
          "生活的真谛是[MASK]。",
          # XXX: This one is problematic because of special tokens
          # "<s> Something something",
      ]
      
      for string in strings:
          encoded = tokenizer(string)["input_ids"]
          encoded2 = new_tokenizer.encode(string).ids
      
          assert encoded == encoded2, f"{encoded} != {encoded2}"
      
          decoded = tokenizer.decode(encoded)
          decoded2 = new_tokenizer.decode(encoded2)
      
          assert decoded.strip() == decoded2, f"{repr(decoded)} != {repr(decoded2)}"
      ```
      
      The converter + some test script.
      
      The test script.
      
      Tmp save.
      
      Adding Fast tokenizer + tests.
      
      Adding the tokenization tests.
      
      Correct combination.
      
      Small fix.
      
      Fixing tests.
      
      Fixing with latest update.
      
      Rebased.
      
      fix copies + normalized added tokens  + copies.
      
      Adding doc.
      
      TMP.
      
      Doc + split files.
      
      Doc.
      
      Versions + try import.
      
      Fix Camembert + warnings -> Error.
      
      Fix by ArthurZucker.
      
      Not a decorator.
      
      * Fixing comments.
      
      * Adding more to docstring.
      
      * Doc rewriting.
      1670be4b
  26. 03 Apr, 2023 2 commits
  27. 29 Mar, 2023 2 commits
  28. 24 Mar, 2023 2 commits
  29. 22 Mar, 2023 1 commit
  30. 21 Mar, 2023 2 commits