1. 03 Jul, 2023 1 commit
  2. 30 Jun, 2023 2 commits
  3. 28 Jun, 2023 2 commits
  4. 23 Jun, 2023 1 commit
    • Matt's avatar
      Improved keras imports (#24448) · 8e164c54
      Matt authored
      * An end to accursed version-specific imports
      
      * No more K.is_keras_tensor() either
      
      * Update dependency tables
      
      * Use a cleaner call context function getter
      
      * Add a cap to <2.14
      
      * Add cap to examples requirements too
      8e164c54
  5. 14 Jun, 2023 1 commit
  6. 08 Jun, 2023 1 commit
  7. 07 Jun, 2023 2 commits
  8. 01 Jun, 2023 1 commit
  9. 31 May, 2023 2 commits
    • Zachary Mueller's avatar
      Upgrade safetensors version (#23911) · 55451c66
      Zachary Mueller authored
      * Upgrade safetensors
      
      * Second table
      55451c66
    • Sanchit Gandhi's avatar
      Unpin numba (#23162) · 8f915c45
      Sanchit Gandhi authored
      * fix for ragged list
      
      * unpin numba
      
      * make style
      
      * np.object -> object
      
      * propagate changes to tokenizer as well
      
      * np.long -> "long"
      
      * revert tokenization changes
      
      * check with tokenization changes
      
      * list/tuple logic
      
      * catch numpy
      
      * catch else case
      
      * clean up
      
      * up
      
      * better check
      
      * trigger ci
      
      * Empty commit to trigger CI
      8f915c45
  10. 23 May, 2023 1 commit
  11. 16 May, 2023 1 commit
  12. 12 May, 2023 1 commit
  13. 11 May, 2023 2 commits
  14. 10 May, 2023 1 commit
  15. 09 May, 2023 1 commit
  16. 08 May, 2023 1 commit
  17. 04 May, 2023 1 commit
  18. 03 May, 2023 1 commit
  19. 20 Apr, 2023 1 commit
  20. 18 Apr, 2023 1 commit
  21. 17 Apr, 2023 1 commit
  22. 13 Apr, 2023 1 commit
  23. 07 Apr, 2023 1 commit
  24. 06 Apr, 2023 1 commit
    • Nicolas Patry's avatar
      Adding Llama FastTokenizer support. (#22264) · 1670be4b
      Nicolas Patry authored
      * Adding Llama FastTokenizer support.
      
      - Requires https://github.com/huggingface/tokenizers/pull/1183 version
      - Only support byte_fallback for llama, raise otherwise (safety net).
      - Lots of questions are special tokens
      
      How to test:
      
      ```python
      
      from transformers.convert_slow_tokenizer import convert_slow_tokenizer
      from transformers import AutoTokenizer
      from tokenizers import Tokenizer
      
      tokenizer = AutoTokenizer.from_pretrained("huggingface/llama-7b")
      
      if False:
          new_tokenizer = Tokenizer.from_file("tok.json")
      else:
          new_tokenizer = convert_slow_tokenizer(tokenizer)
          new_tokenizer.save("tok.json")
      
      strings = [
          "This is a test",
          "生活的真谛是",
          "生活的真谛是[MASK]。",
          # XXX: This one is problematic because of special tokens
          # "<s> Something something",
      ]
      
      for string in strings:
          encoded = tokenizer(string)["input_ids"]
          encoded2 = new_tokenizer.encode(string).ids
      
          assert encoded == encoded2, f"{encoded} != {encoded2}"
      
          decoded = tokenizer.decode(encoded)
          decoded2 = new_tokenizer.decode(encoded2)
      
          assert decoded.strip() == decoded2, f"{repr(decoded)} != {repr(decoded2)}"
      ```
      
      The converter + some test script.
      
      The test script.
      
      Tmp save.
      
      Adding Fast tokenizer + tests.
      
      Adding the tokenization tests.
      
      Correct combination.
      
      Small fix.
      
      Fixing tests.
      
      Fixing with latest update.
      
      Rebased.
      
      fix copies + normalized added tokens  + copies.
      
      Adding doc.
      
      TMP.
      
      Doc + split files.
      
      Doc.
      
      Versions + try import.
      
      Fix Camembert + warnings -> Error.
      
      Fix by ArthurZucker.
      
      Not a decorator.
      
      * Fixing comments.
      
      * Adding more to docstring.
      
      * Doc rewriting.
      1670be4b
  25. 03 Apr, 2023 2 commits
  26. 29 Mar, 2023 2 commits
  27. 24 Mar, 2023 2 commits
  28. 22 Mar, 2023 1 commit
  29. 21 Mar, 2023 2 commits
  30. 17 Mar, 2023 1 commit
    • Ali Hassani's avatar
      Fix natten (#22229) · 3028b20a
      Ali Hassani authored
      * Add kernel size to NATTEN's QK arguments.
      
      The new NATTEN 0.14.5 supports PyTorch 2.0, but also adds an additional
      argument to the QK operation to allow optional RPBs.
      
      This ends up failing NATTEN tests.
      
      This commit adds NATTEN back to circleci and adds the arguments to get
      it working again.
      
      * Force NATTEN >= 0.14.5
      3028b20a
  31. 14 Mar, 2023 1 commit