1. 06 Apr, 2023 7 commits
    • Yih-Dar's avatar
      update_pip_test_mapping (#22606) · fa01127a
      Yih-Dar authored
      
      
      * Add TFBlipForConditionalGeneration
      
      * update pipeline_model_mapping
      
      * Add import
      
      * Revert changes in GPTSanJapaneseTest
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      fa01127a
    • Connor Henderson's avatar
      docs: Fix broken link to generation strategies (#22623) · 321b0908
      Connor Henderson authored
      fix broken link
      321b0908
    • Yih-Dar's avatar
      Make tiny model creation + pipeline testing more robust (#22500) · 2c22bc79
      Yih-Dar authored
      
      
      * Final Tiny things
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      2c22bc79
    • amyeroberts's avatar
      Backbone add mixin tests (#22542) · 12d51db2
      amyeroberts authored
      * Add out_indices to backbones, deprecate out_features
      
      * Update - can specify both out_features and out_indices but not both
      
      * Add backbone mixin tests
      
      * Test tidy up
      
      * Add test_backbone for convnext
      
      * Remove redefinition of method
      
      * Update for Dinat and Nat backbones
      
      * Update tests
      
      * Smarter indexing
      
      * Add checks on config creation for backbone
      
      * PR comments
      12d51db2
    • Joao Gante's avatar
    • Nicolas Patry's avatar
    • Nicolas Patry's avatar
      Adding Llama FastTokenizer support. (#22264) · 1670be4b
      Nicolas Patry authored
      * Adding Llama FastTokenizer support.
      
      - Requires https://github.com/huggingface/tokenizers/pull/1183 version
      - Only support byte_fallback for llama, raise otherwise (safety net).
      - Lots of questions are special tokens
      
      How to test:
      
      ```python
      
      from transformers.convert_slow_tokenizer import convert_slow_tokenizer
      from transformers import AutoTokenizer
      from tokenizers import Tokenizer
      
      tokenizer = AutoTokenizer.from_pretrained("huggingface/llama-7b")
      
      if False:
          new_tokenizer = Tokenizer.from_file("tok.json")
      else:
          new_tokenizer = convert_slow_tokenizer(tokenizer)
          new_tokenizer.save("tok.json")
      
      strings = [
          "This is a test",
          "生活的真谛是",
          "生活的真谛是[MASK]。",
          # XXX: This one is problematic because of special tokens
          # "<s> Something something",
      ]
      
      for string in strings:
          encoded = tokenizer(string)["input_ids"]
          encoded2 = new_tokenizer.encode(string).ids
      
          assert encoded == encoded2, f"{encoded} != {encoded2}"
      
          decoded = tokenizer.decode(encoded)
          decoded2 = new_tokenizer.decode(encoded2)
      
          assert decoded.strip() == decoded2, f"{repr(decoded)} != {repr(decoded2)}"
      ```
      
      The converter + some test script.
      
      The test script.
      
      Tmp save.
      
      Adding Fast tokenizer + tests.
      
      Adding the tokenization tests.
      
      Correct combination.
      
      Small fix.
      
      Fixing tests.
      
      Fixing with latest update.
      
      Rebased.
      
      fix copies + normalized added tokens  + copies.
      
      Adding doc.
      
      TMP.
      
      Doc + split files.
      
      Doc.
      
      Versions + try import.
      
      Fix Camembert + warnings -> Error.
      
      Fix by ArthurZucker.
      
      Not a decorator.
      
      * Fixing comments.
      
      * Adding more to docstring.
      
      * Doc rewriting.
      1670be4b
  2. 05 Apr, 2023 13 commits
  3. 04 Apr, 2023 14 commits
  4. 03 Apr, 2023 6 commits