1. 15 Nov, 2022 9 commits
  2. 14 Nov, 2022 6 commits
  3. 11 Nov, 2022 2 commits
  4. 10 Nov, 2022 4 commits
  5. 09 Nov, 2022 4 commits
  6. 08 Nov, 2022 3 commits
    • amyeroberts's avatar
      AutoImageProcessor (#20111) · 4eb918e6
      amyeroberts authored
      * AutoImageProcessor skeleton
      
      * Update references
      
      * Add mapping in init
      
      * Add model image processors to __init__ for importing
      
      * Add AutoImageProcessor tests
      
      * Fix up
      
      * Image Processor documentation
      
      * Remove pdb
      
      * Update docs/source/en/model_doc/mobilevit.mdx
      
      * Update docs
      
      * Don't add whitespace on json files
      
      * Remove fixtures
      
      * Move checking model config down
      
      * Fix up
      
      * Add check for image processor
      
      * Remove FeatureExtractorMixin in docstrings
      
      * Rename model_tmpfile to config_tmpfile
      
      * Don't make None if not in image processor map
      4eb918e6
    • Weiwe Shi's avatar
      Add RocBert (#20013) · efa889d2
      Weiwe Shi authored
      
      
      * add roc_bert
      
      * update roc_bert readme
      
      * code style
      
      * change name and delete unuse file
      
      * udpate model file
      
      * delete unuse log file
      
      * delete tokenizer fast
      
      * reformat code and change model file path
      
      * add RocBertForPreTraining
      
      * update docs
      
      * delete wrong notes
      
      * fix copies
      
      * fix make repo-consistency error
      
      * fix files are not present in the table of contents error
      
      * change RocBert -> RoCBert
      
      * add doc, add detail test
      Co-authored-by: default avatarweiweishi <weiweishi@tencent.com>
      efa889d2
    • NielsRogge's avatar
      Add CLIPSeg (#20066) · 25896306
      NielsRogge authored
      
      
      * Add first draft
      
      * Update conversion script
      
      * Improve conversion script
      
      * Improve conversion script some more
      
      * Add conditional embeddings
      
      * Add initial decoder
      
      * Fix activation function of decoder
      
      * Make decoder outputs match original implementation
      
      * Make decoder outputs match original implementation
      
      * Add more copied from statements
      
      * Improve model outputs
      
      * Fix auto tokenizer file
      
      * Fix more tests
      
      * Add test
      
      * Improve README and docs, improve conditional embeddings
      
      * Fix more tests
      
      * Remove print statements
      
      * Remove initial embeddings
      
      * Improve conversion script
      
      * Add interpolation of position embeddings
      
      * Finish addition of interpolation of position embeddings
      
      * Add support for refined checkpoint
      
      * Fix refined checkpoint
      
      * Remove unused parameter
      
      * Improve conversion script
      
      * Add support for training
      
      * Fix conversion script
      
      * Add CLIPSegFeatureExtractor
      
      * Fix processor
      
      * Fix CLIPSegProcessor
      
      * Fix conversion script
      
      * Fix most tests
      
      * Fix equivalence test
      
      * Fix README
      
      * Add model to doc tests
      
      * Use better variable name
      
      * Convert other checkpoint as well
      
      * Update config, add link to paper
      
      * Add docs
      
      * Update organization
      
      * Replace base_model_prefix with clip
      
      * Fix base_model_prefix
      
      * Fix checkpoint of config
      
      * Fix config checkpoint
      
      * Remove file
      
      * Use logits for output
      
      * Fix tests
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
      25896306
  7. 07 Nov, 2022 2 commits
  8. 04 Nov, 2022 3 commits
  9. 03 Nov, 2022 3 commits
  10. 02 Nov, 2022 4 commits
    • Ben Eyal's avatar
      馃毃 馃毃 馃毃 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in... · 9f9ddcc2
      Ben Eyal authored
      馃毃 馃毃 馃毃 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` (#15775)
      
      * Add test for SentencePiece not adding special tokens to strings
      
      * Add SentencePieceStringConversionMixin to fix issue 15003
      
      * Fix conversion from tokens to string for most SentencePiece tokenizers
      
      Tokenizers fixed:
      - AlbertTokenizer
      - BarthezTokenizer
      - CamembertTokenizer
      - FNetTokenizer
      - M2M100Tokenizer
      - MBart50Tokenizer
      - PegasusTokenizer
      - Speech2TextTokenizer
      
      * Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab
      
      * Fix DebertaV2Tokenizer
      
      * Ignore LayoutXLMTokenizer in SentencePiece string conversion test
      
      * Run 'make style' and 'make quality'
      
      * Clean convert_tokens_to_string test
      
      Instead of explicitly ignoring LayoutXLMTokenizer in the test,
      override the test in LayoutLMTokenizationTest and do nothing in it.
      
      * Remove commented out code
      
      * Improve robustness of convert_tokens_to_string test
      
      Instead of comparing lengths of re-tokenized text and input_ids,
      check that converting all special tokens to string yields a string
      with all special tokens.
      
      * Inline and remove SentencePieceStringConversionMixin
      
      The convert_tokens_to_string method is now implemented
      in each relevant SentencePiece tokenizer.
      
      * Run 'make style' and 'make quality'
      
      * Revert removal of space in convert_tokens_to_string
      
      * Remove redundant import
      
      * Revert test text to original
      
      * Uncomment the lowercasing of the reverse_text variable
      
      * Mimic Rust tokenizer behavior for tokenizers
      
      - Albert
      - Barthez
      - Camembert
      - MBart50
      - T5
      
      * Fix accidentally skipping test in wrong tokenizer
      
      * Add test for equivalent Rust and slow tokenizer behavior
      
      * Override _decode in BigBirdTokenizer to mimic Rust behavior
      
      * Override _decode in FNetTokenizer to mimic Rust behavior
      
      * Override _decode in XLNetTokenizer to mimic Rust behavior
      
      * Remove unused 're' import
      
      * Update DebertaV2Tokenizer to mimic Rust tokenizer
      
      * Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.
      
      * Ignore problematic tests in Deberta V2
      
      * Add comment on why the Deberta V2 tests are skipped
      9f9ddcc2
    • Yih-Dar's avatar
      Improve model tester (#19984) · f69eb24b
      Yih-Dar authored
      
      
      * part 1
      
      * part 2
      
      * part 3
      
      * fix
      
      * For CANINE
      
      * For ESMFold
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      f69eb24b
    • amyeroberts's avatar
    • Sylvain Gugger's avatar
      Quality (#20002) · 49b77b89
      Sylvain Gugger authored
      49b77b89