1. 08 Nov, 2022 3 commits
    • Weiwe Shi's avatar
      Add RocBert (#20013) · efa889d2
      Weiwe Shi authored
      
      
      * add roc_bert
      
      * update roc_bert readme
      
      * code style
      
      * change name and delete unuse file
      
      * udpate model file
      
      * delete unuse log file
      
      * delete tokenizer fast
      
      * reformat code and change model file path
      
      * add RocBertForPreTraining
      
      * update docs
      
      * delete wrong notes
      
      * fix copies
      
      * fix make repo-consistency error
      
      * fix files are not present in the table of contents error
      
      * change RocBert -> RoCBert
      
      * add doc, add detail test
      Co-authored-by: default avatarweiweishi <weiweishi@tencent.com>
      efa889d2
    • NielsRogge's avatar
      Add CLIPSeg (#20066) · 25896306
      NielsRogge authored
      
      
      * Add first draft
      
      * Update conversion script
      
      * Improve conversion script
      
      * Improve conversion script some more
      
      * Add conditional embeddings
      
      * Add initial decoder
      
      * Fix activation function of decoder
      
      * Make decoder outputs match original implementation
      
      * Make decoder outputs match original implementation
      
      * Add more copied from statements
      
      * Improve model outputs
      
      * Fix auto tokenizer file
      
      * Fix more tests
      
      * Add test
      
      * Improve README and docs, improve conditional embeddings
      
      * Fix more tests
      
      * Remove print statements
      
      * Remove initial embeddings
      
      * Improve conversion script
      
      * Add interpolation of position embeddings
      
      * Finish addition of interpolation of position embeddings
      
      * Add support for refined checkpoint
      
      * Fix refined checkpoint
      
      * Remove unused parameter
      
      * Improve conversion script
      
      * Add support for training
      
      * Fix conversion script
      
      * Add CLIPSegFeatureExtractor
      
      * Fix processor
      
      * Fix CLIPSegProcessor
      
      * Fix conversion script
      
      * Fix most tests
      
      * Fix equivalence test
      
      * Fix README
      
      * Add model to doc tests
      
      * Use better variable name
      
      * Convert other checkpoint as well
      
      * Update config, add link to paper
      
      * Add docs
      
      * Update organization
      
      * Replace base_model_prefix with clip
      
      * Fix base_model_prefix
      
      * Fix checkpoint of config
      
      * Fix config checkpoint
      
      * Remove file
      
      * Use logits for output
      
      * Fix tests
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
      25896306
    • Sanchit Gandhi's avatar
      [Audio Processor] Only pass sr to feat extractor (#20022) · 3e39fd09
      Sanchit Gandhi authored
      * [Audio Processor] Only pass sr to feat extractor
      
      * move out of if/else
      
      * copy to other processors
      3e39fd09
  2. 07 Nov, 2022 7 commits
  3. 04 Nov, 2022 8 commits
  4. 03 Nov, 2022 10 commits
  5. 02 Nov, 2022 7 commits
    • Ben Eyal's avatar
      馃毃 馃毃 馃毃 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in... · 9f9ddcc2
      Ben Eyal authored
      馃毃 馃毃 馃毃 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` (#15775)
      
      * Add test for SentencePiece not adding special tokens to strings
      
      * Add SentencePieceStringConversionMixin to fix issue 15003
      
      * Fix conversion from tokens to string for most SentencePiece tokenizers
      
      Tokenizers fixed:
      - AlbertTokenizer
      - BarthezTokenizer
      - CamembertTokenizer
      - FNetTokenizer
      - M2M100Tokenizer
      - MBart50Tokenizer
      - PegasusTokenizer
      - Speech2TextTokenizer
      
      * Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab
      
      * Fix DebertaV2Tokenizer
      
      * Ignore LayoutXLMTokenizer in SentencePiece string conversion test
      
      * Run 'make style' and 'make quality'
      
      * Clean convert_tokens_to_string test
      
      Instead of explicitly ignoring LayoutXLMTokenizer in the test,
      override the test in LayoutLMTokenizationTest and do nothing in it.
      
      * Remove commented out code
      
      * Improve robustness of convert_tokens_to_string test
      
      Instead of comparing lengths of re-tokenized text and input_ids,
      check that converting all special tokens to string yields a string
      with all special tokens.
      
      * Inline and remove SentencePieceStringConversionMixin
      
      The convert_tokens_to_string method is now implemented
      in each relevant SentencePiece tokenizer.
      
      * Run 'make style' and 'make quality'
      
      * Revert removal of space in convert_tokens_to_string
      
      * Remove redundant import
      
      * Revert test text to original
      
      * Uncomment the lowercasing of the reverse_text variable
      
      * Mimic Rust tokenizer behavior for tokenizers
      
      - Albert
      - Barthez
      - Camembert
      - MBart50
      - T5
      
      * Fix accidentally skipping test in wrong tokenizer
      
      * Add test for equivalent Rust and slow tokenizer behavior
      
      * Override _decode in BigBirdTokenizer to mimic Rust behavior
      
      * Override _decode in FNetTokenizer to mimic Rust behavior
      
      * Override _decode in XLNetTokenizer to mimic Rust behavior
      
      * Remove unused 're' import
      
      * Update DebertaV2Tokenizer to mimic Rust tokenizer
      
      * Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.
      
      * Ignore problematic tests in Deberta V2
      
      * Add comment on why the Deberta V2 tests are skipped
      9f9ddcc2
    • Saad Mahmud's avatar
      [Doctest] Add configuration_deberta_v2.py (#19995) · 74877437
      Saad Mahmud authored
      * Add example docstring for DebertaV2Config
      
      * Add DebertaV2Config to documentation_tests
      
      * Fix mistake with directory name
      74877437
    • Sylvain Gugger's avatar
      Quality (#20002) · 49b77b89
      Sylvain Gugger authored
      49b77b89
    • amyeroberts's avatar
      Add Image Processors (#19796) · a6b77598
      amyeroberts authored
      
      
      * Add CLIP image processor
      
      * Crop size as dict too
      
      * Update warning
      
      * Actually use logger this time
      
      * Normalize doesn't change dtype of input
      
      * Add perceiver image processor
      
      * Tidy up
      
      * Add DPT image processor
      
      * Add Vilt image processor
      
      * Tidy up
      
      * Add poolformer image processor
      
      * Tidy up
      
      * Add LayoutLM v2 and v3 imsge processors
      
      * Tidy up
      
      * Add Flava image processor
      
      * Tidy up
      
      * Add deit image processor
      
      * Tidy up
      
      * Add ConvNext image processor
      
      * Tidy up
      
      * Add levit image processor
      
      * Add segformer image processor
      
      * Add in post processing
      
      * Fix up
      
      * Add ImageGPT image processor
      
      * Fixup
      
      * Add mobilevit image processor
      
      * Tidy up
      
      * Add postprocessing
      
      * Fixup
      
      * Add VideoMAE image processor
      
      * Tidy up
      
      * Add ImageGPT image processor
      
      * Fixup
      
      * Add ViT image processor
      
      * Tidy up
      
      * Add beit image processor
      
      * Add mobilevit image processor
      
      * Tidy up
      
      * Add postprocessing
      
      * Fixup
      
      * Fix up
      
      * Fix flava and remove tree module
      
      * Fix image classification pipeline failing tests
      
      * Update feature extractor in trainer scripts
      
      * Update pad_if_smaller to accept tuple and int size
      
      * Update for image segmentation pipeline
      
      * Update src/transformers/models/perceiver/image_processing_perceiver.py
      Co-authored-by: default avatarAlara Dirik <8944735+alaradirik@users.noreply.github.com>
      
      * Update src/transformers/image_processing_utils.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/models/beit/image_processing_beit.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * PR comments - docstrings; remove accidentally added resize; var names
      
      * Update docstrings
      
      * Add exception if size is not in the right format
      
      * Fix exception check
      
      * Fix up
      
      * Use shortest_edge in tuple in script
      Co-authored-by: default avatarAlara Dirik <8944735+alaradirik@users.noreply.github.com>
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      a6b77598
    • Ripose's avatar
    • Yih-Dar's avatar
      clean up vision/text config dict arguments (#19954) · 8827e1b2
      Yih-Dar authored
      
      
      * clean up
      
      * For backward compatibility
      
      * clean up
      
      * Same changes for more models
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      8827e1b2
    • Alara Dirik's avatar
  6. 01 Nov, 2022 5 commits