1. 02 Nov, 2022 6 commits
    • Ben Eyal's avatar
      馃毃 馃毃 馃毃 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in... · 9f9ddcc2
      Ben Eyal authored
      馃毃 馃毃 馃毃 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` (#15775)
      
      * Add test for SentencePiece not adding special tokens to strings
      
      * Add SentencePieceStringConversionMixin to fix issue 15003
      
      * Fix conversion from tokens to string for most SentencePiece tokenizers
      
      Tokenizers fixed:
      - AlbertTokenizer
      - BarthezTokenizer
      - CamembertTokenizer
      - FNetTokenizer
      - M2M100Tokenizer
      - MBart50Tokenizer
      - PegasusTokenizer
      - Speech2TextTokenizer
      
      * Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab
      
      * Fix DebertaV2Tokenizer
      
      * Ignore LayoutXLMTokenizer in SentencePiece string conversion test
      
      * Run 'make style' and 'make quality'
      
      * Clean convert_tokens_to_string test
      
      Instead of explicitly ignoring LayoutXLMTokenizer in the test,
      override the test in LayoutLMTokenizationTest and do nothing in it.
      
      * Remove commented out code
      
      * Improve robustness of convert_tokens_to_string test
      
      Instead of comparing lengths of re-tokenized text and input_ids,
      check that converting all special tokens to string yields a string
      with all special tokens.
      
      * Inline and remove SentencePieceStringConversionMixin
      
      The convert_tokens_to_string method is now implemented
      in each relevant SentencePiece tokenizer.
      
      * Run 'make style' and 'make quality'
      
      * Revert removal of space in convert_tokens_to_string
      
      * Remove redundant import
      
      * Revert test text to original
      
      * Uncomment the lowercasing of the reverse_text variable
      
      * Mimic Rust tokenizer behavior for tokenizers
      
      - Albert
      - Barthez
      - Camembert
      - MBart50
      - T5
      
      * Fix accidentally skipping test in wrong tokenizer
      
      * Add test for equivalent Rust and slow tokenizer behavior
      
      * Override _decode in BigBirdTokenizer to mimic Rust behavior
      
      * Override _decode in FNetTokenizer to mimic Rust behavior
      
      * Override _decode in XLNetTokenizer to mimic Rust behavior
      
      * Remove unused 're' import
      
      * Update DebertaV2Tokenizer to mimic Rust tokenizer
      
      * Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.
      
      * Ignore problematic tests in Deberta V2
      
      * Add comment on why the Deberta V2 tests are skipped
      9f9ddcc2
    • Yih-Dar's avatar
      Improve model tester (#19984) · f69eb24b
      Yih-Dar authored
      
      
      * part 1
      
      * part 2
      
      * part 3
      
      * fix
      
      * For CANINE
      
      * For ESMFold
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      f69eb24b
    • amyeroberts's avatar
    • Sylvain Gugger's avatar
      Quality (#20002) · 49b77b89
      Sylvain Gugger authored
      49b77b89
    • Yih-Dar's avatar
    • amyeroberts's avatar
      Add Image Processors (#19796) · a6b77598
      amyeroberts authored
      
      
      * Add CLIP image processor
      
      * Crop size as dict too
      
      * Update warning
      
      * Actually use logger this time
      
      * Normalize doesn't change dtype of input
      
      * Add perceiver image processor
      
      * Tidy up
      
      * Add DPT image processor
      
      * Add Vilt image processor
      
      * Tidy up
      
      * Add poolformer image processor
      
      * Tidy up
      
      * Add LayoutLM v2 and v3 imsge processors
      
      * Tidy up
      
      * Add Flava image processor
      
      * Tidy up
      
      * Add deit image processor
      
      * Tidy up
      
      * Add ConvNext image processor
      
      * Tidy up
      
      * Add levit image processor
      
      * Add segformer image processor
      
      * Add in post processing
      
      * Fix up
      
      * Add ImageGPT image processor
      
      * Fixup
      
      * Add mobilevit image processor
      
      * Tidy up
      
      * Add postprocessing
      
      * Fixup
      
      * Add VideoMAE image processor
      
      * Tidy up
      
      * Add ImageGPT image processor
      
      * Fixup
      
      * Add ViT image processor
      
      * Tidy up
      
      * Add beit image processor
      
      * Add mobilevit image processor
      
      * Tidy up
      
      * Add postprocessing
      
      * Fixup
      
      * Fix up
      
      * Fix flava and remove tree module
      
      * Fix image classification pipeline failing tests
      
      * Update feature extractor in trainer scripts
      
      * Update pad_if_smaller to accept tuple and int size
      
      * Update for image segmentation pipeline
      
      * Update src/transformers/models/perceiver/image_processing_perceiver.py
      Co-authored-by: default avatarAlara Dirik <8944735+alaradirik@users.noreply.github.com>
      
      * Update src/transformers/image_processing_utils.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/models/beit/image_processing_beit.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * PR comments - docstrings; remove accidentally added resize; var names
      
      * Update docstrings
      
      * Add exception if size is not in the right format
      
      * Fix exception check
      
      * Fix up
      
      * Use shortest_edge in tuple in script
      Co-authored-by: default avatarAlara Dirik <8944735+alaradirik@users.noreply.github.com>
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      a6b77598
  2. 01 Nov, 2022 3 commits
    • Joao Gante's avatar
      Generate: contrastive search with full optional outputs (#19963) · 831590f6
      Joao Gante authored
      * Use beam search functionality; Add extra outputs and test
      
      * Add full tests for contrastive search
      
      * Add error message on unconventional cache format
      831590f6
    • Mohit Sharma's avatar
      Added onnx config whisper (#19525) · c796b6de
      Mohit Sharma authored
      * Added onnx config whisper
      
      * added whisper support onnx
      
      * add audio input data
      
      * added whisper support onnx
      
      * fixed the seqlength value
      
      * Updated the whisper onnx ocnfig
      
      * restore files to old version
      
      * removed attention mask from inputs
      
      * Updated get_dummy_input_onnxruntime docstring
      
      * Updated relative imports and token generation
      
      * update docstring
      c796b6de
    • Matt's avatar
      Add ESMFold (#19977) · 7f9b7b3f
      Matt authored
      
      
      * initial commit
      
      * First draft that gets outputs without crashing!
      
      * Add all the ported openfold dependencies
      
      * testing
      
      * Restructure config files for ESMFold
      
      * Debugging to find output discrepancies
      
      * Mainly style
      
      * Make model runnable without extra deps
      
      * Remove utils and merge them to the modeling file
      
      * Use correct gelu and remove some debug prints
      
      * More cleanup
      
      * Update esm docs
      
      * Update conversion script to support ESMFold properly
      
      * Port some top-level changes from ESMFold repo
      
      * Expand EsmFold docstrings
      
      * Make attention_mask optional (default to all 1s)
      
      * Add inference test for ESMFold
      
      * Use config and not n kwargs
      
      * Add modeling output class
      
      * Remove einops
      
      * Remove chunking in ESM FFN
      
      * Update tests for ESMFold
      
      * Quality
      
      * REpo consistency
      
      * Remove tree dependency from ESMFold
      
      * make fixup
      
      * Add an error in case my structure map function breaks later
      
      * Remove needless code
      
      * Stop auto-casting the LM to float16 so CPU tests pass
      
      * Stop auto-casting the LM to float16 so CPU tests pass
      
      * Final test updates
      
      * Split test file
      
      * Copyright and quality
      
      * Unpin PyTorch to see built doc
      
      * Fix config file to_dict() method
      
      * Add some docstrings to the output
      
      * Skip TF checkpoint tests for ESM until we reupload those
      
      * make fixup
      
      * More docstrings
      
      * Unpin to get even with main
      
      * Flag example to write
      Co-authored-by: default avatarSylvain Gugger <Sylvain.gugger@gmail.com>
      7f9b7b3f
  3. 31 Oct, 2022 2 commits
  4. 28 Oct, 2022 2 commits
  5. 27 Oct, 2022 3 commits
  6. 26 Oct, 2022 4 commits
  7. 25 Oct, 2022 3 commits
  8. 24 Oct, 2022 3 commits
  9. 21 Oct, 2022 7 commits
  10. 20 Oct, 2022 1 commit
  11. 19 Oct, 2022 3 commits
  12. 18 Oct, 2022 3 commits
    • Sylvain Gugger's avatar
      Repo utils test (#19696) · a929f81e
      Sylvain Gugger authored
      * Create repo utils test job
      
      * Last occurence
      
      * Add tests for tests_fetcher
      
      * Better filtering
      
      * Let's learn more
      
      * Should fix
      
      * Should fix
      
      * Remove debug
      
      * Style
      
      * WiP
      
      WiP
      
      WiP
      
      WiP
      
      WiP
      
      WiP
      
      WiP
      
      WiP
      
      WiP
      
      * Quality
      
      * address review comments
      
      * Fix link
      a929f81e
    • David Yang's avatar
      Clean up deprecation warnings (#19654) · a23819ed
      David Yang authored
      * Clean up deprecation warnings
      
      Notes:
      Changed some strings in tests to raw strings, which will change the literal content of the strings as they are fed into whatever machine handles them.
      Test cases for past in the past/past_key_values switch changed/removed due to warning of impending removal
      
      * Add PILImageResampling abstraction for PIL.Image.Resampling
      a23819ed
    • Sylvain Gugger's avatar
      fb0bd7b7