1. 03 Nov, 2022 1 commit
  2. 02 Nov, 2022 5 commits
    • Ben Eyal's avatar
      馃毃 馃毃 馃毃 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in... · 9f9ddcc2
      Ben Eyal authored
      馃毃 馃毃 馃毃 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` (#15775)
      
      * Add test for SentencePiece not adding special tokens to strings
      
      * Add SentencePieceStringConversionMixin to fix issue 15003
      
      * Fix conversion from tokens to string for most SentencePiece tokenizers
      
      Tokenizers fixed:
      - AlbertTokenizer
      - BarthezTokenizer
      - CamembertTokenizer
      - FNetTokenizer
      - M2M100Tokenizer
      - MBart50Tokenizer
      - PegasusTokenizer
      - Speech2TextTokenizer
      
      * Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab
      
      * Fix DebertaV2Tokenizer
      
      * Ignore LayoutXLMTokenizer in SentencePiece string conversion test
      
      * Run 'make style' and 'make quality'
      
      * Clean convert_tokens_to_string test
      
      Instead of explicitly ignoring LayoutXLMTokenizer in the test,
      override the test in LayoutLMTokenizationTest and do nothing in it.
      
      * Remove commented out code
      
      * Improve robustness of convert_tokens_to_string test
      
      Instead of comparing lengths of re-tokenized text and input_ids,
      check that converting all special tokens to string yields a string
      with all special tokens.
      
      * Inline and remove SentencePieceStringConversionMixin
      
      The convert_tokens_to_string method is now implemented
      in each relevant SentencePiece tokenizer.
      
      * Run 'make style' and 'make quality'
      
      * Revert removal of space in convert_tokens_to_string
      
      * Remove redundant import
      
      * Revert test text to original
      
      * Uncomment the lowercasing of the reverse_text variable
      
      * Mimic Rust tokenizer behavior for tokenizers
      
      - Albert
      - Barthez
      - Camembert
      - MBart50
      - T5
      
      * Fix accidentally skipping test in wrong tokenizer
      
      * Add test for equivalent Rust and slow tokenizer behavior
      
      * Override _decode in BigBirdTokenizer to mimic Rust behavior
      
      * Override _decode in FNetTokenizer to mimic Rust behavior
      
      * Override _decode in XLNetTokenizer to mimic Rust behavior
      
      * Remove unused 're' import
      
      * Update DebertaV2Tokenizer to mimic Rust tokenizer
      
      * Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.
      
      * Ignore problematic tests in Deberta V2
      
      * Add comment on why the Deberta V2 tests are skipped
      9f9ddcc2
    • Yih-Dar's avatar
      Improve model tester (#19984) · f69eb24b
      Yih-Dar authored
      
      
      * part 1
      
      * part 2
      
      * part 3
      
      * fix
      
      * For CANINE
      
      * For ESMFold
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      f69eb24b
    • amyeroberts's avatar
    • Yih-Dar's avatar
    • amyeroberts's avatar
      Add Image Processors (#19796) · a6b77598
      amyeroberts authored
      
      
      * Add CLIP image processor
      
      * Crop size as dict too
      
      * Update warning
      
      * Actually use logger this time
      
      * Normalize doesn't change dtype of input
      
      * Add perceiver image processor
      
      * Tidy up
      
      * Add DPT image processor
      
      * Add Vilt image processor
      
      * Tidy up
      
      * Add poolformer image processor
      
      * Tidy up
      
      * Add LayoutLM v2 and v3 imsge processors
      
      * Tidy up
      
      * Add Flava image processor
      
      * Tidy up
      
      * Add deit image processor
      
      * Tidy up
      
      * Add ConvNext image processor
      
      * Tidy up
      
      * Add levit image processor
      
      * Add segformer image processor
      
      * Add in post processing
      
      * Fix up
      
      * Add ImageGPT image processor
      
      * Fixup
      
      * Add mobilevit image processor
      
      * Tidy up
      
      * Add postprocessing
      
      * Fixup
      
      * Add VideoMAE image processor
      
      * Tidy up
      
      * Add ImageGPT image processor
      
      * Fixup
      
      * Add ViT image processor
      
      * Tidy up
      
      * Add beit image processor
      
      * Add mobilevit image processor
      
      * Tidy up
      
      * Add postprocessing
      
      * Fixup
      
      * Fix up
      
      * Fix flava and remove tree module
      
      * Fix image classification pipeline failing tests
      
      * Update feature extractor in trainer scripts
      
      * Update pad_if_smaller to accept tuple and int size
      
      * Update for image segmentation pipeline
      
      * Update src/transformers/models/perceiver/image_processing_perceiver.py
      Co-authored-by: default avatarAlara Dirik <8944735+alaradirik@users.noreply.github.com>
      
      * Update src/transformers/image_processing_utils.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * Update src/transformers/models/beit/image_processing_beit.py
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      
      * PR comments - docstrings; remove accidentally added resize; var names
      
      * Update docstrings
      
      * Add exception if size is not in the right format
      
      * Fix exception check
      
      * Fix up
      
      * Use shortest_edge in tuple in script
      Co-authored-by: default avatarAlara Dirik <8944735+alaradirik@users.noreply.github.com>
      Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
      a6b77598
  3. 01 Nov, 2022 2 commits
    • Joao Gante's avatar
      Generate: contrastive search with full optional outputs (#19963) · 831590f6
      Joao Gante authored
      * Use beam search functionality; Add extra outputs and test
      
      * Add full tests for contrastive search
      
      * Add error message on unconventional cache format
      831590f6
    • Matt's avatar
      Add ESMFold (#19977) · 7f9b7b3f
      Matt authored
      
      
      * initial commit
      
      * First draft that gets outputs without crashing!
      
      * Add all the ported openfold dependencies
      
      * testing
      
      * Restructure config files for ESMFold
      
      * Debugging to find output discrepancies
      
      * Mainly style
      
      * Make model runnable without extra deps
      
      * Remove utils and merge them to the modeling file
      
      * Use correct gelu and remove some debug prints
      
      * More cleanup
      
      * Update esm docs
      
      * Update conversion script to support ESMFold properly
      
      * Port some top-level changes from ESMFold repo
      
      * Expand EsmFold docstrings
      
      * Make attention_mask optional (default to all 1s)
      
      * Add inference test for ESMFold
      
      * Use config and not n kwargs
      
      * Add modeling output class
      
      * Remove einops
      
      * Remove chunking in ESM FFN
      
      * Update tests for ESMFold
      
      * Quality
      
      * REpo consistency
      
      * Remove tree dependency from ESMFold
      
      * make fixup
      
      * Add an error in case my structure map function breaks later
      
      * Remove needless code
      
      * Stop auto-casting the LM to float16 so CPU tests pass
      
      * Stop auto-casting the LM to float16 so CPU tests pass
      
      * Final test updates
      
      * Split test file
      
      * Copyright and quality
      
      * Unpin PyTorch to see built doc
      
      * Fix config file to_dict() method
      
      * Add some docstrings to the output
      
      * Skip TF checkpoint tests for ESM until we reupload those
      
      * make fixup
      
      * More docstrings
      
      * Unpin to get even with main
      
      * Flag example to write
      Co-authored-by: default avatarSylvain Gugger <Sylvain.gugger@gmail.com>
      7f9b7b3f
  4. 31 Oct, 2022 2 commits
  5. 28 Oct, 2022 1 commit
    • donguk.lim's avatar
      Support segformer fx (#19924) · 347ba38c
      donguk.lim authored
      
      
      * Support segformer fx
      
      * Add fx_compatible attribute to test_modeling_segformer.py
      
      * Update glpn model (fx support)
      
      glpn model was copied from segformer.
      
      * Update utils/fx.py | add semantic-segmentation
      
      for SegformerForSemanticSegmentation model
      
      * Fix minor import order(isort)
      
      * Add random input generation for segformer fx
      Co-authored-by: default avatarnoelbird <lduldu00228@gmail.com>
      347ba38c
  6. 27 Oct, 2022 2 commits
  7. 25 Oct, 2022 2 commits
  8. 24 Oct, 2022 1 commit
  9. 21 Oct, 2022 4 commits
  10. 18 Oct, 2022 5 commits
  11. 17 Oct, 2022 1 commit
    • Matt's avatar
      TF port of ESM (#19587) · 3b3024da
      Matt authored
      
      
      * Partial TF port for ESM model
      
      * Add ESM-TF tests
      
      * Add the various imports for TF-ESM
      
      * TF weight conversion almost ready
      
      * Stop ignoring the decoder weights in PT
      
      * Add tests and lots of fixes
      
      * fix-copies
      
      * Fix imports, add model docs
      
      * Add get_vocab() to tokenizer
      
      * Fix vocab links for pretrained files
      
      * Allow multiple inputs with a sep
      
      * Use EOS as SEP token because ESM vocab lacks SEP
      
      * Correctly return special tokens mask from ESM tokenizer
      
      * make fixup
      
      * Stop testing unsupported embedding resizing
      
      * Handle TF bias correctly
      
      * Skip all models with slow tokenizers in the token classification test
      
      * Fixing the batch/unbatcher of pipelines to accomodate the `None` being
      
      passed around.
      
      * Fixing pipeline bug caused by slow tokenizer  being different.
      
      * Update src/transformers/models/esm/modeling_tf_esm.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update src/transformers/models/esm/modeling_tf_esm.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update src/transformers/models/esm/modeling_tf_esm.py
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update set_input_embeddings and the copyright notices
      Co-authored-by: default avatarYour Name <you@example.com>
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      3b3024da
  12. 14 Oct, 2022 3 commits
  13. 13 Oct, 2022 1 commit
  14. 12 Oct, 2022 4 commits
  15. 11 Oct, 2022 5 commits
  16. 10 Oct, 2022 1 commit