1. 02 Feb, 2022 4 commits
    • Sylvain Gugger's avatar
      Save code of registered custom models (#15379) · 44b21f11
      Sylvain Gugger authored
      
      
      * Allow dynamic modules to use relative imports
      
      * Work for configs
      
      * Fix last merge conflict
      
      * Save code of registered custom objects
      
      * Map strings to strings
      
      * Fix test
      
      * Add tokenizer
      
      * Rework tests
      
      * Tests
      
      * Ignore fixtures py files for tests
      
      * Tokenizer test + fix collection
      
      * With full path
      
      * Rework integration
      
      * Fix typo
      
      * Remove changes in conftest
      
      * Test for tokenizers
      
      * Add documentation
      
      * Update docs/source/custom_models.mdx
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Add file structure and file content
      
      * Add more doc
      
      * Style
      
      * Update docs/source/custom_models.mdx
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Address review comments
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      44b21f11
    • Nicolas Patry's avatar
      Adding support for `microphone` streaming within pipeline. (#15046) · 623d8cb4
      Nicolas Patry authored
      
      
      * Adding support for `microphone` streaming within pipeline.
      
      - Uses `ffmpeg` to get microphone data.
      - Makes sure alignment is made to `size_of_sample`.
      - Works by sending `{"raw": ..data.., "stride": (n, left, right),
      "partial": bool}`
      directly to the pipeline enabling to stream partial results and still
      get inference.
      - Let's `partial` information flow through the pipeline to enable caller
        to get it back and choose to display text or not.
      
      - The striding reconstitution is bound to have errors since CTC does not
      keep previous state. Currently most of the errors are we don't know if
      there's a space or not between two chunks.
      Since we have some left striding info, we could use that during decoding
      to choose what to do with those spaces and even extra letters maybe (if
      the stride is long enough, it's bound to cover at least a few symbols)
      
      Fixing tests.
      
      Protecting with `require_torch`.
      
      `raw_ctc` support for nicer demo.
      
      Post rebase fixes.
      
      Revamp to split raw_mic_data from it's live chunking.
      
      - Requires a refactor to make everything a bit cleaner.
      
      Automatic resampling.
      
      Small fix.
      
      Small fix.
      
      * Post rebase fix (need to let super handle more logic, reorder args.)
      
      * Update docstrings
      
      * Docstring format.
      
      * Remove print.
      
      * Prevent flow of `input_values`.
      
      * Fixing `stride` too.
      
      * Fixing the PR by removing `raw_ctc`.
      
      * Better docstrings.
      
      * Fixing init.
      
      * Update src/transformers/pipelines/audio_utils.py
      Co-authored-by: default avatarAnton Lozhkov <aglozhkov@gmail.com>
      
      * Update tests/test_pipelines_automatic_speech_recognition.py
      Co-authored-by: default avatarAnton Lozhkov <aglozhkov@gmail.com>
      
      * Quality.
      Co-authored-by: default avatarAnton Lozhkov <aglozhkov@gmail.com>
      623d8cb4
    • Patrick von Platen's avatar
    • NielsRogge's avatar
      Add option to resize like torchvision's Resize (#15419) · 1d94d575
      NielsRogge authored
      * Add torchvision's resize
      
      * Rename torch_resize to default_to_square
      
      * Apply suggestions from code review
      
      * Add support for default_to_square and tuple of length 1
      1d94d575
  2. 01 Feb, 2022 4 commits
    • SaulLu's avatar
      fix the `tokenizer_config.json` file for the slow tokenizer when a fast... · 7b8bdd86
      SaulLu authored
      fix the `tokenizer_config.json` file for the slow tokenizer when a fast version is available (#15319)
      
      * add new test
      
      * update test
      
      * remove `tokenizer_file` from `additional_files_names` in `tokenization_utils_base.py`
      
      * add `tokenizer_file` for the fast only tokenizer
      
      * change global variables layoutxml
      
      * remove `"tokenizer_file"` from DPR tokenizer's Global variables
      
      * remove `tokenizer_file` from herbert slow tokenizer init
      
      * `"tokenizer_file"` from LED tokenizer's Global variables
      
      * remove `tokenizer_file` from mbart slow tokenizer init
      
      * remove `tokenizer_file` from slow tokenizer template
      
      * adapt to versioning
      
      * adapt the `test_tokenizer_mismatch_warning` test
      
      * clean test
      
      * clarify `VOCAB_FILES_NAMES` in tokenization_utils_fast.py
      
      * Revert "remove `tokenizer_file` from mbart slow tokenizer init"
      
      This reverts commit 0dbb723fa9c7599d4640fe30b3647a74eb4a64e1.
      
      * Revert "`"tokenizer_file"` from LED tokenizer's Global variables"
      
      This reverts commit 5a3f879bdd651233f3d74a3d1146c34cde82b0c2.
      
      * Revert "remove `tokenizer_file` from herbert slow tokenizer init"
      
      This reverts commit f5e10007b7b0ec5345e015b9de7ffec72c5407fd.
      
      * Revert "remove `"tokenizer_file"` from DPR tokenizer's Global variables"
      
      This reverts commit da0895330bedfafc81ae3073470a9348c669f032.
      
      * set `tokenizer_file` in super `__init__` of mbart
      7b8bdd86
    • SaulLu's avatar
      replace assert with exception for padding_side arg in `PreTrainedTokenizerBase` `__init__` (#15454) · 6d585fe0
      SaulLu authored
      * replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`
      
      * add test
      
      * fix kwargs
      
      * reformat test
      
      * format
      
      * format
      
      * fix typo to render the documentation
      6d585fe0
    • Yih-Dar's avatar
      Fix TF Causal LM models' returned logits (#15256) · dc05dd53
      Yih-Dar authored
      
      
      * Fix TF Causal LM models' returned logits
      
      * Fix expected shape in the tests
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      dc05dd53
    • Yih-Dar's avatar
  3. 31 Jan, 2022 6 commits
  4. 29 Jan, 2022 2 commits
  5. 28 Jan, 2022 2 commits
    • Suraj Patil's avatar
      Add XGLM models (#14876) · d25e25ee
      Suraj Patil authored
      
      
      * add xglm
      
      * update vocab size
      
      * fix model name
      
      * style and tokenizer
      
      * typo
      
      * no mask token
      
      * fix pos embed compute
      
      * fix args
      
      * fix tokenizer
      
      * fix positions
      
      * fix tokenization
      
      * style and dic fixes
      
      * fix imports
      
      * add fast tokenizer
      
      * update names
      
      * add pt tests
      
      * fix tokenizer
      
      * fix typo
      
      * fix tokenizer import
      
      * fix fast tokenizer
      
      * fix tokenizer
      
      * fix converter
      
      * add tokenizer test
      
      * update checkpoint names
      
      * fix tokenizer tests
      
      * fix slow tests
      
      * add copied from comments
      
      * rst -> mdx
      
      * flax model
      
      * update flax tests
      
      * quality
      
      * style
      
      * doc
      
      * update index and readme
      
      * fix copies
      
      * fix doc
      
      * update toctrr
      
      * fix indent
      
      * minor fixes
      
      * fix config doc
      
      * don't save embed_pos weights
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * address Sylvains commnets, few doc fixes
      
      * fix check_repo
      
      * align order of arguments
      
      * fix copies
      
      * fix labels
      
      * remove unnecessary mapping
      
      * fix saving tokenizer
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      d25e25ee
    • Nicolas Patry's avatar
      Fixing support `batch_size` and `num_return_Sequences` in `text-generation` pipeline (#15318) · 06107541
      Nicolas Patry authored
      * Fixing support `batch_size` and `num_return_Sequences` in
      `text-generation` pipeline
      
      And `text2text-generation` too.
      
      The bug was caused by the batch_size containing both the incoming batch
      **and** the generated `num_sequences`.
      
      The fix simply consists into splitting both of these again into
      different dimensions.
      
      * TF support.
      
      * Odd backward compatibility script in the way.
      06107541
  6. 27 Jan, 2022 2 commits
  7. 26 Jan, 2022 1 commit
  8. 25 Jan, 2022 2 commits
    • NielsRogge's avatar
      [Tests] Fix test (#15324) · 637e8175
      NielsRogge authored
      * Fix Swin device
      
      * Remove print statement
      637e8175
    • Sylvain Gugger's avatar
      Avoid using get_list_of_files (#15287) · e6954707
      Sylvain Gugger authored
      * Avoid using get_list_of_files in config
      
      * Wip, change tokenizer file getter
      
      * Remove call in tokenizer files
      
      * Remove last call to get_list_model_files
      
      * Better tests
      
      * Unit tests for new function
      
      * Document bad API
      e6954707
  9. 24 Jan, 2022 3 commits
    • Sylvain Gugger's avatar
      Add model like (#14992) · 81156d20
      Sylvain Gugger authored
      
      
      * Add new model like command
      
      * Bad doc-styler
      
      * black and doc-styler, stop fighting!
      
      * black and doc-styler, stop fighting!
      
      * At last
      
      * Clean up
      
      * Typo
      
      * Bad doc-styler
      
      * Bad doc-styler
      
      * All good maybe?
      
      * Use constants
      
      * Add doc and type hints
      
      * More cleaning
      
      * Add doc
      
      * Fix Copied from
      
      * Doc template
      
      * Use typing.Pattern instead
      
      * Framework-specific files
      
      * Fixes
      
      * Select frameworks clean model init
      
      * Deal with frameworks in main init
      
      * fixes
      
      * Last fix
      
      * Prompt user for info
      
      * Delete exemple config
      
      * Last fixes
      
      * Add test config
      
      * Fix bug with model_type included in each other
      
      * Fixes
      
      * More fixes
      
      * More fixes
      
      * Adapt config
      
      * Remove print statements
      
      * Will fix tokenization later, leave it broken for now
      
      * Add test
      
      * Quality
      
      * Try this way
      
      * Debug
      
      * Maybe by setting the path?
      
      * Let's try another way
      
      * It should go better when actually passing the arg...
      
      * Remove debug statements and style
      
      * Fix config
      
      * Add tests
      
      * Test require the three backends
      
      * intermediate commit
      
      * Revamp pattern replacements and start work on feature extractors
      
      * Adapt model info
      
      * Finalize code for processors
      
      * Fix in main init additions
      
      * Finish questionnaire for processing classes
      
      * Fix file name
      
      * Fix for real
      
      * Fix patterns
      
      * Style
      
      * Remove needless warnings
      
      * Copied from should work now.
      
      * Include Copied form in blocks
      
      * Add test
      
      * More fixes and tests
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Address review comment
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      81156d20
    • Patrick von Platen's avatar
      [Beam Search] Correct returned beam scores (#14654) · 8d6acc6c
      Patrick von Platen authored
      * better
      
      * save intermediate
      
      * finish code
      
      * up
      
      * docs
      
      * Apply suggestions from code review
      
      * up
      
      * add compute transition  beam scores function to model and make sure scores are correct with eos
      
      * apply nicos comments
      
      * Apply suggestions from code review
      
      * another fix
      8d6acc6c
    • Patrick von Platen's avatar
      [LayoutLMV2 Tests] Make sure input is on GPU (#15314) · dcaa5100
      Patrick von Platen authored
      * [LayoutLMV2 Tests] Make sure input is on GPU
      
      * correct empty line
      dcaa5100
  10. 21 Jan, 2022 3 commits
  11. 20 Jan, 2022 1 commit
  12. 19 Jan, 2022 5 commits
    • jsnfly's avatar
      Fix usage of additional kwargs in `from_encoder_decoder_pretrained` in... · baf1ebe9
      jsnfly authored
      
      Fix usage of additional kwargs in `from_encoder_decoder_pretrained` in encoder-decoder models (#15056)
      
      * [EncoderDecoder] Add test for usage of extra kwargs
      
      * [EncoderDecoder] Fix usage of extra kwargs in from pretrained
      
      * [EncoderDecoder] apply suggested changes (passing **kwargs_encoder)
      
      * [EncoderDecoder] create new test function and make sure it passes
      Co-authored-by: default avatarjonas <jsnfly@gmx.de>
      baf1ebe9
    • Nicolas Patry's avatar
      Make chuking smartly (long files) work on asr ctc_with_lm. (#15219) · 3fefee99
      Nicolas Patry authored
      
      
      * [WIP] Make chuking smartly (long files) work on asr ctc_with_lm.
      
      * Slow test with functionality.
      
      * Fixing regular test.
      
      * fix for batch size 1
      
      * Handling batch outside `rescale_Stride`.
      
      - Renamed to `rescale_stride`.
      
      * Disable equality in the test.
      
      * Remove print.
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      3fefee99
    • NielsRogge's avatar
      Add ViLT (#14895) · ac227093
      NielsRogge authored
      
      
      * First commit
      
      * Add conversion script
      
      * Make conversion script work for base model
      
      * More improvements
      
      * Update conversion script, works for vqa
      
      * Add indexing argument to meshgrid
      
      * Make conversion script work for ViltForPreTraining
      
      * Add ViltForPreTraining to docs
      
      * Fix device issue
      
      * Add processor
      
      * Add MinMaxResize to feature extractor
      
      * Implement call method of ViltProcessor
      
      * Fix tests
      
      * Add integration test
      
      * Add loss calculation for VQA
      
      * Improve tests
      
      * Improve some more tests
      
      * Debug tests
      
      * Small improvements
      
      * Add support for attention_mask
      
      * Remove mask_it
      
      * Add pixel_mask
      
      * Add tests for ViltFeatureExtractor
      
      * Improve tests
      
      * Add ViltForNaturalLanguageVisualReasoning
      
      * Add ViltForNaturalLanguageVisualReasoning to conversion script
      
      * Minor fixes
      
      * Add support for image_embeds, update docstrings to markdown
      
      * Update docs to markdown
      
      * Improve conversion script
      
      * Rename ViltForPreTraining to ViltForMaskedLM
      
      * Improve conversion script
      
      * Convert docstrings to markdown
      
      * Fix code example of retrieval model
      
      * Properly convert masked language model
      
      * Add integration test for nlvr
      
      * Fix code quality
      
      * Apply suggestions from code review
      
      * Add copied from statements
      
      * Fix pretrained_config_archive_map
      
      * Fix docs
      
      * Add model to README
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Apply more suggestions from code review
      
      * Make code more readable
      
      * Add ViltForNaturalLanguageVisualReasoning to the tests
      
      * Rename ViltForVisualQuestionAnswering to ViltForQuestionAnswering
      
      * Replace pixel_values_2 by single tensor
      
      * Add hidden_states and attentions
      
      * Fix one more test
      
      * Fix all tests
      
      * Update year
      
      * Fix rebase issues
      
      * Fix another rebase issue
      
      * Remove ViltForPreTraining from auto mapping
      
      * Rename ViltForImageRetrievalTextRetrieval to ViltForImageAndTextRetrieval
      
      * Make it possible to use BertTokenizerFast in the processor
      
      * Use BertTokenizerFast by default
      
      * Rename ViltForNaturalLanguageVisualReasoning, define custom model output
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      ac227093
    • Li-Huai (Allan) Lin's avatar
      Add FastTokenizer to REALM (#15211) · 841d9791
      Li-Huai (Allan) Lin authored
      * Remove BertTokenizer abstraction
      
      * Add FastTokenizer to REALM
      
      * Fix config archive map
      
      * Fix copies
      
      * Update realm.mdx
      
      * Apply suggestions from code review
      841d9791
    • Matt's avatar
      Rename compute_loss in TF models (#15207) · 2708bfa1
      Matt authored
      * Rename compute_loss to hf_compute_loss to avoid conflicts with the new Keras method
      
      * make style
      
      * Adding deprecation warning to `compute_loss`
      
      * Fix sneaky reference to compute_loss
      
      * Replace logger.warning with warnings.warn
      
      * Clarifying warning and deprecation timeline
      2708bfa1
  13. 18 Jan, 2022 5 commits
    • Jake Tae's avatar
      Enable tqdm toggling (#15167) · fe78fe98
      Jake Tae authored
      
      
      * feature: enable tqdm toggle
      
      * test: add tqdm unit test
      
      * style: run linter
      
      * Update tests/test_tqdm_utils.py
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * refactor: use tiny model, run linter
      
      * docs: add tqdm to logging
      
      * docs: add tqdm reference to `http_get`
      
      * style: run linter
      
      * Update docs/source/main_classes/logging.mdx
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * fix: use `AutoConfig` for framework agnostic testing
      
      * chore: mv tqdm test to `test_logging.py`
      
      * feature: implement enable/disable functions
      
      * docs: mv docstring to comment
      
      * chore: mv tqdm functions to `logging.py`
      
      * docs: update docs to reference `enable/disable` funcs
      
      * test: update test to use `enable/disable` func
      
      * chore: update function reference in comment
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      fe78fe98
    • matt's avatar
      1a354d53
    • matt's avatar
      Fix a sneaky reference to compute_loss in the tests · 2085f209
      matt authored
      2085f209
    • NielsRogge's avatar
      Add MAE (#15120) · 74bec986
      NielsRogge authored
      * First draft
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * Fix embeddings
      
      * Add conversion script
      
      * Finish conversion script
      
      * More improvements
      
      * Fix forward pass
      
      * Remove print statements
      
      * Add weights initialization
      
      * Add initialization of decoder weights
      
      * Add support for other models in the conversion script
      
      * Fix patch_size for huge model
      
      * Fix most of the tests
      
      * Fix integration test
      
      * Fix docs
      
      * Fix archive_list
      
      * Apply suggestions from code review
      
      * Improve documentation
      
      * Apply more suggestions
      
      * Skip some tests due to non-deterministic behaviour
      
      * Fix test_initialization
      
      * Remove unneccessary initialization of nn.Embedding
      
      * Improve docs
      
      * Fix dummies
      
      * Remove ViTMAEFeatureExtractor from docs
      
      * Add model to README and table of contents
      
      * Delete inference file
      74bec986
    • Patrick von Platen's avatar
      [ASR pipeline] correct with lm pipeline (#15200) · 497346d0
      Patrick von Platen authored
      * [ASR pipeline] correct with lm pipeline
      
      * improve error
      497346d0