1. 09 Feb, 2022 5 commits
  2. 08 Feb, 2022 2 commits
  3. 07 Feb, 2022 6 commits
  4. 04 Feb, 2022 2 commits
  5. 03 Feb, 2022 3 commits
  6. 02 Feb, 2022 7 commits
    • CHI LIU's avatar
      Correct eos_token_id settings in generate (#15403) · 5ec368d7
      CHI LIU authored
      * Correct eos_token_id set in generate
      
      * Set eos_token_id in test
      
      * Correct eos_token_id set in generate
      
      * Set eos_token_id in test
      5ec368d7
    • SaulLu's avatar
      fix set truncation attribute in `__init__` of `PreTrainedTokenizerBase` (#15456) · 39b5d1a6
      SaulLu authored
      
      
      * change truncation_side in init of `PreTrainedTokenizerBase`
      Co-authored-by: default avatarLSinev <LSinev@users.noreply.github.com>
      
      * add test
      
      * Revert "replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`"
      
      This reverts commit 7a98b87962d2635c7e4d4f00db3948b694624843.
      
      * fix kwargs
      
      * Revert "fix kwargs"
      
      This reverts commit 67b0a5270e8cf1dbf70e6b0232e94c0452b6946f.
      
      * Update tests/test_tokenization_common.py
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      
      * delete truncation_side variable
      
      * reorganize test
      
      * format
      
      * complete doc
      
      * Revert "Revert "replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`""
      
      This reverts commit d5a10a7e2680539e5d9e98ae5d896c893d224b80.
      
      * fix typo
      
      * fix typos to render documentation
      
      * Revert "Revert "Revert "replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`"""
      
      This reverts commit 16cf58811943a08f43409a7c83eaa330686591d0.
      
      * format
      Co-authored-by: default avatarLSinev <LSinev@users.noreply.github.com>
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      39b5d1a6
    • Ayush Chaurasia's avatar
      Add W&B backend for hyperparameter sweep (#14582) · c74f3d4c
      Ayush Chaurasia authored
      # Add support for W&B hyperparameter sweep
      This PR:
      * allows using wandb for running hyperparameter search.
      * The runs are visualized on W&B sweeps dashboard
      * This supports runnning sweeps on parallel devices, all reporting to the same central dashboard.
      
      ### Usage
      **To run new a hyperparameter search:**
      ```
      trainer.hyperparameter_search(
          backend="wandb", 
          project="transformers_sweep", # name of the project
          n_trials=5,
          metric="eval/loss", # metric to be optimized, default 'eval/loss'. A warning is raised if the passed metric is not found
      )
      ```
      This outputs a sweep id. Eg. `my_project/sweep_id`
      
      **To run sweeps on parallel devices:**
      Just pass sweep id which you want to run parallel
      ```
      trainer.hyperparameter_search(
          backend="wandb", 
          sweep_id = "my_project/sweep_id"
      )
      ```
      c74f3d4c
    • Sylvain Gugger's avatar
      Save code of registered custom models (#15379) · 44b21f11
      Sylvain Gugger authored
      
      
      * Allow dynamic modules to use relative imports
      
      * Work for configs
      
      * Fix last merge conflict
      
      * Save code of registered custom objects
      
      * Map strings to strings
      
      * Fix test
      
      * Add tokenizer
      
      * Rework tests
      
      * Tests
      
      * Ignore fixtures py files for tests
      
      * Tokenizer test + fix collection
      
      * With full path
      
      * Rework integration
      
      * Fix typo
      
      * Remove changes in conftest
      
      * Test for tokenizers
      
      * Add documentation
      
      * Update docs/source/custom_models.mdx
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Add file structure and file content
      
      * Add more doc
      
      * Style
      
      * Update docs/source/custom_models.mdx
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      
      * Address review comments
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      44b21f11
    • Nicolas Patry's avatar
      Adding support for `microphone` streaming within pipeline. (#15046) · 623d8cb4
      Nicolas Patry authored
      
      
      * Adding support for `microphone` streaming within pipeline.
      
      - Uses `ffmpeg` to get microphone data.
      - Makes sure alignment is made to `size_of_sample`.
      - Works by sending `{"raw": ..data.., "stride": (n, left, right),
      "partial": bool}`
      directly to the pipeline enabling to stream partial results and still
      get inference.
      - Let's `partial` information flow through the pipeline to enable caller
        to get it back and choose to display text or not.
      
      - The striding reconstitution is bound to have errors since CTC does not
      keep previous state. Currently most of the errors are we don't know if
      there's a space or not between two chunks.
      Since we have some left striding info, we could use that during decoding
      to choose what to do with those spaces and even extra letters maybe (if
      the stride is long enough, it's bound to cover at least a few symbols)
      
      Fixing tests.
      
      Protecting with `require_torch`.
      
      `raw_ctc` support for nicer demo.
      
      Post rebase fixes.
      
      Revamp to split raw_mic_data from it's live chunking.
      
      - Requires a refactor to make everything a bit cleaner.
      
      Automatic resampling.
      
      Small fix.
      
      Small fix.
      
      * Post rebase fix (need to let super handle more logic, reorder args.)
      
      * Update docstrings
      
      * Docstring format.
      
      * Remove print.
      
      * Prevent flow of `input_values`.
      
      * Fixing `stride` too.
      
      * Fixing the PR by removing `raw_ctc`.
      
      * Better docstrings.
      
      * Fixing init.
      
      * Update src/transformers/pipelines/audio_utils.py
      Co-authored-by: default avatarAnton Lozhkov <aglozhkov@gmail.com>
      
      * Update tests/test_pipelines_automatic_speech_recognition.py
      Co-authored-by: default avatarAnton Lozhkov <aglozhkov@gmail.com>
      
      * Quality.
      Co-authored-by: default avatarAnton Lozhkov <aglozhkov@gmail.com>
      623d8cb4
    • Patrick von Platen's avatar
    • NielsRogge's avatar
      Add option to resize like torchvision's Resize (#15419) · 1d94d575
      NielsRogge authored
      * Add torchvision's resize
      
      * Rename torch_resize to default_to_square
      
      * Apply suggestions from code review
      
      * Add support for default_to_square and tuple of length 1
      1d94d575
  7. 01 Feb, 2022 4 commits
    • SaulLu's avatar
      fix the `tokenizer_config.json` file for the slow tokenizer when a fast... · 7b8bdd86
      SaulLu authored
      fix the `tokenizer_config.json` file for the slow tokenizer when a fast version is available (#15319)
      
      * add new test
      
      * update test
      
      * remove `tokenizer_file` from `additional_files_names` in `tokenization_utils_base.py`
      
      * add `tokenizer_file` for the fast only tokenizer
      
      * change global variables layoutxml
      
      * remove `"tokenizer_file"` from DPR tokenizer's Global variables
      
      * remove `tokenizer_file` from herbert slow tokenizer init
      
      * `"tokenizer_file"` from LED tokenizer's Global variables
      
      * remove `tokenizer_file` from mbart slow tokenizer init
      
      * remove `tokenizer_file` from slow tokenizer template
      
      * adapt to versioning
      
      * adapt the `test_tokenizer_mismatch_warning` test
      
      * clean test
      
      * clarify `VOCAB_FILES_NAMES` in tokenization_utils_fast.py
      
      * Revert "remove `tokenizer_file` from mbart slow tokenizer init"
      
      This reverts commit 0dbb723fa9c7599d4640fe30b3647a74eb4a64e1.
      
      * Revert "`"tokenizer_file"` from LED tokenizer's Global variables"
      
      This reverts commit 5a3f879bdd651233f3d74a3d1146c34cde82b0c2.
      
      * Revert "remove `tokenizer_file` from herbert slow tokenizer init"
      
      This reverts commit f5e10007b7b0ec5345e015b9de7ffec72c5407fd.
      
      * Revert "remove `"tokenizer_file"` from DPR tokenizer's Global variables"
      
      This reverts commit da0895330bedfafc81ae3073470a9348c669f032.
      
      * set `tokenizer_file` in super `__init__` of mbart
      7b8bdd86
    • SaulLu's avatar
      replace assert with exception for padding_side arg in `PreTrainedTokenizerBase` `__init__` (#15454) · 6d585fe0
      SaulLu authored
      * replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`
      
      * add test
      
      * fix kwargs
      
      * reformat test
      
      * format
      
      * format
      
      * fix typo to render the documentation
      6d585fe0
    • Yih-Dar's avatar
      Fix TF Causal LM models' returned logits (#15256) · dc05dd53
      Yih-Dar authored
      
      
      * Fix TF Causal LM models' returned logits
      
      * Fix expected shape in the tests
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      dc05dd53
    • Yih-Dar's avatar
  8. 31 Jan, 2022 6 commits
  9. 29 Jan, 2022 2 commits
  10. 28 Jan, 2022 2 commits
    • Suraj Patil's avatar
      Add XGLM models (#14876) · d25e25ee
      Suraj Patil authored
      
      
      * add xglm
      
      * update vocab size
      
      * fix model name
      
      * style and tokenizer
      
      * typo
      
      * no mask token
      
      * fix pos embed compute
      
      * fix args
      
      * fix tokenizer
      
      * fix positions
      
      * fix tokenization
      
      * style and dic fixes
      
      * fix imports
      
      * add fast tokenizer
      
      * update names
      
      * add pt tests
      
      * fix tokenizer
      
      * fix typo
      
      * fix tokenizer import
      
      * fix fast tokenizer
      
      * fix tokenizer
      
      * fix converter
      
      * add tokenizer test
      
      * update checkpoint names
      
      * fix tokenizer tests
      
      * fix slow tests
      
      * add copied from comments
      
      * rst -> mdx
      
      * flax model
      
      * update flax tests
      
      * quality
      
      * style
      
      * doc
      
      * update index and readme
      
      * fix copies
      
      * fix doc
      
      * update toctrr
      
      * fix indent
      
      * minor fixes
      
      * fix config doc
      
      * don't save embed_pos weights
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * address Sylvains commnets, few doc fixes
      
      * fix check_repo
      
      * align order of arguments
      
      * fix copies
      
      * fix labels
      
      * remove unnecessary mapping
      
      * fix saving tokenizer
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      d25e25ee
    • Nicolas Patry's avatar
      Fixing support `batch_size` and `num_return_Sequences` in `text-generation` pipeline (#15318) · 06107541
      Nicolas Patry authored
      * Fixing support `batch_size` and `num_return_Sequences` in
      `text-generation` pipeline
      
      And `text2text-generation` too.
      
      The bug was caused by the batch_size containing both the incoming batch
      **and** the generated `num_sequences`.
      
      The fix simply consists into splitting both of these again into
      different dimensions.
      
      * TF support.
      
      * Odd backward compatibility script in the way.
      06107541
  11. 27 Jan, 2022 1 commit