- 04 Feb, 2022 5 commits
-
-
Sylvain Gugger authored
* Standardize instance segmentation models outputs * Rename output * Update src/transformers/modeling_outputs.py Co-authored-by:
NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Add legacy argument to the config and model forward * Update src/transformers/models/beit/modeling_beit.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Copy fix in Segformer Co-authored-by:
NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Stas Bekman authored
-
Yih-Dar authored
* fix * fix test * remove expected_num_hidden_layers Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Sanchit Gandhi authored
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 03 Feb, 2022 8 commits
-
-
Yih-Dar authored
* Remove return_loss from Flax models * fix more * fix Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Stas Bekman authored
-
davidleonfdez authored
* Add preprocess_logits_for_metrics Trainer param * Compute accuracy in LM examples * Improve comments
-
Stas Bekman authored
* [deepspeed] fix a bug in a test * consistency
-
NielsRogge authored
* Add general docstrings * Remove legacy docstrings * Add BEiT * Add DEiT * Add SegFormer * Fix beit output class * Fix missing return_dict
-
Patrick von Platen authored
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 02 Feb, 2022 12 commits
-
-
CHI LIU authored
* Correct eos_token_id set in generate * Set eos_token_id in test * Correct eos_token_id set in generate * Set eos_token_id in test
-
SaulLu authored
* change truncation_side in init of `PreTrainedTokenizerBase` Co-authored-by:
LSinev <LSinev@users.noreply.github.com> * add test * Revert "replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`" This reverts commit 7a98b87962d2635c7e4d4f00db3948b694624843. * fix kwargs * Revert "fix kwargs" This reverts commit 67b0a5270e8cf1dbf70e6b0232e94c0452b6946f. * Update tests/test_tokenization_common.py Co-authored-by:
Nicolas Patry <patry.nicolas@protonmail.com> * delete truncation_side variable * reorganize test * format * complete doc * Revert "Revert "replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`"" This reverts commit d5a10a7e2680539e5d9e98ae5d896c893d224b80. * fix typo * fix typos to render documentation * Revert "Revert "Revert "replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__`""" This reverts commit 16cf58811943a08f43409a7c83eaa330686591d0. * format Co-authored-by:
LSinev <LSinev@users.noreply.github.com> Co-authored-by:
Nicolas Patry <patry.nicolas@protonmail.com>
-
Sylvain Gugger authored
* Playing * Properly set labels in model config for token classification example * Port to run_ner_no_trainer * Quality
-
Ayush Chaurasia authored
# Add support for W&B hyperparameter sweep This PR: * allows using wandb for running hyperparameter search. * The runs are visualized on W&B sweeps dashboard * This supports runnning sweeps on parallel devices, all reporting to the same central dashboard. ### Usage **To run new a hyperparameter search:** ``` trainer.hyperparameter_search( backend="wandb", project="transformers_sweep", # name of the project n_trials=5, metric="eval/loss", # metric to be optimized, default 'eval/loss'. A warning is raised if the passed metric is not found ) ``` This outputs a sweep id. Eg. `my_project/sweep_id` **To run sweeps on parallel devices:** Just pass sweep id which you want to run parallel ``` trainer.hyperparameter_search( backend="wandb", sweep_id = "my_project/sweep_id" ) ``` -
Sylvain Gugger authored
-
bugface authored
* fix error posted in issue #15448 Signed-off-by:
bugface <alexgre@ufl.edu> * clean up - remove commented line Signed-off-by:
bugface <alexgre@ufl.edu>
-
Sylvain Gugger authored
* Allow dynamic modules to use relative imports * Work for configs * Fix last merge conflict * Save code of registered custom objects * Map strings to strings * Fix test * Add tokenizer * Rework tests * Tests * Ignore fixtures py files for tests * Tokenizer test + fix collection * With full path * Rework integration * Fix typo * Remove changes in conftest * Test for tokenizers * Add documentation * Update docs/source/custom_models.mdx Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Add file structure and file content * Add more doc * Style * Update docs/source/custom_models.mdx Co-authored-by:
Suraj Patil <surajp815@gmail.com> * Address review comments Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Suraj Patil <surajp815@gmail.com>
-
Nicolas Patry authored
* Adding support for `microphone` streaming within pipeline. - Uses `ffmpeg` to get microphone data. - Makes sure alignment is made to `size_of_sample`. - Works by sending `{"raw": ..data.., "stride": (n, left, right), "partial": bool}` directly to the pipeline enabling to stream partial results and still get inference. - Let's `partial` information flow through the pipeline to enable caller to get it back and choose to display text or not. - The striding reconstitution is bound to have errors since CTC does not keep previous state. Currently most of the errors are we don't know if there's a space or not between two chunks. Since we have some left striding info, we could use that during decoding to choose what to do with those spaces and even extra letters maybe (if the stride is long enough, it's bound to cover at least a few symbols) Fixing tests. Protecting with `require_torch`. `raw_ctc` support for nicer demo. Post rebase fixes. Revamp to split raw_mic_data from it's live chunking. - Requires a refactor to make everything a bit cleaner. Automatic resampling. Small fix. Small fix. * Post rebase fix (need to let super handle more logic, reorder args.) * Update docstrings * Docstring format. * Remove print. * Prevent flow of `input_values`. * Fixing `stride` too. * Fixing the PR by removing `raw_ctc`. * Better docstrings. * Fixing init. * Update src/transformers/pipelines/audio_utils.py Co-authored-by:Anton Lozhkov <aglozhkov@gmail.com> * Update tests/test_pipelines_automatic_speech_recognition.py Co-authored-by:
Anton Lozhkov <aglozhkov@gmail.com> * Quality. Co-authored-by:
Anton Lozhkov <aglozhkov@gmail.com>
-
Patrick von Platen authored
-
NielsRogge authored
* Add torchvision's resize * Rename torch_resize to default_to_square * Apply suggestions from code review * Add support for default_to_square and tuple of length 1
-
Steven Liu authored
* first draft of pipeline, autoclass, preprocess tutorials * apply review feedback *
馃枍 apply feedback from patrick/niels *馃摑 add output image to preprocessed image *馃枍 apply feedback from patrick -
Steven Liu authored
* add fine-tune tutorial * make edits, fix style *
馃摑 make edits *馃枍 fix code format links to external libraries *馃攧 revert code formatting *馃枍 use DefaultDataCollator instead of DataCollatorWithPadding
-
- 01 Feb, 2022 11 commits
-
-
Sylvain Gugger authored
* Harder check for IndexErrors in QA scripts * Make test stronger
-
Sylvain Gugger authored
-
Suraj Patil authored
* refactor bart tokenizers * doc * replace assert with ValueError
-
Yih-Dar authored
* use mean instead of elementwise_mean * make style Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
SaulLu authored
fix the `tokenizer_config.json` file for the slow tokenizer when a fast version is available (#15319) * add new test * update test * remove `tokenizer_file` from `additional_files_names` in `tokenization_utils_base.py` * add `tokenizer_file` for the fast only tokenizer * change global variables layoutxml * remove `"tokenizer_file"` from DPR tokenizer's Global variables * remove `tokenizer_file` from herbert slow tokenizer init * `"tokenizer_file"` from LED tokenizer's Global variables * remove `tokenizer_file` from mbart slow tokenizer init * remove `tokenizer_file` from slow tokenizer template * adapt to versioning * adapt the `test_tokenizer_mismatch_warning` test * clean test * clarify `VOCAB_FILES_NAMES` in tokenization_utils_fast.py * Revert "remove `tokenizer_file` from mbart slow tokenizer init" This reverts commit 0dbb723fa9c7599d4640fe30b3647a74eb4a64e1. * Revert "`"tokenizer_file"` from LED tokenizer's Global variables" This reverts commit 5a3f879bdd651233f3d74a3d1146c34cde82b0c2. * Revert "remove `tokenizer_file` from herbert slow tokenizer init" This reverts commit f5e10007b7b0ec5345e015b9de7ffec72c5407fd. * Revert "remove `"tokenizer_file"` from DPR tokenizer's Global variables" This reverts commit da0895330bedfafc81ae3073470a9348c669f032. * set `tokenizer_file` in super `__init__` of mbart
-
SaulLu authored
* replace assert with exception for `padding_side` arg in `PreTrainedTokenizerBase` `__init__` * add test * fix kwargs * reformat test * format * format * fix typo to render the documentation
-
Kamal Raj authored
fix typo
-
Suraj Patil authored
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Yih-Dar authored
* Fix TF Causal LM models' returned logits * Fix expected shape in the tests Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 31 Jan, 2022 4 commits
-
-
Stas Bekman authored
-
Suraj Patil authored
-
Sylvain Gugger authored
-
peregilk authored
* Update modeling_wav2vec2.py With very tiny sound files (less than 0.1 seconds) the num_masked_span can be too long. The issue is described in issue #15366 and discussed with @patrickvonplaten. * correct errors with mask time indices * remove bogus file * make fix-copies Co-authored-by:Patrick von Platen <patrick.v.platen@gmail.com>
-