- 29 Mar, 2021 7 commits
-
-
pcuenca authored
A new argument `length_column_name` has been added to `TrainingArguments`, with default value `"length"`. If this column exists and `group_by_length` is `True`, the train sampler will use it for grouping rather than computing it before training starts. This is an optimization that allows the user to prepare data for fast processing, preventing sequential access to the dataset as described in issue #10909.
-
Sylvain Gugger authored
-
Daniel Stancl authored
* Add NER example with accelerate library * This commit contains the first (yet really unfinished) version of a script for showing how to train HuggingFace model with their new accelerate library. * Fix metric calculation * make style quality * mv ner_no_trainer to token-classification dir * Delete --debug flag from running script * hf_datasets -> raw_datasets * Make a few slight adjustments * Add an informative comment + rewrite a help comment * Change header * Fix a few things * Enforce to use fast tokenizers only * DataCollatorWithPadding -> DataCollatorForTokenClassification * Change bash script: python3 -> accelerate launch * make style * Add a few missing things (see below) * Add a max-lenghth padding to predictions and labels to enable accelerate gather functionality * Add PyTorch no trainer example to the example README.md * Remove --do-train from args as being redundant for now * DataCollatorWithPadding -> DataCollatorForTokenClassification * Remove some obsolete args.do_train conditions from the script * Delete --do_train from bash running script * Delete use_slow_tokenizer from args * Add unintentionally removed flag --label_all_tokens * Delete --debug flag from running script
-
Sylvain Gugger authored
* Instantiate model only once in pipeline * Remove documentation of deprecated method * Add FutureWarning * Update src/transformers/pipelines/base.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Masatoshi Suzuki authored
-
WybeKoper authored
Co-authored-by:WybeKoper <WybeKoper@users.noreply.github.com>
-
Guillaume Filion authored
-
- 28 Mar, 2021 1 commit
-
-
Bhadresh Savani authored
-
- 26 Mar, 2021 3 commits
-
-
Sylvain Gugger authored
* Add ImageFeatureExtractionMixin * Add dummy vision objects * Add require_vision * Add tests * Fix test
-
Tomy Hsieh authored
* Rename NLP library to Datasets library * Update github template * Fix styling
-
- 25 Mar, 2021 7 commits
-
-
lexhuismans authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Amir Tahmasbi authored
* Added embeddings layer * Added layoutlm layers, main model, maskedlm and token classification classes * Added model classes to tf auto models * Added model to PT to TF conversion script * Added model to doc README * Added tests * Removed unused imports * Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py * Made tests pass! * Fixed typos in imports and docs * Fixed a typo in embeddings layer * Removed imports * Fixed formatting issues, imports, tests * Added layoutlm layers, main model, maskedlm and token classification classes * Added model classes to tf auto models * Added model to PT to TF conversion script * Removed unused imports * Added layoutlm model, test, and doc for sequence classification, and fix imports in __init__.py * Made tests pass! * Fixed typos in imports and docs * Removed imports * Fixed small formatting issues * Removed duplicates import from main __init__.py * Chnaged deafult arg to true for adding pooling layer to tf layoutlm * Fixed formatting issues * Style * Added copied from to classes copied from bert * Fixed doc strings examples to work with layoutlm inputs * Removed PyTorch reference in doc strings example * Added integration tests * Cleaned up initialization file * Updated model checkpoint identifiers * Fixed imports Co-authored-by:
Amir Tahmasbi <amir@ehsai.ca> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
Philipp Schmid authored
-
Jethro Kuan authored
Use the correct variable (raw_datasets) instead of the module (datasets) where appropriate.
-
- 24 Mar, 2021 6 commits
-
-
Sidd Karamcheti authored
-
Sylvain Gugger authored
* Remove version warning in pretrained BART models * Put it at the base model
-
Lysandre Debut authored
* Removes overflowing bad word IDs * Raise warning
-
Eliza Szczechla authored
Co-authored-by:Eliza <eliza@habanero.tiger.com.pl>
-
imzhengzx authored
the orignal code in line 246 is ``` tokenizer: Optional["PreTrainedTokenizerBase"] = None, ``` it should be ``` tokenizer: Optional[PreTrainedTokenizerBase] = None, ```
-
Sylvain Gugger authored
-
- 23 Mar, 2021 12 commits
-
-
Sylvain Gugger authored
-
Philipp Schmid authored
* rewrote is_sagemaker_model_parallel_available * added is_sagemaker_model_parallel_available to SageMakerTrainer * removed unnecessary mp_parameters as TrainingArguments * make style happy * added mp_parameters again to parse mp-specific args.
-
RafaelWO authored
-
Bhadresh Savani authored
* added predict stage * added test keyword in exception message * removed example specific saving predictions * fixed f-string error * removed extra line Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com>
-
Stas Bekman authored
* import refactor * fix the fallback
-
Lysandre authored
-
Philipp Schmid authored
* added finished documentation * changed version from 1.6 to 1.6.0 for distributed * updated versions * updated urls
-
Sylvain Gugger authored
-
Marta Ma艣lankowska authored
-
Bhadresh Savani authored
-
Stas Bekman authored
-
Sylvain Gugger authored
-
- 22 Mar, 2021 4 commits
-
-
Patrick von Platen authored
* push * finish * finish * make fix copies * change name
-
Eliza Szczechla authored
Co-authored-by:Eliza <eliza@habanero.tiger.com.pl>
-
Ruan Chaves authored
* Modify the _hp_search_setup method on the Trainer class to handle the wandb argument passed by Ray Tune to model config. * Reformat single quotes as double quotes.
-
Boris Dayma authored
* feat: ensure unique artifact id * feat: allow manual init * fix: simplify reinit logic * fix: no dropped value + immediate commits * fix: wandb use in sagemaker * docs: improve documenation and formatting * fix: typos * docs: improve formatting
-