- 30 Mar, 2020 10 commits
-
-
Sam Shleifer authored
-
dougian authored
Co-authored-by:Ioannis Douratsos <ioannisd@amazon.com>
-
Julien Chaumond authored
-
Julien Plu authored
* Update the NER TF script to remove the softmax and make the pad token label id to -1 * Reformat the quality and style Co-authored-by:Julien Plu <julien.plu@adevinta.com>
-
LysandreJik authored
-
LysandreJik authored
-
LysandreJik authored
-
Patrick von Platen authored
-
Patrick von Platen authored
* make decoder input ids optional for t5 training * lm_lables should not be shifted in t5 * add tests * finish shift right functionality for PT T5 * move shift right to correct class * cleaner code * replace -100 values with pad token id * add assert statement * remove unnecessary for loop * make style
-
Patrick von Platen authored
* Add clear description of how to train T5 * correct docstring in T5 * correct typo * correct docstring format * update t5 model docs * implement collins feedback * fix typo and add more explanation for sentinal tokens * delete unnecessary todos
-
- 29 Mar, 2020 2 commits
-
-
Sam Shleifer authored
-
Sam Shleifer authored
-
- 27 Mar, 2020 10 commits
-
-
Stefan Schweter authored
-
Patrick von Platen authored
* force bleu * fix wrong file name * rename file * different filenames for each example test * test files should clean up after themselves * test files should clean up after themselves * do not force bleu * correct typo * fix isort
-
Patrick von Platen authored
-
Funtowicz Morgan authored
* Use tokenizer.num_added_tokens to count number of added special_tokens instead of hardcoded numbers. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co> * run_ner.py - Do not add a label to the labels_ids if word_tokens is empty. This can happen when using bert-base-multilingual-cased with an input containing an unique space. In this case, the tokenizer will output just an empty word_tokens thus leading to an non-consistent behavior over the labels_ids tokens adding one more tokens than tokens vector. Signed-off-by:
Morgan Funtowicz <morgan@huggingface.co>
-
Patrick von Platen authored
-
Patrick von Platen authored
* add t5 docs basis * improve docs * add t5 docs * improve t5 docstring * add t5 tokenizer docstring * finish docstring * make style * add pretrained models * correct typo * make examples work * finalize docs
-
Lysandre Debut authored
T5-small in test isort
-
LysandreJik authored
For some reason Sphinx extremely dislikes this and crashes.
-
Sam Shleifer authored
-
Manuel Romero authored
-
- 26 Mar, 2020 15 commits
-
-
Sam Shleifer authored
* trim seq_len below 1024 if there are columns full of pad_token_id * Centralize trim_batch so SummarizationDataset can use it too
-
Sam Shleifer authored
-
Sam Shleifer authored
* Dummy inputs to model.device * Move self.device to ModuleUtilsMixin
-
Sam Shleifer authored
-
Sam Shleifer authored
* delete lm_head, skips weight tying * Fixed s3
-
Patrick von Platen authored
* add translation example * make style * adapt docstring * add gpu device as input for example * small renaming * better README
-
Patrick von Platen authored
-
Patrick von Platen authored
* rebase to master * change tf to pytorch * change to pytorch * small fix * renaming * add gpu training possibility * renaming * improve README * incoorporate collins feedback * better Readme * better README.md
-
sakares saengkaew authored
* Add the missing token classification for XLM * fix styling * Add XLMForTokenClassification to AutoModelForTokenClassification class * Fix docstring typo for non-existing class * Add the missing token classification for XLM * fix styling * fix styling * Add XLMForTokenClassification to AutoModelForTokenClassification class * Fix docstring typo for non-existing class * Add missing description for AlbertForTokenClassification * fix styling * Add missing docstring for AlBert * Slow tests should be slow Co-authored-by:
Sakares Saengkaew <s.sakares@gmail.com> Co-authored-by:
LysandreJik <lysandre.debut@reseau.eseo.fr>
-
Patrick von Platen authored
-
Manuel Romero authored
-
Patrick von Platen authored
* fix merge conflicts * add t5 summarization example * change parameters for t5 summarization * make style * add first code snippet for translation * only add prefixes * add prefix patterns * make style * renaming * fix conflicts * remove unused patterns * solve conflicts * fix merge conflicts * remove translation example * remove summarization example * make sure tensors are in numpy for float comparsion * re-add t5 config * fix t5 import config typo * make style * remove unused numpy statements * update doctstring * import translation pipeline
-
HUSEIN ZOLKEPLI authored
* add bert bahasa readme * update readme * update readme * added xlnet
-
Patrick von Platen authored
* solve conflicts * move warnings below * incorporate changes * add pad_to_max_length to pipelines * add bug fix for T5 beam search * add prefix patterns * make style * fix conflicts * adapt pipelines for task specific parameters * improve docstring * remove unused patterns
-
Lysandre Debut authored
-
- 25 Mar, 2020 3 commits
-
-
Travis McGuire authored
-
Patrick von Platen authored
* add new default configs * change prefix default to None
-
Julien Chaumond authored
* [ci] Also run test_examples in py37 (will revert at the end of the experiment) * InputExample: use immutable dataclass * [deps] Install dataclasses for Py<3.7 * [skip ci] Revert "[ci] Also run test_examples in py37" This reverts commit d29afd9959786b77759b0b8fa4e6b4335b952015.
-