- 16 Jun, 2020 2 commits
-
-
Sam Shleifer authored
-
Sylvain Gugger authored
* Convert hans to Trainer * Tick box
-
- 15 Jun, 2020 3 commits
-
-
Anthony MOI authored
[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510) * Use tokenizers pre-tokenized pipeline * failing pretrokenized test * Fix is_pretokenized in python * add pretokenized tests * style and quality * better tests for batched pretokenized inputs * tokenizers clean up - new padding_strategy - split the files * [HUGE] refactoring tokenizers - padding - truncation - tests * style and quality * bump up requied tokenizers version to 0.8.0-rc1 * switched padding/truncation API - simpler better backward compat * updating tests for custom tokenizers * style and quality - tests on pad * fix QA pipeline * fix backward compatibility for max_length only * style and quality * Various cleans up - add verbose * fix tests * update docstrings * Fix tests * Docs reformatted * __call__ method documented Co-authored-by:
Thomas Wolf <thomwolf@users.noreply.github.com> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
Sylvain Gugger authored
* Make DataCollator a callable * Update src/transformers/data/data_collator.py Co-authored-by:Julien Chaumond <chaumond@gmail.com>
-
Stefan Schweter authored
* utils_ner: do not add extra sep token for RoBERTa model * run_pl_ner: do not add extra sep token for RoBERTa model
-
- 13 Jun, 2020 1 commit
-
-
Sylvain Gugger authored
* Update hans data to be able to use Trainer * Fixes * Deal with tokenizer that don't have token_ids * Clean up things * Simplify data use * Fix the input dict * Formatting + proper path in README
-
- 11 Jun, 2020 1 commit
-
-
VictorSanh authored
-
- 10 Jun, 2020 1 commit
-
-
Sylvain Gugger authored
* Remove unused arguments * Formatting * Remove second todo comment
-
- 09 Jun, 2020 3 commits
-
-
songyouwei authored
`is_leaf` may become `False` after `.to(device=device)` function call.
-
Sam Shleifer authored
-
Amil Khare authored
-
- 08 Jun, 2020 1 commit
-
-
daniel-shan authored
Co-authored-by:Daniel Shan <daniel.shan@workday.com>
-
- 06 Jun, 2020 1 commit
-
-
- 05 Jun, 2020 2 commits
-
-
Sam Shleifer authored
-
Julien Chaumond authored
-
- 04 Jun, 2020 3 commits
-
-
Stefan Schweter authored
* ner: add preprocessing script for examples that splits longer sentences * ner: example shell scripts use local preprocessing now * ner: add new example section for WNUT’17 NER task. Remove old English CoNLL-03 results * ner: satisfy black and isort
-
prajjwal1 authored
-
Jason Phang authored
-
- 02 Jun, 2020 4 commits
-
-
Jin Young Sohn authored
* Glue task cleaup * Enable writing cache to cache_dir in case dataset lives in readOnly filesystem. * Differentiate match vs mismatch for MNLI metrics. * Style * Fix pytype * Fix type * Use cache_dir in mnli mismatch eval dataset * Small Tweaks Co-authored-by:Julien Chaumond <chaumond@gmail.com>
-
Julien Chaumond authored
*
🐛 Fix model ids for BART and Flaubert -
Julien Chaumond authored
* Kill model archive maps * Fixup * Also kill model_archive_map for MaskedBertPreTrainedModel * Unhook config_archive_map * Tokenizers: align with model id changes * make style && make quality * Fix CI
-
Lysandre Debut authored
-
- 01 Jun, 2020 15 commits
-
-
Victor SANH authored
-
Victor SANH authored
-
Victor SANH authored
Co-authored-by:Julien Chaumond <chaumond@gmail.com>
-
Victor SANH authored
Co-authored-by:Julien Chaumond <chaumond@gmail.com>
-
Victor SANH authored
-
Victor SANH authored
-
Victor SANH authored
-
Victor SANH authored
-
Victor SANH authored
-
Victor SANH authored
-
Victor SANH authored
-
Victor SANH authored
-
Victor SANH authored
-
Victor SANH authored
-
Victor SANH authored
-
- 27 May, 2020 3 commits
-
-
Patrick von Platen authored
* improve memory benchmarking * correct typo * fix current memory * check torch memory allocated * better pytorch function * add total cached gpu memory * add total gpu required * improve torch gpu usage * update memory usage * finalize memory tracing * save intermediate benchmark class * fix conflict * improve benchmark * improve benchmark * finalize * make style * improve benchmarking * correct typo * make train function more flexible * fix csv save * better repr of bytes * better print * fix __repr__ bug * finish plot script * rename plot file * delete csv and small improvements * fix in plot * fix in plot * correct usage of timeit * remove redundant line * remove redundant line * fix bug * add hf parser tests * add versioning and platform info * make style * add gpu information * ensure backward compatibility * finish adding all tests * Update src/transformers/benchmark/benchmark_args.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/benchmark/benchmark_args_utils.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * delete csv files * fix isort ordering * add out of memory handling * add better train memory handling Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Lysandre Debut authored
* per_device instead of per_gpu/error thrown when argument unknown * [docs] Restore examples.md symlink * Correct absolute links so that symlink to the doc works correctly * Update src/transformers/hf_argparser.py Co-authored-by:
Julien Chaumond <chaumond@gmail.com> * Warning + reorder * Docs * Style * not for squad Co-authored-by:
Julien Chaumond <chaumond@gmail.com>
-
Hao Tan authored
The option `--do_lower_case` is currently required by the uncased models (i.e., bert-base-uncased, bert-large-uncased). Results: BERT-BASE without --do_lower_case: 'exact': 73.83, 'f1': 82.22 BERT-BASE with --do_lower_case: 'exact': 81.02, 'f1': 88.34
-