- 31 Aug, 2020 10 commits
-
-
Sylvain Gugger authored
* Split the run_hp_search by backend * Unused import
-
krfricke authored
* Introduce HPO checkpointing for PBT * Moved checkpoint saving * Fixed checkpoint subdir pass * Fixed style * Enable/disable checkpointing, check conditions for various tune schedulers incl. PBT * Adjust number of GPUs to number of jobs * Avoid mode pickling in ray * Move hp search to integrations
-
Sam Shleifer authored
-
Jin Young (Daniel) Sohn authored
* Only access loss tensor every logging_steps * tensor.item() was being called every step. This must not be done for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU communication at each step. On RoBERTa MLM for example, it reduces step time by 30%, should be larger for smaller step time models/tasks. * Train batch size was not correct in case a user uses the `per_gpu_train_batch_size` flag * Avg reduce loss accross eval shards * Fix style (#6803) * t5 model should make decoder_attention_mask (#6800) * [s2s] Test hub configs in self-scheduled CI (#6809) * [s2s] round runtime in run_eval (#6798) * Pegasus finetune script: add --adafactor (#6811) * [bart] rename self-attention -> attention (#6708) * [tests] fix typos in inputs (#6818) * Fixed open in colab link (#6825) * Add model card for singbert lite. Update widget for singbert and singbert-large. (#6827) * BR_BERTo model card (#6793) * clearly indicate shuffle=False (#6312) * Clarify shuffle * clarify shuffle Co-authored-by:
Kevin Canwen Xu <canwenxu@126.com> * [s2s README] Add more dataset download instructions (#6737) * Style * Patch logging issue * Set default logging level to `WARNING` instead of `INFO` * TF Flaubert w/ pre-norm (#6841) * Dataset and DataCollator for BERT Next Sentence Prediction (NSP) task (#6644) * add datacollator and dataset for next sentence prediction task * bug fix (numbers of special tokens & truncate sequences) * bug fix (+ dict inputs support for data collator) * add padding for nsp data collator; renamed cached files to avoid conflict. * add test for nsp data collator * Style Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr> * Fix in Adafactor docstrings (#6845) * Fix resuming training for Windows (#6847) * Only access loss tensor every logging_steps * tensor.item() was being called every step. This must not be done for XLA:TPU tensors as it's terrible for performance causing TPU<>CPU communication at each step. On RoBERTa MLM for example, it reduces step time by 30%, should be larger for smaller step time models/tasks. * Train batch size was not correct in case a user uses the `per_gpu_train_batch_size` flag * Avg reduce loss accross eval shards * comments Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> Co-authored-by:
Stas Bekman <stas00@users.noreply.github.com> Co-authored-by:
Thomas Ashish Cherian <6967017+PandaWhoCodes@users.noreply.github.com> Co-authored-by:
Zane Lim <zyuanlim@gmail.com> Co-authored-by:
Rodolfo De Nadai <rdenadai@gmail.com> Co-authored-by:
xujiaze13 <37360975+xujiaze13@users.noreply.github.com> Co-authored-by:
Kevin Canwen Xu <canwenxu@126.com> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Huang Lianzhe <hlz@pku.edu.cn> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Sylvain Gugger authored
-
Sylvain Gugger authored
-
Huang Lianzhe authored
* add datacollator and dataset for next sentence prediction task * bug fix (numbers of special tokens & truncate sequences) * bug fix (+ dict inputs support for data collator) * add padding for nsp data collator; renamed cached files to avoid conflict. * add test for nsp data collator * Style Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre <lysandre.debut@reseau.eseo.fr>
-
Lysandre Debut authored
-
Lysandre authored
-
Lysandre authored
-
- 30 Aug, 2020 6 commits
-
-
Sam Shleifer authored
-
xujiaze13 authored
* Clarify shuffle * clarify shuffle Co-authored-by:Kevin Canwen Xu <canwenxu@126.com>
-
Rodolfo De Nadai authored
-
Zane Lim authored
-
Thomas Ashish Cherian authored
-
Stas Bekman authored
-
- 29 Aug, 2020 3 commits
-
-
Sam Shleifer authored
-
Sam Shleifer authored
-
Sam Shleifer authored
-
- 28 Aug, 2020 9 commits
-
-
Sam Shleifer authored
-
Sam Shleifer authored
-
Sam Shleifer authored
-
Sam Shleifer authored
* broken test * batch parity * tests pass * boom boom * boom boom * split out bart tokenizer tests * fix tests * boom boom * Fixed dataset bug * Fix marian * Undo extra * Get marian working * Fix t5 tok tests * Test passing * Cleanup * better assert msg * require torch * Fix mbart tests * undo extra decoder_attn_mask change * Fix import * pegasus tokenizer can ignore src_lang kwargs * unused kwarg test cov * boom boom * add todo for pegasus issue * cover one word translation edge case * Cleanup * doc
-
RafaelWO authored
* Improved tokenization with sacremoses * The TransfoXLTokenizer is now using sacremoses for tokenization * Added tokenization of comma-separated and floating point numbers. * Removed prepare_for_tokenization() from tokenization_transfo_xl.py because punctuation is handled by sacremoses * Added corresponding tests * Removed test comapring TransfoXLTokenizer and TransfoXLTokenizerFast * Added deprecation warning to TransfoXLTokenizerFast * isort change Co-authored-by:
Teven <teven.lescao@gmail.com> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
Ahmed Elnaggar authored
-
Stas Bekman authored
`make style` with `black` < 20.8b1 is a no go (in case some other package forced a lower version) - so make it explicit to avoid confusion
-
Sam Shleifer authored
-
Stas Bekman authored
-
- 27 Aug, 2020 12 commits
-
-
Lysandre authored
-
Stas Bekman authored
* [doc] multiple corrections to "Summary of the tasks" * add a new "docs" target to validate docs and document it * fix mixup
-
Stas Bekman authored
* [test schedulers] small improvement * cleanup
-
Stas Bekman authored
* [testing] replace hardcoded paths to allow running tests from anywhere * fix the merge conflict
-
Sam Shleifer authored
-
Tom Grek authored
-
Lysandre authored
-
Julien Plu authored
* Align TF NER example over the PT one * Fix Dataset call * Fix gradient accumulation training * Apply style * Address Sylvain's comments * Address Sylvain's comments * Apply style
-
Lysandre Debut authored
-
Nikolai Yakovenko authored
* AdaFactor optimizer ported from fairseq. Tested for T5 finetuning and MLM -- reduced memory consumption compared to ADAM. * update PR fixes, add basic test * bug -- incorrect params in test * bugfix -- import Adafactor into test * bugfix -- removed accidental T5 include * resetting T5 to master * bugfix -- include Adafactor in __init__ * longer loop for adafactor test * remove double error class declare * lint * black * isort * Update src/transformers/optimization.py Co-authored-by:
Sam Shleifer <sshleifer@gmail.com> * single docstring * Cleanup docstring Co-authored-by:
Nikolai Y <nikolai.yakovenko@point72.com> Co-authored-by:
Sam Shleifer <sshleifer@gmail.com>
-
Sam Shleifer authored
-
Ahmed Elnaggar authored
-