Commits · 023f0f3708f73e4fdffb92505296cd7d3d928aef · chenpangpang / transformers

22 Oct, 2020 1 commit
- [s2s trainer] tests to use distributed on multi-gpu machine (#7965) · 023f0f37
  Stas Bekman authored Oct 22, 2020
  
  023f0f37
21 Oct, 2020 1 commit
- [seq2seq testing] multigpu test run via subprocess (#7281) · 8b381733
  Stas Bekman authored Oct 21, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
  8b381733
16 Oct, 2020 1 commit
- [seq2seq] get_git_info fails gracefully (#7843) · 2255c2c7
  Stas Bekman authored Oct 15, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
  2255c2c7
04 Oct, 2020 1 commit
- [s2s] add config params like Dropout in Seq2SeqTrainingArguments (#7532) · 99cb924b
  Suraj Patil authored Oct 04, 2020
  
  99cb924b
01 Oct, 2020 2 commits
- [examples/s2s] clean up finetune_trainer (#7509) · 72d363d9
  Suraj Patil authored Oct 01, 2020
  
  72d363d9
- [s2sTrainer] test + code cleanup (#7467) · 48f23f92
  Sam Shleifer authored Oct 01, 2020
  
  48f23f92
30 Sep, 2020 1 commit
- Seq2SeqDataset: avoid passing src_lang everywhere (#7470) · c031d010
  Amanpreet Singh authored Sep 30, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
  c031d010
27 Sep, 2020 1 commit
- [s2s] rougeLSum expects \n between sentences (#7410) · 7296fea1
  Sam Shleifer authored Sep 27, 2020
```
Co-authored-by: Swetha Mandava <smandava@nvidia.com>
```
  7296fea1
24 Sep, 2020 1 commit
- [s2s] distributed eval allows num_return_sequences > 1 (#7254) · d9d0f114
  Sam Shleifer authored Sep 24, 2020
  
  d9d0f114
21 Sep, 2020 1 commit
- [s2s] save hostname with repo info (#7301) · 656c27c3
  Sam Shleifer authored Sep 21, 2020
```
* save hostname
```
  656c27c3
17 Sep, 2020 1 commit
- [s2s] dynamic batch size with --max_tokens_per_batch (#7030) · a5638b2b
  Sam Shleifer authored Sep 17, 2020
  
  a5638b2b
16 Sep, 2020 2 commits
- [s2s] distributed eval cleanup (#7186) · 0203ad43
  Sam Shleifer authored Sep 16, 2020
  
  0203ad43
- [s2s run_eval] new features (#7109) · fdaf8ab3
  Stas Bekman authored Sep 16, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
  fdaf8ab3
14 Sep, 2020 2 commits
- [s2s] distributed eval in one command (#7124) · 33d479d2
  Sam Shleifer authored Sep 14, 2020
  
  33d479d2
- [s2s] distributed eval cleanup (#7110) · de9e2979
  Sam Shleifer authored Sep 13, 2020
  
  de9e2979
13 Sep, 2020 1 commit
- [s2s] two stage run_distributed_eval.py (#7105) · e7f8d2ab
  Sam Shleifer authored Sep 13, 2020
  
  e7f8d2ab
10 Sep, 2020 1 commit
- [wip/s2s] DistributedSortishSampler (#7056) · 77950c48
  Sam Shleifer authored Sep 10, 2020
  
  77950c48
04 Sep, 2020 1 commit
- [s2s] run_eval.py parses generate_kwargs (#6948) · a4fc0c80
  Sam Shleifer authored Sep 04, 2020
  
  a4fc0c80
28 Aug, 2020 1 commit

prepare_seq2seq_batch makes labels/ decoder_input_ids made later. (#6654) · 9336086a

Sam Shleifer authored Aug 28, 2020

* broken test

* batch parity

* tests pass

* boom boom

* boom boom

* split out bart tokenizer tests

* fix tests

* boom boom

* Fixed dataset bug

* Fix marian

* Undo extra

* Get marian working

* Fix t5 tok tests

* Test passing

* Cleanup

* better assert msg

* require torch

* Fix mbart tests

* undo extra decoder_attn_mask change

* Fix import

* pegasus tokenizer can ignore src_lang kwargs

* unused kwarg test cov

* boom boom

* add todo for pegasus issue

* cover one word translation edge case

* Cleanup

* doc

9336086a

26 Aug, 2020 1 commit
- Black 20 release · a75c64d8
  Lysandre authored Aug 26, 2020
  
  a75c64d8
25 Aug, 2020 1 commit
- [s2s] round bleu, rouge to 4 digits (#6704) · 0344428f
  Sam Shleifer authored Aug 25, 2020
  
  0344428f
13 Aug, 2020 1 commit
- Mult rouge by 100: standard units (#6359) · e92efcf7
  Sam Shleifer authored Aug 13, 2020
  
  e92efcf7
11 Aug, 2020 1 commit
- rename prepare_translation_batch -> prepare_seq2seq_batch (#6103) · be1520d3
  Sam Shleifer authored Aug 11, 2020
  
  be1520d3
08 Aug, 2020 1 commit
- [s2s] fix label_smoothed_nll_loss (#6344) · 9bed3554
  Suraj Patil authored Aug 08, 2020
  
  9bed3554
06 Aug, 2020 1 commit
- [s2s]Use prepare_translation_batch for Marian finetuning (#6293) · 2804fff8
  Sam Shleifer authored Aug 06, 2020
```
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
```
  2804fff8
28 Jul, 2020 2 commits
- [s2s] Delete useless method, log tokens_per_batch (#6081) · dafa296c
  Sam Shleifer authored Jul 28, 2020
  
  dafa296c
- MBART: support summarization tasks where max_src_len > max_tgt_len (#6003) · 3c7fbf35
  Sam Shleifer authored Jul 28, 2020
```
* MBART: support summarization tasks

* fix test

* Style

* add tokenizer test
```
  3c7fbf35
21 Jul, 2020 1 commit
- [examples/seq2seq]: add --label_smoothing option (#5919) · 5b193b39
  Sam Shleifer authored Jul 21, 2020
  
  5b193b39
18 Jul, 2020 1 commit
- Seq2SeqDataset uses linecache to save memory by @Pradhy729 (#5792) · 09a2f406
  Sam Shleifer authored Jul 18, 2020
```
Co-authored-by: Pradhy729 <49659913+Pradhy729@users.noreply.github.com>
```
  09a2f406
17 Jul, 2020 1 commit
- [seq2seq] Don't copy self.source in sortishsampler (#5818) · e238e3d5
  Sam Shleifer authored Jul 17, 2020
  
  e238e3d5
15 Jul, 2020 2 commits
- [fix] check code quality (#5772) · 1a647abf
  Sam Shleifer authored Jul 15, 2020
  
  1a647abf
- [cleanup] T5 test, warnings (#5761) · d0486c8b
  Sam Shleifer authored Jul 15, 2020
  
  d0486c8b
07 Jul, 2020 1 commit

Add mbart-large-cc25, support translation finetuning (#5129) · 353b8f1e

Sam Shleifer authored Jul 07, 2020

improve unittests for finetuning, especially w.r.t testing frozen parameters
fix freeze_embeds for T5
add streamlit setup.cfg

353b8f1e

26 Jun, 2020 2 commits
- examples/seq2seq/run_eval.py fixes and docs (#5322) · 393b8dc0
  Sam Shleifer authored Jun 26, 2020
  
  393b8dc0
- [tokenizers] Updates data processors, docstring, examples and model cards to the new API (#5308) · 601d4d69
  Thomas Wolf authored Jun 26, 2020
```
* remove references to old API in docstring - update data processors

* style

* fix tests - better type checking error messages

* better type checking

* include awesome fix by @LysandreJik for #5310

* updated doc and examples
```
  601d4d69
25 Jun, 2020 1 commit
- examples/seq2seq supports translation (#5202) · 40457bce
  Sam Shleifer authored Jun 24, 2020
  
  40457bce
23 Jun, 2020 1 commit
- Upgrade examples to pl=0.8.1(#5146) · f5c2a122
  Sam Shleifer authored Jun 22, 2020
  
  f5c2a122
19 Jun, 2020 1 commit
- [cleanup] remove redundant code in SummarizationDataset (#5119) · 2db1e2f4
  Sam Shleifer authored Jun 18, 2020
  
  2db1e2f4
17 Jun, 2020 1 commit
- [examples] SummarizationModule improvements (#4951) · 043f9f51
  Sam Shleifer authored Jun 17, 2020
  
  043f9f51
15 Jun, 2020 1 commit

[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized... · 36434220

Anthony MOI authored Jun 15, 2020


[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)

* Use tokenizers pre-tokenized pipeline

* failing pretrokenized test

* Fix is_pretokenized in python

* add pretokenized tests

* style and quality

* better tests for batched pretokenized inputs

* tokenizers clean up - new padding_strategy - split the files

* [HUGE] refactoring tokenizers - padding - truncation - tests

* style and quality

* bump up requied tokenizers version to 0.8.0-rc1

* switched padding/truncation API - simpler better backward compat

* updating tests for custom tokenizers

* style and quality - tests on pad

* fix QA pipeline

* fix backward compatibility for max_length only

* style and quality

* Various cleans up - add verbose

* fix tests

* update docstrings

* Fix tests

* Docs reformatted

* __call__ method documented
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

36434220