Commits · 5ff0d6d7d091921bda00a84448d14c87c0c10379 · chenpangpang / transformers

25 Sep, 2020 7 commits
- Update README.md · 5ff0d6d7
  Patrick von Platen authored Sep 25, 2020
  
  5ff0d6d7
- [RAG] Fix retrieval offset in RAG's HfIndex and better integration tests (#7372) · cf1c88e0
  Quentin Lhoest authored Sep 25, 2020
```
* Fix retrieval offset in RAG's HfIndex

* update slow tests

* style

* fix new test

* style

* add better tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
```
  cf1c88e0
- [Rag] Fix wrong usage of `num_beams` and `bos_token_id` in Rag Sequence generation (#7386) · 571c7a11
  Patrick von Platen authored Sep 25, 2020
```
* fix_rag_sequence

* add second bug fix
```
  571c7a11
- doc changes (#7385) · 415071b4
  Suraj Patil authored Sep 25, 2020
  
  415071b4
- [RAG] Add missing doc and attention_mask to rag (#7382) · 2dd652d7
  Patrick von Platen authored Sep 25, 2020
```
* add docs

* add missing docs and attention_mask in fine-tune
```
  2dd652d7
- Check config type using `type` instead of `isinstance` (#7363) · 7cdd9da5
  Lysandre Debut authored Sep 25, 2020
```
* Check config type instead of instance


Bad merge

* Remove for loops

* Style
```
  7cdd9da5
- modeling_bart: 3 small cleanups that dont change outputs (#7381) · 3c6bf899
  Sam Shleifer authored Sep 25, 2020
```
* Mbart passing

* boom boom

* cleaner assert

* add assert

* Fix tests
```
  3c6bf899
24 Sep, 2020 15 commits
- Seq2SeqTrainer (#6769) · 9e68d075
  Suraj Patil authored Sep 25, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
  9e68d075
- [s2s] distributed eval allows num_return_sequences > 1 (#7254) · d9d0f114
  Sam Shleifer authored Sep 24, 2020
  
  d9d0f114
- correct attention mask (#7373) · 0804d077
  Patrick von Platen authored Sep 24, 2020
  
  0804d077
- [fsmt] build/test scripts (#7257) · a8cbc426
  Stas Bekman authored Sep 24, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
  a8cbc426
- Remove mentions of RAG from the docs (#7376) · a8e7982f
  Sylvain Gugger authored Sep 24, 2020
```
* Remove mentions of  RAG from the docs

* Deactivate check
```
  a8e7982f
- [seq2seq] make it easier to run the scripts (#7274) · eadd870b
  Stas Bekman authored Sep 24, 2020
  
  eadd870b
- Formatter (#7368) · 8d3bb781
  Lysandre Debut authored Sep 24, 2020
```
* Formatter

* Docs
```
  8d3bb781
- Fixing case in which `Trainer` hung while saving model in distributed training (#7365) · 7dfdf793
  Teven authored Sep 24, 2020
```
* remote debugging

* remote debugging

* moved _store_flos call

* moved _store_flos call

* moved _store_flos call

* removed debugging artefacts
```
  7dfdf793
- Clean RAG docs and template docs (#7348) · 0ccb6f5c
  Sylvain Gugger authored Sep 24, 2020
```
* Clean RAG docs and template docs

* Fix typo

* Better doc
```
  0ccb6f5c
- Make PyTorch model files independent from each other (#7352) · 27174bd4
  Sylvain Gugger authored Sep 24, 2020
  
  27174bd4
- Update the TF models to remove their interdependencies (#7238) · d161ed16
  Julien Plu authored Sep 24, 2020
```
* Refacto the models to remove their interdependencies

* Fix Flaubert model

* Fix Flaubert

* Fix XLM

* Fix Albert

* Fix Roberta

* Fix Albert

* Fix Flaubert

* Apply style + remove unused imports

* Fix Distilbert

* remove unused import

* fix Distilbert

* Fix Flaubert

* Apply style

* Fix Flaubert

* Add the copy comments for the check_copies script

* Fix MobileBert model name

* Address Morgan's comments

* Fix typo

* Oops typo
```
  d161ed16
- Updata tokenization_auto.py (#6870) · 0cffa424
  Jabin Huang authored Sep 24, 2020
```
Updata tokenization_auto.py to handle Inherited tokenizer
```
  0cffa424
- Update modeling_tf_longformer.py (#7359) · 03fb8e79
  Daquan Lin authored Sep 24, 2020
```
correct a very small mistake
```
  03fb8e79
- Check decorator order (#7326) · 1ff5bd38
  Sylvain Gugger authored Sep 24, 2020
```
* Check decorator order

* Adapt for parametrized decorators

* Fix typos
```
  1ff5bd38
- Expand a bit the documentation doc (#7350) · 0be5f4a0
  Sylvain Gugger authored Sep 24, 2020
  
  0be5f4a0
23 Sep, 2020 8 commits

wip: Code to add lang tags to marian model cards (#6586) · 38f17037
Sam Shleifer authored Sep 23, 2020

38f17037

Remove reference to args in XLA check (#7344) · 129fdae0

Theo Linnemann authored Sep 23, 2020

Previously, the TFTrainingArguments object did a check to see if XLA was enabled, but did this by referencing `self.args.xla`, when it should be `self.xla`, because it is the args object. This can be verified a few lines above, where the XLA field is set.

129fdae0

[Benchmarks] Change all args to from `no_...` to their positive form (#7075) · d2666136

Felipe Curti authored Sep 23, 2020



* Changed name to all no_... arguments and all references to them, inverting the boolean condition

* Change benchmark tests to use new Benchmark Args

* Update src/transformers/benchmark/benchmark_args_utils.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/benchmark/benchmark.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix Style. Add --no options in help

* fix some part of tests

* Update src/transformers/benchmark/benchmark_args_utils.py

* Update src/transformers/benchmark/benchmark_args_utils.py

* Update src/transformers/benchmark/benchmark_args_utils.py

* fix all tests

* make style

* add backwards compability

* make backwards compatible
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: fmcurti <fcurti@DESKTOP-RRQURBM.localdomain>

d2666136

Ensure that integrations are imported before transformers or ml libs (#7330) · 8c697d58

Doug Blank authored Sep 23, 2020

* Ensure that intergrations are imported before transformers or ml libs

* Black reformatter wanted a newline

* isort requests

* black requests

* flake8 requests

8c697d58

Models doc (#7345) · 3323146e

Sylvain Gugger authored Sep 23, 2020



* Clean up model documentation

* Formatting

* Preparation work

* Long lines

* Main work on rst files

* Cleanup all config files

* Syntax fix

* Clean all tokenizers

* Work on first models

* Models beginning

* FaluBERT

* All PyTorch models

* All models

* Long lines again

* Fixes

* More fixes

* Update docs/source/model_doc/bert.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/electra.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Last fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

3323146e

Fixed evaluation_strategy on epoch end bug (#7340) · 58405a52

Wissam Antoun authored Sep 23, 2020

* Fixed evaluation_strategy on epoch end bug

move the evaluation script outside the the iteration loop

* black formatting

58405a52

[testing] skip decorators: docs, tests, bugs (#7334) · 28cf8730

Stas Bekman authored Sep 23, 2020

* skip decorators: docs, tests, bugs

* another important note

* style

* bloody style

* add @pytest.mark.parametrize

* add note

* no idea what it wants :(

28cf8730

[code quality] fix confused flake8 (#7309) · df536438

Stas Bekman authored Sep 22, 2020

* fix confused flake

We run `black  --target-version py35 ...` but flake8 doesn't know that, so currently with py38 flake8 fails suggesting that black should have reformatted 63 files. Indeed if I run:

```
black --line-length 119 --target-version py38 examples templates tests src utils
```
it indeed reformats 63 files.

The only solution I found is to create a black config file as explained at https://github.com/psf/black#configuration-format, which is what this PR adds.

Now flake8 knows that py35 is the standard and no longer gets confused regardless of the user's python version.

* adjust the other files that will now rely on black's config file

df536438

22 Sep, 2020 10 commits

[s2s] only save metrics.json from rank zero (#7331) · 78387cc6
Sam Shleifer authored Sep 22, 2020

78387cc6
[s2s] add src_lang kwarg for distributed eval (#7300) · e53138a1
Sam Shleifer authored Sep 22, 2020

e53138a1
[model_cards] blinoff/roberta-base-russian-v0 (#7317) · a9c7849c
blinovpd authored Sep 23, 2020

a9c7849c
Formatting · f5518e56
Sylvain Gugger authored Sep 22, 2020

f5518e56

Add num workers cli arg (#7322) · 17099ebd

Chady Kamar authored Sep 22, 2020

* Add dataloader_num_workers to TrainingArguments

This argument is meant to be used to set the
number of workers for the PyTorch DataLoader.

* Pass num_workers argument on DataLoader init

17099ebd

[s2s] add supported architecures to MD (#7252) · 25b0463d
Sam Shleifer authored Sep 22, 2020

25b0463d
Fixed results of SQuAD-FR evaluation (#7313) · d6bc72c4
Pavel Soriano authored Sep 22, 2020
```
The score for the F1 metric was reported as the Exact Match and vice-versa.
```
d6bc72c4

[Bug Fix] The actual batch_size is inconsistent with the settings. (#7235) · 6303b5a7

Huang Lianzhe authored Sep 23, 2020



* [bug fix] fixed the bug that the actual batch_size is inconsistent with the parameter settings

* reformat

* reformat

* reformat

* add support for dict and BatchEncoding

* add support for dict and BatchEncoding

* add documentation for DataCollatorForNextSentencePrediction

* Some more nits for the docstring
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Some more nits for the docstring
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* rename variables
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

6303b5a7

RAG (#6813) · c754c41c

Ola Piktus authored Sep 22, 2020

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* Formatting / renaming prior to actual work

* First commit

* improve comments

* Retrieval evaluation scripts

* refactor to include modeling outputs + MPI retriever

* Fix rag-token model + refactor

* Various fixes + finetuning logic

* use_bos fix

* Retrieval refactor

* Finetuning refactoring and cleanup

* Add documentation and cleanup

* Remove set_up_rag_env.sh file

* Fix retrieval wit HF index

* Fix import errors

* Fix quality errors

* Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867



* fix quality

* Fix RAG Sequence generation

* minor cleanup plus initial tests

* fix test

* fix tests 2

* Comments fix

* post-merge fixes

* Improve readme + post-rebase refactor

* Extra dependencied for tests

* Fix tests

* Fix tests 2

* Refactor test requirements

* Fix tests 3

* Post-rebase refactor

* rename nlp->datasets

* RAG integration tests

* add tokenizer to slow integration test and allow retriever to run on cpu

* add tests; fix position ids warning

* change structure

* change structure

* add from encoder generator

* save working solution

* make all integration tests pass

* add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained

* don't save paths

* delete unnecessary imports

* pass config to AutoTokenizer.from_pretrained for Rag tokenizers

* init wiki_dpr only once

* hardcode legacy index and passages paths (todo: add the right urls)

* finalize config

* finalize retriver api and config api

* LegacyIndex index download refactor

* add dpr to autotokenizer

* make from pretrained more flexible

* fix ragfortokengeneration

* small name changes in tokenizer

* add labels to models

* change default index name

* add retrieval tests

* finish token generate

* align test with previous version and make all tests pass

* add tests

* finalize tests

* implement thoms suggestions

* add first version of test

* make first tests work

* make retriever platform agnostic

* naming

* style

* add legacy index URL

* docstrings + simple retrieval test for distributed

* clean model api

* add doc_ids to retriever's outputs

* fix retrieval tests

* finish model outputs

* finalize model api

* fix generate problem for rag

* fix generate for other modles

* fix some tests

* save intermediate

* set generate to default

* big refactor generate

* delete rag_api

* correct pip faiss install

* fix auto tokenization test

* fix faiss install

* fix test

* move the distributed logic to examples

* model page

* docs

* finish tests

* fix dependencies

* fix import in __init__

* Refactor eval_rag and finetune scripts

* start docstring

* add psutil to test

* fix tf test

* move require torch to top

* fix retrieval test

* align naming

* finish automodel

* fix repo consistency

* test ragtokenizer save/load

* add rag model output docs

* fix ragtokenizer save/load from pretrained

* fix tokenizer dir

* remove torch in retrieval

* fix docs

* fixe finetune scripts

* finish model docs

* finish docs

* remove auto model for now

* add require torch

* remove solved todos

* integrate sylvains suggestions

* sams comments

* correct mistake on purpose

* improve README

* Add generation test cases

* fix rag token

* clean token generate

* fix test

* add note to test

* fix attention mask

* add t5 test for rag

* Fix handling prefix in finetune.py

* don't overwrite index_name
Co-authored-by: Patrick Lewis <plewis@fb.com>
Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>

c754c41c

Mark big downloads slow (#7325) · 1ee2194f

Sylvain Gugger authored Sep 22, 2020

* Make big downloads as slow

* Add import

* Right order for slow decorator

* More slow tests

1ee2194f