Commits · 4eb61f8e88fafc07b9fa55069616a5fb38e49012 · chenpangpang / transformers

19 Oct, 2020 1 commit
- remove USE_CUDA (#7861) · 4eb61f8e
  Stas Bekman authored Oct 19, 2020
  
  4eb61f8e
18 Oct, 2020 1 commit

[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659) · ba8c4d0a

Thomas Wolf authored Oct 18, 2020

* splitting fast and slow tokenizers [WIP]

* [WIP] splitting sentencepiece and tokenizers dependencies

* update dummy objects

* add name_or_path to models and tokenizers

* prefix added to file names

* prefix

* styling + quality

* spliting all the tokenizer files - sorting sentencepiece based ones

* update tokenizer version up to 0.9.0

* remove hard dependency on sentencepiece 🎉

* and removed hard dependency on tokenizers 🎉



* update conversion script

* update missing models

* fixing tests

* move test_tokenization_fast to main tokenization tests - fix bugs

* bump up tokenizers

* fix bert_generation

* update ad fix several tokenizers

* keep sentencepiece in deps for now

* fix funnel and deberta tests

* fix fsmt

* fix marian tests

* fix layoutlm

* fix squeezebert and gpt2

* fix T5 tokenization

* fix xlnet tests

* style

* fix mbart

* bump up tokenizers to 0.9.2

* fix model tests

* fix tf models

* fix seq2seq examples

* fix tests without sentencepiece

* fix slow => fast  conversion without sentencepiece

* update auto and bert generation tests

* fix mbart tests

* fix auto and common test without tokenizers

* fix tests without tokenizers

* clean up tests lighten up when tokenizers + sentencepiece are both off

* style quality and tests fixing

* add sentencepiece to doc/examples reqs

* leave sentencepiece on for now

* style quality split hebert and fix pegasus

* WIP Herbert fast

* add sample_text_no_unicode and fix hebert tokenization

* skip FSMT example test for now

* fix style

* fix fsmt in example tests

* update following Lysandre and Sylvain's comments

* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

ba8c4d0a

15 Oct, 2020 1 commit
- Typo and fix the input of labels to `cross_entropy` (#7841) · dfa4c26b
  Katarina Slama authored Oct 15, 2020
```
The current version caused some errors. The changes fixed it for me. Hope this is helpful!
```
  dfa4c26b
14 Oct, 2020 1 commit

Add predict step accumulation (#7767) · a1d1b332

Sylvain Gugger authored Oct 14, 2020



* Add eval_accumulation_step and clean distributed eval

* Add TPU test

* Add TPU stuff

* Fix arg name

* Fix Seq2SeqTrainer

* Fix total_size

* Update src/transformers/trainer_pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Doc and add test to TPU

* Add unit test

* Adapt name
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

a1d1b332

13 Oct, 2020 2 commits

fixed lots of typos. (#7758) · 7e73c128
Tiger authored Oct 13, 2020

7e73c128

Gpt1 for sequence classification (#7683) · dcba9ee0

Felipe Curti authored Oct 13, 2020

* Add Documentation for GPT-1 Classification

* Add GPT-1 with Classification head

* Add tests for GPT-1 Classification

* Add GPT-1 For Classification to auto models

* Remove authorized missing keys, change checkpoint to openai-gpt

dcba9ee0

09 Oct, 2020 5 commits
- Fix title level in Blenderbot doc (#7687) · 2c9e83f7
  Sylvain Gugger authored Oct 09, 2020
  
  2c9e83f7
- Better links for models in READMED and doc index (#7680) · a3cea6a8
  Sylvain Gugger authored Oct 09, 2020
  
  a3cea6a8
- Revert "Better model links in the README and index" · bc00b37a
  sgugger authored Oct 09, 2020
```
This reverts commit 76e05518.
```
  bc00b37a
- Better model links in the README and index · 76e05518
  sgugger authored Oct 09, 2020
  
  76e05518
- Update XLM-RoBERTa details (#7669) · 5668fdb0
  Noah Trenaman authored Oct 09, 2020
  
  5668fdb0
08 Oct, 2020 1 commit

Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove... · 9aeacb58

Thomas Wolf authored Oct 08, 2020


Adding Fast tokenizers for SentencePiece based tokenizers - Breaking: remove Transfo-XL fast tokenizer (#7141)

* [WIP] SP tokenizers

* fixing tests for T5

* WIP tokenizers

* serialization

* update T5

* WIP T5 tokenization

* slow to fast conversion script

* Refactoring to move tokenzier implementations inside transformers

* Adding gpt - refactoring - quality

* WIP adding several tokenizers to the fast world

* WIP Roberta - moving implementations

* update to dev4 switch file loading to in-memory loading

* Updating and fixing

* advancing on the tokenizers - updating do_lower_case

* style and quality

* moving forward with tokenizers conversion and tests

* MBart, T5

* dumping the fast version of transformer XL

* Adding to autotokenizers + style/quality

* update init and space_between_special_tokens

* style and quality

* bump up tokenizers version

* add protobuf

* fix pickle Bert JP with Mecab

* fix newly added tokenizers

* style and quality

* fix bert japanese

* fix funnel

* limite tokenizer warning to one occurence

* clean up file

* fix new tokenizers

* fast tokenizers deep tests

* WIP adding all the special fast tests on the new fast tokenizers

* quick fix

* adding more fast tokenizers in the fast tests

* all tokenizers in fast version tested

* Adding BertGenerationFast

* bump up setup.py for CI

* remove BertGenerationFast (too early)

* bump up tokenizers version

* Clean old docstrings

* Typo

* Update following Lysandre comments
Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>

9aeacb58

07 Oct, 2020 2 commits

Blenderbot (#7418) · 960faaaf

Sam Shleifer authored Oct 07, 2020


Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

960faaaf

Trainer callbacks (#7596) · 08ba4b49

Sylvain Gugger authored Oct 07, 2020



* Initial callback proposal

* Finish various callbacks

* Post-rebase conflicts

* Fix tests

* Don't use something that's not set

* Documentation

* Remove unwanted print.

* Document all models can work

* Add tests + small fixes

* Update docs/source/internal/trainer_utils.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* Fix TF tests

* Real fix this time

* This one should work

* Fix typo

* Really fix typo
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

08ba4b49

06 Oct, 2020 2 commits

Add GPT2ForSequenceClassification based on DialogRPT (#7501) · 59824318
Lysandre Debut authored Oct 06, 2020
```
* Add GPT2ForSequenceClassification based on DialogRPT

* Better documentation

* Code quality
```
59824318

Fix squeezebert docs (#7587) · 0257992e

Lysandre Debut authored Oct 06, 2020

* Configuration

* Modeling

* Tokenization

* Obliterate the trailing spaces

* From underlines to long underlines

0257992e

05 Oct, 2020 5 commits

The toggle actually sticks (#7586) · 818c294f
Lysandre Debut authored Oct 05, 2020

818c294f

Check and update model list in index.rst automatically (#7527) · b2b7fc78

Sylvain Gugger authored Oct 05, 2020

* Check and update model list in index.rst automatically

* Check and update model list in index.rst automatically

* Adapt template

b2b7fc78

docs(pretrained_models): fix num parameters (#7575) · 0d79de73

Amine Abdaoui authored Oct 05, 2020



* docs(pretrained_models): fix num parameters

* fix(pretrained_models): correct typo
Co-authored-by: Amin <amin.geotrend@gmail.com>

0d79de73

SqueezeBERT architecture (#7083) · 02ef825b

Forrest Iandola authored Oct 05, 2020

* configuration_squeezebert.py

thin wrapper around bert tokenizer

fix typos

wip sb model code

wip modeling_squeezebert.py. Next step is to get the multi-layer-output interface working

set up squeezebert to use BertModelOutput when returning results.

squeezebert documentation

formatting

allow head mask that is an array of [None, ..., None]

docs

docs cont'd

path to vocab

docs and pointers to cloud files (WIP)

line length and indentation

squeezebert model cards

formatting of model cards

untrack modeling_squeezebert_scratchpad.py

update aws paths to vocab and config files

get rid of stub of NSP code, and advise users to pretrain with mlm only

fix rebase issues

redo rebase of modeling_auto.py

fix issues with code formatting

more code format auto-fixes

move squeezebert before bert in tokenization_auto.py and modeling_auto.py because squeezebert inherits from bert

tests for squeezebert modeling and tokenization

fix typo

move squeezebert before bert in modeling_auto.py to fix inheritance problem

disable test_head_masking, since squeezebert doesn't yet implement head masking

fix issues exposed by the test_modeling_squeezebert.py

fix an issue exposed by test_tokenization_squeezebert.py

fix issue exposed by test_modeling_squeezebert.py

auto generated code style improvement

issue that we inherited from modeling_xxx.py: SqueezeBertForMaskedLM.forward() calls self.cls(), but there is no self.cls, and I think the goal was actually to call self.lm_head()

update copyright

resolve failing 'test_hidden_states_output' and remove unused encoder_hidden_states and encoder_attention_mask

docs

add integration test. rename squeezebert-mnli --> squeezebert/squeezebert-mnli

autogenerated formatting tweaks

integrate feedback from patrickvonplaten and sgugger to programming style and documentation strings

* tiny change to order of imports

02ef825b

Cleanup documentation for BART, Marian, MBART and Pegasus (#7523) · e2c935f5

Sylvain Gugger authored Oct 05, 2020

* Cleanup documentation for BART, Marian, MBART and Pegasus

* Cleanup documentation for BART, Marian, MBART and Pegasus

e2c935f5

01 Oct, 2020 2 commits
- Update LayoutLM doc (#7388) · 9a92afb6
  Alexandr authored Oct 01, 2020
```
Co-authored-by: Alexandr Maslov <avmaslov3@gmail.com>
```
  9a92afb6
- Add forgotten return_dict argument in the docs (#7483) · be51c103
  Sylvain Gugger authored Oct 01, 2020
  
  be51c103
30 Sep, 2020 3 commits

Alphabetize model lists (#7478) · dc7d2daa
Sylvain Gugger authored Sep 30, 2020

dc7d2daa
Make transformers install check positive (#7473) · cc4eff80
François REMY authored Sep 30, 2020
```
When transformers is correctly installed, you should get a positive message ^_^
```
cc4eff80

Add DeBERTa model (#5929) · 7a0cf0ec

Pengcheng He authored Sep 30, 2020



* Add DeBERTa model

* Remove dependency of deberta

* Address comments

* Patch DeBERTa
Documentation
Style

* Add final tests

* Style

* Enable tests + nitpicks

* position IDs

* BERT -> DeBERTa

* Quality

* Style

* Tokenization

* Last updates.

* @patrickvonplaten's comments

* Not everything can be a copy

* Apply most of @sgugger's review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Last reviews

* DeBERTa -> Deberta
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

7a0cf0ec

29 Sep, 2020 2 commits
- Add documentation for v3.3.1 · a1c2ef7b
  Sylvain Gugger authored Sep 29, 2020
  
  a1c2ef7b
- Release: v3.3.1 · 1ba08dc2
  Sylvain Gugger authored Sep 29, 2020
  
  1ba08dc2
28 Sep, 2020 5 commits

Update docs to version v3.3.0 · 16c21382
Lysandre authored Sep 28, 2020

16c21382
Release: v3.3.0 · 0613f052
Lysandre authored Sep 28, 2020

0613f052
Reorganize documentation navbar (#7423) · ca3fc36d
Sylvain Gugger authored Sep 28, 2020
```
* Reorganize documentation navbar

* Update css to have clear sections
```
ca3fc36d
Document RAG again (#7377) · 0611eab5
Sylvain Gugger authored Sep 28, 2020
```
Do not merge before Monday
```
0611eab5

docs: fix model sharing file names (#5855) · 1749ca31

Boris Dayma authored Sep 28, 2020



* docs: fix model sharing file names

* Update docs/source/model_sharing.rst
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* docs(model_sharing.rst): fix new line
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

1749ca31

24 Sep, 2020 4 commits
- Remove mentions of RAG from the docs (#7376) · a8e7982f
  Sylvain Gugger authored Sep 24, 2020
```
* Remove mentions of  RAG from the docs

* Deactivate check
```
  a8e7982f
- Formatter (#7368) · 8d3bb781
  Lysandre Debut authored Sep 24, 2020
```
* Formatter

* Docs
```
  8d3bb781
- Clean RAG docs and template docs (#7348) · 0ccb6f5c
  Sylvain Gugger authored Sep 24, 2020
```
* Clean RAG docs and template docs

* Fix typo

* Better doc
```
  0ccb6f5c
- Expand a bit the documentation doc (#7350) · 0be5f4a0
  Sylvain Gugger authored Sep 24, 2020
  
  0be5f4a0
23 Sep, 2020 2 commits

Models doc (#7345) · 3323146e

Sylvain Gugger authored Sep 23, 2020



* Clean up model documentation

* Formatting

* Preparation work

* Long lines

* Main work on rst files

* Cleanup all config files

* Syntax fix

* Clean all tokenizers

* Work on first models

* Models beginning

* FaluBERT

* All PyTorch models

* All models

* Long lines again

* Fixes

* More fixes

* Update docs/source/model_doc/bert.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/electra.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Last fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

3323146e

[testing] skip decorators: docs, tests, bugs (#7334) · 28cf8730

Stas Bekman authored Sep 23, 2020

* skip decorators: docs, tests, bugs

* another important note

* style

* bloody style

* add @pytest.mark.parametrize

* add note

* no idea what it wants :(

28cf8730

22 Sep, 2020 1 commit

RAG (#6813) · c754c41c

Ola Piktus authored Sep 22, 2020

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* path fix

* Formatting / renaming prior to actual work

* added rag WIP

* Formatting / renaming prior to actual work

* First commit

* improve comments

* Retrieval evaluation scripts

* refactor to include modeling outputs + MPI retriever

* Fix rag-token model + refactor

* Various fixes + finetuning logic

* use_bos fix

* Retrieval refactor

* Finetuning refactoring and cleanup

* Add documentation and cleanup

* Remove set_up_rag_env.sh file

* Fix retrieval wit HF index

* Fix import errors

* Fix quality errors

* Refactor as per suggestions in https://github.com/huggingface/transformers/pull/6813#issuecomment-687208867



* fix quality

* Fix RAG Sequence generation

* minor cleanup plus initial tests

* fix test

* fix tests 2

* Comments fix

* post-merge fixes

* Improve readme + post-rebase refactor

* Extra dependencied for tests

* Fix tests

* Fix tests 2

* Refactor test requirements

* Fix tests 3

* Post-rebase refactor

* rename nlp->datasets

* RAG integration tests

* add tokenizer to slow integration test and allow retriever to run on cpu

* add tests; fix position ids warning

* change structure

* change structure

* add from encoder generator

* save working solution

* make all integration tests pass

* add RagTokenizer.save/from_pretrained and RagRetriever.save/from_pretrained

* don't save paths

* delete unnecessary imports

* pass config to AutoTokenizer.from_pretrained for Rag tokenizers

* init wiki_dpr only once

* hardcode legacy index and passages paths (todo: add the right urls)

* finalize config

* finalize retriver api and config api

* LegacyIndex index download refactor

* add dpr to autotokenizer

* make from pretrained more flexible

* fix ragfortokengeneration

* small name changes in tokenizer

* add labels to models

* change default index name

* add retrieval tests

* finish token generate

* align test with previous version and make all tests pass

* add tests

* finalize tests

* implement thoms suggestions

* add first version of test

* make first tests work

* make retriever platform agnostic

* naming

* style

* add legacy index URL

* docstrings + simple retrieval test for distributed

* clean model api

* add doc_ids to retriever's outputs

* fix retrieval tests

* finish model outputs

* finalize model api

* fix generate problem for rag

* fix generate for other modles

* fix some tests

* save intermediate

* set generate to default

* big refactor generate

* delete rag_api

* correct pip faiss install

* fix auto tokenization test

* fix faiss install

* fix test

* move the distributed logic to examples

* model page

* docs

* finish tests

* fix dependencies

* fix import in __init__

* Refactor eval_rag and finetune scripts

* start docstring

* add psutil to test

* fix tf test

* move require torch to top

* fix retrieval test

* align naming

* finish automodel

* fix repo consistency

* test ragtokenizer save/load

* add rag model output docs

* fix ragtokenizer save/load from pretrained

* fix tokenizer dir

* remove torch in retrieval

* fix docs

* fixe finetune scripts

* finish model docs

* finish docs

* remove auto model for now

* add require torch

* remove solved todos

* integrate sylvains suggestions

* sams comments

* correct mistake on purpose

* improve README

* Add generation test cases

* fix rag token

* clean token generate

* fix test

* add note to test

* fix attention mask

* add t5 test for rag

* Fix handling prefix in finetune.py

* don't overwrite index_name
Co-authored-by: Patrick Lewis <plewis@fb.com>
Co-authored-by: Aleksandra Piktus <piktus@devfair0141.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5102.h2.fair>
Co-authored-by: Aleksandra Piktus <piktus@learnfair5067.h2.fair>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>

c754c41c