Commits · 827d6d6ef071029cfe82838a18dab046b5813976 · chenpangpang / transformers

18 Apr, 2020 1 commit

Cleanup fast tokenizers integration (#3706) · 827d6d6e

Thomas Wolf authored Apr 18, 2020



* First pass on utility classes and python tokenizers

* finishing cleanup pass

* style and quality

* Fix tests

* Updating following @mfuntowicz comment

* style and quality

* Fix Roberta

* fix batch_size/seq_length inBatchEncoding

* add alignement methods + tests

* Fix OpenAI and Transfo-XL tokenizers

* adding trim_offsets=True default for GPT2 et RoBERTa

* style and quality

* fix tests

* add_prefix_space in roberta

* bump up tokenizers to rc7

* style

* unfortunately tensorfow does like these - removing shape/seq_len for now

* Update src/transformers/tokenization_utils.py
Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Adding doc and docstrings

* making flake8 happy
Co-authored-by: Stefan Schweter <stefan@schweter.it>

827d6d6e

16 Apr, 2020 1 commit

[Docs] Add DialoGPT (#3755) · d22894df

Patrick von Platen authored Apr 16, 2020



* add dialoGPT

* update README.md

* fix conflict

* update readme

* add code links to docs

* Update README.md

* Update dialo_gpt2.rst

* Update pretrained_models.rst

* Update docs/source/model_doc/dialo_gpt2.rst
Co-Authored-By: Julien Chaumond <chaumond@gmail.com>

* change filename of dialogpt
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

d22894df

03 Apr, 2020 1 commit

ELECTRA (#3257) · d5d7d886

Lysandre Debut authored Apr 03, 2020

* Electra wip

* helpers

* Electra wip

* Electra v1

* ELECTRA may be saved/loaded

* Generator & Discriminator

* Embedding size instead of halving the hidden size

* ELECTRA Tokenizer

* Revert BERT helpers

* ELECTRA Conversion script

* Archive maps

* PyTorch tests

* Start fixing tests

* Tests pass

* Same configuration for both models

* Compatible with base + large

* Simplification + weight tying

* Archives

* Auto + Renaming to standard names

* ELECTRA is uncased

* Tests

* Slight API changes

* Update tests

* wip

* ElectraForTokenClassification

* temp

* Simpler arch + tests

Removed ElectraForPreTraining which will be in a script

* Conversion script

* Auto model

* Update links to S3

* Split ElectraForPreTraining and ElectraForTokenClassification

* Actually test PreTraining model

* Remove num_labels from configuration

* wip

* wip

* From discriminator and generator to electra

* Slight API changes

* Better naming

* TensorFlow ELECTRA tests

* Accurate conversion script

* Added to conversion script

* Fast ELECTRA tokenizer

* Style

* Add ELECTRA to README

* Modeling Pytorch Doc + Real style

* TF Docs

* Docs

* Correct links

* Correct model intialized

* random fixes

* style

* Addressing Patrick's and Sam's comments

* Correct links in docs

d5d7d886

30 Mar, 2020 1 commit

[T5] Add training documenation (#3507) · 5b44e0a3

Patrick von Platen authored Mar 30, 2020

* Add clear description of how to train T5

* correct docstring in T5

* correct typo

* correct docstring format

* update t5 model docs

* implement collins feedback

* fix typo and add more explanation for sentinal tokens

* delete unnecessary todos

5b44e0a3

27 Mar, 2020 1 commit

Add T5 to docs (#3461) · fa9af246

Patrick von Platen authored Mar 27, 2020

* add t5 docs basis

* improve docs

* add t5 docs

* improve t5 docstring

* add t5 tokenizer docstring

* finish docstring

* make style

* add pretrained models

* correct typo

* make examples work

* finalize docs

fa9af246

05 Mar, 2020 1 commit
- Rename BartForMaskedLM -> BartForConditionalGeneration (#3114) · 857e0a0d
  Sam Shleifer authored Mar 05, 2020
```
* improved documentation
```
  857e0a0d
02 Mar, 2020 1 commit

Bart-CNN (#3059) · b54ef78d

Sam Shleifer authored Mar 02, 2020

`generate` code that produces 99% identical summarizations to fairseq on CNN test data, with caching.

b54ef78d

25 Feb, 2020 1 commit

Documentation (#2989) · bb7c4685

Lysandre Debut authored Feb 25, 2020

* All Tokenizers

BertTokenizer + few fixes
RobertaTokenizer
OpenAIGPTTokenizer + Fixes
GPT2Tokenizer + fixes
TransfoXLTokenizer
Correct rst for TransformerXL
XLMTokenizer + fixes
XLNet Tokenizer + Style
DistilBERT + Fix XLNet RST
CTRLTokenizer
CamemBERT Tokenizer
FlaubertTokenizer
XLMRobertaTokenizer
cleanup

* cleanup

bb7c4685

20 Feb, 2020 1 commit

New BartModel (#2745) · 53ce3854

Sam Shleifer authored Feb 20, 2020

* Results same as fairseq
* Wrote a ton of tests
* Struggled with api signatures
* added some docs

53ce3854

07 Feb, 2020 2 commits
- Update RoBERTa tips · dd288303
  Lysandre authored Feb 07, 2020
  
  dd288303
- Update XLM-R tips · db979301
  Lysandre authored Feb 07, 2020
  
  db979301
30 Jan, 2020 1 commit
- FlauBERT documentation · 73306d02
  Lysandre authored Jan 29, 2020
  
  73306d02
29 Jan, 2020 2 commits
- Update documentation · c69b0826
  Lysandre authored Jan 29, 2020
  
  c69b0826
- Update documentation · 44a5b4bb
  Lysandre authored Jan 29, 2020
  
  44a5b4bb
27 Jan, 2020 1 commit
- adding in the doc · e0849a66
  thomwolf authored Jan 24, 2020
  
  e0849a66
24 Jan, 2020 1 commit
- AutoModels doc · 983fef46
  Lysandre authored Jan 24, 2020
  
  983fef46
23 Jan, 2020 18 commits
- Run the examples in slow · 24d5ad1d
  Lysandre authored Jan 22, 2020
  
  24d5ad1d
- Tips + whitespaces · 9ddf60b6
  Lysandre authored Jan 21, 2020
  
  9ddf60b6
- Fixes · 0e9899f4
  Lysandre authored Jan 20, 2020
  
  0e9899f4
- PyTorch CTRL + Style · 7511f3dd
  Lysandre authored Jan 20, 2020
  
  7511f3dd
- XLM-RoBERTa · 980211a6
  Lysandre authored Jan 20, 2020
  
  980211a6
- PyTorch DistilBERT · db1a7f27
  Lysandre authored Jan 20, 2020
  
  db1a7f27
- TF RoBERTa · b28020f5
  Lysandre authored Jan 20, 2020
  
  b28020f5
- Pytorch RoBERTa · 3e1bc27e
  Lysandre authored Jan 20, 2020
  
  3e1bc27e
- Camembert · f44ff574
  Lysandre authored Jan 17, 2020
  
  f44ff574
- PyTorch XLM · ccebcae7
  Lysandre authored Jan 17, 2020
  
  ccebcae7
- PyTorch XLNet · cd656fb2
  Lysandre authored Jan 17, 2020
  
  cd656fb2
- PyTorch Transformer-XL · 98edad41
  Lysandre authored Jan 17, 2020
  
  98edad41
- Pytorch GPT · 850795c4
  Lysandre authored Jan 17, 2020
  
  850795c4
- TF GPT2 · 1487b840
  Lysandre authored Jan 17, 2020
  
  1487b840
- GPT-2 PyTorch models + better tips for BERT · bd0d3fd7
  Lysandre authored Jan 16, 2020
  
  bd0d3fd7
- BERT PyTorch models · cd77c750
  Lysandre authored Jan 16, 2020
  
  cd77c750
- TF ALBERT + TF Utilities + Fix warnings · 3922a249
  Lysandre authored Jan 15, 2020
  
  3922a249
- ALBERT Modeling + required changes to utilities · 00df3d4d
  Lysandre authored Jan 15, 2020
  
  00df3d4d
14 Jan, 2020 3 commits
- Added example usage · 387217bd
  Lysandre authored Jan 13, 2020
  
  387217bd
- Add missing XLNet and XLM models · 7d1bb7f2
  Lysandre authored Jan 13, 2020
  
  7d1bb7f2
- Updated Configurations · 63268272
  Lysandre Debut authored Jan 12, 2020
  
  63268272
06 Jan, 2020 2 commits
- GPU text generation: mMoved the encoded_prompt to correct device · 81d6841b
  alberduris authored Dec 31, 2019
  
  81d6841b
- Moved the encoded_prompts to correct device · dd4df80f
  alberduris authored Dec 31, 2019
  
  dd4df80f
27 Nov, 2019 1 commit
- Remove TFBertForPreTraining from ALBERT doc · 36162095
  Lysandre authored Nov 27, 2019
  
  36162095