Commits · f250beb8aac83009c70ff01ae8568384683d0f3c · chenpangpang / transformers

"vscode:/vscode.git/clone" did not exist on "0f9dfa97a35ef87e16b700742d3c358d0ad15452"

31 Jul, 2020 1 commit

Enable ONNX/ONNXRuntime optimizations through converter script (#6131) · 7231f7b5

Funtowicz Morgan authored Jul 31, 2020



* Add onnxruntime transformers optimization support
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added Optimization section in ONNX/ONNXRuntime documentation.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve note reference
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixing imports order.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Add warning about different level of optimization between torch and tf export.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Address @LysandreJik wording suggestion
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address @LysandreJik wording suggestion
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Always optimize model before quantization for maximum performances.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Address comments on the documentation.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve TensorFlow optimization message as suggested by @yufenglee
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Removed --optimize parameter
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Warn the user about current quantization limitation when model is larger than 2GB.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Trigger CI for last check

* Small change in print for the optimization section.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

7231f7b5

30 Jul, 2020 4 commits

Doc tokenizer (#6110) · f3065abd

Sylvain Gugger authored Jul 30, 2020



* Start doc tokenizers

* Tokenizer documentation

* Start doc tokenizers

* Tokenizer documentation

* Formatting after rebase

* Formatting after merge

* Update docs/source/main_classes/tokenizer.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address comment

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address Thom's comments
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

f3065abd

Addition of a DialoguePipeline (#5516) · e642c789

guillaume-be authored Jul 30, 2020



* initial commit for pipeline implementation

Addition of input processing and history concatenation

* Conversation pipeline tested and working for single & multiple conversation inputs

* Added docstrings for dialogue pipeline

* Addition of dialogue pipeline integration tests

* Delete test_t5.py

* Fixed max code length

* Updated styling

* Fixed test broken by formatting tools

* Removed unused import

* Added unit test for DialoguePipeline

* Fixed Tensorflow compatibility

* Fixed multi-framework support using framework flag

* - Fixed docstring
- Added `min_length_for_response` as an initialization parameter
- Renamed `*args` to `conversations`, `conversations` being a `Conversation` or a `List[Conversation]`
- Updated truncation to truncate entire segments of conversations, instead of cutting in the middle of a user/bot input

* - renamed pipeline name from dialogue to conversational
- removed hardcoded default value of 1000 and use config.max_length instead
- added `append_response` and `set_history` method to the Conversation class to avoid direct fields mutation
- fixed bug in history truncation method

* - Updated ConversationalPipeline to accept only active conversations (otherwise a ValueError is raised)

* - Simplified input tensor conversion

* - Updated attention_mask value for Tensorflow compatibility

* - Updated last dialogue reference to conversational & fixed integration tests

* Fixed conflict with master

* Updates following review comments

* Updated formatting

* Added Conversation and ConversationalPipeline to the library __init__, addition of docstrings for Conversation, added both to the docs

* Update src/transformers/pipelines.py

Updated docsting following review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

e642c789

Switch from return_tuple to return_dict (#6138) · 91cb9546

Sylvain Gugger authored Jul 30, 2020



* Switch from return_tuple to return_dict

* Fix test

* [WIP] Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleC… (#5614)

* Test TF Flaubert + Add {XLM, Flaubert}{TokenClassification, MultipleChoice} models and tests

* AutoModels


Tiny tweaks

* Style

* Final changes before merge

* Re-order for simpler review

* Final fixes

* Addressing @sgugger's comments

* Test MultipleChoice

* Rework TF trainer (#6038)

* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import

* Switch from return_tuple to return_dict

* Fix test

* Add recent model
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Plu <plu.julien@gmail.com>

91cb9546

Actually the extra_id are from 0-99 and not from 1-100 (#5967) · d24ea708

Oren Amsalem authored Jul 30, 2020

a = tokenizer.encode("we got a <extra_id_99>", return_tensors='pt',add_special_tokens=True)
print(a)
>tensor([[   62,   530,     3,     9, 32000]])
a = tokenizer.encode("we got a <extra_id_100>", return_tensors='pt',add_special_tokens=True)
print(a)
>tensor([[   62,   530,     3,     9,     3,     2, 25666,   834,    23,    26,
           834,  2915,  3155]])

d24ea708

29 Jul, 2020 2 commits

Added capability to quantize a model while exporting through ONNX. (#6089) · 6c002853

Funtowicz Morgan authored Jul 29, 2020



* Added capability to quantize a model while exporting through ONNX.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

We do not support multiple extensions
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Reformat files
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* More quality
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Ensure test_generate_identified_name compares the same object types
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added documentation everywhere on ONNX exporter
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use pathlib.Path instead of plain-old string
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use f-string everywhere
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use the correct parameters for black formatting
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use Python 3 super() style.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Use packaging.version to ensure installed onnxruntime version match requirements
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fixing imports sorting order.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Missing raise(s)
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Added quantization documentation
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix some spelling.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix bad list header format
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

6c002853

ONNX documentation (#5992) · 640550fc

Funtowicz Morgan authored Jul 29, 2020



* Move torchscript and add ONNX documentation under modle_export
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove previously introduced tree element
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* WIP doc
Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* ONNX documentation
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix invalid link
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve spelling
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Final wording pass
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

640550fc

27 Jul, 2020 1 commit
- Update model_summary.rst (#5737) · b9b11795
  Xin Wen authored Jul 27, 2020
```
Add '-' to make the reference of Transformer-XL more accurate and formal.
```
  b9b11795
24 Jul, 2020 1 commit
- Model utils doc (#6005) · 3b44aa93
  Sylvain Gugger authored Jul 24, 2020
```
* Document TF modeling utils

* Document all model utils
```
  3b44aa93
22 Jul, 2020 1 commit
- Update doc of the model page (#5985) · 33d7506e
  Sylvain Gugger authored Jul 22, 2020
  
  33d7506e
21 Jul, 2020 1 commit
- Update doc to new model outputs (#5946) · e714412f
  Sylvain Gugger authored Jul 21, 2020
```
* Update doc to new model outputs

* Fix outputs in quicktour
```
  e714412f
20 Jul, 2020 1 commit
- Add AlbertForPretraining to doc (#5914) · a2096917
  Sylvain Gugger authored Jul 20, 2020
  
  a2096917
14 Jul, 2020 1 commit
- tiny ppl doc typo fix (#5751) · 5d178954
  Joe Davison authored Jul 14, 2020
  
  5d178954
13 Jul, 2020 2 commits
- FlaubertForTokenClassification (#5644) · 45addfe9
  Stas Bekman authored Jul 13, 2020
```
* implement FlaubertForTokenClassification as a subclass of XLMForTokenClassification

* fix mapping order

* add the doc

* add common tests
```
  45addfe9
- doc improvements (#5688) · 0a19a49d
  Stas Bekman authored Jul 13, 2020
  
  0a19a49d
10 Jul, 2020 2 commits

Document model outputs (#5673) · 7fad617d

Sylvain Gugger authored Jul 10, 2020



* Document model outputs

* Update docs/source/main_classes/output.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

7fad617d

Improvements to PretrainedConfig documentation (#5642) · b2747af5
Sylvain Gugger authored Jul 10, 2020
```
* Update PretrainedConfig doc

* Formatting

* Small fixes

* Forgotten args and more cleanup
```
b2747af5

09 Jul, 2020 2 commits
- Add forum link in the docs (#5637) · 760f726e
  Sylvain Gugger authored Jul 09, 2020
  
  760f726e
- Correct extension (#5631) · 1158e565
  Lysandre Debut authored Jul 09, 2020
  
  1158e565
08 Jul, 2020 1 commit
- doc fixes (#5613) · fa5423b1
  Stas Bekman authored Jul 08, 2020
  
  fa5423b1
07 Jul, 2020 4 commits

Guide to fixed-length model perplexity evaluation (#5449) · b4b33fdf

Joe Davison authored Jul 07, 2020

* add first draft ppl guide

* upload imgs

* expand on strides

* ref typo

* rm superfluous past var

* add tokenization disclaimer

b4b33fdf

Add mbart-large-cc25, support translation finetuning (#5129) · 353b8f1e

Sam Shleifer authored Jul 07, 2020

improve unittests for finetuning, especially w.r.t testing frozen parameters
fix freeze_embeds for T5
add streamlit setup.cfg

353b8f1e

[docs] fix model_doc links in model summary (#5566) · 33e43edd
Suraj Patil authored Jul 07, 2020
```
* fix model_doc links

* update model links
```
33e43edd

Add DPR model (#5279) · fbd87921

Quentin Lhoest authored Jul 07, 2020



* beginning of dpr modeling

* wip

* implement forward

* remove biencoder + better init weights

* export dpr model to embed model for nlp lib

* add new api

* remove old code

* make style

* fix dumb typo

* don't load bert weights

* docs

* docs

* style

* move the `k` parameter

* fix init_weights

* add pretrained configs

* minor

* update config names

* style

* better config

* style

* clean code based on PR comments

* change Dpr to DPR

* fix config

* switch encoder config to a dict

* style

* inheritance -> composition

* add messages in assert startements

* add dpr reader tokenizer

* one tokenizer per model

* fix base_model_prefix

* fix imports

* typo

* add convert script

* docs

* change tokenizers conf names

* style

* change tokenizers conf names

* minor

* minor

* fix wrong names

* minor

* remove unused convert functions

* rename convert script

* use return_tensors in tokenizers

* remove n_questions dim

* move generate logic to tokenizer

* style

* add docs

* docs

* quality

* docs

* add tests

* style

* add tokenization tests

* DPR full tests

* Stay true to the attention mask building

* update docs

* missing param in bert input docs

* docs

* style
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

fbd87921

06 Jul, 2020 4 commits
- Post v3.0.2 release commit · 1d233286
  Lysandre authored Jul 06, 2020
  
  1d233286
- Release: v3.0.2 · b0892fa0
  Lysandre authored Jul 06, 2020
  
  b0892fa0
- Typo fix in `training` doc (#5495) · b2309cc6
  Arnav Sharma authored Jul 06, 2020
  
  b2309cc6
- Fix typo in training (#5510) · 7ecff0cc
  ELanning authored Jul 06, 2020
  
  7ecff0cc
02 Jul, 2020 2 commits

Tokenizer summary (#5467) · 6b735a72

Sylvain Gugger authored Jul 02, 2020



* Work on tokenizer summary

* Finish tutorial

* Link to it

* Apply suggestions from code review
Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add vocab definition
Co-authored-by: Anthony MOI <xn1t0x@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

6b735a72

Fix typo in glossary (#5466) · 84e56669
George Ho authored Jul 02, 2020

84e56669

01 Jul, 2020 4 commits
- [Reformer] Add Masked LM Reformer (#5426) · d16e36c7
  Patrick von Platen authored Jul 01, 2020
```
* fix conflicts

* fix

* happy rebasing
```
  d16e36c7
- finish reformer qa head (#5433) · fe81f7d1
  Patrick von Platen authored Jul 01, 2020
  
  fe81f7d1
- Fix dropdown bug in searches (#5440) · 6c55e9fc
  Sylvain Gugger authored Jul 01, 2020
```
* Trigger CI

* Fix dropdown bug in searches
```
  6c55e9fc
- Fix examples titles and optimization doc page (#5408) · 4ade7491
  Sylvain Gugger authored Jul 01, 2020
  
  4ade7491
30 Jun, 2020 2 commits
- Documentation for the Trainer API (#5383) · 87716a6d
  Sylvain Gugger authored Jun 30, 2020
```
* Documentation for the Trainer API

* Address review comments

* Address comments
```
  87716a6d
- How to share model cards with the CLI (#5374) · 0607b889
  Sylvain Gugger authored Jun 30, 2020
```
* How to share model cards

* Switch the two options

* Fix bad copy/cut

* Julien's suggestion
```
  0607b889
29 Jun, 2020 4 commits

Doc for v3.0.0 (#5366) · b9ee87f5

Lysandre Debut authored Jun 29, 2020



* Doc for v3.0.0

* Update docs/source/_static/js/custom.js
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/_static/js/custom.js
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

b9ee87f5

Release: v3.0.0 · b62ca595
Lysandre authored Jun 29, 2020

b62ca595

[Docs] Benchmark docs (#5360) · 4bcc35cd

Patrick von Platen authored Jun 29, 2020



* first doc version

* add benchmark docs

* fix typos

* improve README

* Update docs/source/benchmarks.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix naming and docs
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

4bcc35cd

[docs] Small tweaks to #5323 · c950fef5
Julien Chaumond authored Jun 29, 2020

c950fef5