Commits · d16e36c7e525aab4c08a6e60a7478e209498dc14 · chenpangpang / transformers

29 Jun, 2020 1 commit

[Docs] Benchmark docs (#5360) · 4bcc35cd

Patrick von Platen authored Jun 29, 2020



* first doc version

* add benchmark docs

* fix typos

* improve README

* Update docs/source/benchmarks.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* fix naming and docs
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

4bcc35cd

26 Jun, 2020 2 commits

[tokenizers] Updates data processors, docstring, examples and model cards to the new API (#5308) · 601d4d69

Thomas Wolf authored Jun 26, 2020

* remove references to old API in docstring - update data processors

* style

* fix tests - better type checking error messages

* better type checking

* include awesome fix by @LysandreJik for #5310

* updated doc and examples

601d4d69

Add benchmark notebook (#5312) · 834b6884

Patrick von Platen authored Jun 26, 2020

* add notebook

* Créé avec Colaboratory

* move notebook to correct folder

* correct link

* correct filename

* correct filename

* better name

834b6884

24 Jun, 2020 1 commit
- Add hugs (#5225) · 7c41057d
  Sylvain Gugger authored Jun 24, 2020
  
  7c41057d
22 Jun, 2020 1 commit

Add link to new comunity notebook (optimization) (#5195) · 0cca6192

Michaël Benesty authored Jun 22, 2020

* Add link to new comunity notebook (optimization)

related to https://github.com/huggingface/transformers/issues/4842#event-3469184635

This notebook is about benchmarking model training with/without dynamic padding optimization. 
https://github.com/ELS-RD/transformers-notebook 

Using dynamic padding on MNLI provides a **4.7 times training time reduction**, with max pad length set to 512. The effect is strong because few examples are >> 400 tokens in this dataset. IRL, it will depend of the dataset, but it always bring improvement and, after more than 20 experiments listed in this [article](https://towardsdatascience.com/divide-hugging-face-transformers-training-time-by-2-or-more-21bf7129db9q-21bf7129db9e?source=friends_link&sk=10a45a0ace94b3255643d81b6475f409

), it seems to not hurt performance.

Following advice from @patrickvonplaten I do the PR myself :-)

* Update notebooks/README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

0cca6192

18 Jun, 2020 1 commit
- Add missing arg in 02-transformers notebook (#5085) · a258982a
  Pri Oberoi authored Jun 18, 2020
```
* Add missing arg when creating model

* Fix typos

* Remove from_tf flag when creating model
```
  a258982a
03 Jun, 2020 1 commit

Adding notebooks for Fine Tuning [Community Notebook] (#4732) · 3e5928c5

Abhishek Kumar Mishra authored Jun 03, 2020

* Added links to more community notebooks

Added links to 3 more community notebooks from the git repo: https://github.com/abhimishra91/transformers-tutorials


Different Transformers models are fine tuned on Dataset using PyTorch

* Update README.md

* Update README.md

* Update README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

3e5928c5

02 Jun, 2020 1 commit
- Add community notebook for sentiment span extraction (#4700) · d3ef14f9
  Lorenzo Ampil authored Jun 02, 2020
  
  d3ef14f9
29 May, 2020 2 commits
- Include `nlp` notebook for model evaluation (#4676) · 6f82aea6
  Patrick von Platen authored May 29, 2020
  
  6f82aea6
- [Longformer] fix model name in examples (#4653) · 91487cbb
  Iz Beltagy authored May 29, 2020
```
* fix longformer model names in examples

* a better name for the notebook
```
  91487cbb
28 May, 2020 3 commits
- Adding community notebook (#4642) · fe5cb1a1
  Iz Beltagy authored May 28, 2020
```
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
```
  fe5cb1a1
- [Community notebooks] add longformer-for-qa notebook (#4652) · aecaaf73
  Suraj Patil authored May 29, 2020
  
  aecaaf73
- add 2 colab notebooks (#4505) · 3cc2c2a1
  Lavanya Shukla authored May 28, 2020
```
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
```
  3cc2c2a1
26 May, 2020 1 commit

Add BART fine-tuning summarization community notebook (#4539) · 5ddd8d65

ohmeow authored May 26, 2020



* adding BART summarization how-to community notebook

* Update notebooks/README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

5ddd8d65

22 May, 2020 2 commits
- Better github link for Reformer Colab Notebook · 0f6969b7
  Patrick von Platen authored May 22, 2020
  
  0f6969b7
- Add Reformer colab to community noteboos · 12e6afe9
  Patrick von Platen authored May 22, 2020
  
  12e6afe9
20 May, 2020 1 commit
- Add Fine-tune DialoGPT on new datasets notebook (#4473) · cacb654c
  Nathan Cooper authored May 20, 2020
  
  cacb654c
19 May, 2020 1 commit

add T5 fine-tuning notebook [Community notebooks] (#4462) · 5856999a

Suraj Patil authored May 19, 2020



* add T5 fine-tuning notebook [Community notebooks]

* Update README.md
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

5856999a

18 May, 2020 2 commits

Adding optimizations block from ONNXRuntime. (#4431) · ca4a3f4d

Funtowicz Morgan authored May 18, 2020

* Adding optimizations block from ONNXRuntime.

* Turn off external data format by default for PyTorch export.

* Correct the way use_external_format is passed through the cmdline args.

ca4a3f4d

[Community notebooks] General notebooks (#4441) · 24538df9
Patrick von Platen authored May 18, 2020
```
* Update README.md

* Update README.md

* Update README.md

* Update README.md
```
24538df9

15 May, 2020 1 commit
- rerun notebook 02-transformers (#4341) · 62427d08
  Nikita authored May 15, 2020
  
  62427d08
14 May, 2020 2 commits

Updated ONNX notebook link in README. · 84894974
Morgan Funtowicz authored May 14, 2020

84894974

Conversion script to export transformers models to ONNX IR. (#4253) · db0076a9

Funtowicz Morgan authored May 14, 2020

* Added generic ONNX conversion script for PyTorch model.

* WIP initial TF support.

* TensorFlow/Keras ONNX export working.

* Print framework version info

* Add possibility to check the model is correctly loading on ONNX runtime.

* Remove quantization option.

* Specify ONNX opset version when exporting.

* Formatting.

* Remove unused imports.

* Make functions more generally reusable from other part of the code.

* isort happy.

* flake happy

* Export only feature-extraction for now

* Correctly check inputs order / filter before export.

* Removed task variable

* Fix invalid args call in load_graph_from_args.

* Fix invalid args call in convert.

* Fix invalid args call in infer_shapes.

* Raise exception and catch in caller function instead of exit.

* Add 04-onnx-export.ipynb notebook

* More WIP on the notebook

* Remove unused imports

* Simplify & remove unused constants.

* Export with constant_folding in PyTorch

* Let's try to put function args in the right order this time ...

* Disable external_data_format temporary

* ONNX notebook draft ready.

* Updated notebooks charts + wording

* Correct error while exporting last chart in notebook.

* Adressing @LysandreJik comment.

* Set ONNX opset to 11 as default value.

* Set opset param mandatory

* Added ONNX export unittests

* Quality.

* flake8 happy

* Add keras2onnx dependency on extras["tf"]

* Pin keras2onnx on github master to v1.6.5

* Second attempt.

* Third attempt.

* Use the right repo URL this time ...

* Do the same for onnxconverter-common

* Added keras2onnx and onnxconveter-common to 1.7.0 to supports TF2.2

* Correct commit hash.

* Addressing PR review: Optimization are enabled by default.

* Addressing PR review: small changes in the notebook

* setup.py comment about keras2onnx versioning.

db0076a9

13 May, 2020 1 commit

[Docs, Notebook] Include generation pipeline (#4295) · 839bfaed

Patrick von Platen authored May 13, 2020

* add first text for generation

* add generation pipeline to usage

* Created using Colaboratory

* correct docstring

* finish

839bfaed

28 Apr, 2020 1 commit
- notebooks: minor fix for community provided models example (#4025) · b5c6d3d4
  Stefan Schweter authored Apr 28, 2020
  
  b5c6d3d4
16 Apr, 2020 1 commit
- typo: fine-grained token-leven · 0cec4fab
  Jonathan Sum authored Apr 15, 2020
```
Changing from "fine-grained token-leven" to "fine-grained token-level"
```
  0cec4fab
10 Apr, 2020 1 commit
- Update tokenizers to 0.7.0-rc5 (#3705) · b7cf9f43
  Anthony MOI authored Apr 10, 2020
  
  b7cf9f43
06 Apr, 2020 1 commit

Update notebooks (#3620) · 261c4ff4

Lysandre Debut authored Apr 06, 2020

* Update notebooks

* From local to global link

* from local links to *actual* global links

261c4ff4

27 Mar, 2020 1 commit
- add summarization and translation to notebook (#3478) · 00ea100e
  Patrick von Platen authored Mar 27, 2020
  
  00ea100e
19 Mar, 2020 2 commits

Update 01-training-tokenizers.ipynb (typo issue) (#3343) · 8eeefcb5

Kyeongpil Kang authored Mar 20, 2020

I found there are two grammar errors or typo issues in the explanation of the encoding properties.

The original sentences:
If your was made of multiple \"parts\" such as (question, context), then this would be a vector with for each token the segment it belongs to
If your has been truncated into multiple subparts because of a length limit (for BERT for example the sequence length is limited to 512), this will contain all the remaining overflowing parts.

I think "input" should be inserted after the phrase "If your".

8eeefcb5

Fix wrong link for the notebook file (#3344) · 3bedfd33

Kyeongpil Kang authored Mar 20, 2020

For the tutorial of "How to generate text", the URL link was wrong (it was linked to the tutorial of "How to train a language model").

I fixed the URL.

3bedfd33

18 Mar, 2020 2 commits
- Improve fill-mask pipeline example in 03-pipelines notebook. · cae334c4
  Morgan Funtowicz authored Mar 18, 2020
```
Remove hardcoded mask_token and use the value provided by the tokenizer.
```
  cae334c4
- add link to blog post (#3326) · efdb46b6
  Patrick von Platen authored Mar 18, 2020
  
  efdb46b6
08 Mar, 2020 1 commit
- Updated `Tokenw ise` in print statement to `Token wise` · b29fed79
  Param bhavsar authored Mar 08, 2020
  
  b29fed79
05 Mar, 2020 4 commits
- Updated notebook dependencies for Colab. · 7ac47bfe
  Morgan Funtowicz authored Mar 05, 2020
```
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
```
  7ac47bfe
- Fixing sentiment pipeline in 03-pipelines notebook. · be02176a
  Morgan Funtowicz authored Mar 05, 2020
```
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
```
  be02176a
- Updating colab links in notebooks README. · 012cbdb0
  Morgan Funtowicz authored Mar 05, 2020
```
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
```
  012cbdb0
- Fix Colab links + install dependencies first. · 30624f70
  Morgan Funtowicz authored Mar 05, 2020
```
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
```
  30624f70
04 Mar, 2020 2 commits
- Update notebook link and fix few working issues. · 1bca97ec
  Morgan Funtowicz authored Mar 04, 2020
```
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
```
  1bca97ec
- [doc] Fix link to how-to-train Colab · 256cbbc4
  Julien Chaumond authored Mar 04, 2020
  
  256cbbc4