Commits · 7291ea0bff57a017e71b1ea8ec01ff19da298bf0 · chenpangpang / transformers

17 Jun, 2020 1 commit
- Reorganize documentation (#5064) · 7291ea0b
  Sylvain Gugger authored Jun 17, 2020
```
* Reorganize topics and add all models
```
  7291ea0b
16 Jun, 2020 13 commits

Update pipeline examples to doctest syntax (#5030) · e4aaa458
Sylvain Gugger authored Jun 16, 2020

e4aaa458
Fix all sphynx warnings (#5068) · 011cc0be
Sylvain Gugger authored Jun 16, 2020

011cc0be
Typo (#5069) · af497b56
flozi00 authored Jun 16, 2020

af497b56

Eli5 examples (#4968) · 49c52025

Yacine Jernite authored Jun 16, 2020



* add eli5 examples

* add dense query script

* query_di

* merging

* merging

* add_utils

* adds nearest neighbor wikipedia

* batch queries

* training_retriever

* new notebooks

* moved retriever traiing script

* finished wiki40b

* max_len_fix

* train_s2s

* retriever_batch_checkpointing

* cleanup

* merge

* dim_fix

* fix_indexer

* fix_wiki40b_snippets

* fix_embed_for_r

* fp32 index

* fix_sparse_q

* joint_training

* remove obsolete datasets

* add_passage_nn_results

* add_passage_nn_results

* add_batch_nn

* add_batch_nn

* add_data_scripts

* notebook

* notebook

* notebook

* fix_multi_gpu

* add_app

* full_caching

* full_caching

* notebook

* sparse_done

* images

* notebook

* add_image_gif

* with_Gif

* add_contr_image

* notebook

* notebook

* notebook

* train_functions

* notebook

* min_retrieval_length

* pandas_option

* notebook

* min_retrieval_length

* notebook

* notebook

* eval_Retriever

* notebook

* images

* notebook

* add_example

* add_example

* notebook

* fireworks

* notebook

* notebook

* joe's notebook comments

* app_update

* notebook

* notebook_link

* captions

* notebook

* assing RetriBert model

* add RetriBert to Auto

* change AutoLMHead to AutoSeq2Seq

* notebook downloads from hf models

* style_black

* style_black

* app_update

* app_update

* fix_app_update

* style

* style

* isort

* Delete WikiELI5training.ipynb

* Delete evaluate_eli5.py

* Delete WikiELI5explore.ipynb

* Delete ExploreWikiELI5Support.html

* Delete explainlikeimfive.py

* Delete wiki_snippets.py

* children before parent

* children before parent

* style_black

* style_black_only

* isort

* isort_new

* Update src/transformers/modeling_retribert.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* typo fixes

* app_without_asset

* cleanup

* Delete ELI5animation.gif

* Delete ELI5contrastive.svg

* Delete ELI5wiki_index.svg

* Delete choco_bis.svg

* Delete fireworks.gif

* Delete huggingface_logo.jpg

* Delete huggingface_logo.svg

* Delete Long_Form_Question_Answering_with_ELI5_and_Wikipedia.ipynb

* Delete eli5_app.py

* Delete eli5_utils.py

* readme

* Update README.md

* unused imports

* moved_info

* default_beam

* ftuned model

* disclaimer

* Update src/transformers/modeling_retribert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* black

* add_doc

* names

* isort_Examples

* isort_Examples

* Add doc to index
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

49c52025

[cleanup] examples test_run_squad uses tiny model (#5059) · c3e60749
Sam Shleifer authored Jun 16, 2020

c3e60749
Remove old section + caching in install (#5027) · 439aa1d6
Sylvain Gugger authored Jun 16, 2020

439aa1d6
Fix marian tokenizer save pretrained (#5043) · 3d495c61
Sam Shleifer authored Jun 16, 2020

3d495c61
Convert hans to Trainer (#5025) · d5477baf
Sylvain Gugger authored Jun 16, 2020
```
* Convert hans to Trainer

* Tick box
```
d5477baf
[cleanup] Hoist ModelTester objects to top level (#4939) · c852036b
Amil Khare authored Jun 16, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
c852036b

Add reference to NLP dataset (#5028) · 0c55a384

Manuel Romero authored Jun 16, 2020



* Add reference to NLP dataset

* Update README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

0c55a384

Add reference to NLP (package) dataset (#5029) · 0946d120

Manuel Romero authored Jun 16, 2020



* Add reference to NLP (package) dataset

* Update README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

0946d120

refactor(wandb): consolidate import (#5044) · edcb3ac5
Boris Dayma authored Jun 16, 2020

edcb3ac5

Ability to pickle/unpickle BatchEncoding pickle (reimport) (#5039) · 9e033649

Funtowicz Morgan authored Jun 16, 2020

* Added is_fast property on BatchEncoding to indicate if the object comes from a Fast Tokenizer.

* Added __get_state__() & __set_state__() to be pickable.

* Correct tokens() return type from List[int] to List[str]

* Added unittest for BatchEncoding pickle/unpickle

* Added unittest for BatchEncoding is_fast

* More careful checking on BatchEncoding unpickle tests.

* Formatting.

* is_fast should assertTrue on Rust tokenizers.

* Ensure tensorflow has correct way of checking array_equal

* More formatting.

9e033649

15 Jun, 2020 14 commits

Add DistilBertForMultipleChoice (#5032) · f9f8a531
Sylvain Gugger authored Jun 15, 2020
```
* Add `DistilBertForMultipleChoice`
```
f9f8a531

[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized... · 36434220

Anthony MOI authored Jun 15, 2020


[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)

* Use tokenizers pre-tokenized pipeline

* failing pretrokenized test

* Fix is_pretokenized in python

* add pretokenized tests

* style and quality

* better tests for batched pretokenized inputs

* tokenizers clean up - new padding_strategy - split the files

* [HUGE] refactoring tokenizers - padding - truncation - tests

* style and quality

* bump up requied tokenizers version to 0.8.0-rc1

* switched padding/truncation API - simpler better backward compat

* updating tests for custom tokenizers

* style and quality - tests on pad

* fix QA pipeline

* fix backward compatibility for max_length only

* style and quality

* Various cleans up - add verbose

* fix tests

* update docstrings

* Fix tests

* Docs reformatted

* __call__ method documented
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

36434220

[Bart] Question Answering Model is added to tests (#5024) · ebba39e4
Patrick von Platen authored Jun 15, 2020
```
* fix test

* Update tests/test_modeling_common.py

* Update tests/test_modeling_common.py
```
ebba39e4
Add position_ids (#5021) · bbad4c69
Sylvain Gugger authored Jun 15, 2020

bbad4c69

feat(TFTrainer): improve logging (#4946) · 1bf4098e

Boris Dayma authored Jun 15, 2020

* feat(tftrainer): improve logging

* fix(trainer): consider case with evaluation only

* refactor(tftrainer): address comments

* refactor(tftrainer): move self.epoch_logging to __init__

1bf4098e

Fix importing transformers on Windows (#4997) · 7b5a1e7d
Funtowicz Morgan authored Jun 15, 2020

7b5a1e7d
Add bart-base (#5014) · a9f1fc6c
Sam Shleifer authored Jun 15, 2020

a9f1fc6c
Increase pipeline support for ONNX export. (#5005) · 7b685f52
Funtowicz Morgan authored Jun 15, 2020
```
* Increase pipeline support for ONNX export.

* Style.
```
7b685f52

Make DataCollator a callable (#5015) · 1affde2f

Sylvain Gugger authored Jun 15, 2020



* Make DataCollator a callable

* Update src/transformers/data/data_collator.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

1affde2f

Possible fix to make AMP work with DDP in the trainer (#4728) · f7c93b3c

Bram Vanroy authored Jun 15, 2020

* manually set device in trainer args

* check if current device is cuda before set_device

* Explicitly set GPU ID when using single GPU

This addresses https://github.com/huggingface/transformers/issues/4657#issuecomment-642228099

f7c93b3c

Create README.md (#4975) · 66bcfbb1

ipuneetrathore authored Jun 15, 2020



* Create README.md

* Update model_cards/ipuneetrathore/bert-base-cased-finetuned-finBERT/README.md
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

66bcfbb1

NER: fix construction of input examples for RoBERTa (#4943) · d812e6d7

Stefan Schweter authored Jun 15, 2020

* utils_ner: do not add extra sep token for RoBERTa model

* run_pl_ner: do not add extra sep token for RoBERTa model

d812e6d7

[model card] model card for bart-large-finetuned-squadv1 (#4977) · ebab096e
Suraj Patil authored Jun 15, 2020
```
* [model card] model card for bart-large-finetuned-squadv1

* add metadata link to the dataset
```
ebab096e

Improve ONNX logging (#4999) · 9ad36ad5

Funtowicz Morgan authored Jun 15, 2020

* Improve ONNX export logging to give more information about the generated graph.

* Correctly handle input and output in the logging.

9ad36ad5

14 Jun, 2020 2 commits
- fix (#4976) · 9931f817
  ZhuBaohe authored Jun 15, 2020
  
  9931f817
- BartTokenizerFast (#4878) · 9208f57b
  Suraj Patil authored Jun 14, 2020
  
  9208f57b
13 Jun, 2020 1 commit

Hans data (#4854) · 403d3098

Sylvain Gugger authored Jun 13, 2020

* Update hans data to be able to use Trainer

* Fixes

* Deal with tokenizer that don't have token_ids

* Clean up things

* Simplify data use

* Fix the input dict

* Formatting + proper path in README

403d3098

12 Jun, 2020 7 commits
- model_cards: we can now tag datasets · ca5e1cdf
  Julien Chaumond authored Jun 12, 2020
```
see corresponding model pages to see how it's rendered
```
  ca5e1cdf
- BartForQuestionAnswering (#4908) · e93ccb32
  Suraj Patil authored Jun 13, 2020
  
  e93ccb32
- Add AlbertForMultipleChoice (#4959) · 538531cd
  Sylvain Gugger authored Jun 12, 2020
```
* Add AlbertForMultipleChoice

* Make up to date and add all models to common tests
```
  538531cd
- Create README.md (#4865) · fe241397
  Manuel Romero authored Jun 12, 2020
  
  fe241397
- Create README.md (#4872) · 9aa219a1
  Yannis Papanikolaou authored Jun 12, 2020
  
  9aa219a1
- [AutoModel] Split AutoModelWithLMHead into clm, mlm, encoder-decoder (#4933) · 86578bb0
  Patrick von Platen authored Jun 12, 2020
```
* first commit

* add new auto models

* better naming

* fix bert automodel

* fix automodel for pretraining

* add models to init

* fix name typo

* fix typo

* better naming

* future warning instead of depreciation warning
```
  86578bb0
- [mbart] Fix fp16 testing logic (#4949) · 56200331
  Sam Shleifer authored Jun 11, 2020
  
  56200331
11 Jun, 2020 2 commits
- update `mvmt-pruning/saving_prunebert` (updating torch to 1.5) · 473808da
  VictorSanh authored Jun 11, 2020
  
  473808da
- fix indentation issue (#4941) · caf37466
  Patrick von Platen authored Jun 11, 2020
  
  caf37466