Commits · 49c5202522bdaf66e45df505b3a3c566e56134c3 · chenpangpang / transformers

16 Jun, 2020 3 commits

Eli5 examples (#4968) · 49c52025

Yacine Jernite authored Jun 16, 2020



* add eli5 examples

* add dense query script

* query_di

* merging

* merging

* add_utils

* adds nearest neighbor wikipedia

* batch queries

* training_retriever

* new notebooks

* moved retriever traiing script

* finished wiki40b

* max_len_fix

* train_s2s

* retriever_batch_checkpointing

* cleanup

* merge

* dim_fix

* fix_indexer

* fix_wiki40b_snippets

* fix_embed_for_r

* fp32 index

* fix_sparse_q

* joint_training

* remove obsolete datasets

* add_passage_nn_results

* add_passage_nn_results

* add_batch_nn

* add_batch_nn

* add_data_scripts

* notebook

* notebook

* notebook

* fix_multi_gpu

* add_app

* full_caching

* full_caching

* notebook

* sparse_done

* images

* notebook

* add_image_gif

* with_Gif

* add_contr_image

* notebook

* notebook

* notebook

* train_functions

* notebook

* min_retrieval_length

* pandas_option

* notebook

* min_retrieval_length

* notebook

* notebook

* eval_Retriever

* notebook

* images

* notebook

* add_example

* add_example

* notebook

* fireworks

* notebook

* notebook

* joe's notebook comments

* app_update

* notebook

* notebook_link

* captions

* notebook

* assing RetriBert model

* add RetriBert to Auto

* change AutoLMHead to AutoSeq2Seq

* notebook downloads from hf models

* style_black

* style_black

* app_update

* app_update

* fix_app_update

* style

* style

* isort

* Delete WikiELI5training.ipynb

* Delete evaluate_eli5.py

* Delete WikiELI5explore.ipynb

* Delete ExploreWikiELI5Support.html

* Delete explainlikeimfive.py

* Delete wiki_snippets.py

* children before parent

* children before parent

* style_black

* style_black_only

* isort

* isort_new

* Update src/transformers/modeling_retribert.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* typo fixes

* app_without_asset

* cleanup

* Delete ELI5animation.gif

* Delete ELI5contrastive.svg

* Delete ELI5wiki_index.svg

* Delete choco_bis.svg

* Delete fireworks.gif

* Delete huggingface_logo.jpg

* Delete huggingface_logo.svg

* Delete Long_Form_Question_Answering_with_ELI5_and_Wikipedia.ipynb

* Delete eli5_app.py

* Delete eli5_utils.py

* readme

* Update README.md

* unused imports

* moved_info

* default_beam

* ftuned model

* disclaimer

* Update src/transformers/modeling_retribert.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* black

* add_doc

* names

* isort_Examples

* isort_Examples

* Add doc to index
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

49c52025

[cleanup] examples test_run_squad uses tiny model (#5059) · c3e60749
Sam Shleifer authored Jun 16, 2020

c3e60749
Convert hans to Trainer (#5025) · d5477baf
Sylvain Gugger authored Jun 16, 2020
```
* Convert hans to Trainer

* Tick box
```
d5477baf

15 Jun, 2020 3 commits

[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized... · 36434220

Anthony MOI authored Jun 15, 2020


[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)

* Use tokenizers pre-tokenized pipeline

* failing pretrokenized test

* Fix is_pretokenized in python

* add pretokenized tests

* style and quality

* better tests for batched pretokenized inputs

* tokenizers clean up - new padding_strategy - split the files

* [HUGE] refactoring tokenizers - padding - truncation - tests

* style and quality

* bump up requied tokenizers version to 0.8.0-rc1

* switched padding/truncation API - simpler better backward compat

* updating tests for custom tokenizers

* style and quality - tests on pad

* fix QA pipeline

* fix backward compatibility for max_length only

* style and quality

* Various cleans up - add verbose

* fix tests

* update docstrings

* Fix tests

* Docs reformatted

* __call__ method documented
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

36434220

Make DataCollator a callable (#5015) · 1affde2f

Sylvain Gugger authored Jun 15, 2020



* Make DataCollator a callable

* Update src/transformers/data/data_collator.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

1affde2f

NER: fix construction of input examples for RoBERTa (#4943) · d812e6d7

Stefan Schweter authored Jun 15, 2020

* utils_ner: do not add extra sep token for RoBERTa model

* run_pl_ner: do not add extra sep token for RoBERTa model

d812e6d7

13 Jun, 2020 1 commit

Hans data (#4854) · 403d3098

Sylvain Gugger authored Jun 13, 2020

* Update hans data to be able to use Trainer

* Fixes

* Deal with tokenizer that don't have token_ids

* Clean up things

* Simplify data use

* Fix the input dict

* Formatting + proper path in README

403d3098

11 Jun, 2020 1 commit
- update `mvmt-pruning/saving_prunebert` (updating torch to 1.5) · 473808da
  VictorSanh authored Jun 11, 2020
  
  473808da
10 Jun, 2020 1 commit
- Remove unused arguments in Multiple Choice example (#4853) · e8db8b84
  Sylvain Gugger authored Jun 09, 2020
```
* Remove unused arguments

* Formatting

* Remove second todo comment
```
  e8db8b84
09 Jun, 2020 3 commits
- run_pplm.py bug fix (#4867) · 29c36e9f
  songyouwei authored Jun 10, 2020
```
`is_leaf` may become `False` after `.to(device=device)` function call.
```
  29c36e9f
- [examples] Cleanup summarization docs (#4876) · f90bc44d
  Sam Shleifer authored Jun 09, 2020
  
  f90bc44d
- [examples] consolidate summarization examples (#4837) · 02e5f796
  Amil Khare authored Jun 09, 2020
  
  02e5f796
08 Jun, 2020 1 commit
- Updates args in tf squad example. (#4820) · b6f365a8
  daniel-shan authored Jun 08, 2020
```
Co-authored-by: Daniel Shan <daniel.shan@workday.com>
```
  b6f365a8
06 Jun, 2020 1 commit
- Updated path "cd examples/text-generation/pplm" (#4778) · ddf9a3df
  Mr Ruben authored Jun 06, 2020
```
https://github.com/huggingface/transformers/issues/4776
```
  ddf9a3df
05 Jun, 2020 2 commits
- [isort] add matplotlib to known 3rd party dependencies (#4800) · 875288b3
  Sam Shleifer authored Jun 05, 2020
  
  875288b3
- [doc] Make it clearer that `text-generation` does not involve training · b9109f2d
  Julien Chaumond authored Jun 05, 2020
  
  b9109f2d
04 Jun, 2020 3 commits
- NER: Add new WNUT’17 example (#4681) · 2a4b9e09
  Stefan Schweter authored Jun 05, 2020
```
* ner: add preprocessing script for examples that splits longer sentences

* ner: example shell scripts use local preprocessing now

* ner: add new example section for WNUT’17 NER task. Remove old English CoNLL-03 results

* ner: satisfy black and isort
```
  2a4b9e09
- removed deprecared use of Variable api from pplm example · 48a05026
  prajjwal1 authored May 28, 2020
  
  48a05026
- Remove unnecessary model_type arg in example (#4771) · 492b352a
  Jason Phang authored Jun 04, 2020
  
  492b352a
02 Jun, 2020 4 commits

Add cache_dir to save features in GLUE + Differentiate match/mismatch for MNLI metrics (#4621) · b231a413

Jin Young Sohn authored Jun 02, 2020



* Glue task cleaup

* Enable writing cache to cache_dir in case dataset lives in readOnly
filesystem.
* Differentiate match vs mismatch for MNLI metrics.

* Style

* Fix pytype

* Fix type

* Use cache_dir in mnli mismatch eval dataset

* Small Tweaks
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

b231a413

Fix CI after killing archive maps (#4724) · b42586ea
Julien Chaumond authored Jun 02, 2020
```
* 🐛 Fix model ids for BART and Flaubert
```
b42586ea

Kill model archive maps (#4636) · d4c2cb40

Julien Chaumond authored Jun 02, 2020

* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI

d4c2cb40

Specify PyTorch versions for examples (#4710) · 88762a2f
Lysandre Debut authored Jun 02, 2020

88762a2f

01 Jun, 2020 15 commits
- finish README · bf760c80
  Victor SANH authored May 29, 2020
  
  bf760c80
- weird import · 9d7d9b3a
  Victor SANH authored May 29, 2020
  
  9d7d9b3a
- Update examples/movement-pruning/README.md · 2a3c88a6
  Victor SANH authored May 28, 2020
```
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  2a3c88a6
- Update examples/movement-pruning/README.md · 4ac462bf
  Victor SANH authored May 28, 2020
```
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  4ac462bf
- clarify README · 35fa0bbc
  Victor SANH authored May 28, 2020
  
  35fa0bbc
- flake8 compliance · cc746a50
  Victor SANH authored May 28, 2020
  
  cc746a50
- less prints in saving prunebert · b11386e1
  Victor SANH authored May 28, 2020
  
  b11386e1
- complete README · 8b5d4003
  Victor SANH authored May 28, 2020
  
  8b5d4003
- commplying with isort · 5c8e5b37
  Victor SANH authored May 28, 2020
  
  5c8e5b37
- space · db2a3b2e
  Victor SANH authored May 27, 2020
  
  db2a3b2e
- add floppy bert model notebok · 5f8f2d84
  Victor SANH authored May 27, 2020
  
  5f8f2d84
- add requirements · b41948f5
  Victor SANH authored May 27, 2020
  
  b41948f5
- add scripts · fb8f4277
  Victor SANH authored May 27, 2020
  
  fb8f4277
- add masked_run_* · d489a6d3
  Victor SANH authored May 27, 2020
  
  d489a6d3
- add sparsity modules · e4c07faf
  Victor SANH authored May 27, 2020
  
  e4c07faf
27 May, 2020 2 commits

[Benchmark] Memory benchmark utils (#4198) · 96f57c9c

Patrick von Platen authored May 27, 2020



* improve memory benchmarking

* correct typo

* fix current memory

* check torch memory allocated

* better pytorch function

* add total cached gpu memory

* add total gpu required

* improve torch gpu usage

* update memory usage

* finalize memory tracing

* save intermediate benchmark class

* fix conflict

* improve benchmark

* improve benchmark

* finalize

* make style

* improve benchmarking

* correct typo

* make train function more flexible

* fix csv save

* better repr of bytes

* better print

* fix __repr__ bug

* finish plot script

* rename plot file

* delete csv and small improvements

* fix in plot

* fix in plot

* correct usage of timeit

* remove redundant line

* remove redundant line

* fix bug

* add hf parser tests

* add versioning and platform info

* make style

* add gpu information

* ensure backward compatibility

* finish adding all tests

* Update src/transformers/benchmark/benchmark_args.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/benchmark/benchmark_args_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* delete csv files

* fix isort ordering

* add out of memory handling

* add better train memory handling
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

96f57c9c

per_device instead of per_gpu/error thrown when argument unknown (#4618) · 6a176880

Lysandre Debut authored May 27, 2020



* per_device instead of per_gpu/error thrown when argument unknown

* [docs] Restore examples.md symlink

* Correct absolute links so that symlink to the doc works correctly

* Update src/transformers/hf_argparser.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

* Warning + reorder

* Docs

* Style

* not for squad
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

6a176880