Commits · 042a6aa77772a4483020dcd45928302cef22af76 · chenpangpang / transformers

17 Nov, 2020 1 commit

Tokenizers: ability to load from model subfolder (#8586) · 042a6aa7

Julien Chaumond authored Nov 17, 2020



* <small>tiny typo</small>

* Tokenizers: ability to load from model subfolder

* use subfolder for local files as well

* Uniformize model shortcut name => model id

* from s3 => from huggingface.co
Co-authored-by: Quentin Lhoest <lhoest.q@gmail.com>

042a6aa7

15 Nov, 2020 1 commit

[breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests... · f4e04cd2

Thomas Wolf authored Nov 15, 2020


[breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency (#8073)

* Fixing roberta for slow-fast tests

* WIP getting equivalence on pipelines

* slow-to-fast equivalence - working on question-answering pipeline

* optional FAISS tests

* Pipeline Q&A

* Move pipeline tests to their own test job again

* update tokenizer to add sequence id methods

* update to tokenizers 0.9.4

* set sentencepiecce as optional

* clean up squad

* clean up pipelines to use sequence_ids

* style/quality

* wording

* Switch to use_fast = True by default

* update tests for use_fast at True by default

* fix rag tokenizer test

* removing protobuf from required dependencies

* fix NER test for use_fast = True by default

* fixing example tests (Q&A examples use slow tokenizers for now)

* protobuf in main deps extras["sentencepiece"] and example deps

* fix protobug install test

* try to fix seq2seq by switching to slow tokenizers for now

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

f4e04cd2

12 Nov, 2020 1 commit
- Try to understand and apply Sylvain's comments (#8458) · 27b3ff31
  Julien Plu authored Nov 12, 2020
  
  27b3ff31
26 Aug, 2020 1 commit
- Black 20 release · a75c64d8
  Lysandre authored Aug 26, 2020
  
  a75c64d8
07 Jul, 2020 1 commit

[examples] Add trainer support for question-answering (#4829) · e49393c3

Suraj Patil authored Jul 07, 2020



* add SquadDataset

* add DataCollatorForQuestionAnswering

* update __init__

* add run_squad with  trainer

* add DataCollatorForQuestionAnswering in __init__

* pass data_collator to trainer

* doc tweak

* Update run_squad_trainer.py

* Update __init__.py

* Update __init__.py
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

e49393c3

19 May, 2020 1 commit

Distributed eval: SequentialDistributedSampler + gather all results (#4243) · 5e7fe8b5

Julien Chaumond authored May 18, 2020

* Distributed eval: SequentialDistributedSampler + gather all results

* For consistency only write to disk from world_master

Close https://github.com/huggingface/transformers/issues/4272

* Working distributed eval

* Hook into scripts

* Fix #3721 again

* TPU.mesh_reduce: stay in tensor space

Thanks @jysohn23

* Just a small comment

* whitespace

* torch.hub: pip install packaging

* Add test scenarii

5e7fe8b5

14 May, 2020 1 commit
- Use Filelock to ensure distributed barriers · c547f15a
  Julien Chaumond authored May 14, 2020
```
see context in https://github.com/huggingface/transformers/pull/4223
```
  c547f15a
08 May, 2020 1 commit

[TPU] Doc, fix xla_spawn.py, only preprocess dataset once (#4223) · 7b75aa9f

Julien Chaumond authored May 08, 2020

* [TPU] Doc, fix xla_spawn.py, only preprocess dataset once

* Update examples/README.md

* [xla_spawn] Add `_mp_fn` to other Trainer scripts

* [TPU] Fix: eval dataloader was None

7b75aa9f

07 May, 2020 1 commit

BIG Reorganize examples (#4213) · 0ae96ff8

Julien Chaumond authored May 07, 2020

* Created using Colaboratory

* [examples] reorganize files

* remove run_tpu_glue.py as superseded by TPU support in Trainer

* Bugfix: int, not tuple

* move files around

0ae96ff8

24 Apr, 2020 1 commit
- [examples] For convenience, also save the tokenizer · c8115260
  Julien Chaumond authored Apr 24, 2020
```
Close #3921
```
  c8115260
22 Apr, 2020 1 commit

Trainer (#3800) · dd9d483d

Julien Chaumond authored Apr 21, 2020

* doc

* [tests] Add sample files for a regression task

* [HUGE] Trainer

* Feedback from @sshleifer

* Feedback from @thomwolf + logging tweak

* [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes

* [glue] Use default max_seq_length of 128 like before

* [glue] move DataTrainingArguments around

* [ner] Change interface of InputExample, and align run_{tf,pl}

* Re-align the pl scripts a little bit

* ner

* [ner] Add integration test

* Fix language_modeling with API tweak

* [ci] Tweak loss target

* Don't break console output

* amp.initialize: model must be on right device before

* [multiple-choice] update for Trainer

* Re-align to 827d6d6e

dd9d483d

20 Apr, 2020 1 commit
- Fix bug in examples: double wrap into DataParallel during eval · b1ff0b2a
  Andrey Kulagin authored Apr 17, 2020
  
  b1ff0b2a
06 Apr, 2020 1 commit

Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py (#3631) · e52d1258

Ethan Perez authored Apr 06, 2020

* Fix RoBERTa/XLNet Pad Token in run_multiple_choice.py

`convert_examples_to_fes atures` sets `pad_token=0` by default, which is correct for BERT but incorrect for RoBERTa (`pad_token=1`) and XLNet (`pad_token=5`). I think the other arguments to `convert_examples_to_features` are correct, but it might be helpful if someone checked who is more familiar with this part of the codebase.

* Simplifying change to match recent commits

e52d1258

01 Apr, 2020 1 commit
- Tokenizers: Start cleaning examples a little (#3455) · 50e15c82
  Julien Chaumond authored Apr 01, 2020
```
* Start cleaning examples

* Fixup
```
  50e15c82
02 Mar, 2020 1 commit
- fix n_gpu count when no_cuda flag is activated (#3077) · 6b1ff250
  Victor SANH authored Mar 02, 2020
```
* fix n_gpu count when no_cuda flag is activated

* someone was left behind
```
  6b1ff250
28 Jan, 2020 1 commit
- Default save steps 50 to 500 in all scripts · 335dd5e6
  Lysandre authored Jan 28, 2020
  
  335dd5e6
06 Jan, 2020 2 commits
- GPU text generation: mMoved the encoded_prompt to correct device · 81d6841b
  alberduris authored Dec 31, 2019
  
  81d6841b
- Moved the encoded_prompts to correct device · dd4df80f
  alberduris authored Dec 31, 2019
  
  dd4df80f
23 Dec, 2019 1 commit
- Remove unused variables in examples. · 81422c4e
  Aymeric Augustin authored Dec 23, 2019
  
  81422c4e
22 Dec, 2019 6 commits
- Update comments mentioning Python 2. · d6eaf4e6
  Aymeric Augustin authored Dec 22, 2019
  
  d6eaf4e6
- Remove __future__ imports. · c824d15a
  Aymeric Augustin authored Dec 22, 2019
  
  c824d15a
- Fix E266 flake8 warning (x90). · fa2ccbc0
  Aymeric Augustin authored Dec 21, 2019
  
  fa2ccbc0
- Fix E722 flake8 warnings (x26). · 631be270
  Aymeric Augustin authored Dec 21, 2019
  
  631be270
- Fix E712 flake8 warning (x1). · 357db709
  Aymeric Augustin authored Dec 21, 2019
  
  357db709
- Sort imports with isort. · 158e82e0
  Aymeric Augustin authored Dec 21, 2019
```
This is the result of:

    $ isort --recursive examples templates transformers utils hubconf.py setup.py
```
  158e82e0
21 Dec, 2019 1 commit

Reformat source code with black. · fa84ae26

Aymeric Augustin authored Dec 21, 2019

This is the result of:

    $ black --line-length 119 examples templates transformers utils hubconf.py setup.py

There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.

This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.

fa84ae26

03 Dec, 2019 1 commit
- Use full dataset for eval (SequentialSampler in Distributed setting) · 48cbf267
  VictorSanh authored Dec 03, 2019
  
  48cbf267
14 Nov, 2019 1 commit
- update the examples, docs and template · 2276bf69
  Rémi Louf authored Nov 14, 2019
  
  2276bf69
12 Nov, 2019 1 commit
- fix multi-gpu eval · 2e311765
  ronakice authored Nov 12, 2019
  
  2e311765
04 Nov, 2019 1 commit
- Fix #1623 · 89d62728
  thomwolf authored Nov 04, 2019
  
  89d62728
08 Oct, 2019 1 commit
- Change tensorboard imports to use built-in tensorboard if available · 5ce8d29a
  Bilal Khan authored Oct 08, 2019
  
  5ce8d29a
04 Oct, 2019 1 commit
- Honor args.overwrite_cache (h/t @erenup) · 9e136ff5
  Julien Chaumond authored Oct 04, 2019
  
  9e136ff5
03 Oct, 2019 1 commit
- Evaluation result.txt path changing #1286 · 2195c0d5
  Brian Ma authored Oct 03, 2019
  
  2195c0d5
30 Sep, 2019 1 commit
- [multiple-choice] Simplify and use tokenizer.encode_plus · f5bcde0b
  Julien Chaumond authored Sep 30, 2019
  
  f5bcde0b
26 Sep, 2019 1 commit
- [BIG] pytorch-transformers => transformers · 31c23bd5
  thomwolf authored Sep 26, 2019
  
  31c23bd5
18 Sep, 2019 2 commits
- fixed to find best dev acc · 8960988f
  erenup authored Sep 19, 2019
  
  8960988f
- move run_multiple_choice.py and utils_multiple_choice.py to examples · 15143fba
  erenup authored Sep 18, 2019
  
  15143fba