Commits · eda07efaa58f6c70ac92e7947d8e134cf99d6ec8 · chenpangpang / transformers

13 Aug, 2020 1 commit

Add POS tagging and Phrase chunking token classification examples (#6457) · eda07efa

vblagoje authored Aug 13, 2020

* Add more token classification examples

* POS tagging example

* Phrase chunking example

* PR review fixes

* Add conllu to third party list (used in token classification examples)

eda07efa

29 Jul, 2020 1 commit

Rework TF trainer (#6038) · 54f9fbef

Julien Plu authored Jul 29, 2020

* Fully rework training/prediction loops

* fix method name

* Fix variable name

* Fix property name

* Fix scope

* Fix method name

* Fix tuple index

* Fix tuple index

* Fix indentation

* Fix variable name

* fix eval before log

* Add drop remainder for test dataset

* Fix step number + fix logging datetime

* fix eval loss value

* use global step instead of step + fix logging at step 0

* Fix logging datetime

* Fix global_step usage

* Fix breaking loop + logging datetime

* Fix step in prediction loop

* Fix step breaking

* Fix train/test loops

* Force TF at least 2.2 for the trainer

* Use assert_cardinality to facilitate the dataset size computation

* Log steps per epoch

* Make tfds compliant with TPU

* Make tfds compliant with TPU

* Use TF dataset enumerate instead of the Python one

* revert previous commit

* Fix data_dir

* Apply style

* rebase on master

* Address Sylvain's comments

* Address Sylvain's and Lysandre comments

* Trigger CI

* Remove unused import

54f9fbef

15 Jun, 2020 1 commit

NER: fix construction of input examples for RoBERTa (#4943) · d812e6d7

Stefan Schweter authored Jun 15, 2020

* utils_ner: do not add extra sep token for RoBERTa model

* run_pl_ner: do not add extra sep token for RoBERTa model

d812e6d7

14 May, 2020 1 commit
- Use Filelock to ensure distributed barriers · c547f15a
  Julien Chaumond authored May 14, 2020
```
see context in https://github.com/huggingface/transformers/pull/4223
```
  c547f15a
07 May, 2020 1 commit

BIG Reorganize examples (#4213) · 0ae96ff8

Julien Chaumond authored May 07, 2020

* Created using Colaboratory

* [examples] reorganize files

* remove run_tpu_glue.py as superseded by TPU support in Trainer

* Bugfix: int, not tuple

* move files around

0ae96ff8

06 May, 2020 1 commit

TF version of the trainer (#4017) · aad50151

Julien Plu authored May 06, 2020

* First commit to add a TF version of the trainer.

* Make the TF trainer closer to what looks the PT trainer

* Refactoring common code between the PT and TF trainer into an util file.

* Some bugfix + better similarity with the PT trainer

* Add missing class in transformers init

* Bugfix over prediction + use classification report instead of simple metrics

* Fix name error

* Fix optimization tests + style

* Apply style

* Several bugfix for multi-gpu training

* Apply style

* Apply style

* Add glue example for the TF trainer

* Several bugix + address the reviews

* Fix on the TF training args file

* Add a debug mode

* Bugfix in utils_ner.py when segment_ids is None

* Apply style

* Apply style

* Add TPU strategy

* Fix selection strategy

aad50151

22 Apr, 2020 1 commit

Trainer (#3800) · dd9d483d

Julien Chaumond authored Apr 21, 2020

* doc

* [tests] Add sample files for a regression task

* [HUGE] Trainer

* Feedback from @sshleifer

* Feedback from @thomwolf + logging tweak

* [file_utils] when downloading concurrently, get_from_cache will use the cached file for subsequent processes

* [glue] Use default max_seq_length of 128 like before

* [glue] move DataTrainingArguments around

* [ner] Change interface of InputExample, and align run_{tf,pl}

* Re-align the pl scripts a little bit

* ner

* [ner] Add integration test

* Fix language_modeling with API tweak

* [ci] Tweak loss target

* Don't break console output

* amp.initialize: model must be on right device before

* [multiple-choice] update for Trainer

* Re-align to 827d6d6e

dd9d483d

27 Mar, 2020 1 commit

run_ner.py / bert-base-multilingual-cased can output empty tokens (#2991) · b08259a1

Funtowicz Morgan authored Mar 27, 2020



* Use tokenizer.num_added_tokens to count number of added special_tokens instead of hardcoded numbers.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* run_ner.py - Do not add a label to the labels_ids if word_tokens is empty.

This can happen when using bert-base-multilingual-cased with an input containing an unique space.
In this case, the tokenizer will output just an empty word_tokens thus leading to an non-consistent behavior
over the labels_ids tokens adding one more tokens than tokens vector.
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

b08259a1

20 Feb, 2020 1 commit

Support for torch-lightning in NER examples (#2890) · b662f0e6

srush authored Feb 20, 2020



* initial pytorch lightning commit

* tested multigpu

* Fix learning rate schedule

* black formatting

* fix flake8

* isort

* isort

* .
Co-authored-by: Check your git settings! <chris@chris-laptop>

b662f0e6

01 Feb, 2020 1 commit
- Fix typo in examples/utils_ner.py · 2ba147ec
  Antonio Carlos Falcão Petri authored Feb 01, 2020
```
"%s-%d".format() -> "{}-{}".format()
```
  2ba147ec
06 Jan, 2020 2 commits
- GPU text generation: mMoved the encoded_prompt to correct device · 81d6841b
  alberduris authored Dec 31, 2019
  
  81d6841b
- Moved the encoded_prompts to correct device · dd4df80f
  alberduris authored Dec 31, 2019
  
  dd4df80f
22 Dec, 2019 3 commits
- Use built-in open(). · 1c62e87b
  Aymeric Augustin authored Dec 22, 2019
```
On Python 3, `open is io.open`.
```
  1c62e87b
- Remove __future__ imports. · c824d15a
  Aymeric Augustin authored Dec 22, 2019
  
  c824d15a
- Sort imports with isort. · 158e82e0
  Aymeric Augustin authored Dec 21, 2019
```
This is the result of:

    $ isort --recursive examples templates transformers utils hubconf.py setup.py
```
  158e82e0
21 Dec, 2019 1 commit

Reformat source code with black. · fa84ae26

Aymeric Augustin authored Dec 21, 2019

This is the result of:

    $ black --line-length 119 examples templates transformers utils hubconf.py setup.py

There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.

This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.

fa84ae26

12 Dec, 2019 1 commit
- Update example scripts · 3fd71c44
  LysandreJik authored Dec 12, 2019
  
  3fd71c44
15 Oct, 2019 8 commits
- Add option to predict on test set · 5ff9cd15
  Marianne Stecklina authored Sep 23, 2019
  
  5ff9cd15
- Add cli argument for configuring labels · 7f5367e0
  Marianne Stecklina authored Sep 19, 2019
  
  7f5367e0
- Make file reading more robust · e1d4179b
  Marianne Stecklina authored Sep 19, 2019
  
  e1d4179b
- Implement fine-tuning BERT on CoNLL-2003 named entity recognition task · 383ef967
  Marianne Stecklina authored Sep 17, 2019
  
  383ef967
- Add option to predict on test set · 5adb39e7
  Marianne Stecklina authored Sep 23, 2019
  
  5adb39e7
- Add cli argument for configuring labels · 99b189df
  Marianne Stecklina authored Sep 19, 2019
  
  99b189df
- Make file reading more robust · 3e9420ad
  Marianne Stecklina authored Sep 19, 2019
  
  3e9420ad
- Implement fine-tuning BERT on CoNLL-2003 named entity recognition task · cde42c43
  Marianne Stecklina authored Sep 17, 2019
  
  cde42c43