Commits · 0ae96ff8a7e2d371242452d81bee85da8df202f5 · chenpangpang / transformers

07 May, 2020 1 commit

BIG Reorganize examples (#4213) · 0ae96ff8

Julien Chaumond authored May 07, 2020

* Created using Colaboratory

* [examples] reorganize files

* remove run_tpu_glue.py as superseded by TPU support in Trainer

* Bugfix: int, not tuple

* move files around

0ae96ff8

28 Apr, 2020 1 commit
- [Generation] Generation should allow to start with empty prompt (#3993) · 18058574
  Patrick von Platen authored Apr 28, 2020
```
* fix empty prompt

* fix length in generation pipeline
```
  18058574
02 Mar, 2020 1 commit
- fix n_gpu count when no_cuda flag is activated (#3077) · 6b1ff250
  Victor SANH authored Mar 02, 2020
```
* fix n_gpu count when no_cuda flag is activated

* someone was left behind
```
  6b1ff250
24 Feb, 2020 1 commit

Add preprocessing step for transfo-xl tokenization to avoid tokenizing words... · 65d74c49

Patrick von Platen authored Feb 24, 2020

Add preprocessing step for transfo-xl tokenization to avoid tokenizing words followed by punction to <unk> (#2987)

* add preprocessing to add space before punctuation for transfo_xl

* improve warning messages

* make style

* compile regex at instantination of tokenizer object

65d74c49

21 Feb, 2020 1 commit

Improve special_token_id logic in run_generation.py and add tests (#2885) · fc38d4c8

Patrick von Platen authored Feb 21, 2020



* improving generation

* finalized special token behaviour for no_beam_search generation

* solved modeling_utils merge conflict

* solve merge conflicts in modeling_utils.py

* add run_generation improvements from PR #2749

* adapted language generation to not use hardcoded -1 if no padding token is available

* remove the -1 removal as hard coded -1`s are not necessary anymore

* add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown

* add slow language generation tests for pretrained models using hardcoded output with pytorch seed

* delete ipdb

* check that all generated tokens are valid

* renaming

* renaming Generation -> Generate

* make style

* updated so that generate_beam_search has same token behavior than generate_no_beam_search

* consistent return format for run_generation.py

* deleted pretrain lm generate tests -> will be added in another PR

* cleaning of unused if statements and renaming

* run_generate will always return an iterable

* make style

* consistent renaming

* improve naming, make sure generate function always returns the same tensor, add docstring

* add slow tests for all lmhead models

* make style and improve example comments modeling_utils

* better naming and refactoring in modeling_utils

* improving generation

* finalized special token behaviour for no_beam_search generation

* solved modeling_utils merge conflict

* solve merge conflicts in modeling_utils.py

* add run_generation improvements from PR #2749

* adapted language generation to not use hardcoded -1 if no padding token is available

* remove the -1 removal as hard coded -1`s are not necessary anymore

* add lightweight language generation testing for randomely initialized models - just checking whether no errors are thrown

* add slow language generation tests for pretrained models using hardcoded output with pytorch seed

* delete ipdb

* check that all generated tokens are valid

* renaming

* renaming Generation -> Generate

* make style

* updated so that generate_beam_search has same token behavior than generate_no_beam_search

* consistent return format for run_generation.py

* deleted pretrain lm generate tests -> will be added in another PR

* cleaning of unused if statements and renaming

* run_generate will always return an iterable

* make style

* consistent renaming

* improve naming, make sure generate function always returns the same tensor, add docstring

* add slow tests for all lmhead models

* make style and improve example comments modeling_utils

* better naming and refactoring in modeling_utils

* changed fast random lm generation testing design to more general one

* delete in old testing design in gpt2

* correct old variable name

* temporary fix for encoder_decoder lm generation tests - has to be updated when t5 is fixed

* adapted all fast random generate tests to new design

* better warning description in modeling_utils

* better comment

* better comment and error message
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

fc38d4c8

31 Jan, 2020 2 commits
- run_generation style · d18d47be
  Lysandre authored Jan 31, 2020
  
  d18d47be
- do_sample should be set to True in run_generation.py · 7365f01d
  Lysandre authored Jan 31, 2020
  
  7365f01d
06 Jan, 2020 2 commits
- GPU text generation: mMoved the encoded_prompt to correct device · 81d6841b
  alberduris authored Dec 31, 2019
  
  81d6841b
- Moved the encoded_prompts to correct device · dd4df80f
  alberduris authored Dec 31, 2019
  
  dd4df80f
22 Dec, 2019 3 commits
- Remove __future__ imports. · c824d15a
  Aymeric Augustin authored Dec 22, 2019
  
  c824d15a
- Fix F821 flake8 warning (x47). · 2ab78325
  Aymeric Augustin authored Dec 21, 2019
```
Ignore warnings related to Python 2, because it's going away soon.
```
  2ab78325
- Sort imports with isort. · 158e82e0
  Aymeric Augustin authored Dec 21, 2019
```
This is the result of:

    $ isort --recursive examples templates transformers utils hubconf.py setup.py
```
  158e82e0
21 Dec, 2019 3 commits

Reformat source code with black. · fa84ae26

Aymeric Augustin authored Dec 21, 2019

This is the result of:

    $ black --line-length 119 examples templates transformers utils hubconf.py setup.py

There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.

This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.

fa84ae26

fixing run_generation example - using torch.no_grad · 300ec300
thomwolf authored Dec 21, 2019

300ec300
fixing run_generation · 1c377468
thomwolf authored Dec 21, 2019

1c377468

18 Dec, 2019 1 commit
- further cleanup · 3d2096f5
  thomwolf authored Dec 18, 2019
  
  3d2096f5
16 Dec, 2019 1 commit
- fix #2180 · 18a879f4
  Lysandre authored Dec 16, 2019
  
  18a879f4
10 Dec, 2019 1 commit
- add greedy decoding and sampling · 07bc8efb
  Rémi Louf authored Nov 15, 2019
  
  07bc8efb
31 Oct, 2019 1 commit
- [run_generation] Fix generation with batch_size>1 · f96ce1c2
  Julien Chaumond authored Oct 31, 2019
  
  f96ce1c2
22 Oct, 2019 2 commits
- [CTRL] warn if generation prompt does not start with a control code · ef1b8b2a
  Julien Chaumond authored Oct 22, 2019
```
see also https://github.com/salesforce/ctrl/pull/50
```
  ef1b8b2a
- Remove · 7d709e55
  Lysandre authored Oct 22, 2019
  
  7d709e55
17 Oct, 2019 1 commit
- fix repetition penalty · ecd15667
  leo-du authored Oct 17, 2019
  
  ecd15667
10 Oct, 2019 2 commits
- move back to simple space spliting · 177a7212
  thomwolf authored Oct 10, 2019
  
  177a7212
- better error messages · a5997dd8
  thomwolf authored Oct 10, 2019
  
  a5997dd8
06 Oct, 2019 1 commit
- Correct device assignment in run_generation · f3e0218f
  LysandreJik authored Oct 05, 2019
  
  f3e0218f
04 Oct, 2019 1 commit

Adding CTRL (squashed commit) · dbed1c5d

keskarnitish authored Sep 30, 2019

adding conversion script

adding first draft of modeling & tokenization

adding placeholder for test files

bunch of changes

registering the tokenizer/model/etc

tests

change link; something is very VERY wrong here

weird end-of-word thingy going on

i think the tokenization works now ; wrote the unit tests

overall structure works;load w next

the monster is alive!

works after some cleanup as well

adding emacs autosave to gitignore

currently only supporting the 48 layer one; seems to infer fine on my macbook

cleanup

fixing some documentation

fixing some documentation

tests passing?

now works on CUDA also

adding greedy?

adding greedy sampling

works well

dbed1c5d

03 Oct, 2019 2 commits
- XLM use_lang_embedding flag in run_generation · ecc4f1bd
  LysandreJik authored Oct 03, 2019
  
  ecc4f1bd
- Added XLM to run_generation, with prompt language selection. · c2c2ca0f
  LysandreJik authored Oct 03, 2019
  
  c2c2ca0f
26 Sep, 2019 1 commit
- [BIG] pytorch-transformers => transformers · 31c23bd5
  thomwolf authored Sep 26, 2019
  
  31c23bd5
25 Sep, 2019 1 commit
- [FIX] fix run_generation.py to work with batch_size > 1 · a9f24a16
  mataney authored Sep 25, 2019
  
  a9f24a16
22 Sep, 2019 1 commit
- Add option to use a 'stop token' which will be used to truncate the output... · 4b543c30
  Lorenzo Ampil authored Sep 22, 2019
```
Add option to use a 'stop token' which will be used to truncate the output text to everything till right before the 'stop token'
```
  4b543c30
15 Jul, 2019 1 commit
- update QA models tests + run_generation · e691fc09
  thomwolf authored Jul 15, 2019
  
  e691fc09
14 Jul, 2019 1 commit
- updating examples and doc · 2397f958
  thomwolf authored Jul 14, 2019
  
  2397f958
13 Jul, 2019 1 commit
- good quality generation example for GPT, GPT-2, Transfo-XL, XLNet · 7d4b200e
  thomwolf authored Jul 13, 2019
  
  7d4b200e