Commits · feeb956a19ca08e3b9657ea9ec7d14adb6304c85 · chenpangpang / transformers

22 Jul, 2020 1 commit
- [docs] Add integration test example to copy pasta template (#5961) · feeb956a
  Sam Shleifer authored Jul 22, 2020
```
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  feeb956a
26 Jun, 2020 1 commit

[tokenizers] Updates data processors, docstring, examples and model cards to the new API (#5308) · 601d4d69

Thomas Wolf authored Jun 26, 2020

* remove references to old API in docstring - update data processors

* style

* fix tests - better type checking error messages

* better type checking

* include awesome fix by @LysandreJik for #5310

* updated doc and examples

601d4d69

15 Jun, 2020 1 commit

[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized... · 36434220

Anthony MOI authored Jun 15, 2020


[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)

* Use tokenizers pre-tokenized pipeline

* failing pretrokenized test

* Fix is_pretokenized in python

* add pretokenized tests

* style and quality

* better tests for batched pretokenized inputs

* tokenizers clean up - new padding_strategy - split the files

* [HUGE] refactoring tokenizers - padding - truncation - tests

* style and quality

* bump up requied tokenizers version to 0.8.0-rc1

* switched padding/truncation API - simpler better backward compat

* updating tests for custom tokenizers

* style and quality - tests on pad

* fix QA pipeline

* fix backward compatibility for max_length only

* style and quality

* Various cleans up - add verbose

* fix tests

* update docstrings

* Fix tests

* Docs reformatted

* __call__ method documented
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

36434220

09 Jun, 2020 1 commit

[All models] Extend config.output_attentions with output_attentions function arguments (#4538) · 6e603cb7

Bharat Raghunathan authored Jun 10, 2020



* DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``

* DOC: Apply Black Formatting

* Fix errors where output_attentions was undefined

* Remove output_attentions in classes per review

* Fix regressions on tests having `output_attention`

* Fix further regressions in tests relating to `output_attentions`

Ensure proper propagation of `output_attentions` as a function parameter
to all model subclasses

* Fix more regressions in `test_output_attentions`

* Fix issues with BertEncoder

* Rename related variables to `output_attentions`

* fix pytorch tests

* fix bert and gpt2 tf

* Fix most TF tests for `test_output_attentions`

* Fix linter errors and more TF tests

* fix conflicts

* DOC: Apply Black Formatting

* Fix errors where output_attentions was undefined

* Remove output_attentions in classes per review

* Fix regressions on tests having `output_attention`

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* fix pytorch tests

* fix conflicts

* fix conflicts

* Fix linter errors and more TF tests

* fix tf tests

* make style

* fix isort

* improve output_attentions

* improve tensorflow
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

6e603cb7

02 Jun, 2020 1 commit

Kill model archive maps (#4636) · d4c2cb40

Julien Chaumond authored Jun 02, 2020

* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI

d4c2cb40

29 Apr, 2020 1 commit

CDN urls (#4030) · 455c6390

Julien Chaumond authored Apr 28, 2020

* [file_utils] use_cdn + documentation

* Move to cdn. urls for weights

* [urls] Hotfix for bert-base-japanese

455c6390

18 Apr, 2020 1 commit

Cleanup fast tokenizers integration (#3706) · 827d6d6e

Thomas Wolf authored Apr 18, 2020



* First pass on utility classes and python tokenizers

* finishing cleanup pass

* style and quality

* Fix tests

* Updating following @mfuntowicz comment

* style and quality

* Fix Roberta

* fix batch_size/seq_length inBatchEncoding

* add alignement methods + tests

* Fix OpenAI and Transfo-XL tokenizers

* adding trim_offsets=True default for GPT2 et RoBERTa

* style and quality

* fix tests

* add_prefix_space in roberta

* bump up tokenizers to rc7

* style

* unfortunately tensorfow does like these - removing shape/seq_len for now

* Update src/transformers/tokenization_utils.py
Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Adding doc and docstrings

* making flake8 happy
Co-authored-by: Stefan Schweter <stefan@schweter.it>

827d6d6e

16 Apr, 2020 1 commit
- [cleanup] factor out get_head_mask, invert_attn_mask, get_exten… (#3806) · dbd04124
  Sam Shleifer authored Apr 16, 2020
```
* Delete some copy pasted code
```
  dbd04124
08 Apr, 2020 1 commit

More doc for model cards (#3698) · a594ee9c

Julien Chaumond authored Apr 08, 2020

see https://github.com/huggingface/transformers/pull/3679#pullrequestreview-389368270

a594ee9c

04 Apr, 2020 1 commit
- weigths*weights · 94eb68d7
  Julien Chaumond authored Apr 04, 2020
  
  94eb68d7
07 Feb, 2020 1 commit
- Fix importing unofficial TF models with extra optimizer weights · 73368963
  monologg authored Jan 27, 2020
  
  73368963
15 Jan, 2020 1 commit
- 💄 super · 83a41d39
  Julien Chaumond authored Jan 15, 2020
  
  83a41d39
13 Jan, 2020 1 commit
- Config to Model mapping · b803b067
  Julien Chaumond authored Jan 13, 2020
  
  b803b067
07 Jan, 2020 1 commit
- Fix typograpical errors (#2438) · d6a677b1
  Genta Indra Winata authored Jan 08, 2020
  
  d6a677b1
06 Jan, 2020 2 commits
- GPU text generation: mMoved the encoded_prompt to correct device · 81d6841b
  alberduris authored Dec 31, 2019
  
  81d6841b
- Moved the encoded_prompts to correct device · dd4df80f
  alberduris authored Dec 31, 2019
  
  dd4df80f
05 Jan, 2020 1 commit
- Enforce target version for black. · 0ffc8eaf
  Aymeric Augustin authored Dec 27, 2019
```
This should stabilize formatting.
```
  0ffc8eaf
28 Dec, 2019 1 commit
- Kill __main__ · 4d6c93e9
  Julien Chaumond authored Dec 27, 2019
  
  4d6c93e9
22 Dec, 2019 18 commits
- Use built-in open(). · 1c62e87b
  Aymeric Augustin authored Dec 22, 2019
```
On Python 3, `open is io.open`.
```
  1c62e87b
- Remove six. · 8af25b16
  Aymeric Augustin authored Dec 22, 2019
  
  8af25b16
- Remove __future__ imports. · c824d15a
  Aymeric Augustin authored Dec 22, 2019
  
  c824d15a
- Replace CommonTestCases for tokenizers with a mixin. · 00204f2b
  Aymeric Augustin authored Dec 22, 2019
```
This is the same change as for (TF)CommonTestCases for modeling.
```
  00204f2b
- Rename file for consistency. · a3c5883f
  Aymeric Augustin authored Dec 22, 2019
  
  a3c5883f
- Replace (TF)CommonTestCases for modeling with a mixin. · 345c23a6
  Aymeric Augustin authored Dec 22, 2019
```
I suspect the wrapper classes were created in order to prevent the
abstract base class (TF)CommonModelTester from being included in test
discovery and running, because that would fail.

I solved this by replacing the abstract base class with a mixin.

Code changes are just de-indenting and automatic reformattings
performed by black to use the extra line space.
```
  345c23a6
- Remove unittest.main() in test modules. · 7e98e211
  Aymeric Augustin authored Dec 22, 2019
```
This construct isn't used anymore these days.

Running python tests/test_foo.py puts the tests/ directory on
PYTHONPATH, which isn't representative of how we run tests.

Use python -m unittest tests/test_foo.py instead.
```
  7e98e211
- Switch test files to the standard test_*.py scheme. · ced0a942
  Aymeric Augustin authored Dec 22, 2019
  
  ced0a942
- Fix F401 flake8 warning (x28). · 939148b0
  Aymeric Augustin authored Dec 21, 2019
```
Do manually what autoflake couldn't manage.
```
  939148b0
- Fix F401 flake8 warning (x88 / 116). · 783a6169
  Aymeric Augustin authored Dec 21, 2019
```
This change is mostly autogenerated with:

    $ python -m autoflake --in-place --recursive --remove-all-unused-imports --ignore-init-module-imports examples templates transformers utils hubconf.py setup.py

I made minor changes in the generated diff.
```
  783a6169
- Fix F401 flake8 warning (x152 / 268). · 80327a13
  Aymeric Augustin authored Dec 21, 2019
```
This change is mostly autogenerated with:

    $ python -m autoflake --in-place --recursive examples templates transformers utils hubconf.py setup.py

I made minor changes in the generated diff.
```
  80327a13
- Fix E266 flake8 warning (x90). · fa2ccbc0
  Aymeric Augustin authored Dec 21, 2019
  
  fa2ccbc0
- Fix F821 flake8 warning (x47). · 2ab78325
  Aymeric Augustin authored Dec 21, 2019
```
Ignore warnings related to Python 2, because it's going away soon.
```
  2ab78325
- Fix E741 flake8 warning (x14). · b0f7db73
  Aymeric Augustin authored Dec 21, 2019
  
  b0f7db73
- Fix E714 flake8 warning (x8). · fd2f17a7
  Aymeric Augustin authored Dec 21, 2019
  
  fd2f17a7
- Fix E302 flake8 warning (x3). · eed46f38
  Aymeric Augustin authored Dec 21, 2019
  
  eed46f38
- Remove trailing whitespace from all Python files. · 28e608a2
  Aymeric Augustin authored Dec 21, 2019
```
Fixes flake8 warning W291 (x224).
```
  28e608a2
- Sort imports with isort. · 158e82e0
  Aymeric Augustin authored Dec 21, 2019
```
This is the result of:

    $ isort --recursive examples templates transformers utils hubconf.py setup.py
```
  158e82e0
21 Dec, 2019 2 commits

Reformat source code with black. · fa84ae26

Aymeric Augustin authored Dec 21, 2019

This is the result of:

    $ black --line-length 119 examples templates transformers utils hubconf.py setup.py

There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.

This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.

fa84ae26

Take advantage of the cache when running tests. · b670c266

Aymeric Augustin authored Dec 20, 2019

Caching models across test cases and across runs of the test suite makes
slow tests somewhat more bearable.

Use gettempdir() instead of /tmp in tests. This makes it easier to
change the location of the cache with semi-standard TMPDIR/TEMP/TMP
environment variables.

Fix #2222.

b670c266

18 Dec, 2019 1 commit
- Fix outdated tokenizer doc · a0d38645
  Julien Chaumond authored Dec 17, 2019
  
  a0d38645
13 Dec, 2019 1 commit
- cleaning up configuration classes · 47f0e3cf
  thomwolf authored Dec 13, 2019
  
  47f0e3cf