Commits · feeb956a19ca08e3b9657ea9ec7d14adb6304c85 · chenpangpang / transformers

22 Jul, 2020 1 commit
- [docs] Add integration test example to copy pasta template (#5961) · feeb956a
  Sam Shleifer authored Jul 22, 2020
```
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  feeb956a
28 Jun, 2020 1 commit
- save_pretrained: mkdir(exist_ok=True) (#5258) · 45e26125
  Sam Shleifer authored Jun 28, 2020
```
* all save_pretrained methods mkdir if not os.path.exists
```
  45e26125
26 Jun, 2020 1 commit

[tokenizers] Updates data processors, docstring, examples and model cards to the new API (#5308) · 601d4d69

Thomas Wolf authored Jun 26, 2020

* remove references to old API in docstring - update data processors

* style

* fix tests - better type checking error messages

* better type checking

* include awesome fix by @LysandreJik for #5310

* updated doc and examples

601d4d69

15 Jun, 2020 1 commit

[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized... · 36434220

Anthony MOI authored Jun 15, 2020


[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)

* Use tokenizers pre-tokenized pipeline

* failing pretrokenized test

* Fix is_pretokenized in python

* add pretokenized tests

* style and quality

* better tests for batched pretokenized inputs

* tokenizers clean up - new padding_strategy - split the files

* [HUGE] refactoring tokenizers - padding - truncation - tests

* style and quality

* bump up requied tokenizers version to 0.8.0-rc1

* switched padding/truncation API - simpler better backward compat

* updating tests for custom tokenizers

* style and quality - tests on pad

* fix QA pipeline

* fix backward compatibility for max_length only

* style and quality

* Various cleans up - add verbose

* fix tests

* update docstrings

* Fix tests

* Docs reformatted

* __call__ method documented
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

36434220

09 Jun, 2020 1 commit

[All models] Extend config.output_attentions with output_attentions function arguments (#4538) · 6e603cb7

Bharat Raghunathan authored Jun 10, 2020



* DOC: Replace instances of ``config.output_attentions`` with function argument ``output_attentions``

* DOC: Apply Black Formatting

* Fix errors where output_attentions was undefined

* Remove output_attentions in classes per review

* Fix regressions on tests having `output_attention`

* Fix further regressions in tests relating to `output_attentions`

Ensure proper propagation of `output_attentions` as a function parameter
to all model subclasses

* Fix more regressions in `test_output_attentions`

* Fix issues with BertEncoder

* Rename related variables to `output_attentions`

* fix pytorch tests

* fix bert and gpt2 tf

* Fix most TF tests for `test_output_attentions`

* Fix linter errors and more TF tests

* fix conflicts

* DOC: Apply Black Formatting

* Fix errors where output_attentions was undefined

* Remove output_attentions in classes per review

* Fix regressions on tests having `output_attention`

* fix conflicts

* fix conflicts

* fix conflicts

* fix conflicts

* fix pytorch tests

* fix conflicts

* fix conflicts

* Fix linter errors and more TF tests

* fix tf tests

* make style

* fix isort

* improve output_attentions

* improve tensorflow
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

6e603cb7

02 Jun, 2020 1 commit

Kill model archive maps (#4636) · d4c2cb40

Julien Chaumond authored Jun 02, 2020

* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI

d4c2cb40

29 Apr, 2020 1 commit

CDN urls (#4030) · 455c6390

Julien Chaumond authored Apr 28, 2020

* [file_utils] use_cdn + documentation

* Move to cdn. urls for weights

* [urls] Hotfix for bert-base-japanese

455c6390

18 Apr, 2020 1 commit

Cleanup fast tokenizers integration (#3706) · 827d6d6e

Thomas Wolf authored Apr 18, 2020



* First pass on utility classes and python tokenizers

* finishing cleanup pass

* style and quality

* Fix tests

* Updating following @mfuntowicz comment

* style and quality

* Fix Roberta

* fix batch_size/seq_length inBatchEncoding

* add alignement methods + tests

* Fix OpenAI and Transfo-XL tokenizers

* adding trim_offsets=True default for GPT2 et RoBERTa

* style and quality

* fix tests

* add_prefix_space in roberta

* bump up tokenizers to rc7

* style

* unfortunately tensorfow does like these - removing shape/seq_len for now

* Update src/transformers/tokenization_utils.py
Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Adding doc and docstrings

* making flake8 happy
Co-authored-by: Stefan Schweter <stefan@schweter.it>

827d6d6e

16 Apr, 2020 1 commit
- [cleanup] factor out get_head_mask, invert_attn_mask, get_exten… (#3806) · dbd04124
  Sam Shleifer authored Apr 16, 2020
```
* Delete some copy pasted code
```
  dbd04124
08 Apr, 2020 1 commit

More doc for model cards (#3698) · a594ee9c

Julien Chaumond authored Apr 08, 2020

see https://github.com/huggingface/transformers/pull/3679#pullrequestreview-389368270

a594ee9c

04 Apr, 2020 1 commit
- weigths*weights · 94eb68d7
  Julien Chaumond authored Apr 04, 2020
  
  94eb68d7
24 Mar, 2020 1 commit
- [examples] Use AutoModels in more examples · a8e3336a
  Julien Chaumond authored Mar 23, 2020
  
  a8e3336a
02 Mar, 2020 1 commit
- fix n_gpu count when no_cuda flag is activated (#3077) · 6b1ff250
  Victor SANH authored Mar 02, 2020
```
* fix n_gpu count when no_cuda flag is activated

* someone was left behind
```
  6b1ff250
07 Feb, 2020 1 commit
- Fix importing unofficial TF models with extra optimizer weights · 73368963
  monologg authored Jan 27, 2020
  
  73368963
29 Jan, 2020 4 commits
- Apply quality and style requirements once again · ca1d6673
  Julien Plu authored Jan 07, 2020
  
  ca1d6673
- Apply quality and style requirements · 0731fa15
  Julien Plu authored Jan 07, 2020
  
  0731fa15
- Apply style · 7fc628d9
  Julien Plu authored Jan 08, 2020
  
  7fc628d9
- Add TF2 XLM-RoBERTa model · 64ca8556
  Julien Plu authored Jan 08, 2020
  
  64ca8556
15 Jan, 2020 1 commit
- 💄 super · 83a41d39
  Julien Chaumond authored Jan 15, 2020
  
  83a41d39
13 Jan, 2020 1 commit
- Config to Model mapping · b803b067
  Julien Chaumond authored Jan 13, 2020
  
  b803b067
07 Jan, 2020 1 commit
- Fix typograpical errors (#2438) · d6a677b1
  Genta Indra Winata authored Jan 08, 2020
  
  d6a677b1
06 Jan, 2020 2 commits
- GPU text generation: mMoved the encoded_prompt to correct device · 81d6841b
  alberduris authored Dec 31, 2019
  
  81d6841b
- Moved the encoded_prompts to correct device · dd4df80f
  alberduris authored Dec 31, 2019
  
  dd4df80f
05 Jan, 2020 1 commit
- Enforce target version for black. · 0ffc8eaf
  Aymeric Augustin authored Dec 27, 2019
```
This should stabilize formatting.
```
  0ffc8eaf
28 Dec, 2019 1 commit
- Kill __main__ · 4d6c93e9
  Julien Chaumond authored Dec 27, 2019
  
  4d6c93e9
23 Dec, 2019 1 commit
- Remove unused variables in templates. · 495580da
  Aymeric Augustin authored Dec 23, 2019
  
  495580da
22 Dec, 2019 14 commits
- Use built-in open(). · 1c62e87b
  Aymeric Augustin authored Dec 22, 2019
```
On Python 3, `open is io.open`.
```
  1c62e87b
- Update comments mentioning Python 2. · d6eaf4e6
  Aymeric Augustin authored Dec 22, 2019
  
  d6eaf4e6
- Remove six. · 8af25b16
  Aymeric Augustin authored Dec 22, 2019
  
  8af25b16
- Remove __future__ imports. · c824d15a
  Aymeric Augustin authored Dec 22, 2019
  
  c824d15a
- Replace CommonTestCases for tokenizers with a mixin. · 00204f2b
  Aymeric Augustin authored Dec 22, 2019
```
This is the same change as for (TF)CommonTestCases for modeling.
```
  00204f2b
- Rename file for consistency. · a3c5883f
  Aymeric Augustin authored Dec 22, 2019
  
  a3c5883f
- Replace (TF)CommonTestCases for modeling with a mixin. · 345c23a6
  Aymeric Augustin authored Dec 22, 2019
```
I suspect the wrapper classes were created in order to prevent the
abstract base class (TF)CommonModelTester from being included in test
discovery and running, because that would fail.

I solved this by replacing the abstract base class with a mixin.

Code changes are just de-indenting and automatic reformattings
performed by black to use the extra line space.
```
  345c23a6
- Remove unittest.main() in test modules. · 7e98e211
  Aymeric Augustin authored Dec 22, 2019
```
This construct isn't used anymore these days.

Running python tests/test_foo.py puts the tests/ directory on
PYTHONPATH, which isn't representative of how we run tests.

Use python -m unittest tests/test_foo.py instead.
```
  7e98e211
- Switch test files to the standard test_*.py scheme. · ced0a942
  Aymeric Augustin authored Dec 22, 2019
  
  ced0a942
- Fix F401 flake8 warning (x28). · 939148b0
  Aymeric Augustin authored Dec 21, 2019
```
Do manually what autoflake couldn't manage.
```
  939148b0
- Fix F401 flake8 warning (x88 / 116). · 783a6169
  Aymeric Augustin authored Dec 21, 2019
```
This change is mostly autogenerated with:

    $ python -m autoflake --in-place --recursive --remove-all-unused-imports --ignore-init-module-imports examples templates transformers utils hubconf.py setup.py

I made minor changes in the generated diff.
```
  783a6169
- Fix F401 flake8 warning (x152 / 268). · 80327a13
  Aymeric Augustin authored Dec 21, 2019
```
This change is mostly autogenerated with:

    $ python -m autoflake --in-place --recursive examples templates transformers utils hubconf.py setup.py

I made minor changes in the generated diff.
```
  80327a13
- Fix E266 flake8 warning (x90). · fa2ccbc0
  Aymeric Augustin authored Dec 21, 2019
  
  fa2ccbc0
- Fix F821 flake8 warning (x47). · 2ab78325
  Aymeric Augustin authored Dec 21, 2019
```
Ignore warnings related to Python 2, because it's going away soon.
```
  2ab78325