Commits · 00aa9dbca29dcf0e3a624354ef5c80a8e5226339 · chenpangpang / transformers

07 Dec, 2020 1 commit
- Copyright (#8970) · 00aa9dbc
  Sylvain Gugger authored Dec 07, 2020
```
* Add copyright everywhere missing

* Style
```
  00aa9dbc
24 Nov, 2020 1 commit

Support various BERT relative position embeddings (2nd) (#8276) · 2c83b3c3

zhiheng-huang authored Nov 24, 2020



* Support BERT relative position embeddings

* Fix typo in README.md

* Address review comment

* Fix failing tests

* [tiny] Fix style_doc.py check by adding an empty line to configuration_bert.py

* make fix copies

* fix configs of electra and albert and fix longformer

* remove copy statement from longformer

* fix albert

* fix electra

* Add bert variants forward tests for various position embeddings

* [tiny] Fix style for test_modeling_bert.py

* improve docstring

* [tiny] improve docstring and remove unnecessary dependency

* [tiny] Remove unused import

* re-add to ALBERT

* make embeddings work for ALBERT

* add test for albert
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

2c83b3c3

17 Nov, 2020 1 commit

Reorganize repo (#8580) · c89bdfbe

Sylvain Gugger authored Nov 16, 2020

* Put models in subfolders

* Styling

* Fix imports in tests

* More fixes in test imports

* Sneaky hidden imports

* Fix imports in doc files

* More sneaky imports

* Finish fixing tests

* Fix examples

* Fix path for copies

* More fixes for examples

* Fix dummy files

* More fixes for example

* More model import fixes

* Is this why you're unhappy GitHub?

* Fix imports in conver command

c89bdfbe

16 Nov, 2020 1 commit

Switch `return_dict` to `True` by default. (#8530) · 1073a2bd

Sylvain Gugger authored Nov 16, 2020

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Use the CI to identify failing tests

* Remove from all examples and tests

* More default switch

* Fixes

* More test fixes

* More fixes

* Last fixes hopefully

* Run on the real suite

* Fix slow tests

1073a2bd

03 Nov, 2020 1 commit

Refactoring the generate() function (#6949) · a1bbcf3f

Patrick von Platen authored Nov 03, 2020

* first draft

* show design proposition for new generate method

* up

* make better readable

* make first version

* gpt2 tests pass

* make beam search for gpt2 work

* add first encoder-decoder code

* delete typo

* make t5 work

* save indermediate

* make bart work with beam search

* finish beam search bart / t5

* add default kwargs

* make more tests pass

* fix no bad words sampler

* some fixes and tests for all distribution processors

* fix test

* fix rag slow tests

* merge to master

* add nograd to generate

* make all slow tests pass

* speed up generate

* fix edge case bug

* small fix

* correct typo

* add type hints and docstrings

* fix typos in tests

* add beam search tests

* add tests for beam scorer

* fix test rag

* finish beam search tests

* move generation tests in seperate file

* fix generation tests

* more tests

* add aggressive generation tests

* fix tests

* add gpt2 sample test

* add more docstring

* add more docs

* finish doc strings

* apply some more of sylvains and sams comments

* fix some typos

* make fix copies

* apply lysandres and sylvains comments

* final corrections on examples

* small fix for reformer

a1bbcf3f

30 Oct, 2020 1 commit

Ci test tf super slow (#8007) · 10f8c636

Lysandre Debut authored Oct 30, 2020

* Test TF GPU CI

* Change cache

* Fix missing torch requirement

* Fix some model tests


Style

* LXMERT

* MobileBERT

* Longformer skip test

* XLNet

* The rest of the tests

* RAG goes OOM in multi gpu setup

* YAML test files

* Last fixes

* Skip doctests

* Fill mask tests

* Yaml files

* Last test fix

* Style

* Update cache

* Change ONNX tests to slow + use tiny model

10f8c636

18 Oct, 2020 1 commit

[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659) · ba8c4d0a

Thomas Wolf authored Oct 18, 2020

* splitting fast and slow tokenizers [WIP]

* [WIP] splitting sentencepiece and tokenizers dependencies

* update dummy objects

* add name_or_path to models and tokenizers

* prefix added to file names

* prefix

* styling + quality

* spliting all the tokenizer files - sorting sentencepiece based ones

* update tokenizer version up to 0.9.0

* remove hard dependency on sentencepiece 🎉

* and removed hard dependency on tokenizers 🎉



* update conversion script

* update missing models

* fixing tests

* move test_tokenization_fast to main tokenization tests - fix bugs

* bump up tokenizers

* fix bert_generation

* update ad fix several tokenizers

* keep sentencepiece in deps for now

* fix funnel and deberta tests

* fix fsmt

* fix marian tests

* fix layoutlm

* fix squeezebert and gpt2

* fix T5 tokenization

* fix xlnet tests

* style

* fix mbart

* bump up tokenizers to 0.9.2

* fix model tests

* fix tf models

* fix seq2seq examples

* fix tests without sentencepiece

* fix slow => fast  conversion without sentencepiece

* update auto and bert generation tests

* fix mbart tests

* fix auto and common test without tokenizers

* fix tests without tokenizers

* clean up tests lighten up when tokenizers + sentencepiece are both off

* style quality and tests fixing

* add sentencepiece to doc/examples reqs

* leave sentencepiece on for now

* style quality split hebert and fix pegasus

* WIP Herbert fast

* add sample_text_no_unicode and fix hebert tokenization

* skip FSMT example test for now

* fix style

* fix fsmt in example tests

* update following Lysandre and Sylvain's comments

* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

ba8c4d0a

26 Aug, 2020 1 commit
- Black 20 release · a75c64d8
  Lysandre authored Aug 26, 2020
  
  a75c64d8
24 Aug, 2020 1 commit
- Update repo to isort v5 (#6686) · a5737779
  Sylvain Gugger authored Aug 24, 2020
```
* Run new isort

* More changes

* Update CI, CONTRIBUTING and benchmarks
```
  a5737779
20 Aug, 2020 1 commit
- [Tests] fix attention masks in Tests (#6621) · 505f2d74
  Patrick von Platen authored Aug 20, 2020
```
* fix distilbert

* fix typo
```
  505f2d74
12 Aug, 2020 1 commit
- [EncoderDecoder] Add encoder-decoder for roberta/ vanilla longformer (#6411) · 0735def8
  Patrick von Platen authored Aug 12, 2020
```
* add encoder-decoder for roberta

* fix headmask

* apply Sylvains suggestions

* fix typo

* Apply suggestions from code review
```
  0735def8
04 Aug, 2020 1 commit

cleanup torch unittests (#6196) · 5deed37f

Stas Bekman authored Aug 03, 2020

* improve unit tests

this is a sample of one test according to the request in https://github.com/huggingface/transformers/issues/5973
before I apply it to the rest

* batch 1

* batch 2

* batch 3

* batch 4

* batch 5

* style

* non-tf template

* last deletion of check_loss_output

5deed37f

31 Jul, 2020 1 commit
- Model output test (#6155) · d951c14a
  Sylvain Gugger authored Jul 31, 2020
```
* Use return_dict=True in all tests

* Formatting
```
  d951c14a
01 Jul, 2020 1 commit
- Move tests/utils.py -> transformers/testing_utils.py (#5350) · 13deb95a
  Sam Shleifer authored Jul 01, 2020
  
  13deb95a
23 Jun, 2020 1 commit
- [bart] add config.extra_pos_embeddings to facilitate reuse (#5190) · 58918c76
  Sam Shleifer authored Jun 23, 2020
  
  58918c76
16 Jun, 2020 1 commit
- [cleanup] Hoist ModelTester objects to top level (#4939) · c852036b
  Amil Khare authored Jun 16, 2020
```
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
```
  c852036b
10 Jun, 2020 1 commit
- Add more models to common tests (#4910) · 4e10acb3
  Sylvain Gugger authored Jun 10, 2020
  
  4e10acb3
05 Jun, 2020 1 commit
- Use labels to remove deprecation warnings (#4807) · f1fe1846
  Sylvain Gugger authored Jun 05, 2020
  
  f1fe1846
02 Jun, 2020 1 commit

Kill model archive maps (#4636) · d4c2cb40

Julien Chaumond authored Jun 02, 2020

* Kill model archive maps

* Fixup

* Also kill model_archive_map for MaskedBertPreTrainedModel

* Unhook config_archive_map

* Tokenizers: align with model id changes

* make style && make quality

* Fix CI

d4c2cb40

01 May, 2020 1 commit

[ci] Load pretrained models into the default (long-lived) cache · f54dc3f4

Julien Chaumond authored Apr 23, 2020

There's an inconsistency right now where:
- we load some models into CACHE_DIR
- and some models in the default cache
- and often, in both for the same models

When running the RUN_SLOW tests, this takes a lot of disk space, time, and bandwidth.

I'd rather always use the default cache

f54dc3f4

03 Mar, 2020 1 commit

[ci] Re-run integration ground truth from fairseq · f631e01d

Julien Chaumond authored Mar 03, 2020

Adopted best practice set by @patrickvonplaten of commenting lines run on fairseq, for easy comparison

also see #3020

f631e01d

20 Feb, 2020 1 commit

New BartModel (#2745) · 53ce3854

Sam Shleifer authored Feb 20, 2020

* Results same as fairseq
* Wrote a ton of tests
* Struggled with api signatures
* added some docs

53ce3854

04 Feb, 2020 2 commits
- Style · 5f96ebc0
  Lysandre authored Feb 03, 2020
  
  5f96ebc0
- RoBERTa Pytorch tests · d28b81dc
  Lysandre authored Feb 03, 2020
  
  d28b81dc
06 Jan, 2020 2 commits
- GPU text generation: mMoved the encoded_prompt to correct device · 81d6841b
  alberduris authored Dec 31, 2019
  
  81d6841b
- Moved the encoded_prompts to correct device · dd4df80f
  alberduris authored Dec 31, 2019
  
  dd4df80f
22 Dec, 2019 6 commits

Remove __future__ imports. · c824d15a
Aymeric Augustin authored Dec 22, 2019

c824d15a

Replace (TF)CommonTestCases for modeling with a mixin. · 345c23a6

Aymeric Augustin authored Dec 22, 2019

I suspect the wrapper classes were created in order to prevent the
abstract base class (TF)CommonModelTester from being included in test
discovery and running, because that would fail.

I solved this by replacing the abstract base class with a mixin.

Code changes are just de-indenting and automatic reformattings
performed by black to use the extra line space.

345c23a6

Remove unittest.main() in test modules. · 7e98e211

Aymeric Augustin authored Dec 22, 2019

This construct isn't used anymore these days.

Running python tests/test_foo.py puts the tests/ directory on
PYTHONPATH, which isn't representative of how we run tests.

Use python -m unittest tests/test_foo.py instead.

7e98e211

Switch test files to the standard test_*.py scheme. · ced0a942
Aymeric Augustin authored Dec 22, 2019

ced0a942
Move tests outside of library. · 067395d5
Aymeric Augustin authored Dec 22, 2019

067395d5

Sort imports with isort. · 158e82e0

Aymeric Augustin authored Dec 21, 2019

This is the result of:

    $ isort --recursive examples templates transformers utils hubconf.py setup.py

158e82e0

21 Dec, 2019 3 commits

Reformat source code with black. · fa84ae26

Aymeric Augustin authored Dec 21, 2019

This is the result of:

    $ black --line-length 119 examples templates transformers utils hubconf.py setup.py

There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.

This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.

fa84ae26

Take advantage of the cache when running tests. · b670c266

Aymeric Augustin authored Dec 20, 2019

Caching models across test cases and across runs of the test suite makes
slow tests somewhat more bearable.

Use gettempdir() instead of /tmp in tests. This makes it easier to
change the location of the cache with semi-standard TMPDIR/TEMP/TMP
environment variables.

Fix #2222.

b670c266

[RoBERTa] Embeddings: fix dimensionality bug · 3e52915f
Julien Chaumond authored Dec 20, 2019

3e52915f

20 Dec, 2019 1 commit
- Bug fix: 1764 · 228f5286
  Dom Hudson authored Nov 07, 2019
  
  228f5286
13 Dec, 2019 1 commit
- cleaning up configuration classes · 47f0e3cf
  thomwolf authored Dec 13, 2019
  
  47f0e3cf
06 Dec, 2019 1 commit

Remove dependency on pytest for running tests (#2055) · 35401fe5

Aymeric Augustin authored Dec 06, 2019

* Switch to plain unittest for skipping slow tests.

Add a RUN_SLOW environment variable for running them.

* Switch to plain unittest for PyTorch dependency.

* Switch to plain unittest for TensorFlow dependency.

* Avoid leaking open files in the test suite.

This prevents spurious warnings when running tests.

* Fix unicode warning on Python 2 when running tests.

The warning was:

    UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

* Support running PyTorch tests on a GPU.

Reverts 27e015bd.

* Tests no longer require pytest.

* Make tests pass on cuda

35401fe5

24 Oct, 2019 1 commit
- RoBERTa token classification · 66085a13
  Matt Maybeno authored Oct 23, 2019
```
[WIP] copy paste bert token classification for roberta
```
  66085a13
26 Sep, 2019 1 commit
- [BIG] pytorch-transformers => transformers · 31c23bd5
  thomwolf authored Sep 26, 2019
  
  31c23bd5