Commits · 00aa9dbca29dcf0e3a624354ef5c80a8e5226339 · chenpangpang / transformers

"src/routes/vscode:/vscode.git/clone" did not exist on "621719c6ac504e1c217fda3e445375a1d20802a3"

07 Dec, 2020 1 commit
- Copyright (#8970) · 00aa9dbc
  Sylvain Gugger authored Dec 07, 2020
```
* Add copyright everywhere missing

* Style
```
  00aa9dbc
17 Nov, 2020 1 commit

Sylvain Gugger authored Nov 16, 2020

* Put models in subfolders

* Styling

* Fix imports in tests

* More fixes in test imports

* Sneaky hidden imports

* Fix imports in doc files

* More sneaky imports

* Finish fixing tests

* Fix examples

* Fix path for copies

* More fixes for examples

* Fix dummy files

* More fixes for example

* More model import fixes

* Is this why you're unhappy GitHub?

* Fix imports in conver command

c89bdfbe

18 Oct, 2020 1 commit

[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659) · ba8c4d0a

Thomas Wolf authored Oct 18, 2020

* splitting fast and slow tokenizers [WIP]

* [WIP] splitting sentencepiece and tokenizers dependencies

* update dummy objects

* add name_or_path to models and tokenizers

* prefix added to file names

* prefix

* styling + quality

* spliting all the tokenizer files - sorting sentencepiece based ones

* update tokenizer version up to 0.9.0

* remove hard dependency on sentencepiece 🎉

* and removed hard dependency on tokenizers 🎉



* update conversion script

* update missing models

* fixing tests

* move test_tokenization_fast to main tokenization tests - fix bugs

* bump up tokenizers

* fix bert_generation

* update ad fix several tokenizers

* keep sentencepiece in deps for now

* fix funnel and deberta tests

* fix fsmt

* fix marian tests

* fix layoutlm

* fix squeezebert and gpt2

* fix T5 tokenization

* fix xlnet tests

* style

* fix mbart

* bump up tokenizers to 0.9.2

* fix model tests

* fix tf models

* fix seq2seq examples

* fix tests without sentencepiece

* fix slow => fast  conversion without sentencepiece

* update auto and bert generation tests

* fix mbart tests

* fix auto and common test without tokenizers

* fix tests without tokenizers

* clean up tests lighten up when tokenizers + sentencepiece are both off

* style quality and tests fixing

* add sentencepiece to doc/examples reqs

* leave sentencepiece on for now

* style quality split hebert and fix pegasus

* WIP Herbert fast

* add sample_text_no_unicode and fix hebert tokenization

* skip FSMT example test for now

* fix style

* fix fsmt in example tests

* update following Lysandre and Sylvain's comments

* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/testing_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

ba8c4d0a

22 Sep, 2020 1 commit

Add LayoutLM Model (#7064) · cd9a0585

Minghao Li authored Sep 22, 2020



* first version

* finish test docs readme model/config/tokenization class

* apply make style and make quality

* fix layoutlm GitHub link

* fix conflict in index.rst and add layoutlm to pretrained_models.rst

* fix bug in test_parents_and_children_in_mappings

* reformat modeling_auto.py and tokenization_auto.py

* fix bug in test_modeling_layoutlm.py

* Update docs/source/model_doc/layoutlm.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/layoutlm.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* remove inh, add tokenizer fast, and update some doc

* copy and rename necessary class from modeling_bert to modeling_layoutlm

* Update src/transformers/configuration_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/configuration_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/modeling_layoutlm.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* add mish to activations.py, import ACT2FN and import logging from utils
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

cd9a0585

22 Jul, 2020 1 commit
- [docs] Add integration test example to copy pasta template (#5961) · feeb956a
  Sam Shleifer authored Jul 22, 2020
```
Co-authored-by: Julien Chaumond <chaumond@gmail.com>
```
  feeb956a
15 Jun, 2020 1 commit

[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized... · 36434220

Anthony MOI authored Jun 15, 2020


[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)

* Use tokenizers pre-tokenized pipeline

* failing pretrokenized test

* Fix is_pretokenized in python

* add pretokenized tests

* style and quality

* better tests for batched pretokenized inputs

* tokenizers clean up - new padding_strategy - split the files

* [HUGE] refactoring tokenizers - padding - truncation - tests

* style and quality

* bump up requied tokenizers version to 0.8.0-rc1

* switched padding/truncation API - simpler better backward compat

* updating tests for custom tokenizers

* style and quality - tests on pad

* fix QA pipeline

* fix backward compatibility for max_length only

* style and quality

* Various cleans up - add verbose

* fix tests

* update docstrings

* Fix tests

* Docs reformatted

* __call__ method documented
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

36434220

15 Jan, 2020 1 commit
- 💄 super · 83a41d39
  Julien Chaumond authored Jan 15, 2020
  
  83a41d39
06 Jan, 2020 2 commits
- GPU text generation: mMoved the encoded_prompt to correct device · 81d6841b
  alberduris authored Dec 31, 2019
  
  81d6841b
- Moved the encoded_prompts to correct device · dd4df80f
  alberduris authored Dec 31, 2019
  
  dd4df80f
22 Dec, 2019 7 commits
- Use built-in open(). · 1c62e87b
  Aymeric Augustin authored Dec 22, 2019
```
On Python 3, `open is io.open`.
```
  1c62e87b
- Remove __future__ imports. · c824d15a
  Aymeric Augustin authored Dec 22, 2019
  
  c824d15a
- Replace CommonTestCases for tokenizers with a mixin. · 00204f2b
  Aymeric Augustin authored Dec 22, 2019
```
This is the same change as for (TF)CommonTestCases for modeling.
```
  00204f2b
- Rename file for consistency. · a3c5883f
  Aymeric Augustin authored Dec 22, 2019
  
  a3c5883f
- Remove unittest.main() in test modules. · 7e98e211
  Aymeric Augustin authored Dec 22, 2019
```
This construct isn't used anymore these days.

Running python tests/test_foo.py puts the tests/ directory on
PYTHONPATH, which isn't representative of how we run tests.

Use python -m unittest tests/test_foo.py instead.
```
  7e98e211
- Switch test files to the standard test_*.py scheme. · ced0a942
  Aymeric Augustin authored Dec 22, 2019
  
  ced0a942
- Sort imports with isort. · 158e82e0
  Aymeric Augustin authored Dec 21, 2019
```
This is the result of:

    $ isort --recursive examples templates transformers utils hubconf.py setup.py
```
  158e82e0
21 Dec, 2019 1 commit

Reformat source code with black. · fa84ae26

Aymeric Augustin authored Dec 21, 2019

This is the result of:

    $ black --line-length 119 examples templates transformers utils hubconf.py setup.py

There's a lot of fairly long lines in the project. As a consequence, I'm
picking the longest widely accepted line length, 119 characters.

This is also Thomas' preference, because it allows for explicit variable
names, to make the code easier to understand.

fa84ae26

30 Oct, 2019 1 commit
- adding templates · 7f4226f9
  thomwolf authored Oct 30, 2019
  
  7f4226f9