Commits · 5f789a687aa6d610ad64fdc39104bc196a5bfcb9 · chenpangpang / transformers

03 Nov, 2021 1 commit

Add LayoutXLMProcessor (and LayoutXLMTokenizer, LayoutXLMTokenizerFast) (#14115) · 5f789a68

NielsRogge authored Nov 03, 2021



* Add LayoutXLMTokenizer and LayoutXLMTokenizerFast

* Fix styling issues

* Fix more styling issues

* Fix more styling issues

* Fix docstring

* Fix unit tests

* Fix docs

* Fix unit tests

* Fix typos and styling issues

* Fix styling issues

* Fix docstring

* Make all tests of test_tokenization_layoutxlm pass

* Add LayoutXLMProcessor

* Make fixup

* Make all LayoutXLMProcessor tests pass

* Minor fixes

* Leave LayoutLMv2Processor tests unchanged

* Fix code quality

* Move LayoutXLM tokenizers and processor to separate folder

* Fix code quality

* Apply suggestions from code review

* Replace assertions by value errors

* Remove methods from fast tokenizer
Co-authored-by: King Yiu Suen <kingyiusuen@gmail.com>

5f789a68

02 Nov, 2021 3 commits

Update Transformers to huggingface_hub >= 0.1.0 (#14251) · 558f8543
Sylvain Gugger authored Nov 02, 2021
```
* Update Transformers to huggingface_hub >= 0.1.0

* Forgot to save...

* Style

* Fix test
```
558f8543
[Tests] Fix DistilHubert path (#14245) · ce01122a
Anton Lozhkov authored Nov 02, 2021
```
* Add audio-classification benchmarking results

* fix distilhubert path
```
ce01122a

Fix test_configuration_tie in FlaxEncoderDecoderModelTest (#14076) · 4a394cf5

Yih-Dar authored Nov 02, 2021



* check test_configuration_tie

* Fix test_configuration_tie

* make test slow again

* Remove property and use model.module.bind

* revert to slow test
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

4a394cf5

01 Nov, 2021 4 commits

Add BeitForSemanticSegmentation (#14096) · e20faa6f

NielsRogge authored Nov 01, 2021



* Add first draft

* Make forward pass work

* Improve conversion script

* Add notebook that checks if it works

* Add BeitForSemanticSegmentation to the tests

* More improvements

* Make BeitForSemanticSegmentation consistent with Segformer

* Small bug fix

* Add BeitForSemanticSegmentation to docs

* Make sure model doesn't output hidden states when the user doesn't want to

* Make it possible to convert the large model

* Fix issue

* Fix conversion script for large model

* Add auxiliary_head option to semantic segmentation model

* Apply suggestions from @sgugger's review

* Apply suggestions from code review

* Fix failing test
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

e20faa6f

[GPTJ] enable common tests and few fixes (#14190) · ce91bf9a
Suraj Patil authored Nov 01, 2021
```
* enable common tests, small fixes

* don't tie word embeds

* don't ignore lm_head
```
ce91bf9a
Fixing `image-segmentation` tests. (#14223) · 323f28dc
Nicolas Patry authored Nov 01, 2021

323f28dc

Add more missing models to models/__init__.py (#14177) · 9450bfcc

Yih-Dar authored Nov 01, 2021



* Add missing models to models/__init__.py

* Fix issues previously undetected

* Add UniSpeechSatForPreTraining to all_model_classes

* fix unispeech sat

* fix

* Add check_model_list() to check_repo.py

* Remove _ignore_models = ["bort"]
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

9450bfcc

29 Oct, 2021 6 commits

Torch 1.10 (#14169) · 63d91f44
Lysandre Debut authored Oct 29, 2021
```
* Torch 1.10

* torch scatter for 1.10

* style

* Skip tests
ok
```
63d91f44

Generalize problem_type to all sequence classification models (#14180) · c28bc80b

Sylvain Gugger authored Oct 29, 2021

* Generalize problem_type to all classification models

* Missing import

* Deberta BC and fix tests

* Fix template

* Missing imports

* Revert change to reformer test

* Fix style

c28bc80b

Adding `handle_long_generation` paramters for `text-generation` pipeline. (#14118) · dc540dd3

Nicolas Patry authored Oct 29, 2021

* Adding `handle_long_generation` paramters for `text-generation` pipeline.

* More error handling

* Fixing tests by dropping tf support on this functionality, it needs

`max_new_tokens` to make it possible to understand user's intent.
Otherwise, `max_length` == `tokenizer.model_max_length` <
input_ids.shape[0].

* Fixing doc ?

* Doc ?

* Remove link from doc.

* Catched an issue on roberta.

* Damn doc.

* Non BC proposal ?

* Cleaning the fix ?

* Finally using only a test override.

* Don't need to modify this.

* Bad print.

dc540dd3

Add `BlenderbotTokenizerFast` (#13720) · d37f1fb8

Daniel Stancl authored Oct 29, 2021

* Add the support for the fast (rust) implementation of BlenbderbotTokenizer

* Fix a converter and a typo in a doc

* Apply the patil-suraj's suggestion

* (Nitpick) Fast tokenization -> Fast Tokenization in doc

* Apply the SaulLu's suggestion

* Apply Narsil's suggestion to fix test pipelines

* Add encoder_no_repeat_ngram_size according to the Narsil's suggestion

* Revert the last (unnecessary) commit

* Override pipeline config for Blenderbot to allow for larger pos. emb.

* make fix-copies

d37f1fb8

Remove n_ctx from configs (#14165) · 5b45422b

Thomas Wang authored Oct 29, 2021

* Remove n_ctx from configs

* Fix GPTJ and OpenAIGPT, both are acceptable breaking changes as there are no configs such that it breaks

* Remove unecessary n_positions from TFOpenAIGPT

5b45422b

Adding `batch_size` support for (almost) all pipelines (#13724) · be236361

Nicolas Patry authored Oct 29, 2021



* Tentative enabling of `batch_size` for pipelines.

* Add systematic test for pipeline batching.

* Enabling batch_size on almost all pipelines

- Not `zero-shot` (it's already passing stuff as batched so trickier)
- Not `QA` (preprocess uses squad features, we need to switch to real
tensors at this boundary.

* Adding `min_length_for_response` for conversational.

* Making CTC, speech mappings avaiable regardless of framework.

* Attempt at fixing automatic tests (ffmpeg not enabled for fast tests)

* Removing ffmpeg dependency in tests.

* Small fixes.

* Slight cleanup.

* Adding docs

and adressing comments.

* Quality.

* Update docs/source/main_classes/pipelines.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/question_answering.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/zero_shot_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Improving docs.

* Update docs/source/main_classes/pipelines.rst
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

* N -> oberved_batch_size

softmax trick.

* Follow `padding_side`.

* Supporting image pipeline batching (and padding).

* Rename `unbatch` -> `loader_batch`.

* unbatch_size forgot.

* Custom padding for offset mappings.

* Attempt to remove librosa.

* Adding require_audio.

* torchaudio.

* Back to using datasets librosa.

* Adding help to set a pad_token on the tokenizer.

* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Quality.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>

be236361

28 Oct, 2021 2 commits

Fix SEW-D implementation differences (#14191) · 1251072f
Anton Lozhkov authored Oct 28, 2021
```
* Fix SEW-D

* Update tests

* isort
```
1251072f

Add SegFormer (#14019) · 1dc96a76

NielsRogge authored Oct 28, 2021



* First draft

* Make style & quality

* Improve conversion script

* Add print statement to see actual slice

* Make absolute tolerance smaller

* Fix image classification models

* Add post_process_semantic method

* Disable padding

* Improve conversion script

* Rename to ForSemanticSegmentation, add integration test, remove post_process methods

* Improve docs

* Fix code quality

* Fix feature extractor tests

* Fix tests for image classification model

* Delete file

* Add is_torch_available to feature extractor

* Improve documentation of feature extractor methods

* Apply suggestions from @sgugger's code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Apply some more suggestions of code review

* Rebase with master

* Fix rebase issues

* Make sure model only outputs hidden states when the user wants to

* Apply suggestions from code review

* Add pad method

* Support padding of 2d images

* Add print statement

* Add print statement

* Move padding method to SegformerFeatureExtractor

* Fix issue

* Add casting of segmentation maps

* Add test for padding

* Add small note about padding
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

1dc96a76

27 Oct, 2021 4 commits
- [TPU tests] Enable first TPU examples pytorch (#14121) · 01b14669
  Patrick von Platen authored Oct 28, 2021
```
* up

* up

* fix

* up

* Update examples/pytorch/test_xla_examples.py

* correct labels

* up

* up

* up

* up

* up

* up
```
  01b14669
- Add DistilHuBERT (#14174) · 232822f3
  Anton Lozhkov authored Oct 27, 2021
```
* Add conversion

* Rename

* Add an integration test and remove layer_norm

* Remove layer_norm from the converter

* wording

* Fix imports
```
  232822f3
- Add SEW CTC models (#14158) · e1dc5afd
  Anton Lozhkov authored Oct 27, 2021
```
* Add SEW CTC models

* Update paths

* Update paths
```
  e1dc5afd
- Fix gelu test for torch 1.10 (#14167) · 1e53faeb
  Lysandre Debut authored Oct 26, 2021
  
  1e53faeb
26 Oct, 2021 1 commit

Add Unispeech & Unispeech-SAT (#13963) · 9f3aa46f

Patrick von Platen authored Oct 26, 2021



* unispeech

* add copy from

* remove hubert copy from

* finish for today

* add unispeech-sat

* adapt more

* up

* up

* up

* up

* add modeling

* add tests

* up

* up

* finish

* up

* Apply suggestions from code review

* up

* up

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* up

* up
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

9f3aa46f

25 Oct, 2021 1 commit

Add TF<>PT and Flax<>PT everywhere (#14047) · 0c3174c7

Patrick von Platen authored Oct 25, 2021

* up

* up

* up

* up

* up

* up

* up

* add clip

* fix clip PyTorch

* fix clip PyTorch

* up

* up

* up

* up

* up

* up

* up

0c3174c7

22 Oct, 2021 1 commit
- up (#14116) · 70f186f6
  Patrick von Platen authored Oct 22, 2021
  
  70f186f6
21 Oct, 2021 1 commit

Fix ignore_mismatched_sizes (#14085) · 234cfefb

Li-Huai (Allan) Lin authored Oct 22, 2021

* Fix

* Style

* Name

* Fix tests

* Style

* Remove embed sizes checking

* Disable some tests

* Fix

* Apply suggestion

234cfefb

20 Oct, 2021 1 commit

Context managers (#13900) · 0270d44f

Leandro von Werra authored Oct 20, 2021

* add `ContextManagers` for lists of contexts

* fix import sorting

* add `ContextManagers` tests

0270d44f

19 Oct, 2021 1 commit

TF Model train and eval step metrics for seq2seq models. (#14009) · 122c2f81

Pedro Marques authored Oct 19, 2021



* TF Model train and eval step metrics for seq2seq models.

When using a model with a seq2seq output compute metrics against logits.

* Removing vestigial code
Co-authored-by: matt <rocketknight1@gmail.com>

122c2f81

18 Oct, 2021 6 commits

[Speech] Refactor Examples (#14040) · d5ff69fc
Patrick von Platen authored Oct 18, 2021
```
* adapt_examples

* up

* up

* up

* up

* add auto models

* finish
```
d5ff69fc

Add an API to register objects to Auto classes (#13989) · 2c60ff2f

Sylvain Gugger authored Oct 18, 2021



* Add API to register a new object in auto classes

* Fix test

* Documentation

* Add to tokenizers and test

* Add cleanup after tests

* Be more careful

* Move import

* Move import

* Cleanup in TF test too

* Add consistency check

* Add documentation

* Style

* Update docs/source/model_doc/auto.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/auto/auto_factory.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

2c60ff2f

Add BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese (#13788) · 3d587c53

Dat Quoc Nguyen authored Oct 18, 2021



* Add the pre-trained BARTpho model

* Add the pre-trained BARTpho model

* Add the pre-trained BARTpho model

* Fix incorrectly sorted and/or formatted imports

* Fix incorrectly sorted and/or formatted style

* Fix check_dummies

* Fix check_dummies

* Fix check_dummies

* Update docs/source/model_doc/bartpho.rst
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/models/bartpho/__init__.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/models/bartpho/tokenization_bartpho.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update tests/test_tokenization_bartpho.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update src/transformers/models/bartpho/tokenization_bartpho.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update tests/test_tokenization_bartpho.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update docs/source/model_doc/bartpho.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/bartpho.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/bartpho/__init__.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Add the pre-trained BARTpho model

* Add Tips section in doc and details of monolingual_vocab_file

* Fix conflicts

* Add another tip related to monolingual_vocab_file

* Readd dependency_versions_table.py

* Handle failing checks

* Remove test_list.txt

* Remove md5sum.saved

* Revise Readme.md
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

3d587c53

up (#14046) · 7c6cd0ac
Patrick von Platen authored Oct 18, 2021

7c6cd0ac
Update SEW integration test tolerance (#14048) · 82b62fa6
Anton Lozhkov authored Oct 18, 2021

82b62fa6
[Speech] Move all examples to new audio feature (#14045) · bdf31d6e
Patrick von Platen authored Oct 18, 2021
```
* up

* up

* up

* finish
```
bdf31d6e

16 Oct, 2021 1 commit
- minor fixes (#14026) · 84ad6af4
  Suraj Patil authored Oct 16, 2021
  
  84ad6af4
15 Oct, 2021 1 commit

Add the SEW and SEW-D speech models (#13962) · cd3166a8

Anton Lozhkov authored Oct 15, 2021



* Working encoder

* SEW-D and tests

* Further conv fixes

* Automodels and conv inits

* Update integration tests, add docs

* Docs cleanup, resolve todos

* Conf fix

* Fix docs

* Fix tests, apply suggestions

* Update src/transformers/models/sew/modeling_sew.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Model conversion and updated no-mask tests

* Remove copy of feature_proj

* Style

* Update src/transformers/models/auto/feature_extraction_auto.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/auto/feature_extraction_auto.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Move orgs
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

cd3166a8

14 Oct, 2021 5 commits
- Scatter dummies + skip pipeline tests (#13996) · 5b317f7e
  Lysandre Debut authored Oct 14, 2021
```
* Scatter dummies + skip pipeline tests

* Add torch scatter to build docs
```
  5b317f7e
- up (#14008) · 7fb2a8b3
  Patrick von Platen authored Oct 14, 2021
  
  7fb2a8b3
- Fix FNet tokenizer tests (#13995) · 7604557e
  Lysandre Debut authored Oct 14, 2021
  
  7604557e
- Add strong test for configuration attributes (#14000) · f2002fea
  Sylvain Gugger authored Oct 14, 2021
```
* Add strong test for configuration attributes

* Add fake modif to trigger all tests

* Add a better fake modif

* Ignore is_encoder_decoder

* Fix faulty configs

* Remove fake modif
```
  f2002fea
- up (#13988) · cc360649
  Patrick von Platen authored Oct 14, 2021
  
  cc360649
13 Oct, 2021 1 commit

Add TrOCR + VisionEncoderDecoderModel (#13874) · 408b2d2b

NielsRogge authored Oct 13, 2021

* First draft

* Update self-attention of RoBERTa as proposition

* Improve conversion script

* Add TrOCR decoder-only model

* More improvements

* Make forward pass with pretrained weights work

* More improvements

* Some more improvements

* More improvements

* Make conversion work

* Clean up print statements

* Add documentation, processor

* Add test files

* Small improvements

* Some more improvements

* Make fix-copies, improve docs

* Make all vision encoder decoder model tests pass

* Make conversion script support other models

* Update URL for OCR image

* Update conversion script

* Fix style & quality

* Add support for the large-printed model

* Fix some issues

* Add print statement for debugging

* Add print statements for debugging

* Make possible fix for sinusoidal embedding

* Further debugging

* Potential fix v2

* Add more print statements for debugging

* Add more print statements for debugging

* Deubg more

* Comment out print statements

* Make conversion of large printed model possible, address review comments

* Make it possible to convert the stage1 checkpoints

* Clean up code, apply suggestions from code review

* Apply suggestions from code review, use Microsoft models in tests

* Rename encoder_hidden_size to cross_attention_hidden_size

* Improve docs

408b2d2b