Commits · 5a06118b3922635699d72d72f9025e71cf04bfba · chenpangpang / transformers

06 Jan, 2022 1 commit
- Enabling `TF` on `image-classification` pipeline. (#15030) · 5a06118b
  Nicolas Patry authored Jan 06, 2022
  
  5a06118b
05 Jan, 2022 3 commits
- [CLIP] Fix TF test (#15042) · 2e9af294
  Suraj Patil authored Jan 05, 2022
  
  2e9af294
- [CLIP] Fix PT test (#15041) · ae929dcb
  Patrick von Platen authored Jan 05, 2022
  
  ae929dcb
- Adding QoL for `batch_size` arg (like others enabled everywhere). (#15027) · 65cb94ff
  Nicolas Patry authored Jan 05, 2022
```
* Adding QoL for `batch_size` arg (like others enabled everywhere).

* Typo.
```
  65cb94ff
04 Jan, 2022 2 commits

Hotfix `chunk_length_s` instead of `_ms`. (#15029) · 19d37c2d

Nicolas Patry authored Jan 04, 2022

* Hotfix `chunk_length_s` instead of `_ms`.

* Adding fix of `pad_token` which should be last/previous token for CTC

proper decoding

* Fixing ChunkPipeline unwrapping.

* Adding a PackIterator specific test.

19d37c2d

Add Flax RoFormer (#15005) · 21aecc09

Daniel Stancl authored Jan 04, 2022



* Add FlaxRoFormer

* Clean code + make quality

* Fix output pooling for FlaxRoFormerForMultipleChoiceModule

* Apply suggestions from code review

* add flax model to repos
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

21aecc09

03 Jan, 2022 4 commits

[Tests] Correct Wav2Vec2 & WavLM tests (#15015) · dbac8899
Patrick von Platen authored Jan 03, 2022
```
* up

* up

* up
```
dbac8899

Large audio chunking for the existing ASR pipeline (#14896) · 38f95d18

Anton Lozhkov authored Jan 03, 2022



* Naive ASR chunking

* Fixing batching for ASR.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

38f95d18

Improve truncation_side (#14947) · d33dc796

Nicolas Patry authored Jan 03, 2022



* Enabling `truncation_side` for Slow and Fast tokenizer.
Co-Authored-by: Niels Rogge <48327001+NielsRogge@users.noreply.github.com>

* Disable failing tests.

* Layout xlm.

* assert -> assertEqual.
Co-authored-by: Niels Rogge <48327001+NielsRogge@users.noreply.github.com>

d33dc796

Fixing t2t pipelines lists outputs. (#15008) · 8c2618e6
Nicolas Patry authored Jan 03, 2022
```
Backward compatibility broken in
https://github.com/huggingface/transformers/pull/14988
```
8c2618e6

30 Dec, 2021 4 commits

Adding `num_return_sequences` support for text2text generation. (#14988) · f8a989cf

Nicolas Patry authored Dec 30, 2021



* Adding `num_return_sequences` support for text2text generation.
Co-Authored-By: Enze <pu.miao@foxmail.com>

* Update tests/test_pipelines_text2text_generation.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_pipelines_text2text_generation.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Enze <pu.miao@foxmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

f8a989cf

[Generate] correct encoder_outputs are passed without attention_mask (#14980) · c043ce6c
Patrick von Platen authored Dec 30, 2021
```
* [Generate] correct encoder_outputs are passed without attention_mask

* Apply suggestions from code review

* up
```
c043ce6c

[AutoProcessor] Correct AutoProcessor and automatically add processor… (#14881) · a1392883

Patrick von Platen authored Dec 30, 2021

* [AutoProcessor] Correct AutoProcessor and automatically add processor class

* up

* up

* up

* up

* up

* up

* up

* up

* continue tomorrow

* up

* up

* up

* make processor class private

* fix loop

a1392883

Fixing a pathological case for slow tokenizers (#14981) · d7d60df0
Nicolas Patry authored Dec 30, 2021
```
* Fixing a pathological case for slow tokenizers

* Update src/transformers/tokenization_utils.py
```
d7d60df0

28 Dec, 2021 4 commits

[Wav2Vec2] Rename model's feature extractor to feature encoder (#14959) · 600496fa

Patrick von Platen authored Dec 28, 2021

* rename classes

* clean up more namings

* remove bogus file

* Apply suggestions from code review

* Apply suggestions from code review

* replace more names

* more regex replace

* make style

* correct

* correct more

* make style

* finish

* correct more in wav2vec2

* make style

* improve freeze_extractor

* add aliases

* add tf aliases

600496fa

[Tests] Speed up tokenizer tests (#14964) · 1bfa3477
Patrick von Platen authored Dec 28, 2021
```
* speed up canine and mluke

* speed up mbart and mbart50 toks

* upload files
```
1bfa3477
[WavLM] give model for precision (#14958) · 1e847b40
Patrick von Platen authored Dec 28, 2021

1e847b40

[doc] :class: hunt (#14955) · 10fd4fa1

Stas Bekman authored Dec 27, 2021



* [doc] :class: hunt

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix the fix + style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

10fd4fa1

27 Dec, 2021 3 commits

[doc] :obj: hunt (#14954) · e13f72fb
Stas Bekman authored Dec 27, 2021
```
* redo sans examples

* style
```
e13f72fb

Add `ElectraForCausalLM` -> Enable Electra encoder-decoder model (#14729) · 501307b5

Daniel Stancl authored Dec 27, 2021

* Add ElectraForCausalLM and cover some basic tests & need to fix a few tests

* Fix bugs

* make style

* make fix-copies

* Update doc

* Change docstring to markdown format

* Remove redundant update_keys_to_ignore

501307b5

ChunkPipeline (batch_size enabled on `zero-cls` and `qa` pipelines. (#14225) · b058490c

Nicolas Patry authored Dec 27, 2021



* Pipeline chunks.

* Batching for Chunking pipelines ?

* Batching for `question-answering` and `zero-shot-cls`.

* Fixing for FNet.

* Making ASR a chunk pipeline.

* Chunking ASR API.

* doc style.

* Fixing ASR test.

* Fixing QA eror (p_mask, padding is 1, not 0).

* Enable both vad and simple chunking.

* Max length for vad.

* remove inference mode, crashing on s2t.

* Revert ChunkPipeline for ASRpipeline.

Too many knobs for simple integration within the pipeline, better stick
to external convenience functions instead, more control to be had,
simpler pipeline and also easier to replace with other things later.

* Drop necessity for PT for these.

* Enabling generators.

* Add mic + cleanup.

* Typo.

* Typo2.

* Remove ASR work, it does not belong in this PR anymore.

* Update src/transformers/pipelines/pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/pipelines/zero_shot_classification.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Adding many comments.

* Doc quality.

* `hidden_states` handling.

* Adding doc.

* Bad rebase.

* Autofixing docs.

* Fixing CRITICAL bug in the new Zerocls pipeline.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

b058490c

23 Dec, 2021 7 commits

Better logic for getting tokenizer config in AutoTokenizer (#14906) · 676643c6

Sylvain Gugger authored Dec 23, 2021

* Better logic for getting tokenizer config in AutoTokenizer

* Remove needless import

* Remove debug statement

* Address review comments

676643c6

Fix failing GPU trainer tests (#14903) · f566c6e3
Sylvain Gugger authored Dec 23, 2021
```
* Fix failing GPU trainer tests

* Remove print statements
```
f566c6e3
[Generate] Remove attention_mask and integrate model_main_input_name (#14856) · fe4197ab
Patrick von Platen authored Dec 23, 2021
```
* up

* save

* correct

* up

* correct more

* up

* up

* up

* up

* up

* correct

* fix tf

* fix

* remove tokenizer
```
fe4197ab
Update diarization and WavLM tolerances (#14902) · ee55ea69
Anton Lozhkov authored Dec 23, 2021

ee55ea69

Add TFCLIPModel (#13967) · 8f2cc1c3

Yih-Dar authored Dec 23, 2021



* Start the work for TFCLIPModel

* Convert to TF code (TODO: loss + doc)

* Clean up

* Fix pooled_output for TFCLIPTextTransformer - using tf.gather_nd

* assert -> raise error

* Expose TFCLIPModel

* Deal with dummy_inputs

* Add tests

* Fix all tests. TODO: manual check weight loading + add more comments

* Fix pt tf equivalence test

* fixes

* update TFCLIPVisionEmbeddings's Conv2D

* Fix loss + overwrite test_pt_tf_model_equivalence from common

* Add a comment about the change about MainLayer in test_keras_save_load

* Set return_loss=True in TFCLIPModelTester + make tests pass

* overwrite test_pt_tf_model_equivalence from tf common

* fix base_model_prefix

* Fix examples

* remove unused

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply review suggestions

* change self.pre_layrnorm to self.pre_layernorm

* apply more review suggestions

* return attention probs before dropout (to align with PT)

* fix weight init

* fix

* build doc

* fix missing doc

* fix for test
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

8f2cc1c3

Add ONNX support for MarianMT models (#14586) · 6b655cc6

lewtun authored Dec 23, 2021

* First commit to add MarianMT to ONNX

* Now MarianModel.forward() automatically generates decoder_input_ids, like BartModel.forward()

* Adjusted MarianOnnxConfig.inputs and outputs to work with seq2seq-lm feature

* Style fix

* Added support for other features for already supported models

* Partial support for causal and seq2seq models

* Partial support for causal and seq2seq models

* Add default task for MarianMT ONNX

* Remove automatic creation of decoder_input_ids

* Extend inputs and outputs for MarianMT ONNX config

* Add MarianMT to ONNX unit tests

* Refactor

* OnnxSeq2SeqConfigWithPast to support seq2seq models

* Parameterized the onnx tests

* Restored run_mlm.py

* Restored run_mlm.py

* [WIP] BART update

* BART and MBART

* Add past_key_values and fix dummy decoder inputs

Using a sequence length of 1 in generate_dummy_outputs() produces large discrepancies, presumably due to some hidden optimisations.

* Refactor MarianOnnxConfig to remove custom past_key_values logic

* Fix quality

* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"

This reverts commit 0f4e39c5.

* is_torch_available test to avoid failing imports

* sorting parameterize parameters to solve ERROR gw0 gw1

* tests fix

* tests fix

* GPT2 with past fix

* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially

* Removed onnx file

* Refactor Marian export to account for base changes

* Fix copies

* Implemented suggestions

* Extend support for causal LM

* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"

This reverts commit 0f4e39c5.

* is_torch_available test to avoid failing imports

* sorting parameterize parameters to solve ERROR gw0 gw1

* tests fix

* tests fix

* GPT2 with past fix

* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially

* Removed onnx file

* Implemented suggestions

* Fixed __init__ to resolve conflict with master

* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"

This reverts commit 0f4e39c5

.

* is_torch_available test to avoid failing imports

* sorting parameterize parameters to solve ERROR gw0 gw1

* tests fix

* tests fix

* GPT2 with past fix

* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially

* Removed onnx file

* Implemented suggestions

* Fixed __init__ to resolve conflict with master

* Remove commented import

* Remove ONNX model

* Remove redundant class method

* Tidy up imports

* Fix quality

* Refactor dummy input function

* Add copied from statements to Marian config functions

* Remove false copied from comments

* Fix copy from comment
Co-authored-by: Massimiliano Bruni <massimiliano.bruni@hcl.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

6b655cc6

Add 'with torch.no_grad()' to integration test forward pass (#14808) · 6a7b9da2
Henrik Holm authored Dec 23, 2021

6a7b9da2

22 Dec, 2021 3 commits

Onnx enable tasks for supported models (part 2) (#14700) · 13504dcb

Michael Benayoun authored Dec 22, 2021

* Revert "Revert "Added support for other features for already supported models (#14358)" (#14679)"

This reverts commit 0f4e39c5.

* is_torch_available test to avoid failing imports

* sorting parameterize parameters to solve ERROR gw0 gw1

* tests fix

* tests fix

* GPT2 with past fix

* Fixed stateful class attribute change that was breaking things when converting multiple models sequentially

* Removed onnx file

* Implemented suggestions

* Fixed __init__ to resolve conflict with master

* Remove commented import

13504dcb

Feature/fix slow test in mluke (#14749) · 824fd44f

Ryokan RI authored Dec 22, 2021

* make MLukeTokenizerTest fast

* make LukeTokenizerTest fast

* add entry to _toctree.yaml

824fd44f

update the arguments `add_prefix_space` and `trim_offsets` in... · c94c1b89

SaulLu authored Dec 22, 2021

update the arguments `add_prefix_space` and `trim_offsets` in `backend_tokenizer.post_processor` of `RobertaTokenizerFast` (#14752)

* add tests

* change post-processor, pre-tokenizer and decoder (can't update decoder)

* update test (remove decoder which doesn't depend on trim and add_prefix)

* just update the post_processor

* fix change

* `trim_offsets` has no influence on `pre_tokenizer`

* remove a test that need some input from the `tokenizers` lib maintainers

* format

* add new test offsets roberta

* polish comments

c94c1b89

21 Dec, 2021 2 commits

Add custom `stopping_criteria` and `logits_processor` to `generate` (#14779) · 5722d058

Leandro von Werra authored Dec 21, 2021



* add custom `stopping_criteria` and `logits_processor` to `generate`

* add tests for custom `stopping_criteria` and `logits_processor`

* fix typo in RAG

* address reviewer comments

* improve custom logits processor/stopping criteria error message

* fix types in merge function signature

* change default for custom list from `None` to empty list

* fix rag generate

* add string split suggestion
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

5722d058

[logging] implement warning_advice / TRANSFORMERS_NO_ADVISORY_WARNINGS (#14669) · b6ec9569
Stas Bekman authored Dec 20, 2021
```
* [logging] implement warning_advice / TRANSFORMERS_NO_ADVISORY_WARNINGS

* reword
```
b6ec9569

20 Dec, 2021 6 commits
- Add a main_input_name attribute to all models (#14803) · 33f36c86
  Sylvain Gugger authored Dec 20, 2021
```
* Add a main_input_name attribute to all models

* Fix tests

* Wtf Vs Code?

* Update src/transformers/models/imagegpt/modeling_imagegpt.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Style

* Fix copies
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
```
  33f36c86
- Add 'with torch.no_grad()' to integration test forward pass (#14820) · 0940e9b2
  Henrik Holm authored Dec 20, 2021
  
  0940e9b2
- Add 'with torch.no_grad()' to integration test forward pass (#14821) · b37cf7de
  Henrik Holm authored Dec 20, 2021
  
  b37cf7de
- [Perceiver] Skip multi-gpu tests for now (#14813) · 952a77b0
  Patrick von Platen authored Dec 20, 2021
```
* [Perceiver] Skip multi-gpu tests for now

* Update tests/test_modeling_perceiver.py

* up

* up
```
  952a77b0
- Add SD and SV heads for WavLM (#14847) · 3883e3a7
  Anton Lozhkov authored Dec 20, 2021
```
* Add converted heads

* Add dummies
```
  3883e3a7
- [WavLM] Fix slow tests (#14845) · cd583bda
  Patrick von Platen authored Dec 20, 2021
  
  cd583bda
17 Dec, 2021 1 commit

[ImageGPT] Deprecate pixel_values input name to input_ids (#14801) · 84ea427f

Patrick von Platen authored Dec 17, 2021



* [ImageGPT] Deprecate pixel_values input name to input_ids

* up

* Apply suggestions from code review
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* correct

* finish
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

84ea427f