Commits · 280db2e39c1e586389df4e46f2b895fc092911bb · chenpangpang / transformers

05 Aug, 2022 3 commits

Fix `test_dbmdz_english` by updating expected values (#18482) · 280db2e3
Yih-Dar authored Aug 05, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
280db2e3
Fix pipeline tests (#18487) · 70fa1a8d
Sylvain Gugger authored Aug 05, 2022
```
* Fix pipeline tests

* Make sure all pipelines tests run with init changes
```
70fa1a8d

Fixing issue where generic model types wouldn't load properly with the pipeline (#18392) · 586dcf6b

Nicolas Patry authored Aug 05, 2022

* Adding a better error message when the model is improperly configured

within transformers.

* Update src/transformers/pipelines/__init__.py

* Black version.

* Overriding task aliases so that tokenizer+feature_extractor

values are correct.

* Fixing task aliases by overriding their names early

* X.

* Fixing feature-extraction.

* black again.

* Normalizing `translation` too.

* Fixing last few corner cases.

translation need to use its non normalized name (translation_XX_to_YY,
so that the task_specific_params are correctly overloaded).
This can be removed and cleaned up in a later PR.

`speech-encode-decoder` actually REQUIRES to pass a `tokenizer` manually
so the error needs to be discarded when the `tokenizer` is already
there.

* doc-builder fix.

* Fixing the real issue.

* Removing dead code.

* Do not import the actual config classes.

586dcf6b

02 Aug, 2022 1 commit

Update pipeline word heuristic to work with whitespace in token offsets (#18402) · 042f4203

David authored Aug 02, 2022

* Update pipeline word heuristic to work with whitespace in token offsets

This change checks for whitespace in the input string at either the
character preceding the token or in the first character of the token.
This works with tokenizers that return offsets excluding whitespace
between words or with offsets including whitespace.

fixes #18111

starting

* Use smaller model, ensure expected tokenization

* Re-run CI (please squash)

042f4203

19 Jul, 2022 1 commit

Custom pipeline (#18079) · dc9147ff

Sylvain Gugger authored Jul 19, 2022



* Initial work

* More work

* Add tests for custom pipelines on the Hub

* Protect import

* Make the test work for TF as well

* Last PyTorch specific bit

* Add documentation

* Style

* Title in toc

* Bad names!

* Update docs/source/en/add_new_pipeline.mdx
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Auto stash before merge of "custom_pipeline" and "origin/custom_pipeline"

* Address review comments

* Address more review comments

* Update src/transformers/pipelines/__init__.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

dc9147ff

15 Jul, 2022 1 commit

Adding support for `device_map` directly in `pipeline(..)` function. (#17902) · ccc08978

Nicolas Patry authored Jul 15, 2022

* Adding support for `device_map` directly in `pipeline(..)` function.

* Updating the docstring.

* Adding a better docstring

* Put back type hints.

* Blacked. (`make fixup` didn't work ??!!)

ccc08978

11 Jul, 2022 3 commits
- Fix image segmentation and object detection pipeline tests (#18100) · 6c8017a5
  Sylvain Gugger authored Jul 11, 2022
  
  6c8017a5
- Skip failing tests · b0520f59
  Sylvain Gugger authored Jul 11, 2022
  
  b0520f59
- Fix some typos. (#17560) · 95113d13
  Yulv-git authored Jul 11, 2022
```
* Fix some typos.
Signed-off-by: Yulv-git <yulvchi@qq.com>

* Fix typo.
Signed-off-by: Yulv-git <yulvchi@qq.com>

* make fixup.
```
  95113d13
01 Jul, 2022 1 commit
- Restore original task in test_warning_logs (#17985) · 6f0723a9
  Yih-Dar authored Jul 01, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  6f0723a9
30 Jun, 2022 2 commits

feat: add pipeline registry abstraction (#17905) · 49cd736a

Aaron Pham authored Jun 30, 2022



* feat: add pipeline registry abstraction

- added `PipelineRegistry` abstraction
- updates `add_new_pipeline.mdx` (english docs) to reflect the api addition
- migrate `check_task` and `get_supported_tasks` from
  transformers/pipelines/__init__.py to
  transformers/pipelines/base.py#PipelineRegistry.{check_task,get_supported_tasks}
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* fix: update with upstream/main

chore: Apply suggestions from sgugger's code review
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* chore: PR updates

- revert src/transformers/dependency_versions_table.py from upstream/main
- updates pipeline registry to use global variables
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* tests: add tests for pipeline registry
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* tests: add test for output warning.
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: fmt and cleanup unused imports
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* fix: change imports to top of the file and address comments
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

49cd736a

[Pipelines] Add revision tag to all default pipelines (#17667) · e4d25885

Patrick von Platen authored Jun 30, 2022



* trigger test failure

* upload revision poc

* Update src/transformers/pipelines/base.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>

* up

* add test

* correct some stuff

* Update src/transformers/pipelines/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct require flag
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

e4d25885

29 Jun, 2022 2 commits
- Fix img seg tests (load checkpoints from `hf-internal-testing`) (#17939) · 77b76672
  Mishig Davaadorj authored Jun 29, 2022
```
* Revert "Skip failing test until they are fixed."

This reverts commit 8f400775.

* Use `tiny-detr` checkpts from `hf-internal-testing`
```
  77b76672
- Skip failing test until they are fixed. · 8f400775
  Sylvain Gugger authored Jun 29, 2022
  
  8f400775
28 Jun, 2022 1 commit

Fixing a regression with `return_all_scores` introduced in #17606 (#17906) · 776855c7

Nicolas Patry authored Jun 28, 2022

Fixing a regression with `return_all_scores` introduced in #17606

- The legacy test actually tested `return_all_scores=False` (the actual
  default) instead of `return_all_scores=True` (the actual weird case).

This commit adds the correct legacy test and fixes it.

Tmp legacy tests.

Actually fix the regression (also contains lists)

Less diffed code.

776855c7

13 Jun, 2022 2 commits

Add `LongT5` model (#16792) · a72f1c9f

Daniel Stancl authored Jun 13, 2022



* Initial commit

* Make some fixes

* Make PT model full forward pass

* Drop TF & Flax implementation, fix copies etc

* Add Flax model and update some corresponding stuff

* Drop some TF things

* Update config and flax local attn

* Add encoder_attention_type to config

* .

* Update docs

* Do some cleansing

* Fix some issues -> make style; add some docs

* Fix position_bias + mask addition + Update tests

* Fix repo consistency

* Fix model consistency by removing flax operation over attn_mask

* [WIP] Add PT TGlobal LongT5

* .

* [WIP] Add flax tglobal model

* [WIP] Update flax model to use the right attention type in the encoder

* Fix flax tglobal model forward pass

* Make the use of global_relative_attention_bias

* Add test suites for TGlobal model

* Fix minor bugs, clean code

* Fix pt-flax equivalence though not convinced with correctness

* Fix LocalAttn implementation to match the original impl. + update READMEs

* Few updates

* Update: [Flax] improve large model init and loading #16148

* Add ckpt conversion script accoring to #16853 + handle torch device placement

* Minor updates to conversion script.

* Typo: AutoModelForSeq2SeqLM -> FlaxAutoModelForSeq2SeqLM

* gpu support + dtype fix

* Apply some suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* * Remove (de)parallelize stuff
* Edit shape comments
* Update README.md
* make fix-copies

* Remove caching logic for local & tglobal attention

* Apply another batch of suggestions from code review

* Add missing checkpoints
* Format converting scripts
* Drop (de)parallelize links from longT5 mdx

* Fix converting script + revert config file change

* Revert "Remove caching logic for local & tglobal attention"

This reverts commit 2a619828f6ddc3e65bd9bb1725a12b77fa883a46.

* Stash caching logic in Flax model

* Make side relative bias used always

* Drop caching logic in PT model

* Return side bias as it was

* Drop all remaining model parallel logic

* Remove clamp statements

* Move test files to the proper place

* Update docs with new version of hf-doc-builder

* Fix test imports

* Make some minor improvements

* Add missing checkpoints to docs
* Make TGlobal model compatible with torch.onnx.export
* Replace some np.ndarray with jnp.ndarray

* Fix TGlobal for ONNX conversion + update docs

* fix _make_global_fixed_block_ids and masked neg  value

* update flax model

* style and quality

* fix imports

* remove load_tf_weights_in_longt5 from init and fix copies

* add slow test for TGlobal model

* typo fix

* Drop obsolete is_parallelizable and one warning

* Update __init__ files to fix repo-consistency

* fix pipeline test

* Fix some device placements

* [wip]: Update tests -- need to generate summaries to update expected_summary

* Fix quality

* Update LongT5 model card

* Update (slow) summarization tests

* make style

* rename checkpoitns

* finish

* fix flax tests
Co-authored-by: phungvanduy <pvduy23@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: patil-suraj <surajp815@gmail.com>

a72f1c9f

Add Visual Question Answering (VQA) pipeline (#17286) · 66336dc1

Sijun He authored Jun 13, 2022



* wip

* rebase

* all tests pass

* rebase

* ready for PR

* address comments

* fix styles

* add require_torch to pipeline test

* remove remote image to improve CI consistency

* address comments; fix tf/flax tests

* address comments; fix tf/flax tests

* fix tests; add alias

* repo consistency tests

* Update src/transformers/pipelines/visual_question_answering.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* address comments

* Update src/transformers/pipelines/visual_question_answering.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* merge

* Update src/transformers/models/auto/modeling_auto.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* merge
Co-authored-by: Sijun He <sijunhe@Sijuns-MacBook-Pro.local>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

66336dc1

09 Jun, 2022 2 commits

Running a pipeline of `float16`. (#17637) · c38f4e1f

Nicolas Patry authored Jun 09, 2022

When we're preparing the tensors for CPU for postprocessing, we need
to upgrade the `float16` to `float32` since CPUs don't have instructions
for `[b]float16`.

c38f4e1f

Adding `top_k` argument to `text-classification` pipeline. (#17606) · 2351729f

Nicolas Patry authored Jun 09, 2022

* Adding `top_k` and `sort` arguments to `text-classification` pipeline.

- Deprecate `return_all_scores` as `top_k` is more uniform with other
  pipelines, and a superset of what `return_all_scores` can do.
  BC is maintained though.
  `return_all_scores=True` -> `top_k=None`
  `return_all_scores=False` -> `top_k=1`

- Using `top_k` will imply sorting the results, but using no argument
  will keep the results unsorted for backward compatibility.

* Remove `sort`.

* Fixing the test.

* Remove bad doc.

2351729f

19 May, 2022 2 commits

Adding `batch_size` test to QA pipeline. (#17330) · 2b282296
Nicolas Patry authored May 19, 2022

2b282296

[BC] Fixing usage of text pairs (#17324) · a4386d7e

Nicolas Patry authored May 19, 2022



* [BC] Fixing usage of text pairs

The BC is actually preventing users from misusing the pipeline since
users could have been willing to send text pairs and the pipeline would
instead understand the thing as a batch returning bogus results.

The correct usage of text pairs is preserved in this PR even when that
makes the code clunky.

Adds support for {"text":..,, "text_pair": ...} inputs for both dataset
iteration and more explicit usage to pairs.

* Updating the doc.

* Update src/transformers/pipelines/text_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/text_classification.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/pipelines/test_pipelines_text_classification.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* quality.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

a4386d7e

18 May, 2022 1 commit
- Accepting real pytorch device as arguments. (#17318) · 2cb2ea3f
  Nicolas Patry authored May 18, 2022
```
* Accepting real pytorch device as arguments.

* is_torch_available.
```
  2cb2ea3f
12 May, 2022 1 commit

Black preview (#17217) · afe5d42d

Sylvain Gugger authored May 12, 2022

* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black

afe5d42d

10 May, 2022 1 commit
- LogSumExp trick `question_answering` pipeline. (#17143) · 6d80c92c
  Nicolas Patry authored May 10, 2022
```
* LogSumExp trick `question_answering` pipeline.

* Adding a failing test.
```
  6d80c92c
05 May, 2022 1 commit
- fix missing "models" in pipeline test module (#17090) · a59eb349
  Yih-Dar authored May 05, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  a59eb349
21 Apr, 2022 1 commit

Long QuestionAnsweringPipeline fix. (#16778) · 6620f60c

Nicolas Patry authored Apr 21, 2022

* Temporary commit witht the long QA fix.

* Adding slow tests covering this fix.

* Removing fast test as it doesn't fail anyway.

6620f60c

20 Apr, 2022 1 commit
- Fixing return type tensor with `num_return_sequences>1`. (#16828) · e13a91fe
  Nicolas Patry authored Apr 20, 2022
```
* Fixing return type tensor with `num_return_sequences>1`.

* Nit.
```
  e13a91fe
14 Apr, 2022 1 commit

Enabling `Tapex` in table question answering pipeline. (#16663) · 195fbbb6

Nicolas Patry authored Apr 14, 2022

* Enabling `Tapex` in table question answering pipeline.

* Questions are independant for Tapex, making the test respect that.

* Missing extra space.

195fbbb6

12 Apr, 2022 1 commit

Change the chunk_iter function to handle (#16730) · a192f61e

Nicolas Patry authored Apr 12, 2022

* Change the chunk_iter function to handle

the subtle cases where the last chunk gets ignored since all the
data is in the `left_strided` data.

We need to remove the right striding on the previous item.

* Remove commented line.

a192f61e

18 Mar, 2022 1 commit

Attention mask is important in the case of batching... (#16222) · ecb4662d

Nicolas Patry authored Mar 18, 2022

* Attention mask is important in the case of batching...

* Improve the fix.

* Making the sentence different enough that they exhibit different
predictions.

ecb4662d

09 Mar, 2022 1 commit

Add `ForInstanceSegmentation` models to `image-segmentation` pipelines (#15937) · f4e4ad34

Nicolas Patry authored Mar 09, 2022

* Adding ForInstanceSegmentation to pipelines.

* Last fix `category_id` renamed to `label_id`.

* Can't be none no more.

* No `is_thing_map` anymore.

f4e4ad34

04 Mar, 2022 2 commits
- Updating the slow tests: (#15893) · 7ade7c17
  Nicolas Patry authored Mar 04, 2022
```
Linked to https://github.com/huggingface/transformers/pull/15826
```
  7ade7c17
- Re-enabling all fast pipeline tests. (#15924) · a6e3b179
  Nicolas Patry authored Mar 04, 2022
  
  a6e3b179
03 Mar, 2022 2 commits
- Enabling MaskFormer in pipelines (#15917) · 3822e4a5
  Nicolas Patry authored Mar 03, 2022
```
* Enabling MaskFormer in ppipelines

No AutoModel though :(

* Ooops local file.
```
  3822e4a5
- The tests were not updated after the addition of `torch.diag` (#15890) · b693cbf9
  Nicolas Patry authored Mar 03, 2022
```
in the scoring (which is more correct)
```
  b693cbf9
02 Mar, 2022 1 commit
- Adding timestamps for CTC with LM in ASR pipeline. (#15863) · 6e57a569
  Nicolas Patry authored Mar 02, 2022
```
* Adding timestamps for CTC with LM in ASR pipeline.

* iRemove print.

* Nit change.
```
  6e57a569
28 Feb, 2022 1 commit

Fixing the timestamps with chunking. (#15843) · 97f9b8a2

Nicolas Patry authored Feb 28, 2022



* Fixing the timestamps with chunking.

* The changes modified (and fixed) the striding tests.

* Adding a tokenizer test.

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Defense -> comment.

* Update src/transformers/models/wav2vec2/tokenization_wav2vec2.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

97f9b8a2

25 Feb, 2022 2 commits

Adding the option to return_timestamps on pure CTC ASR models. (#15792) · ad0d7d17

Nicolas Patry authored Feb 25, 2022



* Adding the option to return_timestamps on pure CTC ASR models.

* Remove `math.prod` which was introduced in Python 3.8

* int are not floats.

* Reworking the PR to support "char" vs "word" output.

* Fixup!

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Quality.
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

ad0d7d17

Fix semantic segmentation pipeline test (#15826) · 074645e3
Sylvain Gugger authored Feb 25, 2022

074645e3

23 Feb, 2022 1 commit

[Test refactor 1/5] Per-folder tests reorganization (#15725) · 29c10a41

Lysandre Debut authored Feb 23, 2022



* Per-folder tests reorganization
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

29c10a41