Commits · c63fcabfe92229eac983ce743f4e9fce863c84dd · chenpangpang / transformers

10 Sep, 2021 1 commit

[Large PR] Entire rework of pipelines. (#13308) · c63fcabf

Nicolas Patry authored Sep 10, 2021



* Enabling dataset iteration on pipelines.

Enabling dataset iteration on pipelines.

Unifying parameters under `set_parameters` function.

Small fix.

Last fixes after rebase

Remove print.

Fixing text2text `generate_kwargs`

No more `self.max_length`.

Fixing tf only conversational.

Consistency in start/stop index over TF/PT.

Speeding up drastically on TF (nasty bug where max_length would increase
a ton.)

Adding test for support for non fast tokenizers.

Fixign GPU usage on zero-shot.

Fix working on Tf.

Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update src/transformers/pipelines/base.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Small cleanup.

Remove all asserts + simple format.

* Fixing audio-classification for large PR.

* Overly explicity null checking.

* Encapsulating GPU/CPU pytorch manipulation directly within `base.py`.

* Removed internal state for parameters of the  pipeline.

Instead of overriding implicitly internal state, we moved
to real named arguments on every `preprocess`, `_forward`,
`postprocess` function.

Instead `_sanitize_parameters` will be used to split all kwargs
of both __init__ and __call__ into the 3 kinds of named parameters.

* Move import warnings.

* Small fixes.

* Quality.

* Another small fix, using the CI to debug faster.

* Last fixes.

* Last fix.

* Small cleanup of tensor moving.

* is not None.

* Adding a bunch of docs + a iteration test.

* Fixing doc style.

* KeyDataset = None guard.

* RRemoving the Cuda test for pipelines (was testing).

* Even more simple iteration test.

* Correct import .

* Long day.

* Fixes in docs.

* [WIP] migrating object detection.

* Fixed the target_size bug.

* Fixup.

* Bad variable name.

* Fixing `ensure_on_device` respects original ModelOutput.

c63fcabf

08 Sep, 2021 1 commit

Object detection pipeline (#12886) · 2a15e8cc

Mishig Davaadorj authored Sep 08, 2021



* Implement object-detection pipeline

* Define threshold const

* Add `threshold` argument

* Refactor

* Uncomment test inputs

* `rm
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fix typo
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fix typo
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Chore better doc
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Rm unnecessary lines
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Chore better naming
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fix typo

* Add `detr-tiny` for tests

* Add `ObjectDetectionPipeline` to `trnsfrmrs/init`

* Implement new bbox format

* Update detr post_process

* Update `load_img` method obj det pipeline

* make style

* Implement new testing format for obj det pipeln

* Add guard pytorch specific code in pipeline

* Add doc

* Make pipeline_obj_tet tests deterministic

* Revert some changes to `post_process` COCO api

* Chore

* Update src/transformers/pipelines/object_detection.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/pipelines/object_detection.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Rm timm requirement

* make fixup

* Add timm requirement to test

* Make fixup

* Guard torch.Tensor

* Chore

* Delete unnecessary comment
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

2a15e8cc

01 Sep, 2021 1 commit

Add the `AudioClassificationPipeline` (#13342) · b9c6a976

Anton Lozhkov authored Sep 01, 2021

* Add the audio classification pipeline

* Remove autoconfig exception

* Mark ffmpeg test as slow

* Rearrange pipeline tests

* Add small test

* Replace asserts with ValueError

b9c6a976

13 May, 2021 1 commit
- Fix doc deployment · cbbf49f6
  Sylvain Gugger authored May 13, 2021
  
  cbbf49f6
07 May, 2021 1 commit

Add the ImageClassificationPipeline (#11598) · 39084ca6

Lysandre Debut authored May 07, 2021



* Add the ImageClassificationPipeline

* Code review
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

* Have `load_image` at the module level
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

39084ca6

30 Apr, 2021 1 commit

Adding `AutomaticSpeechRecognitionPipeline`. (#11337) · db9dd09c

Nicolas Patry authored Apr 30, 2021



* Adding `AutomaticSpeechRecognitionPipeline`.

- Because we added everything to enable this pipeline, we probably
should add it to `transformers`.
- This PR tries to limit the scope and focuses only on the pipeline part
(what should go in, and out).
- The tests are very specific for S2T and Wav2vec2 to make sure both
architectures are supported by the pipeline. We don't use the mixin for
tests right now, because that requires more work in the `pipeline`
function (will be done in a follow up PR).
- Unsure about the "helper" function `ffmpeg_read`. It makes a lot of
  sense from a user perspective, it does not add any additional
dependencies (as in hard dependency, because users can always use their
own load mechanism). Meanwhile, it feels slightly clunky to have so much
optional preprocessing.
- The pipeline is not done to support streaming audio right now.

Future work:

- Add `automatic-speech-recognition` as a `task`. And add the
FeatureExtractor.from_pretrained within `pipeline` function.
- Add small models within tests
- Add the Mixin to tests.
- Make the logic between ForCTC vs ForConditionalGeneration better.

* Update tests/test_pipelines_automatic_speech_recognition.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Adding docs + main import + type checking + LICENSE.

* Doc style !.

* Fixing TYPE_HINT.

* Specifying waveform shape in the docs.

* Adding asserts + specify in the documentation the shape of the input
np.ndarray.

* Update src/transformers/pipelines/automatic_speech_recognition.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Adding require to tests + move the `feature_extractor` doc.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

db9dd09c

16 Dec, 2020 1 commit

TableQuestionAnsweringPipeline (#9145) · 1c1a2ffb

Lysandre Debut authored Dec 16, 2020



* AutoModelForTableQuestionAnswering

* TableQuestionAnsweringPipeline

* Apply suggestions from Patrick's code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Sylvain and Patrick comments

* Better PyTorch/TF error message

* Add integration tests

* Argument Handler naming
Co-authored-by: patrickvonplaten <patrick.v.platen@gmail.com>

* Fix docs to appease the documentation gods
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

1c1a2ffb

10 Dec, 2020 1 commit
- Enforce all objects in the main init are documented (#9014) · 1310e1a7
  Sylvain Gugger authored Dec 10, 2020
  
  1310e1a7
07 Dec, 2020 1 commit
- Copyright (#8970) · 00aa9dbc
  Sylvain Gugger authored Dec 07, 2020
```
* Add copyright everywhere missing

* Style
```
  00aa9dbc
26 Oct, 2020 1 commit

Doc styling (#8067) · 08f534d2

Sylvain Gugger authored Oct 26, 2020

* Important files

* Styling them all

* Revert "Styling them all"

This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e.

* Syling them for realsies

* Fix syntax error

* Fix benchmark_utils

* More fixes

* Fix modeling auto and script

* Remove new line

* Fixes

* More fixes

* Fix more files

* Style

* Add FSMT

* More fixes

* More fixes

* More fixes

* More fixes

* Fixes

* More fixes

* More fixes

* Last fixes

* Make sphinx happy

08f534d2

23 Sep, 2020 1 commit

Models doc (#7345) · 3323146e

Sylvain Gugger authored Sep 23, 2020



* Clean up model documentation

* Formatting

* Preparation work

* Long lines

* Main work on rst files

* Cleanup all config files

* Syntax fix

* Clean all tokenizers

* Work on first models

* Models beginning

* FaluBERT

* All PyTorch models

* All models

* Long lines again

* Fixes

* More fixes

* Update docs/source/model_doc/bert.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/electra.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Last fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

3323146e

02 Sep, 2020 1 commit

[pipelines] Text2TextGenerationPipeline (#6744) · 4230d30f

Suraj Patil authored Sep 02, 2020

* add Text2TextGenerationPipeline

* remove max length warning

* remove comments

* remove input_length

* fix typo

* add tests

* use TFAutoModelForSeq2SeqLM

* doc

* typo

* add the doc below TextGenerationPipeline

* doc nit

* style

* delete comment

4230d30f

04 Aug, 2020 1 commit
- fix zero shot pipeline docs (#6245) · 972535ea
  Joe Davison authored Aug 04, 2020
  
  972535ea
03 Aug, 2020 1 commit

Doc pipelines (#6175) · e4920c92

Sylvain Gugger authored Aug 03, 2020



* Init work on pipelines doc

* Work in progress

* Work in progress

* Doc pipelines

* Rm unwanted default

* Apply suggestions from code review

Lysandre comments
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

e4920c92

30 Jul, 2020 1 commit

Addition of a DialoguePipeline (#5516) · e642c789

guillaume-be authored Jul 30, 2020



* initial commit for pipeline implementation

Addition of input processing and history concatenation

* Conversation pipeline tested and working for single & multiple conversation inputs

* Added docstrings for dialogue pipeline

* Addition of dialogue pipeline integration tests

* Delete test_t5.py

* Fixed max code length

* Updated styling

* Fixed test broken by formatting tools

* Removed unused import

* Added unit test for DialoguePipeline

* Fixed Tensorflow compatibility

* Fixed multi-framework support using framework flag

* - Fixed docstring
- Added `min_length_for_response` as an initialization parameter
- Renamed `*args` to `conversations`, `conversations` being a `Conversation` or a `List[Conversation]`
- Updated truncation to truncate entire segments of conversations, instead of cutting in the middle of a user/bot input

* - renamed pipeline name from dialogue to conversational
- removed hardcoded default value of 1000 and use config.max_length instead
- added `append_response` and `set_history` method to the Conversation class to avoid direct fields mutation
- fixed bug in history truncation method

* - Updated ConversationalPipeline to accept only active conversations (otherwise a ValueError is raised)

* - Simplified input tensor conversion

* - Updated attention_mask value for Tensorflow compatibility

* - Updated last dialogue reference to conversational & fixed integration tests

* Fixed conflict with master

* Updates following review comments

* Updated formatting

* Added Conversation and ConversationalPipeline to the library __init__, addition of docstrings for Conversation, added both to the docs

* Update src/transformers/pipelines.py

Updated docsting following review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

e642c789

22 Jun, 2020 1 commit

Quick tour (#5145) · 417e492f

Sylvain Gugger authored Jun 22, 2020



* Quicktour part 1

* Update

* All done

* Typos
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>

* Address comments in quick tour

* Update docs/source/quicktour.rst
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update from feedback
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

417e492f

16 Jun, 2020 1 commit
- Fix all sphynx warnings (#5068) · 011cc0be
  Sylvain Gugger authored Jun 16, 2020
  
  011cc0be
03 Jun, 2020 1 commit

Pipelines: miscellanea of QoL improvements and small features... (#4632) · 99207bd1

Julien Chaumond authored Jun 03, 2020

* [hf_api] Attach all unknown attributes for future-proof compatibility

* [Pipeline] NerPipeline is really a TokenClassificationPipeline

* modelcard.py: I don't think we need to force the download

* Remove config, tokenizer from SUPPORTED_TASKS as we're moving to one model = one weight + one tokenizer

* FillMaskPipeline: also output token in string form

* TextClassificationPipeline: option to return all scores, not just the argmax

* Update docs/source/main_classes/pipelines.rst

99207bd1

22 Apr, 2020 1 commit

Pipeline for Text Generation: GenerationPipeline (#3758) · f16540fc

Lorenzo Ampil authored Apr 22, 2020



* Add GenerationPipeline

* Fix parameter names

* Correct parameter __call__ parameters

* Add model type attribute and correct function calls for prepare_input

* Take out trailing commas from init attributes

* Remove unnecessary tokenization line

* Implement support for multiple text inputs

* Apply generation support for multiple input text prompts

* Take out tensor coersion

* Take out batch index

* Add text prompt to return sequence

* Squeeze token tensore before decoding

* Return only a single list of sequences if only one prompt was used

* Correct results variable name

* Add GenerationPipeline to SUPPORTED_TASKS with the alias , initalized w GPT2

* Registedred AutoModelWithLMHead for both pt and t

* Update docstring for GenerationPipeline

* Add kwargs parameter to mode.generate

* Take out kwargs parameter after all

* Add generation pipeline example in pipeline docstring

* Fix max length by squeezing tokens tensor

* Apply ensure_tensor_on_device to pytorch tensor

* Include generation step in torch.no_grad

* Take out input from prepare_xlm_input and set 'en' as default xlm_language

* Apply framework specific encoding during prepare_input

* Format w make style

* Move GenerationPipeline import to follow proper import sorting

* Take out training comma from generation dict

* Apply requested changes

* Change name to TextGenerationPipeline

* Apply TextGenerationPipeline rename to __init___

* Changing alias to

* Set input mapping as input to ensure_tensor_on_device

* Fix assertion placement

* Add test_text_generation

* Add TextGenerationPipeline to PipelineCommonTests

* Take out whitespace

* Format __init__ w black

* Fix __init__ style

* Forman __init___

* Add line to end of __init__

* Correct model tokenizer set for test_text_generation

* Ensure to return list of list, not list of string (to pass test)

* Limit test models to only 3 to limit runtime to address circleCI timeout error

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update tests/test_pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Remove argument docstring, __init__, add additional __call__ arguments, and reformat results to list of dict

* Fix blank result list

* Add TextGenerationPipeline to pipelines.rst

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Fix typos from adding PADDING_TEXT_TOKEN_LENGTH

* Fix incorrectly moved result list

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py

* Update src/transformers/pipelines.py
Co-Authored-By: Patrick von Platen <patrick.v.platen@gmail.com>

* Add back generation line and make style

* Take out blank whitespace

* Apply new alis, text-generation, to test_pipelines

* Fix text generation alias in test

* Update src/transformers/pipelines.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <chaumond@gmail.com>

f16540fc

17 Mar, 2020 1 commit

Add Summarization to Pipelines (#3128) · 38a555a8

Sam Shleifer authored Mar 17, 2020

* passing

* Undo stupid chg

* docs

* undo rename

* delete-cruft

* only import if you have torch

* Dont rely on dict ordering

* Fix dict ordering upstream

* docstring link

* docstring link

* remove trailing comma for 3.5 compat

* new name

* delegate kwarging

* Update kwargs

38a555a8

02 Mar, 2020 1 commit

Pipeline doc (#3055) · d3eb7d23

Lysandre Debut authored Mar 02, 2020

* Pipeline doc initial commit

* pipeline abstraction

* Remove modelcard argument from pipeline

* Task-specific pipelines can be instantiated with no model or tokenizer

* All pipelines doc

d3eb7d23