[Large PR] Entire rework of pipelines. (#13308)

* Enabling dataset iteration on pipelines. Enabling dataset iteration on pipelines. Unifying parameters under `set_parameters` function. Small fix. Last fixes after rebase Remove print. Fixing text2text `generate_kwargs` No more `self.max_length`. Fixing tf only conversational. Consistency in start/stop index over TF/PT. Speeding up drastically on TF (nasty bug where max_length would increase a ton.) Adding test for support for non fast tokenizers. Fixign GPU usage on zero-shot. Fix working on Tf. Update src/transformers/pipelines/base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/pipelines/base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Small cleanup. Remove all asserts + simple format. * Fixing audio-classification for large PR. * Overly explicity null checking. * Encapsulating GPU/CPU pytorch manipulation directly within `base.py`. * Removed internal state for parameters of the pipeline. Instead of overriding implicitly internal state, we moved to real named arguments on every `preprocess`, `_forward`, `postprocess` function. Instead `_sanitize_parameters` will be used to split all kwargs of both __init__ and __call__ into the 3 kinds of named parameters. * Move import warnings. * Small fixes. * Quality. * Another small fix, using the CI to debug faster. * Last fixes. * Last fix. * Small cleanup of tensor moving. * is not None. * Adding a bunch of docs + a iteration test. * Fixing doc style. * KeyDataset = None guard. * RRemoving the Cuda test for pipelines (was testing). * Even more simple iteration test. * Correct import . * Long day. * Fixes in docs. * [WIP] migrating object detection. * Fixed the target_size bug. * Fixup. * Bad variable name. * Fixing `ensure_on_device` respects original ModelOutput.

[Large PR] Entire rework of pipelines. (#13308)
* Enabling dataset iteration on pipelines. Enabling dataset iteration on pipelines. Unifying parameters under `set_parameters` function. Small fix. Last fixes after rebase Remove print. Fixing text2text `generate_kwargs` No more `self.max_length`. Fixing tf only conversational. Consistency in start/stop index over TF/PT. Speeding up drastically on TF (nasty bug where max_length would increase a ton.) Adding test for support for non fast tokenizers. Fixign GPU usage on zero-shot. Fix working on Tf. Update src/transformers/pipelines/base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/pipelines/base.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Small cleanup. Remove all asserts + simple format. * Fixing audio-classification for large PR. * Overly explicity null checking. * Encapsulating GPU/CPU pytorch manipulation directly within `base.py`. * Removed internal state for parameters of the pipeline. Instead of overriding implicitly internal state, we moved to real named arguments on every `preprocess`, `_forward`, `postprocess` function. Instead `_sanitize_parameters` will be used to split all kwargs of both __init__ and __call__ into the 3 kinds of named parameters. * Move import warnings. * Small fixes. * Quality. * Another small fix, using the CI to debug faster. * Last fixes. * Last fix. * Small cleanup of tensor moving. * is not None. * Adding a bunch of docs + a iteration test. * Fixing doc style. * KeyDataset = None guard. * RRemoving the Cuda test for pipelines (was testing). * Even more simple iteration test. * Correct import . * Long day. * Fixes in docs. * [WIP] migrating object detection. * Fixed the target_size bug. * Fixup. * Bad variable name. * Fixing `ensure_on_device` respects original ModelOutput.
c63fcabf · Nicolas Patry · GitHub · 09549aa1 · c63fcabf · c63fcabf
Unverified Commit c63fcabf authored Sep 10, 2021 by Nicolas Patry Committed by GitHub Sep 10, 2021
20 changed files
--- a/docs/source/add_new_pipeline.rst
+++ b/docs/source/add_new_pipeline.rst
+.. 
+    Copyright 2020 The HuggingFace Team. All rights reserved.
+    Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+    the License. You may obtain a copy of the License at
+        http://www.apache.org/licenses/LICENSE-2.0
+    Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+    an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+How to add a pipeline to 🤗 Transformers?
+=======================================================================================================================
+First and foremost, you need to decide the raw entries the pipeline will be able to take. It can be strings, raw bytes,
+dictionnaries or whatever seems to be the most likely desired input. Try to keep these inputs as pure Python as
+possible as it makes compatibility easier (even through other languages via JSON). Those will be the :obj:`inputs` of
+the pipeline (:obj:`preprocess`).
+Then define the :obj:`outputs`. Same policy as the :obj:`inputs`. The simpler, the better. Those will be the outputs of
+:obj:`postprocess` method.
+Start by inheriting the base class :obj:`Pipeline`. with the 4 methods needed to implement :obj:`preprocess`,
+:obj:`_forward`, :obj:`postprocess` and :obj:`_sanitize_parameters`.
+.. code-block::
+    from transformers import Pipeline
+    class MyPipeline(Pipeline):
+        def _sanitize_parameters(self, **kwargs)
+            preprocess_kwargs = {}
+            if "maybe_arg" in kwargs:
+                preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
+            return preprocess_kwargs, {}, {}
+        def preprocess(self, inputs, maybe_arg=2)
+            model_input = Tensor(....)
+            return {"model_input": model_input}
+        def _forward(self, model_inputs)
+            # model_inputs == {"model_input": model_input}
+            oututs = self.model(**model_inputs)
+            # Maybe {"logits": Tensor(...)}
+            return outputs
+        def postprocess(self, model_outputs)
+            best_class = model_outputs["logits"].softmax(-1)
+            return best_class
+The structure of this breakdown is to support relatively seemless support for CPU/GPU, while supporting doing
+pre/postprocessing on the CPU on different threads
+:obj:`preprocess` will take the original defined inputs, and turn them something feedable to the model. It might
+contain more information and is usally a :obj:`Dict`.
+:obj:`_forward` is the implementation detail and is not meant to be called directly :obj:`forward` is the preferred
+called method as it contains safeguards to make sure everything is working on the expected device. If anything is
+linked to a real model it belongs in the :obj:`_forward` method, anything else is in the preprocess/postrocess.
+:obj:`postprocess` methods will take the output of :obj:`_forward` and turn it into the final output that were decided
+earlier.
+:obj:`_sanitize_parameters` exists to allow users to pass any parameters whenever they wish, be it at initialization
+time ``pipeline(...., maybe_arg=4)`` or at call time ``pipe = pipeline(...); output = pipe(...., maybe_arg=4)``.
+The returns of :obj:`_sanitize_parameters` are the 3 dicts of kwargs that will be passed directly to :obj:`preprocess`,
+:obj:`_forward` and :obj:`postprocess`. Don't fill anything if the caller didn't call with any extra parameter. That
+allows to keep the default arguments in the function definition which is always more "natural".
+A classic example would be a :obj:`top_k` argument in the post processing in classification tasks.
+.. code-block::
+    >>> pipe = pipeline("my-new-task")
+    >>> pipe("This is a test")
+    [{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}, {"label": "3-star", "score": 0.05}
+    {"label": "4-star", "score": 0.025}, {"label": "5-star", "score": 0.025}]
+    >>> pipe("This is a test", top_k=2)
+    [{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}]
+In order to achieve that, we'll update our :obj:`postprocess` method with a default parameter to :obj:`5`. and edit
+:obj:`_sanitize_parameters` to allow this new parameter.
+.. code-block::
+        def postprocess(self, model_outputs, top_k=5)
+            best_class = model_outputs["logits"].softmax(-1)
+            # Add logic to handle top_k
+            return best_class
+        def _sanitize_parameters(self, **kwargs)
+            preprocess_kwargs = {}
+            if "maybe_arg" in kwargs:
+                preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
+            postprocess_kwargs = {}
+            if "top_k" in kwargs:
+                preprocess_kwargs["top_k"] = kwargs["top_k"]
+            return preprocess_kwargs, {}, postprocess_kwargs
+Try to keep the inputs/outputs very simple and ideally JSON-serializable as it makes the pipeline usage very easy
+without requiring users to understand new kind of objects. It's also relatively common to support many different types
+of arguments for ease of use (audio files, can be filenames, URLs or pure bytes)
+Adding it to the list of supported tasks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Go to ``src/transformers/pipelines/__init__.py`` and fill in :obj:`SUPPORTED_TASKS` with your newly created pipeline.
+If possible it should provide a default model.
+Adding tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Create a new file ``tests/test_pipelines_MY_PIPELINE.py`` with example with the other tests.
+The :obj:`run_pipeline_test` function will be very generic and run on small random models on every possible
+architecture as defined by :obj:`model_mapping` and :obj:`tf_model_mapping`.
+This is very important to test future compatibilty, meaning if someone adds a new model for
+:obj:`XXXForQuestionAnswering` then the pipeline test will attempt to run on it. Because the models are random it's
+impossible to check for actual values, that's why There is a helper :obj:`ANY` that will simply attempt to match the
+output of the pipeline TYPE.
+You also *need* to implement 2 (ideally 4) tests.
+- :obj:`test_small_model_pt` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
+  and test the pipeline outputs. The results should be the same as :obj:`test_small_model_tf`.
+- :obj:`test_small_model_tf` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
+  and test the pipeline outputs. The results should be the same as :obj:`test_small_model_pt`.
+- :obj:`test_large_model_pt` (:obj:`optional`): Tests the pipeline on a real pipeline where the results are supposed to
+  make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
+  sure there is no drift in future releases
+- :obj:`test_large_model_tf` (:obj:`optional`): Tests the pipeline on a real pipeline where the results are supposed to
+  make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
+  sure there is no drift in future releases
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -506,6 +506,7 @@ Flax), PyTorch, and/or TensorFlow.
    migration
    contributing
    add_new_model
+    add_new_pipeline
    fast_tokenizers
    performance
    parallelism

--- a/docs/source/main_classes/pipelines.rst
+++ b/docs/source/main_classes/pipelines.rst
@@ -46,12 +46,53 @@ The pipeline abstraction
 The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any other
 pipeline but requires an additional argument which is the `task`.
+Simple call on one item:
+.. code-block::
+    >>> pipe = pipeline("text-classification")
+    >>> pipe("This restaurant is awesome")
+    [{'label': 'POSITIVE', 'score': 0.9998743534088135}]
+To call a pipeline on many items, you can either call with a `list`.
+.. code-block::
+    >>> pipe = pipeline("text-classification")
+    >>> pipe(["This restaurant is awesome", "This restaurant is aweful"])
+    [{'label': 'POSITIVE', 'score': 0.9998743534088135},
+     {'label': 'NEGATIVE', 'score': 0.9996669292449951}]
+To iterate of full datasets it is recommended to use a :obj:`dataset` directly. This means you don't need to allocate
+the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on
+GPU. If it doesn't don't hesitate to create an issue.
+.. code-block::
+    pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
+    dataset = datasets.load_dataset("superb", name="asr", split="test")
+    # KeyDataset (only `pt`) will simply return the item in the dict returned by the dataset item
+    # as we're not interested in the `target` part of the dataset.
+    for out in tqdm.tqdm(pipe(KeyDataset(dataset, "file"))):
+        print(out)
+        # {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
+        # {"text": ....}
+        # ....
 .. autofunction:: transformers.pipeline
+Implementing a pipeline
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+:doc:`Implementing a new pipeline <../add_new_pipeline>`
 The task specific pipelines
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 AudioClassificationPipeline
 =======================================================================================================================

--- a/docs/source/quicktour.rst
+++ b/docs/source/quicktour.rst
@@ -67,8 +67,8 @@ make them readable. For instance:
    >>> classifier('We are very happy to show you the 🤗 Transformers library.')
    [{'label': 'POSITIVE', 'score': 0.9998}]
-That's encouraging! You can use it on a list of sentences, which will be preprocessed then fed to the model as a
+That's encouraging! You can use it on a list of sentences, which will be preprocessed then fed to the model, returning
-`batch`, returning a list of dictionaries like this one:
+a list of dictionaries like this one:
 .. code-block::
@@ -79,6 +79,8 @@ That's encouraging! You can use it on a list of sentences, which will be preproc
    label: POSITIVE, with score: 0.9998
    label: NEGATIVE, with score: 0.5309
+To use with a large dataset, look at :doc:`iterating over a pipeline <./main_classes/pipelines>`
 You can see the second sentence has been classified as negative (it needs to be positive or negative) but its score is
 fairly neutral.

--- a/src/transformers/pipelines/__init__.py
+++ b/src/transformers/pipelines/__init__.py
@@ -249,18 +249,22 @@ def check_task(task: str) -> Tuple[Dict, Any]:
        task (:obj:`str`):
            The task defining which pipeline will be returned. Currently accepted tasks are:
+            - :obj:`"audio-classification"`
+            - :obj:`"automatic-speech-recognition"`
+            - :obj:`"conversational"`
            - :obj:`"feature-extraction"`
-            - :obj:`"text-classification"`
-            - :obj:`"sentiment-analysis"` (alias of :obj:`"text-classification")
-            - :obj:`"token-classification"`
-            - :obj:`"ner"` (alias of :obj:`"token-classification")
-            - :obj:`"question-answering"`
            - :obj:`"fill-mask"`
-            - :obj:`"summarization"`
+            - :obj:`"image-classification"`
-            - :obj:`"translation_xx_to_yy"`
+            - :obj:`"question-answering"`
-            - :obj:`"translation"`
+            - :obj:`"table-question-answering"`
+            - :obj:`"text2text-generation"`
+            - :obj:`"text-classification"` (alias :obj:`"sentiment-analysis" available)
            - :obj:`"text-generation"`
-            - :obj:`"conversational"`
+            - :obj:`"token-classification"` (alias :obj:`"ner"` available)
+            - :obj:`"translation"`
+            - :obj:`"translation_xx_to_yy"`
+            - :obj:`"summarization"`
+            - :obj:`"zero-shot-classification"`
    Returns:
        (task_defaults:obj:`dict`, task_options: (:obj:`tuple`, None)) The actual dictionary required to initialize the
@@ -312,21 +316,26 @@ def pipeline(
        task (:obj:`str`):
            The task defining which pipeline will be returned. Currently accepted tasks are:
-            - :obj:`"feature-extraction"`: will return a :class:`~transformers.FeatureExtractionPipeline`.
+            - :obj:`"audio-classification"`: will return a :class:`~transformers.AudioClassificationPipeline`:.
-            - :obj:`"text-classification"`: will return a :class:`~transformers.TextClassificationPipeline`.
+            - :obj:`"automatic-speech-recognition"`: will return a
-            - :obj:`"sentiment-analysis"`: (alias of :obj:`"text-classification"`) will return a
+              :class:`~transformers.AutomaticSpeechRecognitionPipeline`:.
-              :class:`~transformers.TextClassificationPipeline`.
+            - :obj:`"conversational"`: will return a :class:`~transformers.ConversationalPipeline`:.
-            - :obj:`"token-classification"`: will return a :class:`~transformers.TokenClassificationPipeline`.
+            - :obj:`"feature-extraction"`: will return a :class:`~transformers.FeatureExtractionPipeline`:.
-            - :obj:`"ner"` (alias of :obj:`"token-classification"`): will return a
+            - :obj:`"fill-mask"`: will return a :class:`~transformers.FillMaskPipeline`:.
-              :class:`~transformers.TokenClassificationPipeline`.
+            - :obj:`"image-classification"`: will return a :class:`~transformers.ImageClassificationPipeline`:.
-            - :obj:`"question-answering"`: will return a :class:`~transformers.QuestionAnsweringPipeline`.
+            - :obj:`"question-answering"`: will return a :class:`~transformers.QuestionAnsweringPipeline`:.
-            - :obj:`"fill-mask"`: will return a :class:`~transformers.FillMaskPipeline`.
+            - :obj:`"table-question-answering"`: will return a :class:`~transformers.TableQuestionAnsweringPipeline`:.
-            - :obj:`"summarization"`: will return a :class:`~transformers.SummarizationPipeline`.
+            - :obj:`"text2text-generation"`: will return a :class:`~transformers.Text2TextGenerationPipeline`:.
-            - :obj:`"translation_xx_to_yy"`: will return a :class:`~transformers.TranslationPipeline`.
+            - :obj:`"text-classification"` (alias :obj:`"sentiment-analysis" available): will return a
-            - :obj:`"text2text-generation"`: will return a :class:`~transformers.Text2TextGenerationPipeline`.
+              :class:`~transformers.TextClassificationPipeline`:.
-            - :obj:`"text-generation"`: will return a :class:`~transformers.TextGenerationPipeline`.
+            - :obj:`"text-generation"`: will return a :class:`~transformers.TextGenerationPipeline`:.
-            - :obj:`"zero-shot-classification:`: will return a :class:`~transformers.ZeroShotClassificationPipeline`.
+            - :obj:`"token-classification"` (alias :obj:`"ner"` available): will return a
-            - :obj:`"conversational"`: will return a :class:`~transformers.ConversationalPipeline`.
+              :class:`~transformers.TokenClassificationPipeline`:.
+            - :obj:`"translation"`: will return a :class:`~transformers.TranslationPipeline`:.
+            - :obj:`"translation_xx_to_yy"`: will return a :class:`~transformers.TranslationPipeline`:.
+            - :obj:`"summarization"`: will return a :class:`~transformers.SummarizationPipeline`:.
+            - :obj:`"zero-shot-classification"`: will return a :class:`~transformers.ZeroShotClassificationPipeline`:.
        model (:obj:`str` or :obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`, `optional`):
            The model that will be used by the pipeline to make predictions. This can be a model identifier or an
            actual instance of a pretrained model inheriting from :class:`~transformers.PreTrainedModel` (for PyTorch)

--- a/src/transformers/pipelines/audio_classification.py
+++ b/src/transformers/pipelines/audio_classification.py
@@ -12,23 +12,16 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 import subprocess
-from typing import TYPE_CHECKING, Optional, Union
+from typing import Union
 import numpy as np
-from ..feature_extraction_utils import PreTrainedFeatureExtractor
 from ..file_utils import add_end_docstrings, is_torch_available
 from ..utils import logging
 from .base import PIPELINE_INIT_ARGS, Pipeline
-if TYPE_CHECKING:
-    from ..modeling_tf_utils import TFPreTrainedModel
-    from ..modeling_utils import PreTrainedModel
 if is_torch_available():
-    import torch
    from ..models.auto.modeling_auto import MODEL_FOR_AUDIO_CLASSIFICATION_MAPPING
 logger = logging.get_logger(__name__)
@@ -84,14 +77,10 @@ class AudioClassificationPipeline(Pipeline):
    <https://huggingface.co/models?filter=audio-classification>`__.
    """
-    def __init__(
+    def __init__(self, *args, **kwargs):
-        self,
+        # Default, might be overriden by the model.config.
-        model: Union["PreTrainedModel", "TFPreTrainedModel"],
+        kwargs["top_k"] = 5
-        feature_extractor: PreTrainedFeatureExtractor,
+        super().__init__(*args, **kwargs)
-        framework: Optional[str] = None,
-        **kwargs
-    ):
-        super().__init__(model, feature_extractor=feature_extractor, framework=framework, **kwargs)
        if self.framework != "pt":
            raise ValueError(f"The {self.__class__} is only available in PyTorch.")
@@ -101,7 +90,6 @@ class AudioClassificationPipeline(Pipeline):
    def __call__(
        self,
        inputs: Union[np.ndarray, bytes, str],
-        top_k: Optional[int] = None,
        **kwargs,
    ):
        """
@@ -126,6 +114,18 @@ class AudioClassificationPipeline(Pipeline):
            - **label** (:obj:`str`) -- The label predicted.
            - **score** (:obj:`float`) -- The corresponding probability.
        """
+        return super().__call__(inputs, **kwargs)
+    def _sanitize_parameters(self, top_k=None, **kwargs):
+        # No parameters on this pipeline right now
+        postprocess_params = {}
+        if top_k is not None:
+            if top_k > self.model.config.num_labels:
+                top_k = self.model.config.num_labels
+            postprocess_params["top_k"] = top_k
+        return {}, {}, postprocess_params
+    def preprocess(self, inputs):
        if isinstance(inputs, str):
            with open(inputs, "rb") as f:
                inputs = f.read()
@@ -136,20 +136,19 @@ class AudioClassificationPipeline(Pipeline):
        if not isinstance(inputs, np.ndarray):
            raise ValueError("We expect a numpy ndarray as input")
        if len(inputs.shape) != 1:
-            raise ValueError("We expect a single channel audio input for AudioClassificationPipeline")
+            raise ValueError("We expect a single channel audio input for AutomaticSpeechRecognitionPipeline")
-        if top_k is None or top_k > self.model.config.num_labels:
-            top_k = self.model.config.num_labels
        processed = self.feature_extractor(
            inputs, sampling_rate=self.feature_extractor.sampling_rate, return_tensors="pt"
        )
-        processed = self.ensure_tensor_on_device(**processed)
+        return processed
-        with torch.no_grad():
+    def _forward(self, model_inputs):
-            outputs = self.model(**processed)
+        model_outputs = self.model(**model_inputs)
+        return model_outputs
-            probs = outputs.logits[0].softmax(-1)
+    def postprocess(self, model_outputs, top_k=5):
+        probs = model_outputs.logits[0].softmax(-1)
        scores, ids = probs.topk(top_k)
        scores = scores.tolist()

--- a/src/transformers/pipelines/automatic_speech_recognition.py
+++ b/src/transformers/pipelines/automatic_speech_recognition.py
@@ -124,6 +124,13 @@ class AutomaticSpeechRecognitionPipeline(Pipeline):
            - **text** (:obj:`str`) -- The recognized text.
        """
+        return super().__call__(inputs, **kwargs)
+    def _sanitize_parameters(self, **kwargs):
+        # No parameters on this pipeline right now
+        return {}, {}, {}
+    def preprocess(self, inputs):
        if isinstance(inputs, str):
            with open(inputs, "rb") as f:
                inputs = f.read()
@@ -131,27 +138,34 @@ class AutomaticSpeechRecognitionPipeline(Pipeline):
        if isinstance(inputs, bytes):
            inputs = ffmpeg_read(inputs, self.feature_extractor.sampling_rate)
-        assert isinstance(inputs, np.ndarray), "We expect a numpy ndarray as input"
+        if not isinstance(inputs, np.ndarray):
-        assert len(inputs.shape) == 1, "We expect a single channel audio input for AutomaticSpeechRecognitionPipeline"
+            raise ValueError("We expect a numpy ndarray as input")
+        if len(inputs.shape) != 1:
+            raise ValueError("We expect a single channel audio input for AutomaticSpeechRecognitionPipeline")
        processed = self.feature_extractor(
            inputs, sampling_rate=self.feature_extractor.sampling_rate, return_tensors="pt"
        )
-        processed = self.ensure_tensor_on_device(**processed)
+        return processed
+    def _forward(self, model_inputs):
        name = self.model.__class__.__name__
        if name.endswith("ForConditionalGeneration") or name.endswith("EncoderDecoderModel"):
            encoder = self.model.get_encoder()
            # we need to pass `processed.get("attention_mask")` here since audio encoder
            # attention mask  length is different from expected text decoder `encoder_attention_mask` length
+            # `generate` magic to create the mask automatically won't work, we basically need to help
+            # it here.
            tokens = self.model.generate(
-                encoder_outputs=encoder(**processed), attention_mask=processed.get("attention_mask")
+                encoder_outputs=encoder(**model_inputs), attention_mask=model_inputs.get("attention_mask")
            )
            tokens = tokens.squeeze(0)
        elif name.endswith("ForCTC"):
-            outputs = self.model(**processed)
+            outputs = self.model(**model_inputs)
            tokens = outputs.logits.squeeze(0).argmax(dim=-1)
+        return tokens
+    def postprocess(self, model_outputs):
        skip_special_tokens = False if "CTC" in self.tokenizer.__class__.__name__ else True
-        recognized_string = self.tokenizer.decode(tokens, skip_special_tokens=skip_special_tokens)
+        recognized_string = self.tokenizer.decode(model_outputs, skip_special_tokens=skip_special_tokens)
        return {"text": recognized_string}
--- a/src/transformers/pipelines/base.py
+++ b/src/transformers/pipelines/base.py
@@ -20,18 +20,21 @@ import pickle
 import sys
 import warnings
 from abc import ABC, abstractmethod
+from collections import UserDict
 from contextlib import contextmanager
 from os.path import abspath, exists
 from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union
 from ..feature_extraction_utils import PreTrainedFeatureExtractor
-from ..file_utils import add_end_docstrings, is_tf_available, is_torch_available
+from ..file_utils import ModelOutput, add_end_docstrings, is_tf_available, is_torch_available
 from ..modelcard import ModelCard
 from ..models.auto.configuration_auto import AutoConfig
-from ..tokenization_utils import PreTrainedTokenizer, TruncationStrategy
+from ..tokenization_utils import PreTrainedTokenizer
 from ..utils import logging
+GenericTensor = Union[List["GenericTensor"], "torch.Tensor", "tf.Tensor"]
 if is_tf_available():
    import tensorflow as tf
@@ -39,8 +42,12 @@ if is_tf_available():
 if is_torch_available():
    import torch
+    from torch.utils.data import DataLoader, Dataset, IterableDataset
    from ..models.auto.modeling_auto import AutoModel
+else:
+    Dataset = None
+    KeyDataset = None
 if TYPE_CHECKING:
    from ..modeling_tf_utils import TFPreTrainedModel
@@ -50,6 +57,12 @@ if TYPE_CHECKING:
 logger = logging.get_logger(__name__)
+def collate_fn(items):
+    if len(items) != 1:
+        raise ValueError("This collate_fn is meant to be used with batch_size=1")
+    return items[0]
 def infer_framework_load_model(
    model,
    config: AutoConfig,
@@ -585,6 +598,51 @@ PIPELINE_INIT_ARGS = r"""
            Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.
 """
+if is_torch_available():
+    class PipelineDataset(Dataset):
+        def __init__(self, dataset, process, params):
+            self.dataset = dataset
+            self.process = process
+            self.params = params
+        def __len__(self):
+            return len(self.dataset)
+        def __getitem__(self, i):
+            item = self.dataset[i]
+            processed = self.process(item, **self.params)
+            return processed
+    class PipelineIterator(IterableDataset):
+        def __init__(self, loader, infer, params):
+            self.loader = loader
+            self.infer = infer
+            self.params = params
+        def __len__(self):
+            return len(self.loader)
+        def __iter__(self):
+            self.iterator = iter(self.loader)
+            return self
+        def __next__(self):
+            item = next(self.iterator)
+            processed = self.infer(item, **self.params)
+            return processed
+    class KeyDataset(Dataset):
+        def __init__(self, dataset: Dataset, key: str):
+            self.dataset = dataset
+            self.key = key
+        def __len__(self):
+            return len(self.dataset)
+        def __getitem__(self, i):
+            return self.dataset[i][self.key]
 @add_end_docstrings(PIPELINE_INIT_ARGS)
 class Pipeline(_ScikitCompat):
@@ -618,6 +676,7 @@ class Pipeline(_ScikitCompat):
        args_parser: ArgumentHandler = None,
        device: int = -1,
        binary_output: bool = False,
+        **kwargs,
    ):
        if framework is None:
@@ -641,6 +700,9 @@ class Pipeline(_ScikitCompat):
        if task_specific_params is not None and task in task_specific_params:
            self.model.config.update(task_specific_params.get(task))
+        self.call_count = 0
+        self._preprocess_params, self._forward_params, self._postprocess_params = self._sanitize_parameters(**kwargs)
    def save_pretrained(self, save_directory: str):
        """
        Save the pipeline's model and tokenizer.
@@ -707,15 +769,31 @@ class Pipeline(_ScikitCompat):
        Ensure PyTorch tensors are on the specified device.
        Args:
-            inputs (keyword arguments that should be :obj:`torch.Tensor`): The tensors to place on :obj:`self.device`.
+            inputs (keyword arguments that should be :obj:`torch.Tensor`, the rest is ignored): The tensors to place on :obj:`self.device`.
+            Recursive on lists **only**.
        Return:
            :obj:`Dict[str, torch.Tensor]`: The same as :obj:`inputs` but on the proper device.
        """
-        return {
+        return self._ensure_tensor_on_device(inputs, self.device)
-            name: tensor.to(self.device) if isinstance(tensor, torch.Tensor) else tensor
-            for name, tensor in inputs.items()
+    def _ensure_tensor_on_device(self, inputs, device):
-        }
+        if isinstance(inputs, ModelOutput):
+            return ModelOutput(
+                {name: self._ensure_tensor_on_device(tensor, device) for name, tensor in inputs.items()}
+            )
+        elif isinstance(inputs, dict):
+            return {name: self._ensure_tensor_on_device(tensor, device) for name, tensor in inputs.items()}
+        elif isinstance(inputs, UserDict):
+            return UserDict({name: self._ensure_tensor_on_device(tensor, device) for name, tensor in inputs.items()})
+        elif isinstance(inputs, list):
+            return [self._ensure_tensor_on_device(item, device) for item in inputs]
+        elif isinstance(inputs, tuple):
+            return tuple([self._ensure_tensor_on_device(item, device) for item in inputs])
+        elif isinstance(inputs, torch.Tensor):
+            return inputs.to(self.device)
+        else:
+            return inputs
    def check_model_type(self, supported_models: Union[List[str], dict]):
        """
@@ -739,65 +817,108 @@ class Pipeline(_ScikitCompat):
                f"The model '{self.model.__class__.__name__}' is not supported for {self.task}. Supported models are {supported_models}."
            )
-    def _parse_and_tokenize(
+    @abstractmethod
-        self, inputs, padding=True, add_special_tokens=True, truncation=TruncationStrategy.DO_NOT_TRUNCATE, **kwargs
+    def _sanitize_parameters(self, **pipeline_parameters):
-    ):
        """
-        Parse arguments and tokenize
+        _sanitize_parameters will be called with any excessive named arguments from either `__init__` or `__call__`
-        """
+        methods. It should return 3 dictionnaries of the resolved parameters used by the various `preprocess`,
-        # Parse arguments
+        `forward` and `postprocess` methods. Do not fill dictionnaries if the caller didn't specify a kwargs. This
-        if getattr(self.tokenizer, "pad_token", None) is None:
+        let's you keep defaults in function signatures, which is more "natural".
-            padding = False
-        inputs = self.tokenizer(
-            inputs,
-            add_special_tokens=add_special_tokens,
-            return_tensors=self.framework,
-            padding=padding,
-            truncation=truncation,
-        )
-        return inputs
-    def __call__(self, inputs, *args, **kwargs):
+        It is not meant to be called directly, it will be automatically called and the final parameters resolved by
-        try:
+        `__init__` and `__call__`
-            model_inputs = self._parse_and_tokenize(inputs, *args, **kwargs)
+        """
-            outputs = self._forward(model_inputs)
+        raise NotImplementedError("_sanitize_parameters not implemented")
-            return outputs
-        except ValueError:
-            # XXX: Some tokenizer do NOT have a pad token, hence we cannot run the inference
-            # in a batch, instead we run everything sequentially
-            if isinstance(inputs, list):
-                values = []
-                for input_ in inputs:
-                    model_input = self._parse_and_tokenize(input_, padding=False, *args, **kwargs)
-                    value = self._forward(model_input)
-                    values.append(value.squeeze(0))
-            else:
-                model_input = self._parse_and_tokenize(inputs, padding=False, *args, **kwargs)
-                values = self._forward(model_input)
-            return values
-    def _forward(self, inputs, return_tensors=False):
+    @abstractmethod
+    def preprocess(self, input_: Any, **preprocess_parameters: Dict) -> Dict[str, GenericTensor]:
        """
-        Internal framework specific forward dispatching
+        Preprocess will take the `input_` of a specific pipeline and return a dictionnary of everything necessary for
+        `_forward` to run properly. It should contain at least one tensor, but might have arbitrary other items.
+        """
+        raise NotImplementedError("preprocess not implemented")
-        Args:
+    @abstractmethod
-            inputs: dict holding all the keyword arguments for required by the model forward method.
+    def _forward(self, input_tensors: Dict[str, GenericTensor], **forward_parameters: Dict) -> ModelOutput:
-            return_tensors: Whether to return native framework (pt/tf) tensors rather than numpy array
+        """
+        _forward will receive the prepared dictionnary from `preprocess` and run it on the model. This method might
+        involve the GPU or the CPU and should be agnostic to it. Isolating this function is the reason for `preprocess`
+        and `postprocess` to exist, so that the hot path, this method generally can run as fast as possible.
-        Returns:
+        It is not meant to be called directly, `forward` is preferred. It is basically the same but contains additional
-            Numpy array
+        code surrounding `_forward` making sure tensors and models are on the same device, disabling the training part
+        of the code (leading to faster inference).
+        """
+        raise NotImplementedError("_forward not implemented")
+    @abstractmethod
+    def postprocess(self, model_outputs: ModelOutput, **postprocess_parameters: Dict) -> Any:
+        """
+        Postprocess will receive the raw outputs of the `_forward` method, generally tensors, and reformat them into
+        something more friendly. Generally it will output a list or a dict or results (containing just strings and
+        numbers).
        """
-        # Encode for forward
+        raise NotImplementedError("postprocess not implemented")
+    def forward(self, model_inputs, **forward_params):
        with self.device_placement():
            if self.framework == "tf":
-                # TODO trace model
+                model_inputs["training"] = False
-                predictions = self.model(inputs.data, training=False)[0]
+                model_outputs = self._forward(model_inputs, **forward_params)
-            else:
+            elif self.framework == "pt":
                with torch.no_grad():
-                    inputs = self.ensure_tensor_on_device(**inputs)
+                    model_inputs = self._ensure_tensor_on_device(model_inputs, device=self.device)
-                    predictions = self.model(**inputs)[0].cpu()
+                    model_outputs = self._forward(model_inputs, **forward_params)
+                    model_outputs = self._ensure_tensor_on_device(model_outputs, device=torch.device("cpu"))
-        if return_tensors:
-            return predictions
            else:
-            return predictions.numpy()
+                raise ValueError(f"Framework {self.framework} is not supported")
+        return model_outputs
+    def get_iterator(self, inputs, num_workers: int, preprocess_params, forward_params, postprocess_params):
+        if "TOKENIZERS_PARALLELISM" not in os.environ:
+            logger.info("Disabling tokenizer parallelism, we're using DataLoader multithreading already")
+            os.environ["TOKENIZERS_PARALLELISM"] = "false"
+        dataset = PipelineDataset(inputs, self.preprocess, preprocess_params)
+        dataloader = DataLoader(dataset, num_workers=num_workers, batch_size=1, collate_fn=collate_fn)
+        model_iterator = PipelineIterator(dataloader, self.forward, forward_params)
+        final_iterator = PipelineIterator(model_iterator, self.postprocess, postprocess_params)
+        return final_iterator
+    def __call__(self, inputs, *args, num_workers=8, **kwargs):
+        if args:
+            logger.warning(f"Ignoring args : {args}")
+        preprocess_params, forward_params, postprocess_params = self._sanitize_parameters(**kwargs)
+        # Fuse __init__ params and __call__ params without modifying the __init__ ones.
+        preprocess_params = {**self._preprocess_params, **preprocess_params}
+        forward_params = {**self._forward_params, **forward_params}
+        postprocess_params = {**self._postprocess_params, **postprocess_params}
+        self.call_count += 1
+        if self.call_count > 10 and self.framework == "pt" and self.device.type == "cuda":
+            warnings.warn(
+                "You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset",
+                UserWarning,
+            )
+        if isinstance(inputs, list):
+            if self.framework == "pt":
+                final_iterator = self.get_iterator(
+                    inputs, num_workers, preprocess_params, forward_params, postprocess_params
+                )
+                outputs = [output for output in final_iterator]
+                return outputs
+            else:
+                return self.run_multi(inputs, preprocess_params, forward_params, postprocess_params)
+        elif Dataset is not None and isinstance(inputs, Dataset):
+            return self.get_iterator(inputs, num_workers, preprocess_params, forward_params, postprocess_params)
+        else:
+            return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
+    def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):
+        return [self.run_single(item, preprocess_params, forward_params, postprocess_params) for item in inputs]
+    def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):
+        model_inputs = self.preprocess(inputs, **preprocess_params)
+        model_outputs = self.forward(model_inputs, **forward_params)
+        outputs = self.postprocess(model_outputs, **postprocess_params)
+        return outputs
--- a/src/transformers/pipelines/conversational.py
+++ b/src/transformers/pipelines/conversational.py
@@ -190,23 +190,34 @@ class ConversationalPipeline(Pipeline):
        conversational_pipeline([conversation_1, conversation_2])
    """
-    def __init__(self, min_length_for_response=32, minimum_tokens=10, *args, **kwargs):
+    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
-        # We need at least an eos_token
-        # assert self.tokenizer.eos_token_id is not None, "ConversationalPipeline tokenizer should have an EOS token set"
        if self.tokenizer.pad_token_id is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
-        self.min_length_for_response = min_length_for_response
+    def _sanitize_parameters(
-        self.minimum_tokens = minimum_tokens
+        self, min_length_for_response=None, minimum_tokens=None, clean_up_tokenization_spaces=None, **generate_kwargs
-    def __call__(
-        self,
-        conversations: Union[Conversation, List[Conversation]],
-        clean_up_tokenization_spaces=True,
-        **generate_kwargs
    ):
+        preprocess_params = {}
+        forward_params = {}
+        postprocess_params = {}
+        if min_length_for_response is not None:
+            preprocess_params["min_length_for_response"] = min_length_for_response
+        if minimum_tokens is not None:
+            forward_params["minimum_tokens"] = minimum_tokens
+        if "max_length" in generate_kwargs:
+            forward_params["max_length"] = generate_kwargs["max_length"]
+            # self.max_length = generate_kwargs.get("max_length", self.model.config.max_length)
+        if clean_up_tokenization_spaces is not None:
+            postprocess_params["clean_up_tokenization_spaces"] = clean_up_tokenization_spaces
+        if generate_kwargs:
+            forward_params.update(generate_kwargs)
+        return preprocess_params, forward_params, postprocess_params
+    def __call__(self, conversations: Union[Conversation, List[Conversation]], num_workers=0, **kwargs):
        r"""
        Generate responses for the conversation(s) given as inputs.
@@ -223,117 +234,67 @@ class ConversationalPipeline(Pipeline):
            :class:`~transformers.Conversation` or a list of :class:`~transformers.Conversation`: Conversation(s) with
            updated generated responses for those containing a new user input.
        """
+        # XXX: num_workers==0 is required to be backward compatible
+        # Otherwise the threads will require a Conversation copy.
+        # This will definitely hinder performance on GPU, but has to be opted
+        # in because of this BC change.
+        outputs = super().__call__(conversations, num_workers=num_workers, **kwargs)
+        if isinstance(outputs, list) and len(outputs) == 1:
+            return outputs[0]
+        return outputs
-        if isinstance(conversations, Conversation):
+    def preprocess(self, conversation: Conversation) -> Dict[str, Any]:
-            conversations = [conversations]
+        if not isinstance(conversation, Conversation):
-        # Input validation
+            raise ValueError("ConversationalPipeline, expects Conversation as inputs")
-        if isinstance(conversations, list):
-            for conversation in conversations:
-                assert isinstance(
-                    conversation, Conversation
-                ), "ConversationalPipeline expects a Conversation or list of Conversations as an input"
        if conversation.new_user_input is None:
            raise ValueError(
                f"Conversation with UUID {type(conversation.uuid)} does not contain new user input to process. "
                "Add user inputs with the conversation's `add_user_input` method"
            )
-            assert (
+        if hasattr(self.tokenizer, "_build_conversation_input_ids"):
-                self.tokenizer.pad_token_id is not None or self.tokenizer.eos_token_id is not None
+            input_ids = self.tokenizer._build_conversation_input_ids(conversation)
-            ), "Please make sure that the tokenizer has a pad_token_id or eos_token_id when using a batch input"
        else:
-            raise ValueError("ConversationalPipeline expects a Conversation or list of Conversations as an input")
+            # If the tokenizer cannot handle conversations, we default to only the old version
+            input_ids = self._legacy_parse_and_tokenize(conversation)
-        with self.device_placement():
-            inputs = self._parse_and_tokenize(conversations)
        if self.framework == "pt":
-                inputs = self.ensure_tensor_on_device(**inputs)
+            input_ids = torch.LongTensor([input_ids])
-                input_length = inputs["input_ids"].shape[-1]
        elif self.framework == "tf":
-                input_length = tf.shape(inputs["input_ids"])[-1].numpy()
+            input_ids = tf.constant([input_ids])
+        return {"input_ids": input_ids, "conversation": conversation}
+    def _forward(self, model_inputs, minimum_tokens=10, **generate_kwargs):
        max_length = generate_kwargs.get("max_length", self.model.config.max_length)
-            n = inputs["input_ids"].shape[1]
-            if max_length - self.minimum_tokens < n:
-                logger.warning(
-                    f"Conversation input is to long ({n}), trimming it to ({max_length} - {self.minimum_tokens})"
-                )
-                trim = max_length - self.minimum_tokens
-                inputs["input_ids"] = inputs["input_ids"][:, -trim:]
-                inputs["attention_mask"] = inputs["attention_mask"][:, -trim:]
-            generated_responses = self.model.generate(
-                inputs["input_ids"],
-                attention_mask=inputs["attention_mask"],
-                **generate_kwargs,
-            )
-            if self.model.config.is_encoder_decoder:
+        n = model_inputs["input_ids"].shape[1]
-                if self.framework == "pt":
+        if max_length - minimum_tokens < n:
-                    history = torch.cat((inputs["input_ids"], generated_responses[:, 1:]), 1)
+            logger.warning(f"Conversation input is to long ({n}), trimming it to ({max_length} - {minimum_tokens})")
-                elif self.framework == "tf":
+            trim = max_length - minimum_tokens
-                    history = tf.concat([inputs["input_ids"], generated_responses[:, 1:]], 1)
+            model_inputs["input_ids"] = model_inputs["input_ids"][:, -trim:]
-            else:
+            if "attention_mask" in model_inputs:
-                history = generated_responses
+                model_inputs["attention_mask"] = model_inputs["attention_mask"][:, -trim:]
+        conversation = model_inputs.pop("conversation")
-            history = self._clean_padding_history(history)
+        model_inputs["max_length"] = max_length
+        output_ids = self.model.generate(**model_inputs, **generate_kwargs)
        if self.model.config.is_encoder_decoder:
            start_position = 1
        else:
-                start_position = input_length
+            start_position = n
+        return {"output_ids": output_ids[0, start_position:], "conversation": conversation}
-            output = []
+    def postprocess(self, model_outputs, clean_up_tokenization_spaces=True):
-            for conversation_index, conversation in enumerate(conversations):
+        output_ids = model_outputs["output_ids"]
-                conversation.mark_processed()
+        answer = self.tokenizer.decode(
-                conversation.generated_responses.append(
+            output_ids,
-                    self.tokenizer.decode(
-                        generated_responses[conversation_index][start_position:],
            skip_special_tokens=True,
            clean_up_tokenization_spaces=clean_up_tokenization_spaces,
        )
-                )
+        conversation = model_outputs["conversation"]
-                output.append(conversation)
+        conversation.mark_processed()
-            if len(output) == 1:
+        conversation.append_response(answer)
-                return output[0]
+        return conversation
-            else:
-                return output
-    def _clean_padding_history(self, generated_tensor) -> List[List[int]]:
-        """
-        Cleans the padding history. Padding may be generated in two places when multiple conversations are provided as
-        an input:
-            - at the end of the concatenated history and new user input, so that all input to the model have the same
-              length
-            - at the end of the generated response, as some responses will be longer than others
-        This method cleans up these padding token so that the history for each conversation is not impacted by the
-        batching process.
-        """
-        outputs = []
-        for sequence in generated_tensor:
-            sequence_tokens = []
-            is_previous_pad = False
-            for token in sequence:
-                if token == self.tokenizer.pad_token_id:
-                    if self.tokenizer.pad_token_id != self.tokenizer.eos_token_id:
-                        continue
-                    if is_previous_pad:
-                        continue
-                    else:
-                        is_previous_pad = True
-                else:
-                    is_previous_pad = False
-                if self.framework == "pt":
-                    sequence_tokens.append(token.item())
-                else:
-                    sequence_tokens.append(int(token.numpy()))
-            outputs.append(sequence_tokens)
-        return outputs
-    def _legacy_parse_and_tokenize(self, conversation: List[Conversation]) -> List[int]:
+    def _legacy_parse_and_tokenize(self, conversation: Conversation) -> Dict:
        eos_token_id = self.tokenizer.eos_token_id
        input_ids = []
        for is_user, text in conversation.iter_texts():
@@ -345,14 +306,3 @@ class ConversationalPipeline(Pipeline):
        if len(input_ids) > self.tokenizer.model_max_length:
            input_ids = input_ids[-self.tokenizer.model_max_length :]
        return input_ids
-    def _parse_and_tokenize(self, conversations: List[Conversation]) -> Dict[str, Any]:
-        if hasattr(self.tokenizer, "_build_conversation_input_ids"):
-            input_ids = [self.tokenizer._build_conversation_input_ids(conversation) for conversation in conversations]
-        else:
-            # If the tokenizer cannot handle conversations, we default to only the old version
-            input_ids = [self._legacy_parse_and_tokenize(conversation) for conversation in conversations]
-        inputs = self.tokenizer.pad(
-            {"input_ids": input_ids}, padding="longest", return_attention_mask=True, return_tensors=self.framework
-        )
-        return inputs
--- a/src/transformers/pipelines/feature_extraction.py
+++ b/src/transformers/pipelines/feature_extraction.py
-from typing import TYPE_CHECKING, Optional, Union
+from typing import Dict
-from ..feature_extraction_utils import PreTrainedFeatureExtractor
+from .base import GenericTensor, Pipeline
-from ..modelcard import ModelCard
-from ..tokenization_utils import PreTrainedTokenizer
-from .base import ArgumentHandler, Pipeline
-if TYPE_CHECKING:
-    from ..modeling_tf_utils import TFPreTrainedModel
-    from ..modeling_utils import PreTrainedModel
 # Can't use @add_end_docstrings(PIPELINE_INIT_ARGS) here because this one does not accept `binary_output`
@@ -49,28 +41,24 @@ class FeatureExtractionPipeline(Pipeline):
            the associated CUDA device id.
    """
-    def __init__(
+    def _sanitize_parameters(self, **kwargs):
-        self,
+        return {}, {}, {}
-        model: Union["PreTrainedModel", "TFPreTrainedModel"],
-        tokenizer: PreTrainedTokenizer,
+    def preprocess(self, inputs) -> Dict[str, GenericTensor]:
-        feature_extractor: Optional[PreTrainedFeatureExtractor] = None,
+        return_tensors = self.framework
-        modelcard: Optional[ModelCard] = None,
+        model_inputs = self.tokenizer(inputs, return_tensors=return_tensors)
-        framework: Optional[str] = None,
+        return model_inputs
-        args_parser: ArgumentHandler = None,
-        device: int = -1,
+    def _forward(self, model_inputs):
-        task: str = "",
+        model_outputs = self.model(**model_inputs)
-    ):
+        return model_outputs
-        super().__init__(
-            model=model,
+    def postprocess(self, model_outputs):
-            tokenizer=tokenizer,
+        # [0] is the first available tensor, logits or last_hidden_state.
-            feature_extractor=feature_extractor,
+        if self.framework == "pt":
-            modelcard=modelcard,
+            return model_outputs[0].tolist()
-            framework=framework,
+        elif self.framework == "tf":
-            args_parser=args_parser,
+            return model_outputs[0].numpy().tolist()
-            device=device,
-            binary_output=True,
-            task=task,
-        )
    def __call__(self, *args, **kwargs):
        """
@@ -82,10 +70,4 @@ class FeatureExtractionPipeline(Pipeline):
        Return:
            A nested list of :obj:`float`: The features computed by the model.
        """
-        results = super().__call__(*args, **kwargs)
+        return super().__call__(*args, **kwargs)
-        if isinstance(results, list):
-            # Sequential run
-            results = [r.tolist() for r in results]
-        else:
-            results = results.tolist()
-        return results
--- a/src/transformers/pipelines/fill_mask.py
+++ b/src/transformers/pipelines/fill_mask.py
-from typing import TYPE_CHECKING, Dict, List, Optional, Union
+from typing import Dict
 import numpy as np
 from ..file_utils import add_end_docstrings, is_tf_available, is_torch_available
-from ..modelcard import ModelCard
-from ..tokenization_utils import PreTrainedTokenizer
 from ..utils import logging
-from .base import PIPELINE_INIT_ARGS, ArgumentHandler, Pipeline, PipelineException
+from .base import PIPELINE_INIT_ARGS, GenericTensor, Pipeline, PipelineException
-GenericTensor = Union[List["GenericTensor"], "torch.Tensor", "tf.Tensor"]
-if TYPE_CHECKING:
-    from ..modeling_tf_utils import TFPreTrainedModel
-    from ..modeling_utils import PreTrainedModel
 if is_tf_available():
    import tensorflow as tf
-    from ..models.auto.modeling_tf_auto import TF_MODEL_FOR_MASKED_LM_MAPPING
 if is_torch_available():
    import torch
-    from ..models.auto.modeling_auto import MODEL_FOR_MASKED_LM_MAPPING
 logger = logging.get_logger(__name__)
@@ -58,39 +47,6 @@ class FillMaskPipeline(Pipeline):
        This pipeline only works for inputs with exactly one token masked.
    """
-    def __init__(
-        self,
-        model: Union["PreTrainedModel", "TFPreTrainedModel"],
-        tokenizer: PreTrainedTokenizer,
-        modelcard: Optional[ModelCard] = None,
-        framework: Optional[str] = None,
-        args_parser: ArgumentHandler = None,
-        device: int = -1,
-        top_k=5,
-        targets=None,
-        task: str = "",
-    ):
-        super().__init__(
-            model=model,
-            tokenizer=tokenizer,
-            modelcard=modelcard,
-            framework=framework,
-            args_parser=args_parser,
-            device=device,
-            binary_output=True,
-            task=task,
-        )
-        self.check_model_type(
-            TF_MODEL_FOR_MASKED_LM_MAPPING if self.framework == "tf" else MODEL_FOR_MASKED_LM_MAPPING
-        )
-        self.top_k = top_k
-        self.targets = targets
-        if self.tokenizer.mask_token_id is None:
-            raise PipelineException(
-                "fill-mask", self.model.base_model_prefix, "The tokenizer does not define a `mask_token`."
-            )
    def get_masked_index(self, input_ids: GenericTensor) -> np.ndarray:
        if self.framework == "tf":
            masked_index = tf.where(input_ids == self.tokenizer.mask_token_id).numpy()
@@ -124,63 +80,69 @@ class FillMaskPipeline(Pipeline):
            for input_ids in model_inputs["input_ids"]:
                self._ensure_exactly_one_mask_token(input_ids)
-    def get_model_inputs(self, inputs, *args, **kwargs) -> Dict:
+    def preprocess(self, inputs, return_tensors=None, **preprocess_parameters) -> Dict[str, GenericTensor]:
-        if isinstance(inputs, list) and self.tokenizer.pad_token is None:
+        if return_tensors is None:
-            model_inputs = []
+            return_tensors = self.framework
-            for input_ in inputs:
+        model_inputs = self.tokenizer(inputs, return_tensors=return_tensors)
-                model_input = self._parse_and_tokenize(input_, padding=False, *args, **kwargs)
+        self.ensure_exactly_one_mask_token(model_inputs)
-                model_inputs.append(model_input)
-        else:
-            model_inputs = self._parse_and_tokenize(inputs, *args, **kwargs)
        return model_inputs
-    def __call__(self, inputs, *args, targets=None, top_k: Optional[int] = None, **kwargs):
+    def _forward(self, model_inputs):
-        """
+        model_outputs = self.model(**model_inputs)
-        Fill the masked token in the text(s) given as inputs.
+        model_outputs["input_ids"] = model_inputs["input_ids"][0]
+        return model_outputs
-        Args:
+    def postprocess(self, model_outputs, top_k=5, target_ids=None):
-            args (:obj:`str` or :obj:`List[str]`):
+        # Cap top_k if there are targets
-                One or several texts (or one list of prompts) with masked tokens.
+        if target_ids is not None and target_ids.shape[0] < top_k:
-            targets (:obj:`str` or :obj:`List[str]`, `optional`):
+            top_k = target_ids.shape[0]
-                When passed, the model will limit the scores to the passed targets instead of looking up in the whole
+        input_ids = model_outputs["input_ids"]
-                vocab. If the provided targets are not in the model vocab, they will be tokenized and the first
+        outputs = model_outputs["logits"]
-                resulting token will be used (with a warning, and that might be slower).
+        result = []
-            top_k (:obj:`int`, `optional`):
-                When passed, overrides the number of predictions to return.
-        Return:
+        if self.framework == "tf":
-            A list or a list of list of :obj:`dict`: Each result comes as list of dictionaries with the following keys:
+            masked_index = tf.where(input_ids == self.tokenizer.mask_token_id).numpy()
-            - **sequence** (:obj:`str`) -- The corresponding input with the mask token prediction.
+            # Fill mask pipeline supports only one ${mask_token} per sample
-            - **score** (:obj:`float`) -- The corresponding probability.
-            - **token** (:obj:`int`) -- The predicted token id (to replace the masked one).
-            - **token** (:obj:`str`) -- The predicted token (to replace the masked one).
-        """
-        model_inputs = self.get_model_inputs(inputs, *args, **kwargs)
-        self.ensure_exactly_one_mask_token(model_inputs)
-        if isinstance(model_inputs, list):
-            outputs = []
-            for model_input in model_inputs:
-                output = self._forward(model_input, return_tensors=True)
-                outputs.append(output)
-            batch_size = len(model_inputs)
+            logits = outputs[0, masked_index.item(), :]
+            probs = tf.nn.softmax(logits)
+            if target_ids is not None:
+                probs = tf.gather_nd(probs, tf.reshape(target_ids, (-1, 1)))
+            topk = tf.math.top_k(probs, k=top_k)
+            values, predictions = topk.values.numpy(), topk.indices.numpy()
        else:
-            outputs = self._forward(model_inputs, return_tensors=True)
+            masked_index = torch.nonzero(input_ids == self.tokenizer.mask_token_id, as_tuple=False)
-            batch_size = outputs.shape[0] if self.framework == "tf" else outputs.size(0)
+            # Fill mask pipeline supports only one ${mask_token} per sample
-        # top_k must be defined
+            logits = outputs[0, masked_index.item(), :]
-        if top_k is None:
+            probs = logits.softmax(dim=0)
-            top_k = self.top_k
+            if target_ids is not None:
+                probs = probs[..., target_ids]
-        results = []
+            values, predictions = probs.topk(top_k)
-        if targets is None and self.targets is not None:
+        for v, p in zip(values.tolist(), predictions.tolist()):
-            targets = self.targets
+            tokens = input_ids.numpy()
-        if targets is not None:
+            if target_ids is not None:
+                p = target_ids[p].tolist()
+            tokens[masked_index] = p
+            # Filter padding out:
+            tokens = tokens[np.where(tokens != self.tokenizer.pad_token_id)]
+            result.append(
+                {
+                    "sequence": self.tokenizer.decode(tokens, skip_special_tokens=True),
+                    "score": v,
+                    "token": p,
+                    "token_str": self.tokenizer.decode(p),
+                }
+            )
+        return result
+    def get_target_ids(self, targets, top_k=None):
        if isinstance(targets, str):
            targets = [targets]
        try:
            vocab = self.tokenizer.get_vocab()
        except Exception:
@@ -217,65 +179,44 @@ class FillMaskPipeline(Pipeline):
        if len(target_ids) == 0:
            raise ValueError("At least one target must be provided when passed.")
        target_ids = np.array(target_ids)
-            # Cap top_k if there are targets
+        return target_ids
-            if top_k > target_ids.shape[0]:
-                top_k = target_ids.shape[0]
-        for i in range(batch_size):
-            if isinstance(model_inputs, list):
-                input_ids = model_inputs[i]["input_ids"][0]
-            else:
-                input_ids = model_inputs["input_ids"][i]
-            result = []
-            if self.framework == "tf":
-                masked_index = tf.where(input_ids == self.tokenizer.mask_token_id).numpy()
-                # Fill mask pipeline supports only one ${mask_token} per sample
+    def _sanitize_parameters(self, top_k=None, targets=None):
+        postprocess_params = {}
-                if isinstance(outputs, list):
-                    logits = outputs[i][0, masked_index.item(), :]
-                else:
-                    logits = outputs[i, masked_index.item(), :]
-                probs = tf.nn.softmax(logits)
        if targets is not None:
-                    probs = tf.gather_nd(probs, tf.reshape(target_ids, (-1, 1)))
+            target_ids = self.get_target_ids(targets, top_k)
+            postprocess_params["target_ids"] = target_ids
-                topk = tf.math.top_k(probs, k=top_k)
+        if top_k is not None:
-                values, predictions = topk.values.numpy(), topk.indices.numpy()
+            postprocess_params["top_k"] = top_k
-            else:
-                masked_index = torch.nonzero(input_ids == self.tokenizer.mask_token_id, as_tuple=False)
-                # Fill mask pipeline supports only one ${mask_token} per sample
-                if isinstance(outputs, list):
+        if self.tokenizer.mask_token_id is None:
-                    logits = outputs[i][0, masked_index.item(), :]
+            raise PipelineException(
-                else:
+                "fill-mask", self.model.base_model_prefix, "The tokenizer does not define a `mask_token`."
-                    logits = outputs[i, masked_index.item(), :]
+            )
-                probs = logits.softmax(dim=0)
+        return {}, {}, postprocess_params
-                if targets is not None:
-                    probs = probs[..., target_ids]
-                values, predictions = probs.topk(top_k)
+    def __call__(self, inputs, *args, **kwargs):
+        """
+        Fill the masked token in the text(s) given as inputs.
-            for v, p in zip(values.tolist(), predictions.tolist()):
+        Args:
-                tokens = input_ids.numpy()
+            args (:obj:`str` or :obj:`List[str]`):
-                if targets is not None:
+                One or several texts (or one list of prompts) with masked tokens.
-                    p = target_ids[p].tolist()
+            targets (:obj:`str` or :obj:`List[str]`, `optional`):
-                tokens[masked_index] = p
+                When passed, the model will limit the scores to the passed targets instead of looking up in the whole
-                # Filter padding out:
+                vocab. If the provided targets are not in the model vocab, they will be tokenized and the first
-                tokens = tokens[np.where(tokens != self.tokenizer.pad_token_id)]
+                resulting token will be used (with a warning, and that might be slower).
-                result.append(
+            top_k (:obj:`int`, `optional`):
-                    {
+                When passed, overrides the number of predictions to return.
-                        "sequence": self.tokenizer.decode(tokens, skip_special_tokens=True),
-                        "score": v,
-                        "token": p,
-                        "token_str": self.tokenizer.decode(p),
-                    }
-                )
-            # Append
+        Return:
-            results += [result]
+            A list or a list of list of :obj:`dict`: Each result comes as list of dictionaries with the following keys:
-        if len(results) == 1:
+            - **sequence** (:obj:`str`) -- The corresponding input with the mask token prediction.
-            return results[0]
+            - **score** (:obj:`float`) -- The corresponding probability.
-        return results
+            - **token** (:obj:`int`) -- The predicted token id (to replace the masked one).
+            - **token** (:obj:`str`) -- The predicted token (to replace the masked one).
+        """
+        return super().__call__(inputs, **kwargs)
--- a/src/transformers/pipelines/image_classification.py
+++ b/src/transformers/pipelines/image_classification.py
 import os
-from typing import TYPE_CHECKING, List, Optional, Union
+from typing import List, Union
 import requests
-from ..feature_extraction_utils import PreTrainedFeatureExtractor
 from ..file_utils import add_end_docstrings, is_torch_available, is_vision_available, requires_backends
 from ..utils import logging
 from .base import PIPELINE_INIT_ARGS, Pipeline
-if TYPE_CHECKING:
-    from ..modeling_tf_utils import TFPreTrainedModel
-    from ..modeling_utils import PreTrainedModel
 if is_vision_available():
    from PIL import Image
 if is_torch_available():
-    import torch
    from ..models.auto.modeling_auto import MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING
 logger = logging.get_logger(__name__)
@@ -37,24 +30,15 @@ class ImageClassificationPipeline(Pipeline):
    <https://huggingface.co/models?filter=image-classification>`__.
    """
-    def __init__(
+    def __init__(self, *args, **kwargs):
-        self,
+        super().__init__(*args, **kwargs)
-        model: Union["PreTrainedModel", "TFPreTrainedModel"],
-        feature_extractor: PreTrainedFeatureExtractor,
-        framework: Optional[str] = None,
-        **kwargs
-    ):
-        super().__init__(model, feature_extractor=feature_extractor, framework=framework, **kwargs)
        if self.framework == "tf":
            raise ValueError(f"The {self.__class__} is only available in PyTorch.")
        requires_backends(self, "vision")
        self.check_model_type(MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING)
-        self.feature_extractor = feature_extractor
    @staticmethod
    def load_image(image: Union[str, "Image.Image"]):
        if isinstance(image, str):
@@ -77,7 +61,13 @@ class ImageClassificationPipeline(Pipeline):
        image = image.convert("RGB")
        return image
-    def __call__(self, images: Union[str, List[str], "Image", List["Image"]], top_k=5):
+    def _sanitize_parameters(self, top_k=None):
+        postprocess_params = {}
+        if top_k is not None:
+            postprocess_params["top_k"] = top_k
+        return {}, {}, postprocess_params
+    def __call__(self, images: Union[str, List[str], "Image", List["Image"]], **kwargs):
        """
        Assign labels to the image(s) passed as inputs.
@@ -106,34 +96,23 @@ class ImageClassificationPipeline(Pipeline):
            - **label** (:obj:`str`) -- The label identified by the model.
            - **score** (:obj:`int`) -- The score attributed by the model for that label.
        """
-        is_batched = isinstance(images, list)
+        return super().__call__(images, **kwargs)
-        if not is_batched:
+    def preprocess(self, image):
-            images = [images]
+        image = self.load_image(image)
+        model_inputs = self.feature_extractor(images=image, return_tensors="pt")
+        return model_inputs
-        images = [self.load_image(image) for image in images]
+    def _forward(self, model_inputs):
+        model_outputs = self.model(**model_inputs)
+        return model_outputs
+    def postprocess(self, model_outputs, top_k=5):
        if top_k > self.model.config.num_labels:
            top_k = self.model.config.num_labels
+        probs = model_outputs.logits.softmax(-1)[0]
-        with torch.no_grad():
-            inputs = self.feature_extractor(images=images, return_tensors="pt")
-            outputs = self.model(**inputs)
-            probs = outputs.logits.softmax(-1)
        scores, ids = probs.topk(top_k)
        scores = scores.tolist()
        ids = ids.tolist()
+        return [{"score": score, "label": self.model.config.id2label[_id]} for score, _id in zip(scores, ids)]
-        if not is_batched:
-            scores, ids = scores[0], ids[0]
-            labels = [{"score": score, "label": self.model.config.id2label[_id]} for score, _id in zip(scores, ids)]
-        else:
-            labels = []
-            for scores, ids in zip(scores, ids):
-                labels.append(
-                    [{"score": score, "label": self.model.config.id2label[_id]} for score, _id in zip(scores, ids)]
-                )
-        return labels
--- a/src/transformers/pipelines/object_detection.py
+++ b/src/transformers/pipelines/object_detection.py
 import os
-from typing import TYPE_CHECKING, Any, Dict, List, Optional, Union
+from typing import Any, Dict, List, Union
 import requests
-from ..feature_extraction_utils import PreTrainedFeatureExtractor
 from ..file_utils import add_end_docstrings, is_torch_available, is_vision_available, requires_backends
 from ..utils import logging
 from .base import PIPELINE_INIT_ARGS, Pipeline
-if TYPE_CHECKING:
-    from ..modeling_utils import PreTrainedModel
 if is_vision_available():
    from PIL import Image
@@ -40,24 +36,15 @@ class ObjectDetectionPipeline(Pipeline):
    <https://huggingface.co/models?filter=object-detection>`__.
    """
-    def __init__(
+    def __init__(self, *args, **kwargs):
-        self,
+        super().__init__(*args, **kwargs)
-        model: "PreTrainedModel",
-        feature_extractor: PreTrainedFeatureExtractor,
-        framework: Optional[str] = None,
-        **kwargs
-    ):
-        super().__init__(model, feature_extractor=feature_extractor, framework=framework, **kwargs)
        if self.framework == "tf":
            raise ValueError(f"The {self.__class__} is only available in PyTorch.")
        requires_backends(self, "vision")
        self.check_model_type(MODEL_FOR_OBJECT_DETECTION_MAPPING)
-        self.feature_extractor = feature_extractor
    @staticmethod
    def load_image(image: Union[str, "Image.Image"]):
        if isinstance(image, str):
@@ -80,11 +67,13 @@ class ObjectDetectionPipeline(Pipeline):
        image = image.convert("RGB")
        return image
-    def __call__(
+    def _sanitize_parameters(self, **kwargs):
-        self,
+        postprocess_kwargs = {}
-        images: Union[str, List[str], "Image", List["Image"]],
+        if "threshold" in kwargs:
-        threshold: Optional[float] = 0.9,
+            postprocess_kwargs["threshold"] = kwargs["threshold"]
-    ) -> Union[Predictions, List[Prediction]]:
+        return {}, {}, postprocess_kwargs
+    def __call__(self, *args, **kwargs) -> Union[Predictions, List[Prediction]]:
        """
        Detect objects (bounding boxes & classes) in the image(s) passed as inputs.
@@ -112,47 +101,42 @@ class ObjectDetectionPipeline(Pipeline):
            - **score** (:obj:`float`) -- The score attributed by the model for that label.
            - **box** (:obj:`List[Dict[str, int]]`) -- The bounding box of detected object in image's original size.
        """
-        is_batched = isinstance(images, list)
-        if not is_batched:
+        return super().__call__(*args, **kwargs)
-            images = [images]
+    def preprocess(self, image):
-        images = [self.load_image(image) for image in images]
+        image = self.load_image(image)
+        target_size = torch.IntTensor([[image.height, image.width]])
-        with torch.no_grad():
+        inputs = self.feature_extractor(images=[image], return_tensors="pt")
-            inputs = self.feature_extractor(images=images, return_tensors="pt")
+        inputs["target_size"] = target_size
-            outputs = self.model(**inputs)
+        return inputs
-            if self.framework == "pt":
+    def _forward(self, model_inputs):
-                target_sizes = torch.IntTensor([[im.height, im.width] for im in images])
+        target_size = model_inputs.pop("target_size")
-            else:
+        outputs = self.model(**model_inputs)
-                raise ValueError("The ObjectDetectionPipeline is only available in PyTorch.")
+        model_outputs = {"outputs": outputs, "target_size": target_size}
+        return model_outputs
-            raw_annotations = self.feature_extractor.post_process(outputs, target_sizes)
-            annotations = []
+    def postprocess(self, model_outputs, threshold=0.9):
-            for annotation in raw_annotations:
+        raw_annotations = self.feature_extractor.post_process(model_outputs["outputs"], model_outputs["target_size"])
-                keep = annotation["scores"] > threshold
+        raw_annotation = raw_annotations[0]
-                scores = annotation["scores"][keep]
+        keep = raw_annotation["scores"] > threshold
-                labels = annotation["labels"][keep]
+        scores = raw_annotation["scores"][keep]
-                boxes = annotation["boxes"][keep]
+        labels = raw_annotation["labels"][keep]
+        boxes = raw_annotation["boxes"][keep]
-                annotation["scores"] = scores.tolist()
-                annotation["labels"] = [self.model.config.id2label[label.item()] for label in labels]
+        raw_annotation["scores"] = scores.tolist()
-                annotation["boxes"] = [self._get_bounding_box(box) for box in boxes]
+        raw_annotation["labels"] = [self.model.config.id2label[label.item()] for label in labels]
+        raw_annotation["boxes"] = [self._get_bounding_box(box) for box in boxes]
        # {"scores": [...], ...} --> [{"score":x, ...}, ...]
        keys = ["score", "label", "box"]
        annotation = [
            dict(zip(keys, vals))
-                    for vals in zip(annotation["scores"], annotation["labels"], annotation["boxes"])
+            for vals in zip(raw_annotation["scores"], raw_annotation["labels"], raw_annotation["boxes"])
        ]
-                annotations.append(annotation)
+        return annotation
-        if not is_batched:
-            return annotations[0]
-        return annotations
    def _get_bounding_box(self, box: "torch.Tensor") -> Dict[str, int]:
        """

--- a/src/transformers/pipelines/question_answering.py
+++ b/src/transformers/pipelines/question_answering.py
+import warnings
 from collections.abc import Iterable
 from typing import TYPE_CHECKING, Dict, List, Optional, Tuple, Union
@@ -109,6 +110,7 @@ class QuestionAnsweringPipeline(Pipeline):
    """
    default_input_names = "question,context"
+    handle_impossible_answer = False
    def __init__(
        self,
@@ -158,6 +160,44 @@ class QuestionAnsweringPipeline(Pipeline):
        else:
            return SquadExample(None, question, context, None, None, None)
+    def _sanitize_parameters(
+        self,
+        padding=None,
+        topk=None,
+        top_k=None,
+        doc_stride=None,
+        max_answer_len=None,
+        max_seq_len=None,
+        max_question_len=None,
+        handle_impossible_answer=None,
+        **kwargs
+    ):
+        # Set defaults values
+        preprocess_params = {}
+        if padding is not None:
+            preprocess_params["padding"] = padding
+        if doc_stride is not None:
+            preprocess_params["doc_stride"] = doc_stride
+        if max_question_len is not None:
+            preprocess_params["max_question_len"] = max_question_len
+        postprocess_params = {}
+        if topk is not None and top_k is None:
+            warnings.warn("topk parameter is deprecated, use top_k instead", UserWarning)
+            top_k = topk
+        if top_k is not None:
+            if top_k < 1:
+                raise ValueError(f"top_k parameter should be >= 1 (got {top_k})")
+            postprocess_params["top_k"] = top_k
+        if max_answer_len is not None:
+            if max_answer_len < 1:
+                raise ValueError(f"max_answer_len parameter should be >= 1 (got {max_answer_len}")
+        if max_answer_len is not None:
+            postprocess_params["max_answer_len"] = max_answer_len
+        if handle_impossible_answer is not None:
+            postprocess_params["handle_impossible_answer"] = handle_impossible_answer
+        return preprocess_params, {}, postprocess_params
    def __call__(self, *args, **kwargs):
        """
        Answer the question(s) given as inputs by using the context(s).
@@ -201,50 +241,36 @@ class QuestionAnsweringPipeline(Pipeline):
            - **end** (:obj:`int`) -- The character end index of the answer (in the tokenized version of the input).
            - **answer** (:obj:`str`) -- The answer to the question.
        """
-        # Set defaults values
-        kwargs.setdefault("padding", "longest" if getattr(self.tokenizer, "pad_token", None) is not None else False)
-        kwargs.setdefault("topk", 1)
-        kwargs.setdefault("doc_stride", 128)
-        kwargs.setdefault("max_answer_len", 15)
-        kwargs.setdefault("max_seq_len", 384)
-        kwargs.setdefault("max_question_len", 64)
-        kwargs.setdefault("handle_impossible_answer", False)
-        if kwargs["topk"] < 1:
-            raise ValueError(f"topk parameter should be >= 1 (got {kwargs['topk']})")
-        if kwargs["max_answer_len"] < 1:
-            raise ValueError(f"max_answer_len parameter should be >= 1 (got {(kwargs['max_answer_len'])}")
        # Convert inputs to features
        examples = self._args_parser(*args, **kwargs)
+        if len(examples) == 1:
+            return super().__call__(examples[0], **kwargs)
+        return super().__call__(examples, **kwargs)
+    def preprocess(self, example, padding="do_not_pad", doc_stride=128, max_question_len=64, max_seq_len=384):
        if not self.tokenizer.is_fast:
-            features_list = [
+            features = squad_convert_examples_to_features(
-                squad_convert_examples_to_features(
                examples=[example],
                tokenizer=self.tokenizer,
-                    max_seq_length=kwargs["max_seq_len"],
+                max_seq_length=max_seq_len,
-                    doc_stride=kwargs["doc_stride"],
+                doc_stride=doc_stride,
-                    max_query_length=kwargs["max_question_len"],
+                max_query_length=max_question_len,
-                    padding_strategy=PaddingStrategy.MAX_LENGTH.value,
+                padding_strategy=PaddingStrategy.MAX_LENGTH,
                is_training=False,
                tqdm_enabled=False,
            )
-                for example in examples
-            ]
        else:
-            features_list = []
-            for example in examples:
            # Define the side we want to truncate / pad and the text/pair sorting
-                question_first = bool(self.tokenizer.padding_side == "right")
+            question_first = self.tokenizer.padding_side == "right"
            encoded_inputs = self.tokenizer(
                text=example.question_text if question_first else example.context_text,
                text_pair=example.context_text if question_first else example.question_text,
-                    padding=kwargs["padding"],
+                padding=padding,
                truncation="only_second" if question_first else "only_first",
-                    max_length=kwargs["max_seq_len"],
+                max_length=max_seq_len,
-                    stride=kwargs["doc_stride"],
+                stride=doc_stride,
                return_tensors="np",
                return_token_type_ids=True,
                return_overflowing_tokens=True,
@@ -297,31 +323,40 @@ class QuestionAnsweringPipeline(Pipeline):
                        qas_id=None,
                    )
                )
-                features_list.append(features)
+        return {"features": features, "example": example}
-        all_answers = []
+    def _forward(self, model_inputs):
-        for features, example in zip(features_list, examples):
+        features = model_inputs["features"]
+        example = model_inputs["example"]
        model_input_names = self.tokenizer.model_input_names
        fw_args = {k: [feature.__dict__[k] for feature in features] for k in model_input_names}
-            # Manage tensor allocation on correct device
-            with self.device_placement():
        if self.framework == "tf":
            fw_args = {k: tf.constant(v) for (k, v) in fw_args.items()}
            start, end = self.model(fw_args)[:2]
            start, end = start.numpy(), end.numpy()
-                else:
+        elif self.framework == "pt":
-                    with torch.no_grad():
            # Retrieve the score for the context tokens only (removing question tokens)
            fw_args = {k: torch.tensor(v, device=self.device) for (k, v) in fw_args.items()}
            # On Windows, the default int type in numpy is np.int32 so we get some non-long tensors.
            fw_args = {k: v.long() if v.dtype == torch.int32 else v for (k, v) in fw_args.items()}
            start, end = self.model(**fw_args)[:2]
            start, end = start.cpu().numpy(), end.cpu().numpy()
+        return {"start": start, "end": end, "features": features, "example": example}
+    def postprocess(
+        self,
+        model_outputs,
+        top_k=1,
+        handle_impossible_answer=False,
+        max_answer_len=15,
+    ):
        min_null_score = 1000000  # large and positive
        answers = []
-            for (feature, start_, end_) in zip(features, start, end):
+        start_ = model_outputs["start"][0]
+        end_ = model_outputs["end"][0]
+        feature = model_outputs["features"][0]
+        example = model_outputs["example"]
        # Ensure padded tokens & question tokens cannot belong to the set of candidate answers.
        undesired_tokens = np.abs(np.array(feature.p_mask) - 1) & feature.attention_mask
@@ -336,15 +371,13 @@ class QuestionAnsweringPipeline(Pipeline):
        start_ = np.exp(start_ - np.log(np.sum(np.exp(start_), axis=-1, keepdims=True)))
        end_ = np.exp(end_ - np.log(np.sum(np.exp(end_), axis=-1, keepdims=True)))
-                if kwargs["handle_impossible_answer"]:
+        if handle_impossible_answer:
            min_null_score = min(min_null_score, (start_[0] * end_[0]).item())
        # Mask CLS
        start_[0] = end_[0] = 0.0
-                starts, ends, scores = self.decode(
+        starts, ends, scores = self.decode(start_, end_, top_k, max_answer_len, undesired_tokens)
-                    start_, end_, kwargs["topk"], kwargs["max_answer_len"], undesired_tokens
-                )
        if not self.tokenizer.is_fast:
            char_to_word = np.array(example.char_to_word_offset)
@@ -397,15 +430,13 @@ class QuestionAnsweringPipeline(Pipeline):
                    }
                )
-            if kwargs["handle_impossible_answer"]:
+        if handle_impossible_answer:
            answers.append({"score": min_null_score, "start": 0, "end": 0, "answer": ""})
-            answers = sorted(answers, key=lambda x: x["score"], reverse=True)[: kwargs["topk"]]
+            answers = sorted(answers, key=lambda x: x["score"], reverse=True)[:top_k]
-            all_answers += answers
+        if len(answers) == 1:
+            return answers[0]
-        if len(all_answers) == 1:
+        return answers
-            return all_answers[0]
-        return all_answers
    def decode(
        self, start: np.ndarray, end: np.ndarray, topk: int, max_answer_len: int, undesired_tokens: np.ndarray

--- a/src/transformers/pipelines/table_question_answering.py
+++ b/src/transformers/pipelines/table_question_answering.py
@@ -17,7 +17,7 @@ class TableQuestionAnsweringArgumentHandler(ArgumentHandler):
    Handles arguments for the TableQuestionAnsweringPipeline
    """
-    def __call__(self, table=None, query=None, sequential=False, padding=True, truncation=True):
+    def __call__(self, table=None, query=None, **kwargs):
        # Returns tqa_pipeline_inputs of shape:
        # [
        #   {"table": pd.DataFrame, "query": List[str]},
@@ -60,7 +60,7 @@ class TableQuestionAnsweringArgumentHandler(ArgumentHandler):
                tqa_pipeline_input["table"] = pd.DataFrame(tqa_pipeline_input["table"])
-        return tqa_pipeline_inputs, sequential, padding, truncation
+        return tqa_pipeline_inputs
 @add_end_docstrings(PIPELINE_INIT_ARGS)
@@ -235,20 +235,45 @@ class TableQuestionAnsweringPipeline(Pipeline):
            - **cells** (:obj:`List[str]`) -- List of strings made up of the answer cell values.
            - **aggregator** (:obj:`str`) -- If the model has an aggregator, this returns the aggregator.
        """
-        pipeline_inputs, sequential, padding, truncation = self._args_parser(*args, **kwargs)
+        pipeline_inputs = self._args_parser(*args, **kwargs)
-        batched_answers = []
-        for pipeline_input in pipeline_inputs:
+        results = super().__call__(pipeline_inputs, **kwargs)
+        if len(results) == 1:
+            return results[0]
+        return results
+    def _sanitize_parameters(self, sequential=None, padding=None, truncation=None, **kwargs):
+        preprocess_params = {}
+        if padding is not None:
+            preprocess_params["padding"] = padding
+        if truncation is not None:
+            preprocess_params["truncation"] = truncation
+        forward_params = {}
+        if sequential is not None:
+            forward_params["sequential"] = sequential
+        return preprocess_params, forward_params, {}
+    def preprocess(self, pipeline_input, sequential=None, padding=True, truncation="drop_rows_to_fit"):
        table, query = pipeline_input["table"], pipeline_input["query"]
        if table.empty:
            raise ValueError("table is empty")
-            if not query:
+        if query is None or query == "":
            raise ValueError("query is empty")
-            inputs = self.tokenizer(
+        inputs = self.tokenizer(table, query, return_tensors=self.framework, truncation=truncation, padding=padding)
-                table, query, return_tensors=self.framework, truncation="drop_rows_to_fit", padding=padding
+        inputs["table"] = table
-            )
+        return inputs
-            outputs = self.sequential_inference(**inputs) if sequential else self.batch_inference(**inputs)
+    def _forward(self, model_inputs, sequential=False):
+        table = model_inputs.pop("table")
+        outputs = self.sequential_inference(**model_inputs) if sequential else self.batch_inference(**model_inputs)
+        model_outputs = {"model_inputs": model_inputs, "table": table, "outputs": outputs}
+        return model_outputs
+    def postprocess(self, model_outputs):
+        inputs = model_outputs["model_inputs"]
+        table = model_outputs["table"]
+        outputs = model_outputs["outputs"]
        if self.aggregate:
            logits, logits_agg = outputs[:2]
            predictions = self.tokenizer.convert_logits_to_predictions(inputs, logits.detach(), logits_agg)
@@ -282,5 +307,4 @@ class TableQuestionAnsweringPipeline(Pipeline):
            answers.append(answer)
        if len(answer) == 0:
            raise PipelineException("Empty answer")
-            batched_answers.append(answers if len(answers) > 1 else answers[0])
+        return answers if len(answers) > 1 else answers[0]
-        return batched_answers if len(batched_answers) > 1 else batched_answers[0]
--- a/src/transformers/pipelines/text2text_generation.py
+++ b/src/transformers/pipelines/text2text_generation.py
-from typing import Optional
+import enum
 from ..file_utils import add_end_docstrings, is_tf_available, is_torch_available
 from ..tokenization_utils import TruncationStrategy
@@ -17,6 +17,11 @@ if is_torch_available():
 logger = logging.get_logger(__name__)
+class ReturnType(enum.Enum):
+    TENSORS = 0
+    TEXT = 1
 @add_end_docstrings(PIPELINE_INIT_ARGS)
 class Text2TextGenerationPipeline(Pipeline):
    """
@@ -46,6 +51,32 @@ class Text2TextGenerationPipeline(Pipeline):
            else MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING
        )
+    def _sanitize_parameters(
+        self,
+        return_tensors=None,
+        return_text=None,
+        return_type=None,
+        clean_up_tokenization_spaces=None,
+        truncation=None,
+        **generate_kwargs
+    ):
+        preprocess_params = {}
+        if truncation is not None:
+            preprocess_params["truncation"] = truncation
+        forward_params = generate_kwargs
+        postprocess_params = {}
+        if return_tensors is not None and return_type is None:
+            return_type = ReturnType.TENSORS if return_tensors else ReturnType.TEXT
+        if return_type is not None:
+            postprocess_params["return_type"] = return_type
+        if clean_up_tokenization_spaces is not None:
+            postprocess_params["clean_up_tokenization_spaces"] = clean_up_tokenization_spaces
+        return preprocess_params, forward_params, postprocess_params
    def check_inputs(self, input_length: int, min_length: int, max_length: int):
        """
        Checks whether there might be something wrong with given input with regard to the model.
@@ -55,9 +86,8 @@ class Text2TextGenerationPipeline(Pipeline):
    def _parse_and_tokenize(self, *args, truncation):
        prefix = self.model.config.prefix if self.model.config.prefix is not None else ""
        if isinstance(args[0], list):
-            assert (
+            if self.tokenizer.pad_token_id is None:
-                self.tokenizer.pad_token_id is not None
+                raise ValueError("Please make sure that the tokenizer has a pad_token_id when using a batch input")
-            ), "Please make sure that the tokenizer has a pad_token_id when using a batch input"
            args = ([prefix + arg for arg in args[0]],)
            padding = True
@@ -68,21 +98,13 @@ class Text2TextGenerationPipeline(Pipeline):
            raise ValueError(
                f" `args[0]`: {args[0]} have the wrong format. The should be either of type `str` or type `list`"
            )
-        inputs = super()._parse_and_tokenize(*args, padding=padding, truncation=truncation)
+        inputs = self.tokenizer(*args, padding=padding, truncation=truncation, return_tensors=self.framework)
        # This is produced by tokenizers but is an invalid generate kwargs
        if "token_type_ids" in inputs:
            del inputs["token_type_ids"]
        return inputs
-    def __call__(
+    def __call__(self, *args, **kwargs):
-        self,
-        *args,
-        return_tensors=False,
-        return_text=True,
-        clean_up_tokenization_spaces=False,
-        truncation=TruncationStrategy.DO_NOT_TRUNCATE,
-        **generate_kwargs
-    ):
        r"""
        Generate the output text(s) using text(s) given as inputs.
@@ -111,43 +133,40 @@ class Text2TextGenerationPipeline(Pipeline):
              -- The token ids of the generated text.
        """
-        assert return_tensors or return_text, "You must specify return_tensors=True or return_text=True"
+        result = super().__call__(*args, **kwargs)
+        if isinstance(result, dict):
+            return [result]
+        return result
-        with self.device_placement():
+    def preprocess(self, inputs, truncation=TruncationStrategy.DO_NOT_TRUNCATE, **kwargs):
-            inputs = self._parse_and_tokenize(*args, truncation=truncation)
+        inputs = self._parse_and_tokenize(inputs, truncation=truncation, **kwargs)
-            return self._generate(inputs, return_tensors, return_text, clean_up_tokenization_spaces, generate_kwargs)
+        return inputs
-    def _generate(
+    def _forward(self, model_inputs, **generate_kwargs):
-        self, inputs, return_tensors: bool, return_text: bool, clean_up_tokenization_spaces: bool, generate_kwargs
-    ):
        if self.framework == "pt":
-            inputs = self.ensure_tensor_on_device(**inputs)
+            input_length = model_inputs["input_ids"].shape[-1]
-            input_length = inputs["input_ids"].shape[-1]
        elif self.framework == "tf":
-            input_length = tf.shape(inputs["input_ids"])[-1].numpy()
+            input_length = tf.shape(model_inputs["input_ids"])[-1].numpy()
-        min_length = generate_kwargs.get("min_length", self.model.config.min_length)
+        generate_kwargs["min_length"] = generate_kwargs.get("min_length", self.model.config.min_length)
-        max_length = generate_kwargs.get("max_length", self.model.config.max_length)
+        generate_kwargs["max_length"] = generate_kwargs.get("max_length", self.model.config.max_length)
-        self.check_inputs(input_length, min_length, max_length)
+        self.check_inputs(input_length, generate_kwargs["min_length"], generate_kwargs["max_length"])
+        output_ids = self.model.generate(**model_inputs, **generate_kwargs)
+        return {"output_ids": output_ids}
-        generate_kwargs.update(inputs)
+    def postprocess(self, model_outputs, return_type=ReturnType.TEXT, clean_up_tokenization_spaces=False):
-        generations = self.model.generate(
-            **generate_kwargs,
-        )
-        results = []
-        for generation in generations:
        record = {}
-            if return_tensors:
+        if return_type == ReturnType.TENSORS:
-                record[f"{self.return_name}_token_ids"] = generation
+            record = {f"{self.return_name}_token_ids": model_outputs}
-            if return_text:
+        elif return_type == ReturnType.TEXT:
-                record[f"{self.return_name}_text"] = self.tokenizer.decode(
+            record = {
-                    generation,
+                f"{self.return_name}_text": self.tokenizer.decode(
+                    model_outputs["output_ids"][0],
                    skip_special_tokens=True,
                    clean_up_tokenization_spaces=clean_up_tokenization_spaces,
                )
-            results.append(record)
+            }
-        return results
+        return record
 @add_end_docstrings(PIPELINE_INIT_ARGS)
@@ -239,23 +258,6 @@ class TranslationPipeline(Text2TextGenerationPipeline):
    # Used in the return key of the pipeline.
    return_name = "translation"
-    src_lang: Optional[str] = None
-    tgt_lang: Optional[str] = None
-    def __init__(self, *args, src_lang=None, tgt_lang=None, **kwargs):
-        super().__init__(*args, **kwargs)
-        if src_lang is not None:
-            self.src_lang = src_lang
-        if tgt_lang is not None:
-            self.tgt_lang = tgt_lang
-        if src_lang is None and tgt_lang is None:
-            # Backward compatibility, direct arguments use is preferred.
-            task = kwargs.get("task", "")
-            items = task.split("_")
-            if task and len(items) == 4:
-                # translation, XX, to YY
-                self.src_lang = items[1]
-                self.tgt_lang = items[3]
    def check_inputs(self, input_length: int, min_length: int, max_length: int):
        if input_length > 0.9 * max_length:
@@ -265,25 +267,31 @@ class TranslationPipeline(Text2TextGenerationPipeline):
            )
        return True
-    def _parse_and_tokenize(self, *args, src_lang, tgt_lang, truncation):
+    def preprocess(self, *args, truncation=TruncationStrategy.DO_NOT_TRUNCATE, src_lang=None, tgt_lang=None):
        if getattr(self.tokenizer, "_build_translation_inputs", None):
            return self.tokenizer._build_translation_inputs(
-                *args, return_tensors=self.framework, src_lang=src_lang, tgt_lang=tgt_lang, truncation=truncation
+                *args, return_tensors=self.framework, truncation=truncation, src_lang=src_lang, tgt_lang=tgt_lang
            )
        else:
            return super()._parse_and_tokenize(*args, truncation=truncation)
-    def __call__(
+    def _sanitize_parameters(self, src_lang=None, tgt_lang=None, **kwargs):
-        self,
+        preprocess_params, forward_params, postprocess_params = super()._sanitize_parameters(**kwargs)
-        *args,
+        if src_lang is not None:
-        return_tensors=False,
+            preprocess_params["src_lang"] = src_lang
-        return_text=True,
+        if tgt_lang is not None:
-        clean_up_tokenization_spaces=False,
+            preprocess_params["tgt_lang"] = tgt_lang
-        truncation=TruncationStrategy.DO_NOT_TRUNCATE,
+        if src_lang is None and tgt_lang is None:
-        src_lang=None,
+            # Backward compatibility, direct arguments use is preferred.
-        tgt_lang=None,
+            task = kwargs.get("task", self.task)
-        **generate_kwargs
+            items = task.split("_")
-    ):
+            if task and len(items) == 4:
+                # translation, XX, to YY
+                preprocess_params["src_lang"] = items[1]
+                preprocess_params["tgt_lang"] = items[3]
+        return preprocess_params, forward_params, postprocess_params
+    def __call__(self, *args, **kwargs):
        r"""
        Translate the text(s) given as inputs.
@@ -313,10 +321,4 @@ class TranslationPipeline(Text2TextGenerationPipeline):
            - **translation_token_ids** (:obj:`torch.Tensor` or :obj:`tf.Tensor`, present when ``return_tensors=True``)
              -- The token ids of the translation.
        """
-        assert return_tensors or return_text, "You must specify return_tensors=True or return_text=True"
+        return super().__call__(*args, **kwargs)
-        src_lang = src_lang if src_lang is not None else self.src_lang
-        tgt_lang = tgt_lang if tgt_lang is not None else self.tgt_lang
-        with self.device_placement():
-            inputs = self._parse_and_tokenize(*args, truncation=truncation, src_lang=src_lang, tgt_lang=tgt_lang)
-            return self._generate(inputs, return_tensors, return_text, clean_up_tokenization_spaces, generate_kwargs)
--- a/src/transformers/pipelines/text_classification.py
+++ b/src/transformers/pipelines/text_classification.py
-from typing import Optional
+from typing import Dict
 import numpy as np
 from ..file_utils import ExplicitEnum, add_end_docstrings, is_tf_available, is_torch_available
-from .base import PIPELINE_INIT_ARGS, Pipeline
+from .base import PIPELINE_INIT_ARGS, GenericTensor, Pipeline
 if is_tf_available():
@@ -61,9 +61,10 @@ class TextClassificationPipeline(Pipeline):
    <https://huggingface.co/models?filter=text-classification>`__.
    """
-    task = "text-classification"
+    return_all_scores = False
+    function_to_apply = ClassificationFunction.NONE
-    def __init__(self, return_all_scores: bool = None, function_to_apply: str = None, **kwargs):
+    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.check_model_type(
@@ -72,22 +73,24 @@ class TextClassificationPipeline(Pipeline):
            else MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING
        )
+    def _sanitize_parameters(self, return_all_scores=None, function_to_apply=None, **tokenizer_kwargs):
+        preprocess_params = tokenizer_kwargs
+        postprocess_params = {}
        if hasattr(self.model.config, "return_all_scores") and return_all_scores is None:
            return_all_scores = self.model.config.return_all_scores
-        if hasattr(self.model.config, "function_to_apply") and function_to_apply is None:
+        if return_all_scores is not None:
-            function_to_apply = self.model.config.function_to_apply
+            postprocess_params["return_all_scores"] = return_all_scores
-        self.return_all_scores = return_all_scores if return_all_scores is not None else False
+        if isinstance(function_to_apply, str):
-        self.function_to_apply = function_to_apply if function_to_apply is not None else None
+            function_to_apply = ClassificationFunction[function_to_apply.upper()]
-    def __call__(
+        if function_to_apply is not None:
-        self,
+            postprocess_params["function_to_apply"] = function_to_apply
-        *args,
+        return preprocess_params, {}, postprocess_params
-        return_all_scores: Optional[bool] = None,
-        function_to_apply: Optional[ClassificationFunction] = None,
+    def __call__(self, *args, **kwargs):
-        **kwargs
-    ):
        """
        Classify the text(s) given as inputs.
@@ -120,19 +123,32 @@ class TextClassificationPipeline(Pipeline):
            If ``self.return_all_scores=True``, one such dictionary is returned per label.
        """
-        outputs = super().__call__(*args, **kwargs)
+        return super().__call__(*args, **kwargs)
+    def preprocess(self, inputs, **tokenizer_kwargs) -> Dict[str, GenericTensor]:
+        return_tensors = self.framework
+        return self.tokenizer(inputs, return_tensors=return_tensors, **tokenizer_kwargs)
-        return_all_scores = return_all_scores if return_all_scores is not None else self.return_all_scores
+    def _forward(self, model_inputs):
-        function_to_apply = function_to_apply if function_to_apply is not None else self.function_to_apply
+        return self.model(**model_inputs)
+    def postprocess(self, model_outputs, function_to_apply=None, return_all_scores=False):
+        # Default value before `set_parameters`
        if function_to_apply is None:
            if self.model.config.problem_type == "multi_label_classification" or self.model.config.num_labels == 1:
                function_to_apply = ClassificationFunction.SIGMOID
            elif self.model.config.problem_type == "single_label_classification" or self.model.config.num_labels > 1:
                function_to_apply = ClassificationFunction.SOFTMAX
+            elif hasattr(self.model.config, "function_to_apply") and function_to_apply is None:
+                function_to_apply = self.model.config.function_to_apply
+            else:
+                function_to_apply = ClassificationFunction.NONE
-        if isinstance(function_to_apply, str):
+        outputs = model_outputs["logits"][0]
-            function_to_apply = ClassificationFunction[function_to_apply.upper()]
+        if self.framework == "pt":
+            outputs = outputs.cpu().numpy()
+        else:
+            outputs = outputs.numpy()
        if function_to_apply == ClassificationFunction.SIGMOID:
            scores = sigmoid(outputs)
@@ -144,11 +160,13 @@ class TextClassificationPipeline(Pipeline):
            raise ValueError(f"Unrecognized `function_to_apply` argument: {function_to_apply}")
        if return_all_scores:
-            return [
+            return [{"label": self.model.config.id2label[i], "score": score.item()} for i, score in enumerate(scores)]
-                [{"label": self.model.config.id2label[i], "score": score.item()} for i, score in enumerate(item)]
-                for item in scores
-            ]
        else:
-            return [
+            return {"label": self.model.config.id2label[scores.argmax().item()], "score": scores.max().item()}
-                {"label": self.model.config.id2label[item.argmax()], "score": item.max().item()} for item in scores
-            ]
+    def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):
+        return [self.run_single(item, preprocess_params, forward_params, postprocess_params)[0] for item in inputs]
+    def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):
+        "This pipeline is odd, and return a list when single item is run"
+        return [super().run_single(inputs, preprocess_params, forward_params, postprocess_params)]
--- a/src/transformers/pipelines/text_generation.py
+++ b/src/transformers/pipelines/text_generation.py
+import enum
 from transformers import MODEL_FOR_CAUSAL_LM_MAPPING, TF_MODEL_FOR_CAUSAL_LM_MAPPING
 from ..file_utils import add_end_docstrings
 from .base import PIPELINE_INIT_ARGS, Pipeline
+class ReturnType(enum.Enum):
+    TENSORS = 0
+    NEW_TEXT = 1
+    FULL_TEXT = 2
 @add_end_docstrings(PIPELINE_INIT_ARGS)
 class TextGenerationPipeline(Pipeline):
    """
@@ -32,29 +40,72 @@ class TextGenerationPipeline(Pipeline):
    begging for his blessing. <eod> </s> <eos>
    """
-    ALLOWED_MODELS = [
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.check_model_type(
+            TF_MODEL_FOR_CAUSAL_LM_MAPPING if self.framework == "tf" else MODEL_FOR_CAUSAL_LM_MAPPING
+        )
+        if "prefix" not in self._preprocess_params:
+            # This is very specific. The logic is quite complex and needs to be done
+            # as a "default".
+            # It also defines both some preprocess_kwargs and generate_kwargs
+            # which is why we cannot put them in their respective methods.
+            prefix = None
+            if self.model.config.prefix is not None:
+                prefix = self.model.config.prefix
+            if prefix is None and self.model.__class__.__name__ in [
                "XLNetLMHeadModel",
                "TransfoXLLMHeadModel",
-        "ReformerModelWithLMHead",
-        "GPT2LMHeadModel",
-        "GPTJForCausalLM",
-        "GPTNeoForCausalLM",
-        "OpenAIGPTLMHeadModel",
-        "CTRLLMHeadModel",
                "TFXLNetLMHeadModel",
                "TFTransfoXLLMHeadModel",
-        "TFGPT2LMHeadModel",
+            ]:
-        "TFOpenAIGPTLMHeadModel",
+                # For XLNet and TransformerXL we add an article to the prompt to give more state to the model.
-        "TFCTRLLMHeadModel",
+                prefix = self.XL_PREFIX
-    ]
+            if prefix is not None:
+                # Recalculate some generate_kwargs linked to prefix.
+                preprocess_params, forward_params, _ = self._sanitize_parameters(prefix=prefix, **self._forward_params)
+                self._preprocess_params = {**self._preprocess_params, **preprocess_params}
+                self._forward_params = {**self._forward_params, **forward_params}
-    def __init__(self, *args, return_full_text=True, **kwargs):
+    def _sanitize_parameters(
-        super().__init__(*args, **kwargs)
+        self,
-        self.check_model_type(
+        return_full_text=None,
-            TF_MODEL_FOR_CAUSAL_LM_MAPPING if self.framework == "tf" else MODEL_FOR_CAUSAL_LM_MAPPING
+        return_tensors=None,
+        return_text=None,
+        return_type=None,
+        clean_up_tokenization_spaces=None,
+        prefix=None,
+        **generate_kwargs
+    ):
+        preprocess_params = {}
+        if prefix is not None:
+            preprocess_params["prefix"] = prefix
+        if prefix:
+            prefix_inputs = self.tokenizer(
+                prefix, padding=False, add_special_tokens=False, return_tensors=self.framework
            )
+            prefix_length = prefix_inputs["input_ids"].shape[-1]
+            if "max_length" in generate_kwargs:
+                generate_kwargs["max_length"] += prefix_length
+            else:
+                generate_kwargs["max_length"] = self.model.config.max_length + prefix_length
+            if "min_length" in generate_kwargs:
+                generate_kwargs["min_length"] += prefix_length
-        self.return_full_text = return_full_text
+        forward_params = generate_kwargs
+        postprocess_params = {}
+        if return_full_text is not None and return_type is None:
+            return_type = ReturnType.FULL_TEXT if return_full_text else ReturnType.NEW_TEXT
+        if return_tensors is not None and return_type is None:
+            return_type = ReturnType.TENSORS
+        if return_type is not None:
+            postprocess_params["return_type"] = return_type
+        if clean_up_tokenization_spaces is not None:
+            postprocess_params["clean_up_tokenization_spaces"] = clean_up_tokenization_spaces
+        return preprocess_params, forward_params, postprocess_params
    # overriding _parse_and_tokenize to allow for unusual language-modeling tokenizer arguments
    def _parse_and_tokenize(self, *args, **kwargs):
@@ -67,16 +118,7 @@ class TextGenerationPipeline(Pipeline):
        return super()._parse_and_tokenize(*args, **kwargs)
-    def __call__(
+    def __call__(self, text_inputs, **kwargs):
-        self,
-        text_inputs,
-        return_tensors=False,
-        return_text=True,
-        return_full_text=None,
-        clean_up_tokenization_spaces=False,
-        prefix=None,
-        **generate_kwargs
-    ):
        """
        Complete the prompt(s) given as inputs.
@@ -105,68 +147,36 @@ class TextGenerationPipeline(Pipeline):
            - **generated_token_ids** (:obj:`torch.Tensor` or :obj:`tf.Tensor`, present when ``return_tensors=True``)
              -- The token ids of the generated text.
        """
-        prefix = prefix if prefix is not None else self.model.config.prefix
+        return super().__call__(text_inputs, **kwargs)
-        return_full_text = return_full_text if return_full_text is not None else self.return_full_text
-        if isinstance(text_inputs, str):
-            text_inputs = [text_inputs]
-        results = []
-        for prompt_text in text_inputs:
-            # Manage correct placement of the tensors
-            with self.device_placement():
-                if prefix is None and self.model.__class__.__name__ in [
-                    "XLNetLMHeadModel",
-                    "TransfoXLLMHeadModel",
-                    "TFXLNetLMHeadModel",
-                    "TFTransfoXLLMHeadModel",
-                ]:
-                    # For XLNet and TransformerXL we add an article to the prompt to give more state to the model.
-                    prefix = self.XL_PREFIX
-                if prefix:
+    def preprocess(self, prompt_text, prefix=""):
-                    prefix_inputs = self._parse_and_tokenize(prefix, padding=False, add_special_tokens=False)
+        inputs = self.tokenizer(
-                    # This impacts max_length and min_length argument that need adjusting.
+            prefix + prompt_text, padding=False, add_special_tokens=False, return_tensors=self.framework
-                    prefix_length = prefix_inputs["input_ids"].shape[-1]
+        )
-                    if generate_kwargs.get("max_length", None) is not None:
+        inputs["prompt_text"] = prompt_text
-                        generate_kwargs["max_length"] += prefix_length
+        return inputs
-                    else:
-                        generate_kwargs["max_length"] = self.model.config.max_length + prefix_length
+    def _forward(self, model_inputs, **generate_kwargs):
+        input_ids = model_inputs["input_ids"]
-                    if generate_kwargs.get("min_length", None) is not None:
+        prompt_text = model_inputs.pop("prompt_text")
-                        generate_kwargs["min_length"] += prefix_length
+        generated_sequence = self.model.generate(input_ids=input_ids, **generate_kwargs)  # BS x SL
+        return {"generated_sequence": generated_sequence, "input_ids": input_ids, "prompt_text": prompt_text}
-                prefix = prefix or ""
-                inputs = self._parse_and_tokenize(prefix + prompt_text, padding=False, add_special_tokens=False)
+    def postprocess(self, model_outputs, return_type=ReturnType.FULL_TEXT, clean_up_tokenization_spaces=True):
+        generated_sequence = model_outputs["generated_sequence"]
-                # set input_ids to None to allow empty prompt
+        input_ids = model_outputs["input_ids"]
-                if inputs["input_ids"].shape[-1] == 0:
+        prompt_text = model_outputs["prompt_text"]
-                    inputs["input_ids"] = None
-                    inputs["attention_mask"] = None
-                if self.framework == "pt" and inputs["input_ids"] is not None:
-                    inputs = self.ensure_tensor_on_device(**inputs)
-                input_ids = inputs["input_ids"]
-                # Ensure that batch size = 1 (batch generation not allowed for now)
-                assert (
-                    input_ids is None or input_ids.shape[0] == 1
-                ), "Batch generation is currently not supported. See https://github.com/huggingface/transformers/issues/3021 for more information."
-                output_sequences = self.model.generate(input_ids=input_ids, **generate_kwargs)  # BS x SL
-            result = []
-            for generated_sequence in output_sequences:
        if self.framework == "pt" and generated_sequence is not None:
            generated_sequence = generated_sequence.cpu()
        generated_sequence = generated_sequence.numpy().tolist()
-                record = {}
+        if return_type == ReturnType.TENSORS:
-                if return_tensors:
+            record = {"generated_token_ids": generated_sequence}
-                    record["generated_token_ids"] = generated_sequence
+        elif return_type in {ReturnType.NEW_TEXT, ReturnType.FULL_TEXT}:
-                if return_text:
            # Decode text
+            record = []
+            for sequence in generated_sequence:
                text = self.tokenizer.decode(
-                        generated_sequence,
+                    sequence,
                    skip_special_tokens=True,
                    clean_up_tokenization_spaces=clean_up_tokenization_spaces,
                )
@@ -183,17 +193,12 @@ class TextGenerationPipeline(Pipeline):
                        )
                    )
-                    if return_full_text:
+                if return_type == ReturnType.FULL_TEXT:
                    all_text = prompt_text + text[prompt_length:]
                else:
                    all_text = text[prompt_length:]
-                    record["generated_text"] = all_text
+                item = {"generated_text": all_text}
+                record.append(item)
-                result.append(record)
-            results += [result]
-        if len(results) == 1:
-            return results[0]
-        return results
+        return record
--- a/src/transformers/pipelines/token_classification.py
+++ b/src/transformers/pipelines/token_classification.py
 import warnings
-from typing import TYPE_CHECKING, List, Optional, Tuple, Union
+from typing import List, Optional, Tuple, Union
 import numpy as np
 from ..file_utils import ExplicitEnum, add_end_docstrings, is_tf_available, is_torch_available
-from ..modelcard import ModelCard
 from ..models.bert.tokenization_bert import BasicTokenizer
-from ..tokenization_utils import PreTrainedTokenizer
 from .base import PIPELINE_INIT_ARGS, ArgumentHandler, Pipeline
-if TYPE_CHECKING:
-    from ..modeling_tf_utils import TFPreTrainedModel
-    from ..modeling_utils import PreTrainedModel
 if is_tf_available():
    from ..models.auto.modeling_tf_auto import TF_MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING
 if is_torch_available():
-    import torch
    from ..models.auto.modeling_auto import MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING
@@ -104,31 +95,9 @@ class TokenClassificationPipeline(Pipeline):
    default_input_names = "sequences"
-    def __init__(
+    def __init__(self, args_parser=TokenClassificationArgumentHandler(), *args, **kwargs):
-        self,
+        self.ignore_labels = ["O"]
-        model: Union["PreTrainedModel", "TFPreTrainedModel"],
+        super().__init__(*args, **kwargs)
-        tokenizer: PreTrainedTokenizer,
-        modelcard: Optional[ModelCard] = None,
-        framework: Optional[str] = None,
-        args_parser: ArgumentHandler = TokenClassificationArgumentHandler(),
-        device: int = -1,
-        binary_output: bool = False,
-        ignore_labels=["O"],
-        task: str = "",
-        grouped_entities: Optional[bool] = None,
-        ignore_subwords: Optional[bool] = None,
-        aggregation_strategy: Optional[AggregationStrategy] = None,
-    ):
-        super().__init__(
-            model=model,
-            tokenizer=tokenizer,
-            modelcard=modelcard,
-            framework=framework,
-            device=device,
-            binary_output=binary_output,
-            task=task,
-        )
        self.check_model_type(
            TF_MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING
            if self.framework == "tf"
@@ -137,12 +106,17 @@ class TokenClassificationPipeline(Pipeline):
        self._basic_tokenizer = BasicTokenizer(do_lower_case=False)
        self._args_parser = args_parser
-        self.ignore_labels = ignore_labels
-        if aggregation_strategy is None:
+    def _sanitize_parameters(
-            aggregation_strategy = AggregationStrategy.NONE
+        self,
-            if grouped_entities is not None or ignore_subwords is not None:
+        ignore_labels=None,
+        grouped_entities: Optional[bool] = None,
+        ignore_subwords: Optional[bool] = None,
+        aggregation_strategy: Optional[AggregationStrategy] = None,
+    ):
+        postprocess_params = {}
+        if grouped_entities is not None or ignore_subwords is not None:
            if grouped_entities and ignore_subwords:
                aggregation_strategy = AggregationStrategy.FIRST
            elif grouped_entities and not ignore_subwords:
@@ -158,19 +132,23 @@ class TokenClassificationPipeline(Pipeline):
                warnings.warn(
                    f'`ignore_subwords` is deprecated and will be removed in version v5.0.0, defaulted to `aggregation_strategy="{aggregation_strategy}"` instead.'
                )
+        if aggregation_strategy is not None:
            if isinstance(aggregation_strategy, str):
                aggregation_strategy = AggregationStrategy[aggregation_strategy.upper()]
            if (
-            aggregation_strategy in {AggregationStrategy.FIRST, AggregationStrategy.MAX, AggregationStrategy.AVERAGE}
+                aggregation_strategy
+                in {AggregationStrategy.FIRST, AggregationStrategy.MAX, AggregationStrategy.AVERAGE}
                and not self.tokenizer.is_fast
            ):
                raise ValueError(
                    "Slow tokenizers cannot handle subwords. Please set the `aggregation_strategy` option"
                    'to `"simple"` or use a fast tokenizer.'
                )
+            postprocess_params["aggregation_strategy"] = aggregation_strategy
-        self.aggregation_strategy = aggregation_strategy
+        if ignore_labels is not None:
+            postprocess_params["ignore_labels"] = ignore_labels
+        return {}, {}, postprocess_params
    def __call__(self, inputs: Union[str, List[str]], **kwargs):
        """
@@ -198,44 +176,57 @@ class TokenClassificationPipeline(Pipeline):
        """
        _inputs, offset_mappings = self._args_parser(inputs, **kwargs)
+        self.offset_mappings = offset_mappings
-        answers = []
+        return super().__call__(inputs, **kwargs)
-        for i, sentence in enumerate(_inputs):
+    def preprocess(self, sentence):
+        truncation = True if self.tokenizer.model_max_length and self.tokenizer.model_max_length > 0 else False
-            # Manage correct placement of the tensors
+        model_inputs = self.tokenizer(
-            with self.device_placement():
-                tokens = self.tokenizer(
            sentence,
            return_attention_mask=False,
            return_tensors=self.framework,
-                    truncation=True,
+            truncation=truncation,
            return_special_tokens_mask=True,
            return_offsets_mapping=self.tokenizer.is_fast,
        )
-                if self.tokenizer.is_fast:
+        if self.offset_mappings:
-                    offset_mapping = tokens.pop("offset_mapping").cpu().numpy()[0]
+            offset_mapping = self.offset_mappings[0]
-                elif offset_mappings:
+            model_inputs["offset_mapping"] = offset_mapping
-                    offset_mapping = offset_mappings[i]
-                else:
+        model_inputs["sentence"] = sentence
-                    offset_mapping = None
-                special_tokens_mask = tokens.pop("special_tokens_mask").cpu().numpy()[0]
+        return model_inputs
+    def _forward(self, model_inputs):
        # Forward
+        special_tokens_mask = model_inputs.pop("special_tokens_mask")
+        offset_mapping = model_inputs.pop("offset_mapping", None)
+        sentence = model_inputs.pop("sentence")
        if self.framework == "tf":
-                    entities = self.model(tokens.data)[0][0].numpy()
+            outputs = self.model(model_inputs.data)[0][0].numpy()
-                    input_ids = tokens["input_ids"].numpy()[0]
        else:
-                    with torch.no_grad():
+            outputs = self.model(**model_inputs)[0][0].numpy()
-                        tokens = self.ensure_tensor_on_device(**tokens)
+        return {
-                        entities = self.model(**tokens)[0][0].cpu().numpy()
+            "outputs": outputs,
-                        input_ids = tokens["input_ids"].cpu().numpy()[0]
+            "special_tokens_mask": special_tokens_mask,
+            "offset_mapping": offset_mapping,
-            scores = np.exp(entities) / np.exp(entities).sum(-1, keepdims=True)
+            "sentence": sentence,
-            pre_entities = self.gather_pre_entities(sentence, input_ids, scores, offset_mapping, special_tokens_mask)
+            **model_inputs,
-            grouped_entities = self.aggregate(pre_entities, self.aggregation_strategy)
+        }
+    def postprocess(self, model_outputs, aggregation_strategy=AggregationStrategy.NONE):
+        outputs = model_outputs["outputs"]
+        sentence = model_outputs["sentence"]
+        input_ids = model_outputs["input_ids"][0]
+        offset_mapping = model_outputs["offset_mapping"][0] if model_outputs["offset_mapping"] is not None else None
+        special_tokens_mask = model_outputs["special_tokens_mask"][0].numpy()
+        scores = np.exp(outputs) / np.exp(outputs).sum(-1, keepdims=True)
+        pre_entities = self.gather_pre_entities(
+            sentence, input_ids, scores, offset_mapping, special_tokens_mask, aggregation_strategy
+        )
+        grouped_entities = self.aggregate(pre_entities, aggregation_strategy)
        # Filter anything that is in self.ignore_labels
        entities = [
            entity
@@ -243,11 +234,7 @@ class TokenClassificationPipeline(Pipeline):
            if entity.get("entity", None) not in self.ignore_labels
            and entity.get("entity_group", None) not in self.ignore_labels
        ]
-            answers.append(entities)
+        return entities
-        if len(answers) == 1:
-            return answers[0]
-        return answers
    def gather_pre_entities(
        self,
@@ -256,6 +243,7 @@ class TokenClassificationPipeline(Pipeline):
        scores: np.ndarray,
        offset_mapping: Optional[List[Tuple[int, int]]],
        special_tokens_mask: np.ndarray,
+        aggregation_strategy: AggregationStrategy,
    ) -> List[dict]:
        """Fuse various numpy arrays into dicts with all the information needed for aggregation"""
        pre_entities = []
@@ -269,6 +257,12 @@ class TokenClassificationPipeline(Pipeline):
            word = self.tokenizer.convert_ids_to_tokens(int(input_ids[idx]))
            if offset_mapping is not None:
                start_ind, end_ind = offset_mapping[idx]
+                if self.framework == "pt":
+                    start_ind = start_ind.item()
+                    end_ind = end_ind.item()
+                else:
+                    start_ind = int(start_ind.numpy())
+                    end_ind = int(end_ind.numpy())
                word_ref = sentence[start_ind:end_ind]
                if getattr(self.tokenizer._tokenizer.model, "continuing_subword_prefix", None):
                    # This is a BPE, word aware tokenizer, there is a correct way
@@ -276,7 +270,7 @@ class TokenClassificationPipeline(Pipeline):
                    is_subword = len(word) != len(word_ref)
                else:
                    # This is a fallback heuristic. This will fail most likely on any kind of text + punctuation mixtures that will be considered "words". Non word aware models cannot do better than this unfortunately.
-                    if self.aggregation_strategy in {
+                    if aggregation_strategy in {
                        AggregationStrategy.FIRST,
                        AggregationStrategy.AVERAGE,
                        AggregationStrategy.MAX,
@@ -362,10 +356,11 @@ class TokenClassificationPipeline(Pipeline):
        Example: micro|soft| com|pany| B-ENT I-NAME I-ENT I-ENT will be rewritten with first strategy as microsoft|
        company| B-ENT I-ENT
        """
-        assert aggregation_strategy not in {
+        if aggregation_strategy in {
            AggregationStrategy.NONE,
            AggregationStrategy.SIMPLE,
-        }, "NONE and SIMPLE strategies are invalid"
+        }:
+            raise ValueError("NONE and SIMPLE strategies are invalid for word aggregation")
        word_entities = []
        word_group = None

--- a/src/transformers/pipelines/zero_shot_classification.py
+++ b/src/transformers/pipelines/zero_shot_classification.py
@@ -2,15 +2,12 @@ from typing import List, Union
 import numpy as np
-from ..file_utils import add_end_docstrings, is_torch_available
+from ..file_utils import add_end_docstrings
 from ..tokenization_utils import TruncationStrategy
 from ..utils import logging
 from .base import PIPELINE_INIT_ARGS, ArgumentHandler, Pipeline
-if is_torch_available():
-    import torch
 logger = logging.get_logger(__name__)
@@ -22,7 +19,7 @@ class ZeroShotClassificationArgumentHandler(ArgumentHandler):
    def _parse_labels(self, labels):
        if isinstance(labels, str):
-            labels = [label.strip() for label in labels.split(",")]
+            labels = [label.strip() for label in labels.split(",") if label.strip()]
        return labels
    def __call__(self, sequences, labels, hypothesis_template):
@@ -38,13 +35,12 @@ class ZeroShotClassificationArgumentHandler(ArgumentHandler):
        if isinstance(sequences, str):
            sequences = [sequences]
-        labels = self._parse_labels(labels)
        sequence_pairs = []
        for sequence in sequences:
            sequence_pairs.extend([[sequence, hypothesis_template.format(label)] for label in labels])
-        return sequence_pairs
+        return sequence_pairs, sequences
 @add_end_docstrings(PIPELINE_INIT_ARGS)
@@ -66,8 +62,8 @@ class ZeroShotClassificationPipeline(Pipeline):
    """
    def __init__(self, args_parser=ZeroShotClassificationArgumentHandler(), *args, **kwargs):
-        super().__init__(*args, **kwargs)
        self._args_parser = args_parser
+        super().__init__(*args, **kwargs)
        if self.entailment_id == -1:
            logger.warning(
                "Failed to determine 'entailment' label id from the label2id mapping in the model config. Setting to "
@@ -82,19 +78,11 @@ class ZeroShotClassificationPipeline(Pipeline):
        return -1
    def _parse_and_tokenize(
-        self,
+        self, sequence_pairs, padding=True, add_special_tokens=True, truncation=TruncationStrategy.ONLY_FIRST, **kwargs
-        sequences,
-        candidate_labels,
-        hypothesis_template,
-        padding=True,
-        add_special_tokens=True,
-        truncation=TruncationStrategy.ONLY_FIRST,
-        **kwargs
    ):
        """
        Parse arguments and tokenize only_first so that hypothesis (label) is not truncated
        """
-        sequence_pairs = self._args_parser(sequences, candidate_labels, hypothesis_template)
        return_tensors = self.framework
        if getattr(self.tokenizer, "pad_token", None) is None:
            # XXX some tokenizers do not have a padding token, we use simple lists
@@ -141,55 +129,27 @@ class ZeroShotClassificationPipeline(Pipeline):
        return inputs
-    def _forward(self, inputs, return_tensors=False):
+    def _sanitize_parameters(self, **kwargs):
-        """
+        if kwargs.get("multi_class", None) is not None:
-        Internal framework specific forward dispatching
+            kwargs["multi_label"] = kwargs["multi_class"]
+            logger.warning(
-        Args:
+                "The `multi_class` argument has been deprecated and renamed to `multi_label`. "
-            inputs: dict holding all the keyword arguments for required by the model forward method.
+                "`multi_class` will be removed in a future version of Transformers."
-            return_tensors: Whether to return native framework (pt/tf) tensors rather than numpy array
+            )
+        preprocess_params = {}
-        Returns:
+        if "candidate_labels" in kwargs:
-            Numpy array
+            preprocess_params["candidate_labels"] = self._args_parser._parse_labels(kwargs["candidate_labels"])
-        """
+        if "hypothesis_template" in kwargs:
-        # Encode for forward
+            preprocess_params["hypothesis_template"] = kwargs["hypothesis_template"]
-        with self.device_placement():
-            if self.framework == "tf":
-                if isinstance(inputs, list):
-                    predictions = []
-                    for input_ in inputs:
-                        prediction = self.model(input_.data, training=False)[0]
-                        predictions.append(prediction)
-                else:
-                    predictions = self.model(inputs.data, training=False)[0]
-            else:
-                with torch.no_grad():
-                    if isinstance(inputs, list):
-                        predictions = []
-                        for input_ in inputs:
-                            model_input = self.ensure_tensor_on_device(**input_)
-                            prediction = self.model(**model_input)[0].cpu()
-                            predictions.append(prediction)
-                    else:
-                        inputs = self.ensure_tensor_on_device(**inputs)
-                        predictions = self.model(**inputs)[0].cpu()
-        if return_tensors:
+        postprocess_params = {}
-            return predictions
+        if "multi_label" in kwargs:
-        else:
+            postprocess_params["multi_label"] = kwargs["multi_label"]
-            if isinstance(predictions, list):
+        return preprocess_params, {}, postprocess_params
-                predictions = np.array([p.numpy() for p in predictions])
-            else:
-                predictions = predictions.numpy()
-            return predictions
    def __call__(
        self,
        sequences: Union[str, List[str]],
-        candidate_labels,
-        hypothesis_template="This example is {}.",
-        multi_label=False,
        **kwargs,
    ):
        """
@@ -222,53 +182,78 @@ class ZeroShotClassificationPipeline(Pipeline):
            - **labels** (:obj:`List[str]`) -- The labels sorted by order of likelihood.
            - **scores** (:obj:`List[float]`) -- The probabilities for each of the labels.
        """
-        if "multi_class" in kwargs and kwargs["multi_class"] is not None:
-            multi_label = kwargs.pop("multi_class")
-            logger.warning(
-                "The `multi_class` argument has been deprecated and renamed to `multi_label`. "
-                "`multi_class` will be removed in a future version of Transformers."
-            )
-        if sequences and isinstance(sequences, str):
+        result = super().__call__(sequences, **kwargs)
-            sequences = [sequences]
+        if len(result) == 1:
+            return result[0]
+        return result
+    def preprocess(self, inputs, candidate_labels=None, hypothesis_template="This example is {}."):
+        sequence_pairs, sequences = self._args_parser(inputs, candidate_labels, hypothesis_template)
+        model_inputs = self._parse_and_tokenize(sequence_pairs)
+        prepared_inputs = {
+            "candidate_labels": candidate_labels,
+            "sequences": sequences,
+            "inputs": model_inputs,
+        }
+        return prepared_inputs
+    def _forward(self, inputs):
+        candidate_labels = inputs["candidate_labels"]
+        sequences = inputs["sequences"]
+        model_inputs = inputs["inputs"]
+        if isinstance(model_inputs, list):
+            outputs = []
+            for input_ in model_inputs:
+                prediction = self.model(**input_)[0].cpu()
+                outputs.append(prediction)
+        else:
+            outputs = self.model(**model_inputs)
+        model_outputs = {"candidate_labels": candidate_labels, "sequences": sequences, "outputs": outputs}
+        return model_outputs
+    def postprocess(self, model_outputs, multi_label=False):
+        candidate_labels = model_outputs["candidate_labels"]
+        sequences = model_outputs["sequences"]
+        outputs = model_outputs["outputs"]
-        outputs = super().__call__(sequences, candidate_labels, hypothesis_template)
+        if self.framework == "pt":
            if isinstance(outputs, list):
-            # XXX: Some tokenizers cannot handle batching because they don't
+                logits = np.concatenate([output.cpu().numpy() for output in outputs], axis=0)
-            # have pad_token, so outputs will be a list, however, because outputs
-            # is only n logits and sequence_length is not present anymore, we
-            # can recreate a tensor out of outputs.
-            outputs = np.array(outputs)
-        num_sequences = len(sequences)
-        candidate_labels = self._args_parser._parse_labels(candidate_labels)
-        reshaped_outputs = outputs.reshape((num_sequences, len(candidate_labels), -1))
-        if len(candidate_labels) == 1:
-            multi_label = True
-        if not multi_label:
-            # softmax the "entailment" logits over all candidate labels
-            entail_logits = reshaped_outputs[..., self.entailment_id]
-            scores = np.exp(entail_logits) / np.exp(entail_logits).sum(-1, keepdims=True)
            else:
+                logits = outputs["logits"].cpu().numpy()
+        else:
+            if isinstance(outputs, list):
+                logits = np.concatenate([output.numpy() for output in outputs], axis=0)
+            else:
+                logits = outputs["logits"].numpy()
+        N = logits.shape[0]
+        n = len(candidate_labels)
+        num_sequences = N // n
+        reshaped_outputs = logits.reshape((num_sequences, n, -1))
+        if multi_label or len(candidate_labels) == 1:
            # softmax over the entailment vs. contradiction dim for each label independently
            entailment_id = self.entailment_id
            contradiction_id = -1 if entailment_id == 0 else 0
            entail_contr_logits = reshaped_outputs[..., [contradiction_id, entailment_id]]
            scores = np.exp(entail_contr_logits) / np.exp(entail_contr_logits).sum(-1, keepdims=True)
            scores = scores[..., 1]
+        else:
+            # softmax the "entailment" logits over all candidate labels
+            entail_logits = reshaped_outputs[..., self.entailment_id]
+            scores = np.exp(entail_logits) / np.exp(entail_logits).sum(-1, keepdims=True)
        result = []
        for iseq in range(num_sequences):
            top_inds = list(reversed(scores[iseq].argsort()))
            result.append(
                {
-                    "sequence": sequences if isinstance(sequences, str) else sequences[iseq],
+                    "sequence": sequences[iseq],
                    "labels": [candidate_labels[i] for i in top_inds],
-                    "scores": scores[iseq][top_inds].tolist(),
+                    "scores": scores[iseq, top_inds].tolist(),
                }
            )
-        if len(result) == 1:
-            return result[0]
        return result