Unverified Commit e4920c92 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Doc pipelines (#6175)



* Init work on pipelines doc

* Work in progress

* Work in progress

* Doc pipelines

* Rm unwanted default

* Apply suggestions from code review

Lysandre comments
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
parent b6b2f227
...@@ -207,3 +207,4 @@ conversion utilities for the following models: ...@@ -207,3 +207,4 @@ conversion utilities for the following models:
model_doc/dpr model_doc/dpr
internal/modeling_utils internal/modeling_utils
internal/tokenization_utils internal/tokenization_utils
internal/pipelines_utils
\ No newline at end of file
Utilities for pipelines
-----------------------
This page lists all the utility functions the library provides for pipelines.
Most of those are only useful if you are studying the code of the models in the library.
Argument handling
~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.ArgumentHandler
.. autoclass:: transformers.pipelines.ZeroShotClassificationArgumentHandler
.. autoclass:: transformers.pipelines.QuestionAnsweringArgumentHandler
Data format
~~~~~~~~~~~
.. autoclass:: transformers.pipelines.PipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.CsvPipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.JsonPipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.PipedPipelineDataFormat
:members:
Utilities
~~~~~~~~~
.. autofunction:: transformers.pipelines.get_framework
.. autoclass:: transformers.pipelines.PipelineException
...@@ -41,3 +41,9 @@ The other methods that are common to each model are defined in :class:`~transfor ...@@ -41,3 +41,9 @@ The other methods that are common to each model are defined in :class:`~transfor
.. autoclass:: transformers.modeling_tf_utils.TFModelUtilsMixin .. autoclass:: transformers.modeling_tf_utils.TFModelUtilsMixin
:members: :members:
Generative models
~~~~~~~~~~~~~~~~~
Coming soon
...@@ -3,13 +3,23 @@ Pipelines ...@@ -3,13 +3,23 @@ Pipelines
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most
of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity
Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the
:doc:`task summary <../task_summary>` for examples of use.
There are two categories of pipeline abstractions to be aware about: There are two categories of pipeline abstractions to be aware about:
- The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines - The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines.
- The other task-specific pipelines, such as :class:`~transformers.TokenClassificationPipeline` - The other task-specific pipelines:
or :class:`~transformers.QuestionAnsweringPipeline`
- :class:`~transformers.ConversationalPipeline`
- :class:`~transformers.FeatureExtractionPipeline`
- :class:`~transformers.FillMaskPipeline`
- :class:`~transformers.QuestionAnsweringPipeline`
- :class:`~transformers.SummarizationPipeline`
- :class:`~transformers.TextClassificationPipeline`
- :class:`~transformers.TextGenerationPipeline`
- :class:`~transformers.TokenClassificationPipeline`
- :class:`~transformers.TranslationPipeline`
The pipeline abstraction The pipeline abstraction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...@@ -21,61 +31,75 @@ other pipeline but requires an additional argument which is the `task`. ...@@ -21,61 +31,75 @@ other pipeline but requires an additional argument which is the `task`.
The task specific pipelines The task specific pipelines
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Parent class: Pipeline
=========================================
.. autoclass:: transformers.Pipeline
:members: predict, transform, save_pretrained
TokenClassificationPipeline ConversationalPipeline
========================================== ==========================================
.. autoclass:: transformers.TokenClassificationPipeline .. autoclass:: transformers.Conversation
NerPipeline .. autoclass:: transformers.ConversationalPipeline
:special-members: __call__
:members:
FeatureExtractionPipeline
========================================== ==========================================
This class is an alias of the :class:`~transformers.TokenClassificationPipeline` defined above. Please refer to that pipeline for .. autoclass:: transformers.FeatureExtractionPipeline
documentation and usage examples. :special-members: __call__
:members:
FillMaskPipeline FillMaskPipeline
========================================== ==========================================
.. autoclass:: transformers.FillMaskPipeline .. autoclass:: transformers.FillMaskPipeline
:special-members: __call__
:members:
FeatureExtractionPipeline NerPipeline
==========================================
.. autoclass:: transformers.FeatureExtractionPipeline
TextClassificationPipeline
========================================== ==========================================
.. autoclass:: transformers.TextClassificationPipeline This class is an alias of the :class:`~transformers.TokenClassificationPipeline` defined below. Please refer to that
pipeline for documentation and usage examples.
QuestionAnsweringPipeline QuestionAnsweringPipeline
========================================== ==========================================
.. autoclass:: transformers.QuestionAnsweringPipeline .. autoclass:: transformers.QuestionAnsweringPipeline
:special-members: __call__
:members:
SummarizationPipeline SummarizationPipeline
========================================== ==========================================
.. autoclass:: transformers.SummarizationPipeline .. autoclass:: transformers.SummarizationPipeline
:special-members: __call__
:members:
TextClassificationPipeline
==========================================
.. autoclass:: transformers.TextClassificationPipeline
:special-members: __call__
:members:
TextGenerationPipeline TextGenerationPipeline
========================================== ==========================================
.. autoclass:: transformers.TextGenerationPipeline .. autoclass:: transformers.TextGenerationPipeline
:special-members: __call__
:members:
TokenClassificationPipeline
ConversationalPipeline
========================================== ==========================================
.. autoclass:: transformers.Conversation .. autoclass:: transformers.TokenClassificationPipeline
:special-members: __call__
:members:
.. autoclass:: transformers.ConversationalPipeline Parent class: :obj:`Pipeline`
\ No newline at end of file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.Pipeline
:members:
...@@ -33,7 +33,7 @@ import numpy as np ...@@ -33,7 +33,7 @@ import numpy as np
from .configuration_auto import AutoConfig from .configuration_auto import AutoConfig
from .configuration_utils import PretrainedConfig from .configuration_utils import PretrainedConfig
from .data import SquadExample, squad_convert_examples_to_features from .data import SquadExample, squad_convert_examples_to_features
from .file_utils import is_tf_available, is_torch_available from .file_utils import add_end_docstrings, is_tf_available, is_torch_available
from .modelcard import ModelCard from .modelcard import ModelCard
from .tokenization_auto import AutoTokenizer from .tokenization_auto import AutoTokenizer
from .tokenization_bert import BasicTokenizer from .tokenization_bert import BasicTokenizer
...@@ -82,8 +82,13 @@ logger = logging.getLogger(__name__) ...@@ -82,8 +82,13 @@ logger = logging.getLogger(__name__)
def get_framework(model=None): def get_framework(model=None):
""" Select framework (TensorFlow/PyTorch) to use. """
If both frameworks are installed and no specific model is provided, defaults to using PyTorch. Select framework (TensorFlow or PyTorch) to use.
Args:
model (:obj:`str`, :class:`~transformers.PreTrainedModel` or :class:`~transformers.TFPreTrainedModel`, `optional`):
If both frameworks are installed, picks the one corresponding to the model passed (either a model class or
the model name). If no specific model is provided, defaults to using PyTorch.
""" """
if is_tf_available() and is_torch_available() and model is not None and not isinstance(model, str): if is_tf_available() and is_torch_available() and model is not None and not isinstance(model, str):
# Both framework are available but the user supplied a model class instance. # Both framework are available but the user supplied a model class instance.
...@@ -103,7 +108,12 @@ def get_framework(model=None): ...@@ -103,7 +108,12 @@ def get_framework(model=None):
class PipelineException(Exception): class PipelineException(Exception):
""" """
Raised by pipelines when handling __call__ Raised by a :class:`~transformers.Pipeline` when handling __call__.
Args:
task (:obj:`str`): The task of the pipeline.
model (:obj:`str`): The model used by the pipeline.
reason (:obj:`str`): The error message to display.
""" """
def __init__(self, task: str, model: str, reason: str): def __init__(self, task: str, model: str, reason: str):
...@@ -115,7 +125,7 @@ class PipelineException(Exception): ...@@ -115,7 +125,7 @@ class PipelineException(Exception):
class ArgumentHandler(ABC): class ArgumentHandler(ABC):
""" """
Base interface for handling varargs for each Pipeline Base interface for handling arguments for each :class:`~transformers.pipelines.Pipeline`.
""" """
@abstractmethod @abstractmethod
...@@ -125,7 +135,7 @@ class ArgumentHandler(ABC): ...@@ -125,7 +135,7 @@ class ArgumentHandler(ABC):
class DefaultArgumentHandler(ArgumentHandler): class DefaultArgumentHandler(ArgumentHandler):
""" """
Default varargs argument parser handling parameters for each Pipeline Default argument parser handling parameters for each :class:`~transformers.pipelines.Pipeline`.
""" """
@staticmethod @staticmethod
...@@ -178,18 +188,25 @@ class PipelineDataFormat: ...@@ -178,18 +188,25 @@ class PipelineDataFormat:
""" """
Base class for all the pipeline supported data format both for reading and writing. Base class for all the pipeline supported data format both for reading and writing.
Supported data formats currently includes: Supported data formats currently includes:
- JSON - JSON
- CSV - CSV
- stdin/stdout (pipe) - stdin/stdout (pipe)
PipelineDataFormat also includes some utilities to work with multi-columns like mapping from datasets columns :obj:`PipelineDataFormat` also includes some utilities to work with multi-columns like mapping from datasets
to pipelines keyword arguments through the `dataset_kwarg_1=dataset_column_1` format. columns to pipelines keyword arguments through the :obj:`dataset_kwarg_1=dataset_column_1` format.
Args:
output_path (:obj:`str`, `optional`): Where to save the outgoing data.
input_path (:obj:`str`, `optional`): Where to look for the input data.
column (:obj:`str`, `optional`): The column to read.
overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to overwrite the :obj:`output_path`.
""" """
SUPPORTED_FORMATS = ["json", "csv", "pipe"] SUPPORTED_FORMATS = ["json", "csv", "pipe"]
def __init__( def __init__(
self, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False, self, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite: bool = False,
): ):
self.output_path = output_path self.output_path = output_path
self.input_path = input_path self.input_path = input_path
...@@ -212,19 +229,25 @@ class PipelineDataFormat: ...@@ -212,19 +229,25 @@ class PipelineDataFormat:
raise NotImplementedError() raise NotImplementedError()
@abstractmethod @abstractmethod
def save(self, data: dict): def save(self, data: Union[dict, List[dict]]):
""" """
Save the provided data object with the representation for the current `DataFormat`. Save the provided data object with the representation for the current
:param data: data to store :class:`~transformers.pipelines.PipelineDataFormat`.
:return:
Args:
data (:obj:`dict` or list of :obj:`dict`): The data to store.
""" """
raise NotImplementedError() raise NotImplementedError()
def save_binary(self, data: Union[dict, List[dict]]) -> str: def save_binary(self, data: Union[dict, List[dict]]) -> str:
""" """
Save the provided data object as a pickle-formatted binary data on the disk. Save the provided data object as a pickle-formatted binary data on the disk.
:param data: data to store
:return: (str) Path where the data has been saved Args:
data (:obj:`dict` or list of :obj:`dict`): The data to store.
Returns:
:obj:`str`: Path where the data has been saved.
""" """
path, _ = os.path.splitext(self.output_path) path, _ = os.path.splitext(self.output_path)
binary_path = os.path.extsep.join((path, "pickle")) binary_path = os.path.extsep.join((path, "pickle"))
...@@ -237,7 +260,26 @@ class PipelineDataFormat: ...@@ -237,7 +260,26 @@ class PipelineDataFormat:
@staticmethod @staticmethod
def from_str( def from_str(
format: str, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False, format: str, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False,
): ) -> "PipelineDataFormat":
"""
Creates an instance of the right subclass of :class:`~transformers.pipelines.PipelineDataFormat` depending
on :obj:`format`.
Args:
format: (:obj:`str`):
The format of the desired pipeline. Acceptable values are :obj:`"json"`, :obj:`"csv"` or :obj:`"pipe"`.
output_path (:obj:`str`, `optional`):
Where to save the outgoing data.
input_path (:obj:`str`, `optional`):
Where to look for the input data.
column (:obj:`str`, `optional`):
The column to read.
overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to overwrite the :obj:`output_path`.
Returns:
:class:`~transformers.pipelines.PipelineDataFormat`: The proper data format.
"""
if format == "json": if format == "json":
return JsonPipelineDataFormat(output_path, input_path, column, overwrite=overwrite) return JsonPipelineDataFormat(output_path, input_path, column, overwrite=overwrite)
elif format == "csv": elif format == "csv":
...@@ -249,6 +291,17 @@ class PipelineDataFormat: ...@@ -249,6 +291,17 @@ class PipelineDataFormat:
class CsvPipelineDataFormat(PipelineDataFormat): class CsvPipelineDataFormat(PipelineDataFormat):
"""
Support for pipelines using CSV data format.
Args:
output_path (:obj:`str`, `optional`): Where to save the outgoing data.
input_path (:obj:`str`, `optional`): Where to look for the input data.
column (:obj:`str`, `optional`): The column to read.
overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to overwrite the :obj:`output_path`.
"""
def __init__( def __init__(
self, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False, self, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False,
): ):
...@@ -264,6 +317,13 @@ class CsvPipelineDataFormat(PipelineDataFormat): ...@@ -264,6 +317,13 @@ class CsvPipelineDataFormat(PipelineDataFormat):
yield row[self.column[0]] yield row[self.column[0]]
def save(self, data: List[dict]): def save(self, data: List[dict]):
"""
Save the provided data object with the representation for the current
:class:`~transformers.pipelines.PipelineDataFormat`.
Args:
data (:obj:`List[dict]`): The data to store.
"""
with open(self.output_path, "w") as f: with open(self.output_path, "w") as f:
if len(data) > 0: if len(data) > 0:
writer = csv.DictWriter(f, list(data[0].keys())) writer = csv.DictWriter(f, list(data[0].keys()))
...@@ -272,6 +332,17 @@ class CsvPipelineDataFormat(PipelineDataFormat): ...@@ -272,6 +332,17 @@ class CsvPipelineDataFormat(PipelineDataFormat):
class JsonPipelineDataFormat(PipelineDataFormat): class JsonPipelineDataFormat(PipelineDataFormat):
"""
Support for pipelines using JSON file format.
Args:
output_path (:obj:`str`, `optional`): Where to save the outgoing data.
input_path (:obj:`str`, `optional`): Where to look for the input data.
column (:obj:`str`, `optional`): The column to read.
overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to overwrite the :obj:`output_path`.
"""
def __init__( def __init__(
self, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False, self, output_path: Optional[str], input_path: Optional[str], column: Optional[str], overwrite=False,
): ):
...@@ -288,6 +359,12 @@ class JsonPipelineDataFormat(PipelineDataFormat): ...@@ -288,6 +359,12 @@ class JsonPipelineDataFormat(PipelineDataFormat):
yield entry[self.column[0]] yield entry[self.column[0]]
def save(self, data: dict): def save(self, data: dict):
"""
Save the provided data object in a json file.
Args:
data (:obj:`dict`): The data to store.
"""
with open(self.output_path, "w") as f: with open(self.output_path, "w") as f:
json.dump(data, f) json.dump(data, f)
...@@ -298,6 +375,13 @@ class PipedPipelineDataFormat(PipelineDataFormat): ...@@ -298,6 +375,13 @@ class PipedPipelineDataFormat(PipelineDataFormat):
For multi columns data, columns should separated by \t For multi columns data, columns should separated by \t
If columns are provided, then the output will be a dictionary with {column_x: value_x} If columns are provided, then the output will be a dictionary with {column_x: value_x}
Args:
output_path (:obj:`str`, `optional`): Where to save the outgoing data.
input_path (:obj:`str`, `optional`): Where to look for the input data.
column (:obj:`str`, `optional`): The column to read.
overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to overwrite the :obj:`output_path`.
""" """
def __iter__(self): def __iter__(self):
...@@ -317,6 +401,12 @@ class PipedPipelineDataFormat(PipelineDataFormat): ...@@ -317,6 +401,12 @@ class PipedPipelineDataFormat(PipelineDataFormat):
yield line yield line
def save(self, data: dict): def save(self, data: dict):
"""
Print the data.
Args:
data (:obj:`dict`): The data to store.
"""
print(data) print(data)
def save_binary(self, data: Union[dict, List[dict]]) -> str: def save_binary(self, data: Union[dict, List[dict]]) -> str:
...@@ -343,24 +433,7 @@ class _ScikitCompat(ABC): ...@@ -343,24 +433,7 @@ class _ScikitCompat(ABC):
raise NotImplementedError() raise NotImplementedError()
class Pipeline(_ScikitCompat): PIPELINE_INIT_ARGS = r"""
"""
The Pipeline class is the class from which all pipelines inherit. Refer to this class for methods shared across
different pipelines.
Base class implementing pipelined operations.
Pipeline workflow is defined as a sequence of the following operations:
Input -> Tokenization -> Model Inference -> Post-Processing (Task dependent) -> Output
Pipeline supports running on CPU or GPU through the device argument. Users can specify
device argument as an integer, -1 meaning "CPU", >= 0 referring the CUDA device ordinal.
Some pipeline, like for instance FeatureExtractionPipeline ('feature-extraction') outputs large
tensor object as nested-lists. In order to avoid dumping such large structure as textual data we
provide the binary_output constructor argument. If set to True, the output will be stored in the
pickle format.
Arguments: Arguments:
model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`): model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`):
The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from
...@@ -369,28 +442,44 @@ class Pipeline(_ScikitCompat): ...@@ -369,28 +442,44 @@ class Pipeline(_ScikitCompat):
tokenizer (:obj:`~transformers.PreTrainedTokenizer`): tokenizer (:obj:`~transformers.PreTrainedTokenizer`):
The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from
:class:`~transformers.PreTrainedTokenizer`. :class:`~transformers.PreTrainedTokenizer`.
modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`): modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`):
Model card attributed to the model for this pipeline. Model card attributed to the model for this pipeline.
framework (:obj:`str`, `optional`, defaults to :obj:`None`): framework (:obj:`str`, `optional`):
The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be The framework to use, either :obj:`"pt"` for PyTorch or :obj:`"tf"` for TensorFlow. The specified framework
installed. must be installed.
If no framework is specified, will default to the one currently installed. If no framework is specified If no framework is specified, will default to the one currently installed. If no framework is specified
and both frameworks are installed, will default to PyTorch. and both frameworks are installed, will default to the framework of the :obj:`model`, or to PyTorch if no
args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`): model is provided.
task (:obj:`str`, defaults to :obj:`""`):
A task-identifier for the pipeline.
args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`):
Reference to the object in charge of parsing supplied pipeline parameters. Reference to the object in charge of parsing supplied pipeline parameters.
device (:obj:`int`, `optional`, defaults to :obj:`-1`): device (:obj:`int`, `optional`, defaults to -1):
Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model
on the associated CUDA device id. on the associated CUDA device id.
binary_output (:obj:`bool`, `optional`, defaults to :obj:`False`): binary_output (:obj:`bool`, `optional`, defaults to :obj:`False`):
Flag indicating if the output the pipeline should happen in a binary format (i.e. pickle) or as raw text. Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.
"""
Return:
:obj:`List` or :obj:`Dict`:
Pipeline returns list or dictionary depending on:
- Whether the user supplied multiple samples @add_end_docstrings(PIPELINE_INIT_ARGS)
- Whether the pipeline exposes multiple fields in the output object class Pipeline(_ScikitCompat):
"""
The Pipeline class is the class from which all pipelines inherit. Refer to this class for methods shared across
different pipelines.
Base class implementing pipelined operations.
Pipeline workflow is defined as a sequence of the following operations:
Input -> Tokenization -> Model Inference -> Post-Processing (task dependent) -> Output
Pipeline supports running on CPU or GPU through the device argument (see below).
Some pipeline, like for instance :class:`~transformers.FeatureExtractionPipeline` (:obj:`'feature-extraction'` )
output large tensor object as nested-lists. In order to avoid dumping such large structure as textual data we
provide the :obj:`binary_output` constructor argument. If set to :obj:`True`, the output will be stored in the
pickle format.
""" """
default_input_names = None default_input_names = None
...@@ -408,7 +497,7 @@ class Pipeline(_ScikitCompat): ...@@ -408,7 +497,7 @@ class Pipeline(_ScikitCompat):
): ):
if framework is None: if framework is None:
framework = get_framework() framework = get_framework(model)
self.task = task self.task = task
self.model = model self.model = model
...@@ -428,9 +517,13 @@ class Pipeline(_ScikitCompat): ...@@ -428,9 +517,13 @@ class Pipeline(_ScikitCompat):
if task_specific_params is not None and task in task_specific_params: if task_specific_params is not None and task in task_specific_params:
self.model.config.update(task_specific_params.get(task)) self.model.config.update(task_specific_params.get(task))
def save_pretrained(self, save_directory): def save_pretrained(self, save_directory: str):
""" """
Save the pipeline's model and tokenizer to the specified save_directory Save the pipeline's model and tokenizer.
Args:
save_directory (:obj:`str`):
A path to the directory where to saved. It will be created if it doesn't exist.
""" """
if os.path.isfile(save_directory): if os.path.isfile(save_directory):
logger.error("Provided path ({}) should be a directory, not a file".format(save_directory)) logger.error("Provided path ({}) should be a directory, not a file".format(save_directory))
...@@ -458,14 +551,17 @@ class Pipeline(_ScikitCompat): ...@@ -458,14 +551,17 @@ class Pipeline(_ScikitCompat):
def device_placement(self): def device_placement(self):
""" """
Context Manager allowing tensor allocation on the user-specified device in framework agnostic way. Context Manager allowing tensor allocation on the user-specified device in framework agnostic way.
example:
# Explicitly ask for tensor allocation on CUDA device :0
nlp = pipeline(..., device=0)
with nlp.device_placement():
# Every framework specific tensor allocation will be done on the request device
output = nlp(...)
Returns: Returns:
Context manager Context manager
Examples::
# Explicitly ask for tensor allocation on CUDA device :0
pipe = pipeline(..., device=0)
with pipe.device_placement():
# Every framework specific tensor allocation will be done on the request device
output = pipe(...)
""" """
if self.framework == "tf": if self.framework == "tf":
with tf.device("/CPU:0" if self.device == -1 else "/device:GPU:{}".format(self.device)): with tf.device("/CPU:0" if self.device == -1 else "/device:GPU:{}".format(self.device)):
...@@ -479,14 +575,22 @@ class Pipeline(_ScikitCompat): ...@@ -479,14 +575,22 @@ class Pipeline(_ScikitCompat):
def ensure_tensor_on_device(self, **inputs): def ensure_tensor_on_device(self, **inputs):
""" """
Ensure PyTorch tensors are on the specified device. Ensure PyTorch tensors are on the specified device.
:param inputs:
:return: Args:
inputs (keyword arguments that should be :obj:`torch.Tensor`): The tensors to place on :obj:`self.device`.
Return:
:obj:`Dict[str, torch.Tensor]`: The same as :obj:`inputs` but on the proper device.
""" """
return {name: tensor.to(self.device) for name, tensor in inputs.items()} return {name: tensor.to(self.device) for name, tensor in inputs.items()}
def check_model_type(self, supported_models): def check_model_type(self, supported_models: Union[List[str], dict]):
""" """
Check if the model class is in the supported class list of the pipeline. Check if the model class is in supported by the pipeline.
Args:
supported_models (:obj:`List[str]` or :obj:`dict`):
The list of models supported by the pipeline, or a dictionary with model class values.
""" """
if not isinstance(supported_models, list): # Create from a model mapping if not isinstance(supported_models, list): # Create from a model mapping
supported_models = [item[1].__name__ for item in supported_models.items()] supported_models = [item[1].__name__ for item in supported_models.items()]
...@@ -538,15 +642,14 @@ class Pipeline(_ScikitCompat): ...@@ -538,15 +642,14 @@ class Pipeline(_ScikitCompat):
return predictions.numpy() return predictions.numpy()
# Can't use @add_end_docstrings(PIPELINE_INIT_ARGS) here because this one does not accept `binary_output`
class FeatureExtractionPipeline(Pipeline): class FeatureExtractionPipeline(Pipeline):
""" """
Feature extraction pipeline using Model head. This pipeline extracts the hidden states from the base transformer, Feature extraction pipeline using no model head. This pipeline extracts the hidden states from the base
which can be used as features in downstream tasks. transformer, which can be used as features in downstream tasks.
This feature extraction pipeline can currently be loaded from the :func:`~transformers.pipeline` method using This feature extraction pipeline can currently be loaded from :func:`~transformers.pipeline` using the task
the following task identifier(s): identifier: :obj:`"feature-extraction"`.
- "feature-extraction", for extracting features of a sequence.
All models may be used for this pipeline. See a list of all models, including community-contributed models on All models may be used for this pipeline. See a list of all models, including community-contributed models on
`huggingface.co/models <https://huggingface.co/models>`__. `huggingface.co/models <https://huggingface.co/models>`__.
...@@ -559,18 +662,21 @@ class FeatureExtractionPipeline(Pipeline): ...@@ -559,18 +662,21 @@ class FeatureExtractionPipeline(Pipeline):
tokenizer (:obj:`~transformers.PreTrainedTokenizer`): tokenizer (:obj:`~transformers.PreTrainedTokenizer`):
The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from
:class:`~transformers.PreTrainedTokenizer`. :class:`~transformers.PreTrainedTokenizer`.
modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`): modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`):
Model card attributed to the model for this pipeline. Model card attributed to the model for this pipeline.
framework (:obj:`str`, `optional`, defaults to :obj:`None`): framework (:obj:`str`, `optional`):
The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be The framework to use, either :obj:`"pt"` for PyTorch or :obj:`"tf"` for TensorFlow. The specified framework
installed. must be installed.
If no framework is specified, will default to the one currently installed. If no framework is specified If no framework is specified, will default to the one currently installed. If no framework is specified
and both frameworks are installed, will default to PyTorch. and both frameworks are installed, will default to the framework of the :obj:`model`, or to PyTorch if no
args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`): model is provided.
task (:obj:`str`, defaults to :obj:`""`):
A task-identifier for the pipeline.
args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`):
Reference to the object in charge of parsing supplied pipeline parameters. Reference to the object in charge of parsing supplied pipeline parameters.
device (:obj:`int`, `optional`, defaults to :obj:`-1`): device (:obj:`int`, `optional`, defaults to -1):
Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model
on the associated CUDA device id. on the associated CUDA device id.
""" """
...@@ -596,20 +702,29 @@ class FeatureExtractionPipeline(Pipeline): ...@@ -596,20 +702,29 @@ class FeatureExtractionPipeline(Pipeline):
) )
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
"""
Extract the features of the input(s).
Args:
args (:obj:`str` or :obj:`List[str]`): One or several texts (or one list of texts) to get the features of.
Return:
A nested list of :obj:`float`: The features computed by the model.
"""
return super().__call__(*args, **kwargs).tolist() return super().__call__(*args, **kwargs).tolist()
@add_end_docstrings(PIPELINE_INIT_ARGS)
class TextGenerationPipeline(Pipeline): class TextGenerationPipeline(Pipeline):
""" """
Language generation pipeline using any ModelWithLMHead head. This pipeline predicts the words that will follow a specified text prompt. Language generation pipeline using any :obj:`ModelWithLMHead`. This pipeline predicts the words that will follow a
specified text prompt.
This language generation pipeline can currently be loaded from the :func:`~transformers.pipeline` method using This language generation pipeline can currently be loaded from :func:`~transformers.pipeline` using the following
the following task identifier(s): task identifier: :obj:`"text-generation"`.
- "text-generation", for generating text from a specified prompt. The models that this pipeline can use are models that have been trained with an autoregressive language modeling
objective, which includes the uni-directional models in the library (e.g. gpt2).
The models that this pipeline can use are models that have been trained with an autoregressive language modeling objective,
which includes the uni-directional models in the library (e.g. gpt2).
See the list of available community models on See the list of available community models on
`huggingface.co/models <https://huggingface.co/models?search=&filter=lm-head>`__. `huggingface.co/models <https://huggingface.co/models?search=&filter=lm-head>`__.
""" """
...@@ -673,7 +788,30 @@ class TextGenerationPipeline(Pipeline): ...@@ -673,7 +788,30 @@ class TextGenerationPipeline(Pipeline):
def __call__( def __call__(
self, *args, return_tensors=False, return_text=True, clean_up_tokenization_spaces=False, **generate_kwargs self, *args, return_tensors=False, return_text=True, clean_up_tokenization_spaces=False, **generate_kwargs
): ):
"""
Complete the prompt(s) given as inputs.
Args:
args (:obj:`str` or :obj:`List[str]`):
One or several prompts (or one list of prompts) to complete.
return_tensors (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to include the tensors of predictions (as token indinces) in the outputs.
return_text (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not to include the decoded texts in the outputs.
clean_up_tokenization_spaces (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to clean up the potential extra spaces in the text output.
generate_kwargs:
Additional keyword arguments to pass along to the generate method of the model (see the generate
method corresponding to your framework `here <./model.html#generative-models>`__).
Return:
A list or a list of list of :obj:`dict`: Each result comes as a dictionary with the
following keys:
- **generated_text** (:obj:`str`, present when ``return_text=True``) -- The generated text.
- **generated_token_ids** (:obj:`torch.Tensor` or :obj:`tf.Tensor`, present when ``return_tensors=True``)
-- The token ids of the generated text.
"""
text_inputs = self._args_parser(*args) text_inputs = self._args_parser(*args)
results = [] results = []
...@@ -758,41 +896,25 @@ class TextGenerationPipeline(Pipeline): ...@@ -758,41 +896,25 @@ class TextGenerationPipeline(Pipeline):
return results return results
@add_end_docstrings(
PIPELINE_INIT_ARGS,
r"""
return_all_scores (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether to return all prediction scores or just the one of the predicted class.
""",
)
class TextClassificationPipeline(Pipeline): class TextClassificationPipeline(Pipeline):
""" """
Text classification pipeline using ModelForSequenceClassification head. See the Text classification pipeline using any :obj:`ModelForSequenceClassification`. See the
`sequence classification usage <../usage.html#sequence-classification>`__ examples for more information. `sequence classification examples <../task_summary.html#sequence-classification>`__ for more information.
This text classification pipeline can currently be loaded from the :func:`~transformers.pipeline` method using
the following task identifier(s):
- "sentiment-analysis", for classifying sequences according to positive or negative sentiments. This text classification pipeline can currently be loaded from :func:`~transformers.pipeline` using the following
task identifier: :obj:`"sentiment-analysis"` (for classifying sequences according to positive or negative
sentiments).
The models that this pipeline can use are models that have been fine-tuned on a sequence classification task. The models that this pipeline can use are models that have been fine-tuned on a sequence classification task.
See the up-to-date list of available models on See the up-to-date list of available models on
`huggingface.co/models <https://huggingface.co/models?filter=text-classification>`__. `huggingface.co/models <https://huggingface.co/models?filter=text-classification>`__.
Arguments:
model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`):
The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from
:class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for
TensorFlow.
tokenizer (:obj:`~transformers.PreTrainedTokenizer`):
The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from
:class:`~transformers.PreTrainedTokenizer`.
modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`):
Model card attributed to the model for this pipeline.
framework (:obj:`str`, `optional`, defaults to :obj:`None`):
The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be
installed.
If no framework is specified, will default to the one currently installed. If no framework is specified
and both frameworks are installed, will default to PyTorch.
args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`):
Reference to the object in charge of parsing supplied pipeline parameters.
device (:obj:`int`, `optional`, defaults to :obj:`-1`):
Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model
on the associated CUDA device id.
""" """
def __init__(self, return_all_scores: bool = False, **kwargs): def __init__(self, return_all_scores: bool = False, **kwargs):
...@@ -807,6 +929,22 @@ class TextClassificationPipeline(Pipeline): ...@@ -807,6 +929,22 @@ class TextClassificationPipeline(Pipeline):
self.return_all_scores = return_all_scores self.return_all_scores = return_all_scores
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
"""
Classify the text(s) given as inputs.
Args:
args (:obj:`str` or :obj:`List[str]`):
One or several textts (or one list of prompts) to classify.
Return:
A list or a list of list of :obj:`dict`: Each result comes as list of dictionaries with the
following keys:
- **label** (:obj:`str`) -- The label predicted.
- **score** (:obj:`float`) -- The corresponding probability.
If ``self.return_all_scores=True``, one such dictionary is returned per label.
"""
outputs = super().__call__(*args, **kwargs) outputs = super().__call__(*args, **kwargs)
scores = np.exp(outputs) / np.exp(outputs).sum(-1, keepdims=True) scores = np.exp(outputs) / np.exp(outputs).sum(-1, keepdims=True)
if self.return_all_scores: if self.return_all_scores:
...@@ -853,46 +991,23 @@ class ZeroShotClassificationArgumentHandler(ArgumentHandler): ...@@ -853,46 +991,23 @@ class ZeroShotClassificationArgumentHandler(ArgumentHandler):
return sequence_pairs return sequence_pairs
@add_end_docstrings(PIPELINE_INIT_ARGS)
class ZeroShotClassificationPipeline(Pipeline): class ZeroShotClassificationPipeline(Pipeline):
""" """
NLI-based zero-shot classification pipeline using a ModelForSequenceClassification head with models trained on NLI-based zero-shot classification pipeline using a :obj:`ModelForSequenceClassification` trained on NLI (natural
NLI tasks. language inference) tasks.
Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis Any combination of sequences and labels can be passed and each combination will be posed as a premise/hypothesis
pair and passed to the pre-trained model. Then logit for `entailment` is then taken as the logit for the pair and passed to the pretrained model. Then, the logit for `entailment` is taken as the logit for the
candidate label being valid. Any NLI model can be used as long as the first output logit corresponds to candidate label being valid. Any NLI model can be used as long as the first output logit corresponds to
`contradiction` and the last to `entailment`. `contradiction` and the last to `entailment`.
This pipeline can currently be loaded from the :func:`~transformers.pipeline` method using the following task This NLI pipeline can currently be loaded from :func:`~transformers.pipeline` using the following
identifier(s): task identifier: :obj:`"zero-shot-classification"`.
- "zero-shot-classification" The models that this pipeline can use are models that have been fine-tuned on an NLI task.
The models that this pipeline can use are models that have been fine-tuned on a Natural Language Inference task.
See the up-to-date list of available models on See the up-to-date list of available models on
`huggingface.co/models <https://huggingface.co/models?search=nli>`__. `huggingface.co/models <https://huggingface.co/models?search=nli>`__.
Arguments:
model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`):
The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from
:class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for
TensorFlow.
tokenizer (:obj:`~transformers.PreTrainedTokenizer`):
The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from
:class:`~transformers.PreTrainedTokenizer`.
modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`):
Model card attributed to the model for this pipeline.
framework (:obj:`str`, `optional`, defaults to :obj:`None`):
The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be
installed.
If no framework is specified, will default to the one currently installed. If no framework is specified
and both frameworks are installed, will default to PyTorch.
args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`):
Reference to the object in charge of parsing supplied pipeline parameters.
device (:obj:`int`, `optional`, defaults to :obj:`-1`):
Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model
on the associated CUDA device id.
""" """
def __init__(self, args_parser=ZeroShotClassificationArgumentHandler(), *args, **kwargs): def __init__(self, args_parser=ZeroShotClassificationArgumentHandler(), *args, **kwargs):
...@@ -915,29 +1030,33 @@ class ZeroShotClassificationPipeline(Pipeline): ...@@ -915,29 +1030,33 @@ class ZeroShotClassificationPipeline(Pipeline):
def __call__(self, sequences, candidate_labels, hypothesis_template="This example is {}.", multi_class=False): def __call__(self, sequences, candidate_labels, hypothesis_template="This example is {}.", multi_class=False):
""" """
NLI-based zero-shot classification. Any combination of sequences and labels can be passed and each Classify the sequence(s) given as inputs.
combination will be posed as a premise/hypothesis pair and passed to the pre-trained model. Then logit for
`entailment` is then taken as the logit for the candidate label being valid. Any NLI model can be used as
long as the first output logit corresponds to `contradiction` and the last to `entailment`.
Args: Args:
sequences (:obj:`str` or obj:`List`): sequences (:obj:`str` or obj:`List[str]`):
The sequence or sequences to classify. Truncated if model input is too large. The sequence(s) to classify, will be truncated if the model input is too large.
candidate_labels (:obj:`str` or obj:`List`): candidate_labels (:obj:`str` or obj:`List[str]`):
The set of possible class labels to classify each sequence into. Can be a single label, a string of The set of possible class labels to classify each sequence into. Can be a single label, a string of
comma-separated labels, or a list of labels. comma-separated labels, or a list of labels.
hypothesis_template (obj:`str`, defaults to "This example is {}."): hypothesis_template (obj:`str`, `optional`, defaults to :obj:`"This example is {}."`):
The template used to turn each label into an NLI-style hypothesis. This template must include a {} The template used to turn each label into an NLI-style hypothesis. This template must include a {}
or similar syntax for the candidate label to be inserted into the template. For example, the default or similar syntax for the candidate label to be inserted into the template. For example, the default
template is "This example is {}." With the candidate label "sports", this would be fed into the model template is :obj:`"This example is {}."` With the candidate label :obj:`"sports"`, this would be fed
like `<cls> sequence to classify <sep> This example is sports . <sep>`. The default template works into the model like :obj:`"<cls> sequence to classify <sep> This example is sports . <sep>"`. The
well in many cases, but it may be worthwhile to experiment with different templates depending on the default template works well in many cases, but it may be worthwhile to experiment with different
task setting. templates depending on the task setting.
multi_class (obj:`bool`, defaults to False): multi_class (obj:`bool`, `optional`, defaults to :obj:`False`):
When False, it is assumed that only one candidate label can be true, and the scores are normalized Whether or not multiple candidate labels can be true. If :obj:`False`, the scores are normalized
such that the sum of the label likelihoods for each sequence is 1. When True, the labels are such that the sum of the label likelihoods for each sequence is 1. If :obj:`True`, the labels are
considered independent and probabilities are normalized for each candidate by doing a of softmax of considered independent and probabilities are normalized for each candidate by doing a softmax of
the entailment score vs. the contradiction score. the entailment score vs. the contradiction score.
Return:
A :obj:`dict` or a list of :obj:`dict`: Each result comes as a dictionary with the
following keys:
- **sequence** (:obj:`str`) -- The sequence for which this is the output.
- **labels** (:obj:`List[str]`) -- The labels sorted by order of likelihood.
- **scores** (:obj:` List[float]`) -- The probabilities for each of the labels.
""" """
outputs = super().__call__(sequences, candidate_labels, hypothesis_template) outputs = super().__call__(sequences, candidate_labels, hypothesis_template)
num_sequences = 1 if isinstance(sequences, str) else len(sequences) num_sequences = 1 if isinstance(sequences, str) else len(sequences)
...@@ -973,42 +1092,28 @@ class ZeroShotClassificationPipeline(Pipeline): ...@@ -973,42 +1092,28 @@ class ZeroShotClassificationPipeline(Pipeline):
return result return result
@add_end_docstrings(
PIPELINE_INIT_ARGS,
r"""
topk (:obj:`int`, defaults to 5): The number of predictions to return.
""",
)
class FillMaskPipeline(Pipeline): class FillMaskPipeline(Pipeline):
""" """
Masked language modeling prediction pipeline using ModelWithLMHead head. See the Masked language modeling prediction pipeline using any :obj:`ModelWithLMHead`. See the
`masked language modeling usage <../usage.html#masked-language-modeling>`__ examples for more information. `masked language modeling examples <../task_summary.html#masked-language-modeling>`__ for more information.
This mask filling pipeline can currently be loaded from the :func:`~transformers.pipeline` method using This mask filling pipeline can currently be loaded from :func:`~transformers.pipeline` using the following
the following task identifier(s): task identifier: :obj:`"fill-mask"`.
- "fill-mask", for predicting masked tokens in a sequence.
The models that this pipeline can use are models that have been trained with a masked language modeling objective, The models that this pipeline can use are models that have been trained with a masked language modeling objective,
which includes the bi-directional models in the library. which includes the bi-directional models in the library.
See the up-to-date list of available models on See the up-to-date list of available models on
`huggingface.co/models <https://huggingface.co/models?filter=lm-head>`__. `huggingface.co/models <https://huggingface.co/models?filter=lm-head>`__.
Arguments: .. note::
model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`):
The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from
:class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for
TensorFlow.
tokenizer (:obj:`~transformers.PreTrainedTokenizer`):
The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from
:class:`~transformers.PreTrainedTokenizer`.
modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`):
Model card attributed to the model for this pipeline.
framework (:obj:`str`, `optional`, defaults to :obj:`None`):
The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be
installed.
If no framework is specified, will default to the one currently installed. If no framework is specified This pipeline only works for inputs with exactly one token masked.
and both frameworks are installed, will default to PyTorch.
args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`):
Reference to the object in charge of parsing supplied pipeline parameters.
device (:obj:`int`, `optional`, defaults to :obj:`-1`):
Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model
on the associated CUDA device id.
""" """
def __init__( def __init__(
...@@ -1053,6 +1158,21 @@ class FillMaskPipeline(Pipeline): ...@@ -1053,6 +1158,21 @@ class FillMaskPipeline(Pipeline):
) )
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
"""
Fill the masked token in the text(s) given as inputs.
Args:
args (:obj:`str` or :obj:`List[str]`): One or several texts (or one list of prompts) with masked tokens.
Return:
A list or a list of list of :obj:`dict`: Each result comes as list of dictionaries with the
following keys:
- **sequence** (:obj:`str`) -- The corresponding input with the mask token prediction.
- **score** (:obj:`float`) -- The corresponding probability.
- **token** (:obj:`int`) -- The predicted token id (to replace the masked one).
- **token** (:obj:`str`) -- The predicted token (to replace the masked one).
"""
inputs = self._parse_and_tokenize(*args, **kwargs) inputs = self._parse_and_tokenize(*args, **kwargs)
outputs = self._forward(inputs, return_tensors=True) outputs = self._forward(inputs, return_tensors=True)
...@@ -1105,41 +1225,27 @@ class FillMaskPipeline(Pipeline): ...@@ -1105,41 +1225,27 @@ class FillMaskPipeline(Pipeline):
return results return results
@add_end_docstrings(
PIPELINE_INIT_ARGS,
r"""
ignore_labels (:obj:`List[str]`, defaults to :obj:`["O"]`):
A list of labels to ignore.
grouped_entities (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to group the tokens corresponding to the same entity together in the predictions or not.
""",
)
class TokenClassificationPipeline(Pipeline): class TokenClassificationPipeline(Pipeline):
""" """
Named Entity Recognition pipeline using ModelForTokenClassification head. See the Named Entity Recognition pipeline using any :obj:`ModelForTokenClassification`. See the
`named entity recognition usage <../usage.html#named-entity-recognition>`__ examples for more information. `named entity recognition examples <../task_summary.html#named-entity-recognition>`__ for more information.
This token recognition pipeline can currently be loaded from the :func:`~transformers.pipeline` method using This token recognition pipeline can currently be loaded from :func:`~transformers.pipeline` using the following
the following task identifier(s): task identifier: :obj:`"ner"` (for predicting the classes of tokens in a sequence: person, organisation, location
or miscellaneous).
- "ner", for predicting the classes of tokens in a sequence: person, organisation, location or miscellaneous.
The models that this pipeline can use are models that have been fine-tuned on a token classification task. The models that this pipeline can use are models that have been fine-tuned on a token classification task.
See the up-to-date list of available models on See the up-to-date list of available models on
`huggingface.co/models <https://huggingface.co/models?filter=token-classification>`__. `huggingface.co/models <https://huggingface.co/models?filter=token-classification>`__.
Arguments:
model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`):
The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from
:class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for
TensorFlow.
tokenizer (:obj:`~transformers.PreTrainedTokenizer`):
The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from
:class:`~transformers.PreTrainedTokenizer`.
modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`):
Model card attributed to the model for this pipeline.
framework (:obj:`str`, `optional`, defaults to :obj:`None`):
The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be
installed.
If no framework is specified, will default to the one currently installed. If no framework is specified
and both frameworks are installed, will default to PyTorch.
args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`):
Reference to the object in charge of parsing supplied pipeline parameters.
device (:obj:`int`, `optional`, defaults to :obj:`-1`):
Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model
on the associated CUDA device id.
""" """
default_input_names = "sequences" default_input_names = "sequences"
...@@ -1179,6 +1285,24 @@ class TokenClassificationPipeline(Pipeline): ...@@ -1179,6 +1285,24 @@ class TokenClassificationPipeline(Pipeline):
self.grouped_entities = grouped_entities self.grouped_entities = grouped_entities
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
"""
Classify each token of the text(s) given as inputs.
Args:
args (:obj:`str` or :obj:`List[str]`):
One or several texts (or one list of texts) for token classification.
Return:
A list or a list of list of :obj:`dict`: Each result comes as a list of dictionaries (one for each token in
the corresponding input, or each entity if this pipeline was instantiated with
:obj:`grouped_entities=True`) with the following keys:
- **word** (:obj:`str`) -- The token/word classified.
- **score** (:obj:`float`) -- The corresponding probability for :obj:`entity`.
- **entity** (:obj:`str`) -- The entity predicted for that token/word.
- **index** (:obj:`int`, only present when ``self.grouped_entities=False``) -- The index of the
corresponding token in the sentence.
"""
inputs = self._args_parser(*args, **kwargs) inputs = self._args_parser(*args, **kwargs)
answers = [] answers = []
for sentence in inputs: for sentence in inputs:
...@@ -1235,7 +1359,10 @@ class TokenClassificationPipeline(Pipeline): ...@@ -1235,7 +1359,10 @@ class TokenClassificationPipeline(Pipeline):
def group_sub_entities(self, entities: List[dict]) -> dict: def group_sub_entities(self, entities: List[dict]) -> dict:
""" """
Returns grouped sub entities Group together the adjacent tokens with the same entity predicted.
Args:
entities (:obj:`dict`): The entities predicted by the pipeline.
""" """
# Get the first entity in the entity group # Get the first entity in the entity group
entity = entities[0]["entity"] entity = entities[0]["entity"]
...@@ -1251,7 +1378,10 @@ class TokenClassificationPipeline(Pipeline): ...@@ -1251,7 +1378,10 @@ class TokenClassificationPipeline(Pipeline):
def group_entities(self, entities: List[dict]) -> List[dict]: def group_entities(self, entities: List[dict]) -> List[dict]:
""" """
Returns grouped entities Find and group together the adjacent tokens with the same entity predicted.
Args:
entities (:obj:`dict`): The entities predicted by the pipeline.
""" """
entity_groups = [] entity_groups = []
...@@ -1295,10 +1425,10 @@ NerPipeline = TokenClassificationPipeline ...@@ -1295,10 +1425,10 @@ NerPipeline = TokenClassificationPipeline
class QuestionAnsweringArgumentHandler(ArgumentHandler): class QuestionAnsweringArgumentHandler(ArgumentHandler):
""" """
QuestionAnsweringPipeline requires the user to provide multiple arguments (i.e. question & context) to be mapped QuestionAnsweringPipeline requires the user to provide multiple arguments (i.e. question & context) to be mapped
to internal SquadExample / SquadFeature structures. to internal :class:`~transformers.SquadExample`.
QuestionAnsweringArgumentHandler manages all the possible to create SquadExample from the command-line supplied QuestionAnsweringArgumentHandler manages all the possible to create a :class:`~transformers.SquadExample` from
arguments. the command-line supplied arguments.
""" """
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
...@@ -1354,41 +1484,18 @@ class QuestionAnsweringArgumentHandler(ArgumentHandler): ...@@ -1354,41 +1484,18 @@ class QuestionAnsweringArgumentHandler(ArgumentHandler):
return inputs return inputs
@add_end_docstrings(PIPELINE_INIT_ARGS)
class QuestionAnsweringPipeline(Pipeline): class QuestionAnsweringPipeline(Pipeline):
""" """
Question Answering pipeline using ModelForQuestionAnswering head. See the Question Answering pipeline using any :obj:`ModelForQuestionAnswering`. See the
`question answering usage <../usage.html#question-answering>`__ examples for more information. `question answering examples <../task_summary.html#question-answering>`__ for more information.
This question answering can currently be loaded from the :func:`~transformers.pipeline` method using
the following task identifier(s):
- "question-answering", for answering questions given a context. This question answering pipeline can currently be loaded from :func:`~transformers.pipeline` using the following
task identifier: :obj:`"question-answering"`.
The models that this pipeline can use are models that have been fine-tuned on a question answering task. The models that this pipeline can use are models that have been fine-tuned on a question answering task.
See the up-to-date list of available models on See the up-to-date list of available models on
`huggingface.co/models <https://huggingface.co/models?filter=question-answering>`__. `huggingface.co/models <https://huggingface.co/models?filter=question-answering>`__.
Arguments:
model (:obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`):
The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from
:class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for
TensorFlow.
tokenizer (:obj:`~transformers.PreTrainedTokenizer`):
The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from
:class:`~transformers.PreTrainedTokenizer`.
modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`):
Model card attributed to the model for this pipeline.
framework (:obj:`str`, `optional`, defaults to :obj:`None`):
The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be
installed.
If no framework is specified, will default to the one currently installed. If no framework is specified
and both frameworks are installed, will default to PyTorch.
args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`):
Reference to the object in charge of parsing supplied pipeline parameters.
device (:obj:`int`, `optional`, defaults to :obj:`-1`):
Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model
on the associated CUDA device id.
""" """
default_input_names = "question,context" default_input_names = "question,context"
...@@ -1423,15 +1530,19 @@ class QuestionAnsweringPipeline(Pipeline): ...@@ -1423,15 +1530,19 @@ class QuestionAnsweringPipeline(Pipeline):
question: Union[str, List[str]], context: Union[str, List[str]] question: Union[str, List[str]], context: Union[str, List[str]]
) -> Union[SquadExample, List[SquadExample]]: ) -> Union[SquadExample, List[SquadExample]]:
""" """
QuestionAnsweringPipeline leverages the SquadExample/SquadFeatures internally. QuestionAnsweringPipeline leverages the :class:`~transformers.SquadExample` internally.
This helper method encapsulate all the logic for converting question(s) and context(s) to SquadExample(s). This helper method encapsulate all the logic for converting question(s) and context(s) to
:class:`~transformers.SquadExample`.
We currently support extractive question answering. We currently support extractive question answering.
Arguments: Arguments:
question: (str, List[str]) The question to be ask for the associated context question (:obj:`str` or :obj:`List[str]`): The question(s) asked.
context: (str, List[str]) The context in which we will look for the answer. context (:obj:`str` or :obj:`List[str]`): The context(s) in which we will look for the answer.
Returns: Returns:
SquadExample initialized with the corresponding question and context. One or a list of :class:`~transformers.SquadExample`: The corresponding
:class:`~transformers.SquadExample` grouping question and context.
""" """
if isinstance(question, list): if isinstance(question, list):
return [SquadExample(None, q, c, None, None, None) for q, c in zip(question, context)] return [SquadExample(None, q, c, None, None, None) for q, c in zip(question, context)]
...@@ -1440,18 +1551,45 @@ class QuestionAnsweringPipeline(Pipeline): ...@@ -1440,18 +1551,45 @@ class QuestionAnsweringPipeline(Pipeline):
def __call__(self, *args, **kwargs): def __call__(self, *args, **kwargs):
""" """
Answer the question(s) given as inputs by using the context(s).
Args: Args:
We support multiple use-cases, the following are exclusive: args (:class:`~transformers.SquadExample` or a list of :class:`~transformers.SquadExample`):
X: sequence of SquadExample One or several :class:`~transformers.SquadExample` containing the question and context.
data: sequence of SquadExample X (:class:`~transformers.SquadExample` or a list of :class:`~transformers.SquadExample`, `optional`):
question: (str, List[str]), batch of question(s) to map along with context One or several :class:`~transformers.SquadExample` containing the question and context
context: (str, List[str]), batch of context(s) associated with the provided question keyword argument (will be treated the same way as if passed as the first positional argument).
Returns: data (:class:`~transformers.SquadExample` or a list of :class:`~transformers.SquadExample`, `optional`):
dict: {'answer': str, 'score": float, 'start": int, "end": int} One or several :class:`~transformers.SquadExample` containing the question and context
answer: the textual answer in the intial context (will be treated the same way as if passed as the first positional argument).
score: the score the current answer scored for the model question (:obj:`str` or :obj:`List[str]`):
start: the character index in the original string corresponding to the beginning of the answer' span One or several question(s) (must be used in conjunction with the :obj:`context` argument).
end: the character index in the original string corresponding to the ending of the answer' span context (:obj:`str` or :obj:`List[str]`):
One or several context(s) associated with the qustion(s) (must be used in conjunction with the
:obj:`question` argument).
topk (:obj:`int`, `optional`, defaults to 1):
The number of answers to return (will be chosen by order of likelihood).
doc_stride (:obj:`int`, `optional`, defaults to 128):
If the context is too long to fit with the question for the model, it will be split in several chunks
with some overlap. This argument controls the size of that overlap.
max_answer_len (:obj:`int`, `optional`, defaults to 15):
The maximum length of predicted answers (e.g., only answers with a shorter length are considered).
max_seq_len (:obj:`int`, `optional`, defaults to 384):
The maximum length of the total sentence (context + question) after tokenization. The context will be
split in several chunks (using :obj:`doc_stride`) if needed.
max_question_len (:obj:`int`, `optional`, defaults to 64):
The maximum length of the question after tokenization. It will be truncated if needed.
handle_impossible_answer (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not we accept impossible as an answer.
Return:
A :obj:`dict` or a list of :obj:`dict`: Each result comes as a dictionary with the
following keys:
- **score** (:obj:`float`) -- The probability associated to the answer.
- **start** (:obj:`int`) -- The start index of the answer (in the tokenized version of the input).
- **end** (:obj:`int`) -- The end index of the answer (in the tokenized version of the input).
- **answer** (:obj:`str`) -- The answer to the question.
""" """
# Set defaults values # Set defaults values
kwargs.setdefault("topk", 1) kwargs.setdefault("topk", 1)
...@@ -1551,17 +1689,18 @@ class QuestionAnsweringPipeline(Pipeline): ...@@ -1551,17 +1689,18 @@ class QuestionAnsweringPipeline(Pipeline):
def decode(self, start: np.ndarray, end: np.ndarray, topk: int, max_answer_len: int) -> Tuple: def decode(self, start: np.ndarray, end: np.ndarray, topk: int, max_answer_len: int) -> Tuple:
""" """
Take the output of any QuestionAnswering head and will generate probalities for each span to be Take the output of any :obj:`ModelForQuestionAnswering` and will generate probalities for each span to be
the actual answer. the actual answer.
In addition, it filters out some unwanted/impossible cases like answer len being greater than In addition, it filters out some unwanted/impossible cases like answer len being greater than
max_answer_len or answer end position being before the starting position. max_answer_len or answer end position being before the starting position.
The method supports output the k-best answer through the topk argument. The method supports output the k-best answer through the topk argument.
Args: Args:
start: numpy array, holding individual start probabilities for each token start (:obj:`np.ndarray`): Individual start probabilities for each token.
end: numpy array, holding individual end probabilities for each token end (:obj:`np.ndarray`): Individual end probabilities for each token.
topk: int, indicates how many possible answer span(s) to extract from the model's output topk (:obj:`int`): Indicates how many possible answer span(s) to extract from the model output.
max_answer_len: int, maximum size of the answer to extract from the model's output max_answer_len (:obj:`int`): Maximum size of the answer to extract from the model's output.
""" """
# Ensure we have batch axis # Ensure we have batch axis
if start.ndim == 1: if start.ndim == 1:
...@@ -1589,18 +1728,18 @@ class QuestionAnsweringPipeline(Pipeline): ...@@ -1589,18 +1728,18 @@ class QuestionAnsweringPipeline(Pipeline):
start, end = np.unravel_index(idx_sort, candidates.shape)[1:] start, end = np.unravel_index(idx_sort, candidates.shape)[1:]
return start, end, candidates[0, start, end] return start, end, candidates[0, start, end]
def span_to_answer(self, text: str, start: int, end: int): def span_to_answer(self, text: str, start: int, end: int) -> Dict[str, Union[str, int]]:
""" """
When decoding from token probalities, this method maps token indexes to actual word in When decoding from token probalities, this method maps token indexes to actual word in
the initial context. the initial context.
Args: Args:
text: str, the actual context to extract the answer from text (:obj:`str`): The actual context to extract the answer from.
start: int, starting answer token index start (:obj:`int`): The answer starting token index.
end: int, ending answer token index end (:obj:`int`): The answer end token index.
Returns: Returns:
dict: {'answer': str, 'start': int, 'end': int} Dictionary like :obj:`{'answer': str, 'start': int, 'end': int}`
""" """
words = [] words = []
token_idx = char_start_idx = char_end_idx = chars_idx = 0 token_idx = char_start_idx = char_end_idx = chars_idx = 0
...@@ -1634,9 +1773,18 @@ class QuestionAnsweringPipeline(Pipeline): ...@@ -1634,9 +1773,18 @@ class QuestionAnsweringPipeline(Pipeline):
} }
@add_end_docstrings(PIPELINE_INIT_ARGS)
class SummarizationPipeline(Pipeline): class SummarizationPipeline(Pipeline):
""" """
Summarize news articles and other documents Summarize news articles and other documents.
This summarizing pipeline can currently be loaded from :func:`~transformers.pipeline` using the following
task identifier: :obj:`"summarization"`.
The models that this pipeline can use are models that have been fine-tuned on a summarization task,
which is currently, '`bart-large-cnn`', '`t5-small`', '`t5-base`', '`t5-large`', '`t5-3b`', '`t5-11b`'.
See the up-to-date list of available models on
`huggingface.co/models <https://huggingface.co/models?filter=summarization>`__.
Usage:: Usage::
...@@ -1647,39 +1795,6 @@ class SummarizationPipeline(Pipeline): ...@@ -1647,39 +1795,6 @@ class SummarizationPipeline(Pipeline):
# use t5 in tf # use t5 in tf
summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base", framework="tf") summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base", framework="tf")
summarizer("Sam Shleifer writes the best docstring examples in the whole world.", min_length=5, max_length=20) summarizer("Sam Shleifer writes the best docstring examples in the whole world.", min_length=5, max_length=20)
The models that this pipeline can use are models that have been fine-tuned on a summarization task,
which is currently, '`bart-large-cnn`', '`t5-small`', '`t5-base`', '`t5-large`', '`t5-3b`', '`t5-11b`'.
See the up-to-date list of available models on
`huggingface.co/models <https://huggingface.co/models?filter=summarization>`__.
Arguments:
model (:obj:`str` or :obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`, `optional`, defaults to :obj:`None`):
The model that will be used by the pipeline to make predictions. This can be :obj:`None`, a string
checkpoint identifier or an actual pre-trained model inheriting from
:class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for
TensorFlow.
If :obj:`None`, the default of the pipeline will be loaded.
tokenizer (:obj:`str` or :obj:`~transformers.PreTrainedTokenizer`, `optional`, defaults to :obj:`None`):
The tokenizer that will be used by the pipeline to encode data for the model. This can be :obj:`None`,
a string checkpoint identifier or an actual pre-trained tokenizer inheriting from
:class:`~transformers.PreTrainedTokenizer`.
If :obj:`None`, the default of the pipeline will be loaded.
modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`):
Model card attributed to the model for this pipeline.
framework (:obj:`str`, `optional`, defaults to :obj:`None`):
The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be
installed.
If no framework is specified, will default to the one currently installed. If no framework is specified
and both frameworks are installed, will default to PyTorch.
args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`):
Reference to the object in charge of parsing supplied pipeline parameters.
device (:obj:`int`, `optional`, defaults to :obj:`-1`):
Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model
on the associated CUDA device id.
""" """
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
...@@ -1694,20 +1809,29 @@ class SummarizationPipeline(Pipeline): ...@@ -1694,20 +1809,29 @@ class SummarizationPipeline(Pipeline):
self, *documents, return_tensors=False, return_text=True, clean_up_tokenization_spaces=False, **generate_kwargs self, *documents, return_tensors=False, return_text=True, clean_up_tokenization_spaces=False, **generate_kwargs
): ):
r""" r"""
Args: Summarize the text(s) given as inputs.
*documents: (list of strings) articles to be summarized
return_text: (bool, default=True) whether to add a decoded "summary_text" to each result
return_tensors: (bool, default=False) whether to return the raw "summary_token_ids" to each result
clean_up_tokenization_spaces: (`optional`) bool whether to include extra spaces in the output
**generate_kwargs: extra kwargs passed to `self.model.generate`_
Returns: Args:
list of dicts with 'summary_text' and/or 'summary_token_ids' for each document_to_summarize documents (`str` or :obj:`List[str]`):
One or several articles (or one list of articles) to summarize.
return_text (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not to include the decoded texts in the outputs
return_tensors (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to include the tensors of predictions (as token indinces) in the outputs.
clean_up_tokenization_spaces (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to clean up the potential extra spaces in the text output.
generate_kwargs:
Additional keyword arguments to pass along to the generate method of the model (see the generate
method corresponding to your framework `here <./model.html#generative-models>`__).
.. _`self.model.generate`: Return:
https://huggingface.co/transformers/model_doc/bart.html#transformers.BartForConditionalGeneration.generate A list or a list of list of :obj:`dict`: Each result comes as a dictionary with the
following keys:
- **summary_text** (:obj:`str`, present when ``return_text=True``) -- The summary of the corresponding
input.
- **summary_token_ids** (:obj:`torch.Tensor` or :obj:`tf.Tensor`, present when ``return_tensors=True``)
-- The token ids of the summary.
""" """
assert return_tensors or return_text, "You must specify return_tensors=True or return_text=True" assert return_tensors or return_text, "You must specify return_tensors=True or return_text=True"
assert len(documents) > 0, "Please provide a document to summarize" assert len(documents) > 0, "Please provide a document to summarize"
...@@ -1779,43 +1903,21 @@ class SummarizationPipeline(Pipeline): ...@@ -1779,43 +1903,21 @@ class SummarizationPipeline(Pipeline):
return results return results
@add_end_docstrings(PIPELINE_INIT_ARGS)
class TranslationPipeline(Pipeline): class TranslationPipeline(Pipeline):
""" """
Translates from one language to another. Translates from one language to another.
Usage:: This translation pipeline can currently be loaded from :func:`~transformers.pipeline` using the following
en_fr_translator = pipeline("translation_en_to_fr") task identifier: :obj:`"translation_xx_to_yy"`.
en_fr_translator("How old are you?")
The models that this pipeline can use are models that have been fine-tuned on a translation task, The models that this pipeline can use are models that have been fine-tuned on a translation task.
currently: "t5-small", "t5-base", "t5-large", "t5-3b", "t5-11b"
See the up-to-date list of available models on See the up-to-date list of available models on
`huggingface.co/models <https://huggingface.co/models?filter=translation>`__. `huggingface.co/models <https://huggingface.co/models?filter=translation>`__.
Arguments: Usage::
model (:obj:`str` or :obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`, `optional`, defaults to :obj:`None`): en_fr_translator = pipeline("translation_en_to_fr")
The model that will be used by the pipeline to make predictions. This can be :obj:`None`, a string en_fr_translator("How old are you?")
checkpoint identifier or an actual pre-trained model inheriting from
:class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for
TensorFlow.
If :obj:`None`, the default of the pipeline will be loaded.
tokenizer (:obj:`str` or :obj:`~transformers.PreTrainedTokenizer`, `optional`, defaults to :obj:`None`):
The tokenizer that will be used by the pipeline to encode data for the model. This can be :obj:`None`,
a string checkpoint identifier or an actual pre-trained tokenizer inheriting from
:class:`~transformers.PreTrainedTokenizer`.
If :obj:`None`, the default of the pipeline will be loaded.
modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`):
Model card attributed to the model for this pipeline.
framework (:obj:`str`, `optional`, defaults to :obj:`None`):
The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be
installed.
If no framework is specified, will default to the one currently installed. If no framework is specified
and both frameworks are installed, will default to PyTorch.
args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`):
Reference to the object in charge of parsing supplied pipeline parameters.
device (:obj:`int`, `optional`, defaults to :obj:`-1`):
Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model
on the associated CUDA device id.
""" """
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
...@@ -1829,17 +1931,28 @@ class TranslationPipeline(Pipeline): ...@@ -1829,17 +1931,28 @@ class TranslationPipeline(Pipeline):
self, *args, return_tensors=False, return_text=True, clean_up_tokenization_spaces=False, **generate_kwargs self, *args, return_tensors=False, return_text=True, clean_up_tokenization_spaces=False, **generate_kwargs
): ):
r""" r"""
Translate the text(s) given as inputs.
Args: Args:
*args: (list of strings) texts to be translated args (:obj:`str` or :obj:`List[str]`):
return_text: (bool, default=True) whether to add a decoded "translation_text" to each result Texts to be translated.
return_tensors: (bool, default=False) whether to return the raw "translation_token_ids" to each result return_tensors (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to include the tensors of predictions (as token indinces) in the outputs.
return_text (:obj:`bool`, `optional`, defaults to :obj:`True`):
Whether or not to include the decoded texts in the outputs.
clean_up_tokenization_spaces (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to clean up the potential extra spaces in the text output.
generate_kwargs:
Additional keyword arguments to pass along to the generate method of the model (see the generate
method corresponding to your framework `here <./model.html#generative-models>`__).
**generate_kwargs: extra kwargs passed to `self.model.generate`_ Return:
A list or a list of list of :obj:`dict`: Each result comes as a dictionary with the
following keys:
Returns: - **translation_text** (:obj:`str`, present when ``return_text=True``) -- The translation.
list of dicts with 'translation_text' and/or 'translation_token_ids' for each text_to_translate - **translation_token_ids** (:obj:`torch.Tensor` or :obj:`tf.Tensor`, present when ``return_tensors=True``)
.. _`self.model.generate`: -- The token ids of the translation.
https://huggingface.co/transformers/model_doc/bart.html#transformers.BartForConditionalGeneration.generate
""" """
assert return_tensors or return_text, "You must specify return_tensors=True or return_text=True" assert return_tensors or return_text, "You must specify return_tensors=True or return_text=True"
...@@ -1901,10 +2014,20 @@ class TranslationPipeline(Pipeline): ...@@ -1901,10 +2014,20 @@ class TranslationPipeline(Pipeline):
class Conversation: class Conversation:
""" """
Utility class containing a conversation and its history. This class is meant to be used as an input to the Utility class containing a conversation and its history. This class is meant to be used as an input to the
:obj:`~transformers.ConversationalPipeline`. The conversation contains a number of utility function to manage the addition of new :class:`~transformers.ConversationalPipeline`. The conversation contains a number of utility function to manage the
user input and generated model responses. A conversation needs to contain an unprocessed user input before being addition of new user input and generated model responses. A conversation needs to contain an unprocessed user input
passed to the :obj:`~transformers.ConversationalPipeline`. This user input is either created when the class is instantiated, or by calling before being passed to the :class:`~transformers.ConversationalPipeline`. This user input is either created when
`append_response("input")` after a conversation turn. the class is instantiated, or by calling :obj:`conversional_pipeline.append_response("input")` after a conversation
turn.
Arguments:
text (:obj:`str`, `optional`):
The initial user input to start the conversation. If not provided, a user input needs to be provided
manually using the :meth:`~transformers.Conversation.add_user_input` method before the conversation can
begin.
conversation_id (:obj:`uuid.UUID`, `optional`):
Unique identifier for the conversation. If not provided, a random UUID4 id will be assigned to the
conversation.
Usage:: Usage::
...@@ -1917,14 +2040,6 @@ class Conversation: ...@@ -1917,14 +2040,6 @@ class Conversation:
conversation.append_response("The Big lebowski.") conversation.append_response("The Big lebowski.")
conversation.add_user_input("Is it good?") conversation.add_user_input("Is it good?")
Arguments:
text (:obj:`str`, `optional`, defaults to :obj:`None`):
The initial user input to start the conversation.
If :obj:`None`, a user input needs to be provided manually using `add_user_input` before the conversation can begin.
conversation_id (:obj:`uuid.UUID`, `optional`, defaults to :obj:`None`):
Unique identifier for the conversation
If :obj:`None`, the random UUID4 id will be assigned to the conversation.
""" """
def __init__(self, text: str = None, conversation_id: UUID = None): def __init__(self, text: str = None, conversation_id: UUID = None):
...@@ -1938,12 +2053,13 @@ class Conversation: ...@@ -1938,12 +2053,13 @@ class Conversation:
def add_user_input(self, text: str, overwrite: bool = False): def add_user_input(self, text: str, overwrite: bool = False):
""" """
Add a user input to the conversation for the next round. This populates the internal `new_user_input` field. Add a user input to the conversation for the next round. This populates the internal :obj:`new_user_input`
field.
Args: Args:
text: str, the user input for the next conversation round text (:obj:`str`): The user input for the next conversation round.
overwrite: bool, flag indicating if existing and unprocessed user input should be overwritten when this function is called overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not existing and unprocessed user input should be overwritten when this function is called.
""" """
if self.new_user_input: if self.new_user_input:
if overwrite: if overwrite:
...@@ -1963,8 +2079,8 @@ class Conversation: ...@@ -1963,8 +2079,8 @@ class Conversation:
def mark_processed(self): def mark_processed(self):
""" """
Mark the conversation as processed (moves the content of `new_user_input` to `past_user_inputs`) and empties the Mark the conversation as processed (moves the content of :obj:`new_user_input` to :obj:`past_user_inputs`) and
`new_user_input` field. empties the :obj:`new_user_input` field.
""" """
if self.new_user_input: if self.new_user_input:
self.past_user_inputs.append(self.new_user_input) self.past_user_inputs.append(self.new_user_input)
...@@ -1975,17 +2091,17 @@ class Conversation: ...@@ -1975,17 +2091,17 @@ class Conversation:
Append a response to the list of generated responses. Append a response to the list of generated responses.
Args: Args:
response: str, the model generated response response (:obj:`str`): The model generated response.
""" """
self.generated_responses.append(response) self.generated_responses.append(response)
def set_history(self, history: List[int]): def set_history(self, history: List[int]):
""" """
Updates the value of the history of the conversation. The history is represented by a list of `token_ids`. The Updates the value of the history of the conversation. The history is represented by a list of :obj:`token_ids`.
history is used by the model to generate responses based on the previous conversation turns. The history is used by the model to generate responses based on the previous conversation turns.
Args: Args:
history: (list of int), history of tokens provided and generated for this conversation history (:obj:`List[int]`): History of tokens provided and generated for this conversation.
""" """
self.history = history self.history = history
...@@ -1994,7 +2110,7 @@ class Conversation: ...@@ -1994,7 +2110,7 @@ class Conversation:
Generates a string representation of the conversation. Generates a string representation of the conversation.
Return: Return:
:obj:`str` or :obj:`Dict`: :obj:`str`:
Example: Example:
Conversation id: 7d15686b-dc94-49f2-9c4b-c9eac6a1f114 Conversation id: 7d15686b-dc94-49f2-9c4b-c9eac6a1f114
...@@ -2010,10 +2126,25 @@ class Conversation: ...@@ -2010,10 +2126,25 @@ class Conversation:
return output return output
@add_end_docstrings(
PIPELINE_INIT_ARGS,
r"""
min_length_for_response (:obj:`int`, `optional`, defaults to 32):
The minimum length (in number of tokens) for a response.
""",
)
class ConversationalPipeline(Pipeline): class ConversationalPipeline(Pipeline):
""" """
Multi-turn conversational pipeline. Multi-turn conversational pipeline.
This conversational pipeline can currently be loaded from :func:`~transformers.pipeline` using the following
task identifier: :obj:`"conversational"`.
The models that this pipeline can use are models that have been fine-tuned on a multi-turn conversational task,
currently: `'microsoft/DialoGPT-small'`, `'microsoft/DialoGPT-medium'`, `'microsoft/DialoGPT-large'`.
See the up-to-date list of available models on
`huggingface.co/models <https://huggingface.co/models?filter=conversational>`__.
Usage:: Usage::
conversational_pipeline = pipeline("conversational") conversational_pipeline = pipeline("conversational")
...@@ -2027,36 +2158,6 @@ class ConversationalPipeline(Pipeline): ...@@ -2027,36 +2158,6 @@ class ConversationalPipeline(Pipeline):
conversation_2.add_user_input("What is the genre of this book?") conversation_2.add_user_input("What is the genre of this book?")
conversational_pipeline([conversation_1, conversation_2]) conversational_pipeline([conversation_1, conversation_2])
The models that this pipeline can use are models that have been fine-tuned on a multi-turn conversational task,
currently: "microsoft/DialoGPT-small", "microsoft/DialoGPT-medium", "microsoft/DialoGPT-large"
See the up-to-date list of available models on
`huggingface.co/models <https://huggingface.co/models?filter=conversational>`__.
Arguments:
model (:obj:`str` or :obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`, `optional`, defaults to :obj:`None`):
The model that will be used by the pipeline to make predictions. This can be :obj:`None`, a string
checkpoint identifier or an actual pre-trained model inheriting from
:class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for
TensorFlow.
If :obj:`None`, the default of the pipeline will be loaded.
tokenizer (:obj:`str` or :obj:`~transformers.PreTrainedTokenizer`, `optional`, defaults to :obj:`None`):
The tokenizer that will be used by the pipeline to encode data for the model. This can be :obj:`None`,
a string checkpoint identifier or an actual pre-trained tokenizer inheriting from
:class:`~transformers.PreTrainedTokenizer`.
If :obj:`None`, the default of the pipeline will be loaded.
modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`, defaults to :obj:`None`):
Model card attributed to the model for this pipeline.
framework (:obj:`str`, `optional`, defaults to :obj:`None`):
The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be
installed.
If no framework is specified, will default to the one currently installed. If no framework is specified
and both frameworks are installed, will default to PyTorch.
args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`, defaults to :obj:`None`):
Reference to the object in charge of parsing supplied pipeline parameters.
device (:obj:`int`, `optional`, defaults to :obj:`-1`):
Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, >=0 will run the model
on the associated CUDA device id.
""" """
def __init__(self, min_length_for_response=32, *args, **kwargs): def __init__(self, min_length_for_response=32, *args, **kwargs):
...@@ -2075,12 +2176,20 @@ class ConversationalPipeline(Pipeline): ...@@ -2075,12 +2176,20 @@ class ConversationalPipeline(Pipeline):
**generate_kwargs **generate_kwargs
): ):
r""" r"""
Generate responses for the conversation(s) given as inputs.
Args: Args:
conversations: (list of :class:`~transformers.pipelines.Conversation`) Conversations to generate responses for conversations (a :class:`~transformers.Conversation` or a list of :class:`~transformers.Conversation`):
**generate_kwargs: extra kwargs passed to `self.model.generate`_ Conversations to generate responses for.
clean_up_tokenization_spaces (:obj:`bool`, `optional`, defaults to :obj:`False`):
Whether or not to clean up the potential extra spaces in the text output.
generate_kwargs:
Additional keyword arguments to pass along to the generate method of the model (see the generate
method corresponding to your framework `here <./model.html#generative-models>`__).
Returns: Returns:
list of conversations with updated generated responses for those containing a new user input :class:`~transformers.Conversation` or a list of :class:`~transformers.Conversation`: Conversation(s) with
updated generated responses for those containing a new user input.
""" """
# Input validation # Input validation
...@@ -2315,56 +2424,58 @@ def pipeline( ...@@ -2315,56 +2424,58 @@ def pipeline(
**kwargs **kwargs
) -> Pipeline: ) -> Pipeline:
""" """
Utility factory method to build a pipeline. Utility factory method to build a :class:`~transformers.Pipeline`.
Pipeline are made of: Pipelines are made of:
- A Tokenizer instance in charge of mapping raw textual input to token
- A Model instance
- Some (optional) post processing for enhancing model's output
- A :doc:`tokenizer <tokenizer>` in charge of mapping raw textual input to token.
- A :doc:`model <model>` to make predictions from the inputs.
- Some (optional) post processing for enhancing model's output.
Args: Args:
task (:obj:`str`): task (:obj:`str`):
The task defining which pipeline will be returned. Currently accepted tasks are: The task defining which pipeline will be returned. Currently accepted tasks are:
- "feature-extraction": will return a :class:`~transformers.FeatureExtractionPipeline` - :obj:`"feature-extraction"`: will return a :class:`~transformers.FeatureExtractionPipeline`.
- "sentiment-analysis": will return a :class:`~transformers.TextClassificationPipeline` - :obj:`"sentiment-analysis"`: will return a :class:`~transformers.TextClassificationPipeline`.
- "ner": will return a :class:`~transformers.TokenClassificationPipeline` - :obj:`"ner"`: will return a :class:`~transformers.TokenClassificationPipeline`.
- "question-answering": will return a :class:`~transformers.QuestionAnsweringPipeline` - :obj:`"question-answering"`: will return a :class:`~transformers.QuestionAnsweringPipeline`.
- "fill-mask": will return a :class:`~transformers.FillMaskPipeline` - :obj:`"fill-mask"`: will return a :class:`~transformers.FillMaskPipeline`.
- "summarization": will return a :class:`~transformers.SummarizationPipeline` - :obj:`"summarization"`: will return a :class:`~transformers.SummarizationPipeline`.
- "translation_xx_to_yy": will return a :class:`~transformers.TranslationPipeline` - :obj:`"translation_xx_to_yy"`: will return a :class:`~transformers.TranslationPipeline`.
- "text-generation": will return a :class:`~transformers.TextGenerationPipeline` - :obj:`"text-generation"`: will return a :class:`~transformers.TextGenerationPipeline`.
model (:obj:`str` or :obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`, `optional`, defaults to :obj:`None`): - :obj:`"conversation"`: will return a :class:`~transformers.ConversationalPipeline`.
The model that will be used by the pipeline to make predictions. This can be :obj:`None`, model (:obj:`str` or :obj:`~transformers.PreTrainedModel` or :obj:`~transformers.TFPreTrainedModel`, `optional`):
a model identifier or an actual pre-trained model inheriting from The model that will be used by the pipeline to make predictions. This can be a model identifier or an
:class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for actual instance of a pretrained model inheriting from :class:`~transformers.PreTrainedModel` (for PyTorch)
TensorFlow. or :class:`~transformers.TFPreTrainedModel` (for TensorFlow).
If :obj:`None`, the default for this pipeline will be loaded. If not provided, the default for the :obj:`task` will be loaded.
config (:obj:`str` or :obj:`~transformers.PretrainedConfig`, `optional`, defaults to :obj:`None`): config (:obj:`str` or :obj:`~transformers.PretrainedConfig`, `optional`):
The configuration that will be used by the pipeline to instantiate the model. This can be :obj:`None`, The configuration that will be used by the pipeline to instantiate the model. This can be a model
a model identifier or an actual pre-trained model configuration inheriting from identifier or an actual pretrained model configuration inheriting from
:class:`~transformers.PretrainedConfig`. :class:`~transformers.PretrainedConfig`.
If :obj:`None`, the default for this pipeline will be loaded. If not provided, the default for the :obj:`task` will be loaded.
tokenizer (:obj:`str` or :obj:`~transformers.PreTrainedTokenizer`, `optional`, defaults to :obj:`None`): tokenizer (:obj:`str` or :obj:`~transformers.PreTrainedTokenizer`, `optional`):
The tokenizer that will be used by the pipeline to encode data for the model. This can be :obj:`None`, The tokenizer that will be used by the pipeline to encode data for the model. This can be a model
a model identifier or an actual pre-trained tokenizer inheriting from identifier or an actual pretrained tokenizer inheriting from
:class:`~transformers.PreTrainedTokenizer`. :class:`~transformers.PreTrainedTokenizer`.
If :obj:`None`, the default for this pipeline will be loaded. If not provided, the default for the :obj:`task` will be loaded.
framework (:obj:`str`, `optional`, defaults to :obj:`None`): framework (:obj:`str`, `optional`):
The framework to use, either "pt" for PyTorch or "tf" for TensorFlow. The specified framework must be The framework to use, either :obj:`"pt"` for PyTorch or :obj:`"tf"` for TensorFlow. The specified framework
installed. must be installed.
If no framework is specified, will default to the one currently installed. If no framework is specified If no framework is specified, will default to the one currently installed. If no framework is specified
and both frameworks are installed, will default to PyTorch. and both frameworks are installed, will default to the framework of the :obj:`model`, or to PyTorch if no
model is provided.
kwargs:
Additional keyword arguments passed along to the specific pipeline init (see the documentation for the
corresponding pipeline class for possible values).
Returns: Returns:
:class:`~transformers.Pipeline`: Class inheriting from :class:`~transformers.Pipeline`, according to :class:`~transformers.Pipeline`: A suitable pipeline for the task.
the task.
Examples:: Examples::
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment