Mass conversion of documentation from rst to Markdown (#14866)

* Convert docstrings of all configurations and tokenizers * Processors and fixes * Last modeling files and fixes to models * Pipeline modules * Utils files * Data submodule * All the other files * Style * Missing examples * Style again * Fix copies * Say bye bye to rst docstrings forever

Mass conversion of documentation from rst to Markdown (#14866)
* Convert docstrings of all configurations and tokenizers * Processors and fixes * Last modeling files and fixes to models * Pipeline modules * Utils files * Data submodule * All the other files * Style * Missing examples * Style again * Fix copies * Say bye bye to rst docstrings forever
27b3031d · Sylvain Gugger · GitHub · 18587639 · 27b3031d · 27b3031d
Unverified Commit 27b3031d authored Dec 21, 2021 by Sylvain Gugger Committed by GitHub Dec 21, 2021
20 changed files
--- a/src/transformers/models/xlm_prophetnet/configuration_xlm_prophetnet.py
+++ b/src/transformers/models/xlm_prophetnet/configuration_xlm_prophetnet.py
@@ -28,7 +28,7 @@ XLM_PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP = {

 class XLMProphetNetConfig(ProphetNetConfig):
    """
-    This class overrides :class:`~transformers.ProphetNetConfig`. Please check the superclass for the appropriate
+    This class overrides [`ProphetNetConfig`]. Please check the superclass for the appropriate
    documentation alongside usage examples.
    """


--- a/src/transformers/models/xlm_prophetnet/tokenization_xlm_prophetnet.py
+++ b/src/transformers/models/xlm_prophetnet/tokenization_xlm_prophetnet.py
@@ -56,64 +56,69 @@ def load_vocab(vocab_file):

 class XLMProphetNetTokenizer(PreTrainedTokenizer):
    """
-    Adapted from :class:`~transformers.RobertaTokenizer` and :class:`~transformers.XLNetTokenizer`. Based on
-    `SentencePiece <https://github.com/google/sentencepiece>`__.
+    Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on
+    [SentencePiece](https://github.com/google/sentencepiece).

-    This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods.
+    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods.
    Users should refer to this superclass for more information regarding those methods.

    Args:
-        vocab_file (:obj:`str`):
+        vocab_file (`str`):
            Path to the vocabulary file.
-        bos_token (:obj:`str`, `optional`, defaults to :obj:`"<s>"`):
+        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

-            .. note::
+            <Tip>

-                When building a sequence using special tokens, this is not the token that is used for the beginning of
-                sequence. The token used is the :obj:`cls_token`.
-        eos_token (:obj:`str`, `optional`, defaults to :obj:`"</s>"`):
+            When building a sequence using special tokens, this is not the token that is used for the beginning of
+            sequence. The token used is the `cls_token`.
+
+            </Tip>
+
+        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.

-            .. note::
+            <Tip>
+
+            When building a sequence using special tokens, this is not the token that is used for the end of
+            sequence. The token used is the `sep_token`.
+
+            </Tip>

-                When building a sequence using special tokens, this is not the token that is used for the end of
-                sequence. The token used is the :obj:`sep_token`.
-        sep_token (:obj:`str`, `optional`, defaults to :obj:`"</s>"`):
+        sep_token (`str`, *optional*, defaults to `"</s>"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
-        cls_token (:obj:`str`, `optional`, defaults to :obj:`"<s>"`):
+        cls_token (`str`, *optional*, defaults to `"<s>"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
-        unk_token (:obj:`str`, `optional`, defaults to :obj:`"<unk>"`):
+        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
-        pad_token (:obj:`str`, `optional`, defaults to :obj:`"<pad>"`):
+        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
-        mask_token (:obj:`str`, `optional`, defaults to :obj:`"<mask>"`):
+        mask_token (`str`, *optional*, defaults to `"<mask>"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
-        additional_special_tokens (:obj:`List[str]`, `optional`, defaults to :obj:`["<s>NOTUSED", "</s>NOTUSED"]`):
+        additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
            Additional special tokens used by the tokenizer.
-        sp_model_kwargs (:obj:`dict`, `optional`):
-            Will be passed to the ``SentencePieceProcessor.__init__()`` method. The `Python wrapper for SentencePiece
-            <https://github.com/google/sentencepiece/tree/master/python>`__ can be used, among other things, to set:
+        sp_model_kwargs (`dict`, *optional*):
+            Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, to set:

-            - ``enable_sampling``: Enable subword regularization.
-            - ``nbest_size``: Sampling parameters for unigram. Invalid for BPE-Dropout.
+            - `enable_sampling`: Enable subword regularization.
+            - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout.

-              - ``nbest_size = {0,1}``: No sampling is performed.
-              - ``nbest_size > 1``: samples from the nbest_size results.
-              - ``nbest_size < 0``: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
+              - `nbest_size = {0,1}`: No sampling is performed.
+              - `nbest_size > 1`: samples from the nbest_size results.
+              - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
                using forward-filtering-and-backward-sampling algorithm.

-            - ``alpha``: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
+            - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
              BPE-dropout.

    Attributes:
-        sp_model (:obj:`SentencePieceProcessor`):
-            The `SentencePiece` processor that is used for every conversion (string, tokens and IDs).
+        sp_model (`SentencePieceProcessor`):
+            The *SentencePiece* processor that is used for every conversion (string, tokens and IDs).
    """

    vocab_files_names = VOCAB_FILES_NAMES
@@ -208,18 +213,18 @@ class XLMProphetNetTokenizer(PreTrainedTokenizer):
    ) -> List[int]:
        """
        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
-        special tokens using the tokenizer ``prepare_for_model`` method.
+        special tokens using the tokenizer `prepare_for_model` method.

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs.
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.
-            already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
                Whether or not the token list is already formatted with special tokens for the model.

        Returns:
-            :obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
+            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """

        if already_has_special_tokens:
@@ -239,13 +244,13 @@ class XLMProphetNetTokenizer(PreTrainedTokenizer):
        does not make use of token type ids, therefore a list of zeros is returned.

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs.
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
-            :obj:`List[int]`: List of zeros.
+            `List[int]`: List of zeros.

        """

@@ -307,17 +312,17 @@ class XLMProphetNetTokenizer(PreTrainedTokenizer):
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. A XLMProphetNet sequence has the following format:

-        - single sequence: ``X [SEP]``
-        - pair of sequences: ``A [SEP] B [SEP]``
+        - single sequence: `X [SEP]`
+        - pair of sequences: `A [SEP] B [SEP]`

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
-            :obj:`List[int]`: list of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
+            `List[int]`: list of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """

        if token_ids_1 is None:

--- a/src/transformers/models/xlm_roberta/configuration_xlm_roberta.py
+++ b/src/transformers/models/xlm_roberta/configuration_xlm_roberta.py
@@ -36,7 +36,7 @@ XLM_ROBERTA_PRETRAINED_CONFIG_ARCHIVE_MAP = {

 class XLMRobertaConfig(RobertaConfig):
    """
-    This class overrides :class:`~transformers.RobertaConfig`. Please check the superclass for the appropriate
+    This class overrides [`RobertaConfig`]. Please check the superclass for the appropriate
    documentation alongside usage examples.
    """


--- a/src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py
+++ b/src/transformers/models/xlm_roberta/tokenization_xlm_roberta.py
@@ -54,64 +54,69 @@ PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {

 class XLMRobertaTokenizer(PreTrainedTokenizer):
    """
-    Adapted from :class:`~transformers.RobertaTokenizer` and :class:`~transformers.XLNetTokenizer`. Based on
-    `SentencePiece <https://github.com/google/sentencepiece>`__.
+    Adapted from [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on
+    [SentencePiece](https://github.com/google/sentencepiece).

-    This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods.
+    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods.
    Users should refer to this superclass for more information regarding those methods.

    Args:
-        vocab_file (:obj:`str`):
+        vocab_file (`str`):
            Path to the vocabulary file.
-        bos_token (:obj:`str`, `optional`, defaults to :obj:`"<s>"`):
+        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

-            .. note::
+            <Tip>

-                When building a sequence using special tokens, this is not the token that is used for the beginning of
-                sequence. The token used is the :obj:`cls_token`.
-        eos_token (:obj:`str`, `optional`, defaults to :obj:`"</s>"`):
+            When building a sequence using special tokens, this is not the token that is used for the beginning of
+            sequence. The token used is the `cls_token`.
+
+            </Tip>
+
+        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.

-            .. note::
+            <Tip>
+
+            When building a sequence using special tokens, this is not the token that is used for the end of
+            sequence. The token used is the `sep_token`.
+
+            </Tip>

-                When building a sequence using special tokens, this is not the token that is used for the end of
-                sequence. The token used is the :obj:`sep_token`.
-        sep_token (:obj:`str`, `optional`, defaults to :obj:`"</s>"`):
+        sep_token (`str`, *optional*, defaults to `"</s>"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
-        cls_token (:obj:`str`, `optional`, defaults to :obj:`"<s>"`):
+        cls_token (`str`, *optional*, defaults to `"<s>"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
-        unk_token (:obj:`str`, `optional`, defaults to :obj:`"<unk>"`):
+        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
-        pad_token (:obj:`str`, `optional`, defaults to :obj:`"<pad>"`):
+        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
-        mask_token (:obj:`str`, `optional`, defaults to :obj:`"<mask>"`):
+        mask_token (`str`, *optional*, defaults to `"<mask>"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
-        additional_special_tokens (:obj:`List[str]`, `optional`, defaults to :obj:`["<s>NOTUSED", "</s>NOTUSED"]`):
+        additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
            Additional special tokens used by the tokenizer.
-        sp_model_kwargs (:obj:`dict`, `optional`):
-            Will be passed to the ``SentencePieceProcessor.__init__()`` method. The `Python wrapper for SentencePiece
-            <https://github.com/google/sentencepiece/tree/master/python>`__ can be used, among other things, to set:
+        sp_model_kwargs (`dict`, *optional*):
+            Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, to set:

-            - ``enable_sampling``: Enable subword regularization.
-            - ``nbest_size``: Sampling parameters for unigram. Invalid for BPE-Dropout.
+            - `enable_sampling`: Enable subword regularization.
+            - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout.

-              - ``nbest_size = {0,1}``: No sampling is performed.
-              - ``nbest_size > 1``: samples from the nbest_size results.
-              - ``nbest_size < 0``: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
+              - `nbest_size = {0,1}`: No sampling is performed.
+              - `nbest_size > 1`: samples from the nbest_size results.
+              - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
                using forward-filtering-and-backward-sampling algorithm.

-            - ``alpha``: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
+            - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
              BPE-dropout.

    Attributes:
-        sp_model (:obj:`SentencePieceProcessor`):
-            The `SentencePiece` processor that is used for every conversion (string, tokens and IDs).
+        sp_model (`SentencePieceProcessor`):
+            The *SentencePiece* processor that is used for every conversion (string, tokens and IDs).
    """

    vocab_files_names = VOCAB_FILES_NAMES
@@ -191,17 +196,17 @@ class XLMRobertaTokenizer(PreTrainedTokenizer):
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. An XLM-RoBERTa sequence has the following format:

-        - single sequence: ``<s> X </s>``
-        - pair of sequences: ``<s> A </s></s> B </s>``
+        - single sequence: `<s> X </s>`
+        - pair of sequences: `<s> A </s></s> B </s>`

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
-            :obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
+            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """

        if token_ids_1 is None:
@@ -215,18 +220,18 @@ class XLMRobertaTokenizer(PreTrainedTokenizer):
    ) -> List[int]:
        """
        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
-        special tokens using the tokenizer ``prepare_for_model`` method.
+        special tokens using the tokenizer `prepare_for_model` method.

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs.
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.
-            already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
                Whether or not the token list is already formatted with special tokens for the model.

        Returns:
-            :obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
+            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """

        if already_has_special_tokens:
@@ -246,13 +251,13 @@ class XLMRobertaTokenizer(PreTrainedTokenizer):
        not make use of token type ids, therefore a list of zeros is returned.

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs.
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
-            :obj:`List[int]`: List of zeros.
+            `List[int]`: List of zeros.

        """


--- a/src/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py
+++ b/src/transformers/models/xlm_roberta/tokenization_xlm_roberta_fast.py
@@ -66,46 +66,51 @@ PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {

 class XLMRobertaTokenizerFast(PreTrainedTokenizerFast):
    """
-    Construct a "fast" XLM-RoBERTa tokenizer (backed by HuggingFace's `tokenizers` library). Adapted from
-    :class:`~transformers.RobertaTokenizer` and :class:`~transformers.XLNetTokenizer`. Based on `BPE
-    <https://huggingface.co/docs/tokenizers/python/latest/components.html?highlight=BPE#models>`__.
+    Construct a "fast" XLM-RoBERTa tokenizer (backed by HuggingFace's *tokenizers* library). Adapted from
+    [`RobertaTokenizer`] and [`XLNetTokenizer`]. Based on [BPE](https://huggingface.co/docs/tokenizers/python/latest/components.html?highlight=BPE#models).

-    This tokenizer inherits from :class:`~transformers.PreTrainedTokenizerFast` which contains most of the main
+    This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main
    methods. Users should refer to this superclass for more information regarding those methods.

    Args:
-        vocab_file (:obj:`str`):
+        vocab_file (`str`):
            Path to the vocabulary file.
-        bos_token (:obj:`str`, `optional`, defaults to :obj:`"<s>"`):
+        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

-            .. note::
+            <Tip>

-                When building a sequence using special tokens, this is not the token that is used for the beginning of
-                sequence. The token used is the :obj:`cls_token`.
-        eos_token (:obj:`str`, `optional`, defaults to :obj:`"</s>"`):
+            When building a sequence using special tokens, this is not the token that is used for the beginning of
+            sequence. The token used is the `cls_token`.
+
+            </Tip>
+
+        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.

-            .. note::
+            <Tip>
+
+            When building a sequence using special tokens, this is not the token that is used for the end of
+            sequence. The token used is the `sep_token`.
+
+            </Tip>

-                When building a sequence using special tokens, this is not the token that is used for the end of
-                sequence. The token used is the :obj:`sep_token`.
-        sep_token (:obj:`str`, `optional`, defaults to :obj:`"</s>"`):
+        sep_token (`str`, *optional*, defaults to `"</s>"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
-        cls_token (:obj:`str`, `optional`, defaults to :obj:`"<s>"`):
+        cls_token (`str`, *optional*, defaults to `"<s>"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
-        unk_token (:obj:`str`, `optional`, defaults to :obj:`"<unk>"`):
+        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
-        pad_token (:obj:`str`, `optional`, defaults to :obj:`"<pad>"`):
+        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
-        mask_token (:obj:`str`, `optional`, defaults to :obj:`"<mask>"`):
+        mask_token (`str`, *optional*, defaults to `"<mask>"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
-        additional_special_tokens (:obj:`List[str]`, `optional`, defaults to :obj:`["<s>NOTUSED", "</s>NOTUSED"]`):
+        additional_special_tokens (`List[str]`, *optional*, defaults to `["<s>NOTUSED", "</s>NOTUSED"]`):
            Additional special tokens used by the tokenizer.
    """

@@ -154,17 +159,17 @@ class XLMRobertaTokenizerFast(PreTrainedTokenizerFast):
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. An XLM-RoBERTa sequence has the following format:

-        - single sequence: ``<s> X </s>``
-        - pair of sequences: ``<s> A </s></s> B </s>``
+        - single sequence: `<s> X </s>`
+        - pair of sequences: `<s> A </s></s> B </s>`

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
-            :obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
+            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """

        if token_ids_1 is None:
@@ -181,13 +186,13 @@ class XLMRobertaTokenizerFast(PreTrainedTokenizerFast):
        not make use of token type ids, therefore a list of zeros is returned.

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs.
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
-            :obj:`List[int]`: List of zeros.
+            `List[int]`: List of zeros.

        """


--- a/src/transformers/models/xlnet/configuration_xlnet.py
+++ b/src/transformers/models/xlnet/configuration_xlnet.py
@@ -31,109 +31,110 @@ XLNET_PRETRAINED_CONFIG_ARCHIVE_MAP = {

 class XLNetConfig(PretrainedConfig):
    """
-    This is the configuration class to store the configuration of a :class:`~transformers.XLNetModel` or a
-    :class:`~transformers.TFXLNetModel`. It is used to instantiate a XLNet model according to the specified arguments,
+    This is the configuration class to store the configuration of a [`XLNetModel`] or a
+    [`TFXLNetModel`]. It is used to instantiate a XLNet model according to the specified arguments,
    defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration
-    to that of the `xlnet-large-cased <https://huggingface.co/xlnet-large-cased>`__ architecture.
+    to that of the [xlnet-large-cased](https://huggingface.co/xlnet-large-cased) architecture.

-    Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used to control the model
-    outputs. Read the documentation from :class:`~transformers.PretrainedConfig` for more information.
+    Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model
+    outputs. Read the documentation from [`PretrainedConfig`] for more information.

    Args:
-        vocab_size (:obj:`int`, `optional`, defaults to 32000):
+        vocab_size (`int`, *optional*, defaults to 32000):
            Vocabulary size of the XLNet model. Defines the number of different tokens that can be represented by the
-            :obj:`inputs_ids` passed when calling :class:`~transformers.XLNetModel` or
-            :class:`~transformers.TFXLNetModel`.
-        d_model (:obj:`int`, `optional`, defaults to 1024):
+            `inputs_ids` passed when calling [`XLNetModel`] or
+            [`TFXLNetModel`].
+        d_model (`int`, *optional*, defaults to 1024):
            Dimensionality of the encoder layers and the pooler layer.
-        n_layer (:obj:`int`, `optional`, defaults to 24):
+        n_layer (`int`, *optional*, defaults to 24):
            Number of hidden layers in the Transformer encoder.
-        n_head (:obj:`int`, `optional`, defaults to 16):
+        n_head (`int`, *optional*, defaults to 16):
            Number of attention heads for each attention layer in the Transformer encoder.
-        d_inner (:obj:`int`, `optional`, defaults to 4096):
+        d_inner (`int`, *optional*, defaults to 4096):
            Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
-        ff_activation (:obj:`str` or :obj:`Callable`, `optional`, defaults to :obj:`"gelu"`):
-            The non-linear activation function (function or string) in the If string, :obj:`"gelu"`, :obj:`"relu"`,
-            :obj:`"silu"` and :obj:`"gelu_new"` are supported.
-        untie_r (:obj:`bool`, `optional`, defaults to :obj:`True`):
+        ff_activation (`str` or `Callable`, *optional*, defaults to `"gelu"`):
+            The non-linear activation function (function or string) in the If string, `"gelu"`, `"relu"`,
+            `"silu"` and `"gelu_new"` are supported.
+        untie_r (`bool`, *optional*, defaults to `True`):
            Whether or not to untie relative position biases
-        attn_type (:obj:`str`, `optional`, defaults to :obj:`"bi"`):
-            The attention type used by the model. Set :obj:`"bi"` for XLNet, :obj:`"uni"` for Transformer-XL.
-        initializer_range (:obj:`float`, `optional`, defaults to 0.02):
+        attn_type (`str`, *optional*, defaults to `"bi"`):
+            The attention type used by the model. Set `"bi"` for XLNet, `"uni"` for Transformer-XL.
+        initializer_range (`float`, *optional*, defaults to 0.02):
            The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
-        layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-12):
+        layer_norm_eps (`float`, *optional*, defaults to 1e-12):
            The epsilon used by the layer normalization layers.
-        dropout (:obj:`float`, `optional`, defaults to 0.1):
+        dropout (`float`, *optional*, defaults to 0.1):
            The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
-        mem_len (:obj:`int` or :obj:`None`, `optional`):
+        mem_len (`int` or `None`, *optional*):
            The number of tokens to cache. The key/value pairs that have already been pre-computed in a previous
-            forward pass won't be re-computed. See the `quickstart
-            <https://huggingface.co/transformers/quickstart.html#using-the-past>`__ for more information.
-        reuse_len (:obj:`int`, `optional`):
+            forward pass won't be re-computed. See the [quickstart](https://huggingface.co/transformers/quickstart.html#using-the-past) for more information.
+        reuse_len (`int`, *optional*):
            The number of tokens in the current batch to be cached and reused in the future.
-        bi_data (:obj:`bool`, `optional`, defaults to :obj:`False`):
-            Whether or not to use bidirectional input pipeline. Usually set to :obj:`True` during pretraining and
-            :obj:`False` during finetuning.
-        clamp_len (:obj:`int`, `optional`, defaults to -1):
+        bi_data (`bool`, *optional*, defaults to `False`):
+            Whether or not to use bidirectional input pipeline. Usually set to `True` during pretraining and
+            `False` during finetuning.
+        clamp_len (`int`, *optional*, defaults to -1):
            Clamp all relative distances larger than clamp_len. Setting this attribute to -1 means no clamping.
-        same_length (:obj:`bool`, `optional`, defaults to :obj:`False`):
+        same_length (`bool`, *optional*, defaults to `False`):
            Whether or not to use the same attention length for each token.
-        summary_type (:obj:`str`, `optional`, defaults to "last"):
+        summary_type (`str`, *optional*, defaults to "last"):
            Argument used when doing sequence summary. Used in the sequence classification and multiple choice models.

            Has to be one of the following options:

-                - :obj:`"last"`: Take the last token hidden state (like XLNet).
-                - :obj:`"first"`: Take the first token hidden state (like BERT).
-                - :obj:`"mean"`: Take the mean of all tokens hidden states.
-                - :obj:`"cls_index"`: Supply a Tensor of classification token position (like GPT/GPT-2).
-                - :obj:`"attn"`: Not implemented now, use multi-head attention.
-        summary_use_proj (:obj:`bool`, `optional`, defaults to :obj:`True`):
+                - `"last"`: Take the last token hidden state (like XLNet).
+                - `"first"`: Take the first token hidden state (like BERT).
+                - `"mean"`: Take the mean of all tokens hidden states.
+                - `"cls_index"`: Supply a Tensor of classification token position (like GPT/GPT-2).
+                - `"attn"`: Not implemented now, use multi-head attention.
+        summary_use_proj (`bool`, *optional*, defaults to `True`):
            Argument used when doing sequence summary. Used in the sequence classification and multiple choice models.

            Whether or not to add a projection after the vector extraction.
-        summary_activation (:obj:`str`, `optional`):
+        summary_activation (`str`, *optional*):
            Argument used when doing sequence summary. Used in the sequence classification and multiple choice models.

-            Pass :obj:`"tanh"` for a tanh activation to the output, any other value will result in no activation.
-        summary_proj_to_labels (:obj:`boo`, `optional`, defaults to :obj:`True`):
+            Pass `"tanh"` for a tanh activation to the output, any other value will result in no activation.
+        summary_proj_to_labels (`boo`, *optional*, defaults to `True`):
            Used in the sequence classification and multiple choice models.

-            Whether the projection outputs should have :obj:`config.num_labels` or :obj:`config.hidden_size` classes.
-        summary_last_dropout (:obj:`float`, `optional`, defaults to 0.1):
+            Whether the projection outputs should have `config.num_labels` or `config.hidden_size` classes.
+        summary_last_dropout (`float`, *optional*, defaults to 0.1):
            Used in the sequence classification and multiple choice models.

            The dropout ratio to be used after the projection and activation.
-        start_n_top (:obj:`int`, `optional`, defaults to 5):
+        start_n_top (`int`, *optional*, defaults to 5):
            Used in the SQuAD evaluation script.
-        end_n_top (:obj:`int`, `optional`, defaults to 5):
+        end_n_top (`int`, *optional*, defaults to 5):
            Used in the SQuAD evaluation script.
-        use_mems_eval (:obj:`bool`, `optional`, defaults to :obj:`True`):
+        use_mems_eval (`bool`, *optional*, defaults to `True`):
            Whether or not the model should make use of the recurrent memory mechanism in evaluation mode.
-        use_mems_train (:obj:`bool`, `optional`, defaults to :obj:`False`):
+        use_mems_train (`bool`, *optional*, defaults to `False`):
            Whether or not the model should make use of the recurrent memory mechanism in train mode.

-            .. note::
-                For pretraining, it is recommended to set ``use_mems_train`` to :obj:`True`. For fine-tuning, it is
-                recommended to set ``use_mems_train`` to :obj:`False` as discussed `here
-                <https://github.com/zihangdai/xlnet/issues/41#issuecomment-505102587>`__. If ``use_mems_train`` is set
-                to :obj:`True`, one has to make sure that the train batches are correctly pre-processed, `e.g.`
-                :obj:`batch_1 = [[This line is], [This is the]]` and :obj:`batch_2 = [[ the first line], [ second
-                line]]` and that all batches are of equal size.
+            <Tip>

-    Examples::
+            For pretraining, it is recommended to set `use_mems_train` to `True`. For fine-tuning, it is
+            recommended to set `use_mems_train` to `False` as discussed [here](https://github.com/zihangdai/xlnet/issues/41#issuecomment-505102587). If `use_mems_train` is set
+            to `True`, one has to make sure that the train batches are correctly pre-processed, *e.g.*
+            `batch_1 = [[This line is], [This is the]]` and `batch_2 = [[ the first line], [ second line]]` and that all batches are of equal size.

-        >>> from transformers import XLNetConfig, XLNetModel
+            </Tip>

-        >>> # Initializing a XLNet configuration
-        >>> configuration = XLNetConfig()
+    Examples:

-        >>> # Initializing a model from the configuration
-        >>> model = XLNetModel(configuration)
+    ```python
+    >>> from transformers import XLNetConfig, XLNetModel

-        >>> # Accessing the model configuration
-        >>> configuration = model.config
-    """
+    >>> # Initializing a XLNet configuration
+    >>> configuration = XLNetConfig()
+
+    >>> # Initializing a model from the configuration
+    >>> model = XLNetModel(configuration)
+
+    >>> # Accessing the model configuration
+    >>> configuration = model.config
+    ```"""

    model_type = "xlnet"
    keys_to_ignore_at_inference = ["mems"]

--- a/src/transformers/models/xlnet/modeling_tf_xlnet.py
+++ b/src/transformers/models/xlnet/modeling_tf_xlnet.py
@@ -485,7 +485,7 @@ class TFXLNetMainLayer(tf.keras.layers.Layer):
            qlen: TODO Lysandre didn't fill
            mlen: TODO Lysandre didn't fill

-        ::
+        ```

                  same_length=False:      same_length=True:
                  <mlen > <  qlen >       <mlen > <  qlen >
@@ -494,7 +494,7 @@ class TFXLNetMainLayer(tf.keras.layers.Layer):
            qlen [0 0 0 0 0 0 0 1 1]     [1 1 0 0 0 0 0 1 1]
                 [0 0 0 0 0 0 0 0 1]     [1 1 1 0 0 0 0 0 1]
               v [0 0 0 0 0 0 0 0 0]     [1 1 1 1 0 0 0 0 0]
-
+        ```
        """
        attn_mask = tf.ones([qlen, qlen])
        mask_u = tf.linalg.band_part(attn_mask, 0, -1)
@@ -1069,15 +1069,15 @@ XLNET_START_DOCSTRING = r"""

 XLNET_INPUTS_DOCSTRING = r"""
    Args:
-        input_ids (`Numpy array` or `tf.Tensor` of shape `({0})`):
+        input_ids (`torch.LongTensor` of shape `({0})`):
            Indices of input sequence tokens in the vocabulary.

-            Indices can be obtained using [`BertTokenizer`]. See
-            [`PreTrainedTokenizer.__call__`] and [`PreTrainedTokenizer.encode`] for
+            Indices can be obtained using [`XLNetTokenizer`]. See
+            [`PreTrainedTokenizer.encode`] and [`PreTrainedTokenizer.__call__`] for
            details.

            [What are input IDs?](../glossary#input-ids)
-        attention_mask (`Numpy array` or `tf.Tensor` of shape `({0})`, *optional*):
+        attention_mask (`torch.FloatTensor` of shape `({0})`, *optional*):
            Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:

            - 1 for tokens that are **not masked**,
@@ -1089,8 +1089,8 @@ XLNET_INPUTS_DOCSTRING = r"""
            decoding. The token ids which have their past given to this model should not be passed as `input_ids`
            as they have already been computed.

-            :obj:`use_mems` has to be set to `True` to make use of `mems`.
-        perm_mask (`tf.Tensor` or `Numpy array` of shape `(batch_size, sequence_length, sequence_length)`, *optional*):
+            `use_mems` has to be set to `True` to make use of `mems`.
+        perm_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length, sequence_length)`, *optional*):
            Mask to indicate the attention pattern for each input token with values selected in `[0, 1]`:

            - if `perm_mask[k, i, j] = 0`, i attend to j in batch k;
@@ -1098,17 +1098,18 @@ XLNET_INPUTS_DOCSTRING = r"""

            If not set, each token attends to all the others (full bidirectional attention). Only used during
            pretraining (to define factorization order) or for sequential decoding (generation).
-        target_mapping (`tf.Tensor` or `Numpy array` of shape `(batch_size, num_predict, sequence_length)`, *optional*):
+        target_mapping (`torch.FloatTensor` of shape `(batch_size, num_predict, sequence_length)`, *optional*):
            Mask to indicate the output tokens to use. If `target_mapping[k, i, j] = 1`, the i-th predict in batch k
-            is on the j-th token.
-        token_type_ids (`Numpy array` or `tf.Tensor` of shape `({0})`, *optional*):
+            is on the j-th token. Only used during pretraining for partial prediction or for sequential decoding
+            (generation).
+        token_type_ids (`torch.LongTensor` of shape `({0})`, *optional*):
            Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`:

            - 0 corresponds to a *sentence A* token,
            - 1 corresponds to a *sentence B* token.

            [What are token type IDs?](../glossary#token-type-ids)
-        input_mask (`tf.Tensor` or `Numpy array` of shape `({0})`, *optional*):
+        input_mask (`torch.FloatTensor` of shape `{0}`, *optional*):
            Mask to avoid performing attention on padding token indices. Negative of `attention_mask`, i.e. with 0
            for real tokens and 1 for padding which is kept for compatibility with the original code base.

@@ -1118,30 +1119,24 @@ XLNET_INPUTS_DOCSTRING = r"""
            - 0 for tokens that are **not masked**.

            You can only uses one of `input_mask` and `attention_mask`.
-        head_mask (`Numpy array` or `tf.Tensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
+        head_mask (`torch.FloatTensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
            Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:

            - 1 indicates the head is **not masked**,
            - 0 indicates the head is **masked**.

-        inputs_embeds (`tf.Tensor` of shape `({0}, hidden_size)`, *optional*):
+        inputs_embeds (`torch.FloatTensor` of shape `({0}, hidden_size)`, *optional*):
            Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation.
            This is useful if you want more control over how to convert `input_ids` indices into associated
            vectors than the model's internal embedding lookup matrix.
        output_attentions (`bool`, *optional*):
            Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
-            tensors for more detail. This argument can be used only in eager mode, in graph mode the value in the
-            config will be used instead.
+            tensors for more detail.
        output_hidden_states (`bool`, *optional*):
            Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
-            more detail. This argument can be used only in eager mode, in graph mode the value in the config will be
-            used instead.
+            more detail.
        return_dict (`bool`, *optional*):
-            Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple. This
-            argument can be used in eager mode, in graph mode the value will always be set to True.
-        training (`bool`, *optional*, defaults to `False`):
-            Whether or not to use the model in training mode (some modules like dropout modules have different
-            behaviors between training and evaluation).
+            Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple.
 """



--- a/src/transformers/models/xlnet/tokenization_xlnet.py
+++ b/src/transformers/models/xlnet/tokenization_xlnet.py
@@ -53,70 +53,75 @@ SEG_ID_PAD = 4

 class XLNetTokenizer(PreTrainedTokenizer):
    """
-    Construct an XLNet tokenizer. Based on `SentencePiece <https://github.com/google/sentencepiece>`__.
+    Construct an XLNet tokenizer. Based on [SentencePiece](https://github.com/google/sentencepiece).

-    This tokenizer inherits from :class:`~transformers.PreTrainedTokenizer` which contains most of the main methods.
+    This tokenizer inherits from [`PreTrainedTokenizer`] which contains most of the main methods.
    Users should refer to this superclass for more information regarding those methods.

    Args:
-        vocab_file (:obj:`str`):
-            `SentencePiece <https://github.com/google/sentencepiece>`__ file (generally has a .spm extension) that
+        vocab_file (`str`):
+            [SentencePiece](https://github.com/google/sentencepiece) file (generally has a .spm extension) that
            contains the vocabulary necessary to instantiate a tokenizer.
-        do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
+        do_lower_case (`bool`, *optional*, defaults to `True`):
            Whether to lowercase the input when tokenizing.
-        remove_space (:obj:`bool`, `optional`, defaults to :obj:`True`):
+        remove_space (`bool`, *optional*, defaults to `True`):
            Whether to strip the text when tokenizing (removing excess spaces before and after the string).
-        keep_accents (:obj:`bool`, `optional`, defaults to :obj:`False`):
+        keep_accents (`bool`, *optional*, defaults to `False`):
            Whether to keep accents when tokenizing.
-        bos_token (:obj:`str`, `optional`, defaults to :obj:`"<s>"`):
+        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

-            .. note::
+            <Tip>

-                When building a sequence using special tokens, this is not the token that is used for the beginning of
-                sequence. The token used is the :obj:`cls_token`.
-        eos_token (:obj:`str`, `optional`, defaults to :obj:`"</s>"`):
+            When building a sequence using special tokens, this is not the token that is used for the beginning of
+            sequence. The token used is the `cls_token`.
+
+            </Tip>
+
+        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.

-            .. note::
+            <Tip>
+
+            When building a sequence using special tokens, this is not the token that is used for the end of
+            sequence. The token used is the `sep_token`.

-                When building a sequence using special tokens, this is not the token that is used for the end of
-                sequence. The token used is the :obj:`sep_token`.
-        unk_token (:obj:`str`, `optional`, defaults to :obj:`"<unk>"`):
+            </Tip>
+
+        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
-        sep_token (:obj:`str`, `optional`, defaults to :obj:`"<sep>"`):
+        sep_token (`str`, *optional*, defaults to `"<sep>"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
-        pad_token (:obj:`str`, `optional`, defaults to :obj:`"<pad>"`):
+        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
-        cls_token (:obj:`str`, `optional`, defaults to :obj:`"<cls>"`):
+        cls_token (`str`, *optional*, defaults to `"<cls>"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
-        mask_token (:obj:`str`, `optional`, defaults to :obj:`"<mask>"`):
+        mask_token (`str`, *optional*, defaults to `"<mask>"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
-        additional_special_tokens (:obj:`List[str]`, `optional`, defaults to :obj:`["<eop>", "<eod>"]`):
+        additional_special_tokens (`List[str]`, *optional*, defaults to `["<eop>", "<eod>"]`):
            Additional special tokens used by the tokenizer.
-        sp_model_kwargs (:obj:`dict`, `optional`):
-            Will be passed to the ``SentencePieceProcessor.__init__()`` method. The `Python wrapper for SentencePiece
-            <https://github.com/google/sentencepiece/tree/master/python>`__ can be used, among other things, to set:
+        sp_model_kwargs (`dict`, *optional*):
+            Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things, to set:

-            - ``enable_sampling``: Enable subword regularization.
-            - ``nbest_size``: Sampling parameters for unigram. Invalid for BPE-Dropout.
+            - `enable_sampling`: Enable subword regularization.
+            - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout.

-              - ``nbest_size = {0,1}``: No sampling is performed.
-              - ``nbest_size > 1``: samples from the nbest_size results.
-              - ``nbest_size < 0``: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
+              - `nbest_size = {0,1}`: No sampling is performed.
+              - `nbest_size > 1`: samples from the nbest_size results.
+              - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
                using forward-filtering-and-backward-sampling algorithm.

-            - ``alpha``: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
+            - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
              BPE-dropout.

    Attributes:
-        sp_model (:obj:`SentencePieceProcessor`):
-            The `SentencePiece` processor that is used for every conversion (string, tokens and IDs).
+        sp_model (`SentencePieceProcessor`):
+            The *SentencePiece* processor that is used for every conversion (string, tokens and IDs).
    """

    vocab_files_names = VOCAB_FILES_NAMES
@@ -251,17 +256,17 @@ class XLNetTokenizer(PreTrainedTokenizer):
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. An XLNet sequence has the following format:

-        - single sequence: ``X <sep> <cls>``
-        - pair of sequences: ``A <sep> B <sep> <cls>``
+        - single sequence: `X <sep> <cls>`
+        - pair of sequences: `A <sep> B <sep> <cls>`

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
-            :obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
+            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """
        sep = [self.sep_token_id]
        cls = [self.cls_token_id]
@@ -274,18 +279,18 @@ class XLNetTokenizer(PreTrainedTokenizer):
    ) -> List[int]:
        """
        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
-        special tokens using the tokenizer ``prepare_for_model`` method.
+        special tokens using the tokenizer `prepare_for_model` method.

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs.
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.
-            already_has_special_tokens (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
                Whether or not the token list is already formatted with special tokens for the model.

        Returns:
-            :obj:`List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
+            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
        """

        if already_has_special_tokens:
@@ -304,21 +309,21 @@ class XLNetTokenizer(PreTrainedTokenizer):
        Create a mask from the two sequences passed to be used in a sequence-pair classification task. An XLNet
        sequence pair mask has the following format:

-        ::
-
-            0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
-            | first sequence    | second sequence |
+        ```
+        0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
+        | first sequence    | second sequence |
+        ```

-        If :obj:`token_ids_1` is :obj:`None`, this method only returns the first portion of the mask (0s).
+        If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs.
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
-            :obj:`List[int]`: List of `token type IDs <../glossary.html#token-type-ids>`_ according to the given
+            `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given
            sequence(s).
        """
        sep = [self.sep_token_id]

--- a/src/transformers/models/xlnet/tokenization_xlnet_fast.py
+++ b/src/transformers/models/xlnet/tokenization_xlnet_fast.py
@@ -63,57 +63,62 @@ SEG_ID_PAD = 4

 class XLNetTokenizerFast(PreTrainedTokenizerFast):
    """
-    Construct a "fast" XLNet tokenizer (backed by HuggingFace's `tokenizers` library). Based on `Unigram
-    <https://huggingface.co/docs/tokenizers/python/latest/components.html?highlight=unigram#models>`__.
+    Construct a "fast" XLNet tokenizer (backed by HuggingFace's *tokenizers* library). Based on [Unigram](https://huggingface.co/docs/tokenizers/python/latest/components.html?highlight=unigram#models).

-    This tokenizer inherits from :class:`~transformers.PreTrainedTokenizerFast` which contains most of the main
+    This tokenizer inherits from [`PreTrainedTokenizerFast`] which contains most of the main
    methods. Users should refer to this superclass for more information regarding those methods.

    Args:
-        vocab_file (:obj:`str`):
-            `SentencePiece <https://github.com/google/sentencepiece>`__ file (generally has a .spm extension) that
+        vocab_file (`str`):
+            [SentencePiece](https://github.com/google/sentencepiece) file (generally has a .spm extension) that
            contains the vocabulary necessary to instantiate a tokenizer.
-        do_lower_case (:obj:`bool`, `optional`, defaults to :obj:`True`):
+        do_lower_case (`bool`, *optional*, defaults to `True`):
            Whether to lowercase the input when tokenizing.
-        remove_space (:obj:`bool`, `optional`, defaults to :obj:`True`):
+        remove_space (`bool`, *optional*, defaults to `True`):
            Whether to strip the text when tokenizing (removing excess spaces before and after the string).
-        keep_accents (:obj:`bool`, `optional`, defaults to :obj:`False`):
+        keep_accents (`bool`, *optional*, defaults to `False`):
            Whether to keep accents when tokenizing.
-        bos_token (:obj:`str`, `optional`, defaults to :obj:`"<s>"`):
+        bos_token (`str`, *optional*, defaults to `"<s>"`):
            The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.

-            .. note::
+            <Tip>

-                When building a sequence using special tokens, this is not the token that is used for the beginning of
-                sequence. The token used is the :obj:`cls_token`.
-        eos_token (:obj:`str`, `optional`, defaults to :obj:`"</s>"`):
+            When building a sequence using special tokens, this is not the token that is used for the beginning of
+            sequence. The token used is the `cls_token`.
+
+            </Tip>
+
+        eos_token (`str`, *optional*, defaults to `"</s>"`):
            The end of sequence token.

-            .. note::
+            <Tip>
+
+            When building a sequence using special tokens, this is not the token that is used for the end of
+            sequence. The token used is the `sep_token`.

-                When building a sequence using special tokens, this is not the token that is used for the end of
-                sequence. The token used is the :obj:`sep_token`.
-        unk_token (:obj:`str`, `optional`, defaults to :obj:`"<unk>"`):
+            </Tip>
+
+        unk_token (`str`, *optional*, defaults to `"<unk>"`):
            The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
            token instead.
-        sep_token (:obj:`str`, `optional`, defaults to :obj:`"<sep>"`):
+        sep_token (`str`, *optional*, defaults to `"<sep>"`):
            The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for
            sequence classification or for a text and a question for question answering. It is also used as the last
            token of a sequence built with special tokens.
-        pad_token (:obj:`str`, `optional`, defaults to :obj:`"<pad>"`):
+        pad_token (`str`, *optional*, defaults to `"<pad>"`):
            The token used for padding, for example when batching sequences of different lengths.
-        cls_token (:obj:`str`, `optional`, defaults to :obj:`"<cls>"`):
+        cls_token (`str`, *optional*, defaults to `"<cls>"`):
            The classifier token which is used when doing sequence classification (classification of the whole sequence
            instead of per-token classification). It is the first token of the sequence when built with special tokens.
-        mask_token (:obj:`str`, `optional`, defaults to :obj:`"<mask>"`):
+        mask_token (`str`, *optional*, defaults to `"<mask>"`):
            The token used for masking values. This is the token used when training this model with masked language
            modeling. This is the token which the model will try to predict.
-        additional_special_tokens (:obj:`List[str]`, `optional`, defaults to :obj:`["<eop>", "<eod>"]`):
+        additional_special_tokens (`List[str]`, *optional*, defaults to `["<eop>", "<eod>"]`):
            Additional special tokens used by the tokenizer.

    Attributes:
-        sp_model (:obj:`SentencePieceProcessor`):
-            The `SentencePiece` processor that is used for every conversion (string, tokens and IDs).
+        sp_model (`SentencePieceProcessor`):
+            The *SentencePiece* processor that is used for every conversion (string, tokens and IDs).
    """

    vocab_files_names = VOCAB_FILES_NAMES
@@ -173,17 +178,17 @@ class XLNetTokenizerFast(PreTrainedTokenizerFast):
        Build model inputs from a sequence or a pair of sequence for sequence classification tasks by concatenating and
        adding special tokens. An XLNet sequence has the following format:

-        - single sequence: ``X <sep> <cls>``
-        - pair of sequences: ``A <sep> B <sep> <cls>``
+        - single sequence: `X <sep> <cls>`
+        - pair of sequences: `A <sep> B <sep> <cls>`

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs to which the special tokens will be added.
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
-            :obj:`List[int]`: List of `input IDs <../glossary.html#input-ids>`__ with the appropriate special tokens.
+            `List[int]`: List of [input IDs](../glossary#input-ids) with the appropriate special tokens.
        """
        sep = [self.sep_token_id]
        cls = [self.cls_token_id]
@@ -198,21 +203,21 @@ class XLNetTokenizerFast(PreTrainedTokenizerFast):
        Create a mask from the two sequences passed to be used in a sequence-pair classification task. An XLNet
        sequence pair mask has the following format:

-        ::
-
-            0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
-            | first sequence    | second sequence |
+        ```
+        0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
+        | first sequence    | second sequence |
+        ```

-        If :obj:`token_ids_1` is :obj:`None`, this method only returns the first portion of the mask (0s).
+        If `token_ids_1` is `None`, this method only returns the first portion of the mask (0s).

        Args:
-            token_ids_0 (:obj:`List[int]`):
+            token_ids_0 (`List[int]`):
                List of IDs.
-            token_ids_1 (:obj:`List[int]`, `optional`):
+            token_ids_1 (`List[int]`, *optional*):
                Optional second list of IDs for sequence pairs.

        Returns:
-            :obj:`List[int]`: List of `token type IDs <../glossary.html#token-type-ids>`_ according to the given
+            `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given
            sequence(s).
        """
        sep = [self.sep_token_id]

--- a/src/transformers/optimization.py
+++ b/src/transformers/optimization.py
@@ -35,13 +35,13 @@ def get_constant_schedule(optimizer: Optimizer, last_epoch: int = -1):
    Create a schedule with a constant learning rate, using the learning rate set in optimizer.

    Args:
-        optimizer (:class:`~torch.optim.Optimizer`):
+        optimizer ([`~torch.optim.Optimizer`]):
            The optimizer for which to schedule the learning rate.
-        last_epoch (:obj:`int`, `optional`, defaults to -1):
+        last_epoch (`int`, *optional*, defaults to -1):
            The index of the last epoch when resuming training.

    Return:
-        :obj:`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
+        `torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
    """
    return LambdaLR(optimizer, lambda _: 1, last_epoch=last_epoch)

@@ -52,15 +52,15 @@ def get_constant_schedule_with_warmup(optimizer: Optimizer, num_warmup_steps: in
    increases linearly between 0 and the initial lr set in the optimizer.

    Args:
-        optimizer (:class:`~torch.optim.Optimizer`):
+        optimizer ([`~torch.optim.Optimizer`]):
            The optimizer for which to schedule the learning rate.
-        num_warmup_steps (:obj:`int`):
+        num_warmup_steps (`int`):
            The number of steps for the warmup phase.
-        last_epoch (:obj:`int`, `optional`, defaults to -1):
+        last_epoch (`int`, *optional*, defaults to -1):
            The index of the last epoch when resuming training.

    Return:
-        :obj:`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
+        `torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
    """

    def lr_lambda(current_step: int):
@@ -77,17 +77,17 @@ def get_linear_schedule_with_warmup(optimizer, num_warmup_steps, num_training_st
    a warmup period during which it increases linearly from 0 to the initial lr set in the optimizer.

    Args:
-        optimizer (:class:`~torch.optim.Optimizer`):
+        optimizer ([`~torch.optim.Optimizer`]):
            The optimizer for which to schedule the learning rate.
-        num_warmup_steps (:obj:`int`):
+        num_warmup_steps (`int`):
            The number of steps for the warmup phase.
-        num_training_steps (:obj:`int`):
+        num_training_steps (`int`):
            The total number of training steps.
-        last_epoch (:obj:`int`, `optional`, defaults to -1):
+        last_epoch (`int`, *optional*, defaults to -1):
            The index of the last epoch when resuming training.

    Return:
-        :obj:`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
+        `torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
    """

    def lr_lambda(current_step: int):
@@ -109,20 +109,20 @@ def get_cosine_schedule_with_warmup(
    initial lr set in the optimizer.

    Args:
-        optimizer (:class:`~torch.optim.Optimizer`):
+        optimizer ([`~torch.optim.Optimizer`]):
            The optimizer for which to schedule the learning rate.
-        num_warmup_steps (:obj:`int`):
+        num_warmup_steps (`int`):
            The number of steps for the warmup phase.
-        num_training_steps (:obj:`int`):
+        num_training_steps (`int`):
            The total number of training steps.
-        num_cycles (:obj:`float`, `optional`, defaults to 0.5):
+        num_cycles (`float`, *optional*, defaults to 0.5):
            The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0
            following a half-cosine).
-        last_epoch (:obj:`int`, `optional`, defaults to -1):
+        last_epoch (`int`, *optional*, defaults to -1):
            The index of the last epoch when resuming training.

    Return:
-        :obj:`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
+        `torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
    """

    def lr_lambda(current_step):
@@ -143,19 +143,19 @@ def get_cosine_with_hard_restarts_schedule_with_warmup(
    linearly between 0 and the initial lr set in the optimizer.

    Args:
-        optimizer (:class:`~torch.optim.Optimizer`):
+        optimizer ([`~torch.optim.Optimizer`]):
            The optimizer for which to schedule the learning rate.
-        num_warmup_steps (:obj:`int`):
+        num_warmup_steps (`int`):
            The number of steps for the warmup phase.
-        num_training_steps (:obj:`int`):
+        num_training_steps (`int`):
            The total number of training steps.
-        num_cycles (:obj:`int`, `optional`, defaults to 1):
+        num_cycles (`int`, *optional*, defaults to 1):
            The number of hard restarts to use.
-        last_epoch (:obj:`int`, `optional`, defaults to -1):
+        last_epoch (`int`, *optional*, defaults to -1):
            The index of the last epoch when resuming training.

    Return:
-        :obj:`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
+        `torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
    """

    def lr_lambda(current_step):
@@ -174,29 +174,29 @@ def get_polynomial_decay_schedule_with_warmup(
 ):
    """
    Create a schedule with a learning rate that decreases as a polynomial decay from the initial lr set in the
-    optimizer to end lr defined by `lr_end`, after a warmup period during which it increases linearly from 0 to the
+    optimizer to end lr defined by *lr_end*, after a warmup period during which it increases linearly from 0 to the
    initial lr set in the optimizer.

    Args:
-        optimizer (:class:`~torch.optim.Optimizer`):
+        optimizer ([`~torch.optim.Optimizer`]):
            The optimizer for which to schedule the learning rate.
-        num_warmup_steps (:obj:`int`):
+        num_warmup_steps (`int`):
            The number of steps for the warmup phase.
-        num_training_steps (:obj:`int`):
+        num_training_steps (`int`):
            The total number of training steps.
-        lr_end (:obj:`float`, `optional`, defaults to 1e-7):
+        lr_end (`float`, *optional*, defaults to 1e-7):
            The end LR.
-        power (:obj:`float`, `optional`, defaults to 1.0):
+        power (`float`, *optional*, defaults to 1.0):
            Power factor.
-        last_epoch (:obj:`int`, `optional`, defaults to -1):
+        last_epoch (`int`, *optional*, defaults to -1):
            The index of the last epoch when resuming training.

-    Note: `power` defaults to 1.0 as in the fairseq implementation, which in turn is based on the original BERT
+    Note: *power* defaults to 1.0 as in the fairseq implementation, which in turn is based on the original BERT
    implementation at
    https://github.com/google-research/bert/blob/f39e881b169b9d53bea03d2d341b31707a6c052b/optimization.py#L37

    Return:
-        :obj:`torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.
+        `torch.optim.lr_scheduler.LambdaLR` with the appropriate schedule.

    """

@@ -239,14 +239,14 @@ def get_scheduler(
    Unified API to get any scheduler from its name.

    Args:
-        name (:obj:`str` or `:obj:`SchedulerType`):
+        name (`str` or `SchedulerType`):
            The name of the scheduler to use.
-        optimizer (:obj:`torch.optim.Optimizer`):
+        optimizer (`torch.optim.Optimizer`):
            The optimizer that will be used during training.
-        num_warmup_steps (:obj:`int`, `optional`):
+        num_warmup_steps (`int`, *optional*):
            The number of warmup steps to do. This is not required by all schedulers (hence the argument being
            optional), the function will raise an error if it's unset and the scheduler type requires it.
-        num_training_steps (:obj:`int`, `optional`):
+        num_training_steps (`int``, *optional*):
            The number of training steps to do. This is not required by all schedulers (hence the argument being
            optional), the function will raise an error if it's unset and the scheduler type requires it.
    """
@@ -271,22 +271,21 @@ def get_scheduler(

 class AdamW(Optimizer):
    """
-    Implements Adam algorithm with weight decay fix as introduced in `Decoupled Weight Decay Regularization
-    <https://arxiv.org/abs/1711.05101>`__.
+    Implements Adam algorithm with weight decay fix as introduced in [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101).

    Parameters:
-        params (:obj:`Iterable[nn.parameter.Parameter]`):
+        params (`Iterable[nn.parameter.Parameter]`):
            Iterable of parameters to optimize or dictionaries defining parameter groups.
-        lr (:obj:`float`, `optional`, defaults to 1e-3):
+        lr (`float`, *optional*, defaults to 1e-3):
            The learning rate to use.
-        betas (:obj:`Tuple[float,float]`, `optional`, defaults to (0.9, 0.999)):
+        betas (`Tuple[float,float]`, *optional*, defaults to (0.9, 0.999)):
            Adam's betas parameters (b1, b2).
-        eps (:obj:`float`, `optional`, defaults to 1e-6):
+        eps (`float`, *optional*, defaults to 1e-6):
            Adam's epsilon for numerical stability.
-        weight_decay (:obj:`float`, `optional`, defaults to 0):
+        weight_decay (`float`, *optional*, defaults to 0):
            Decoupled weight decay to apply.
-        correct_bias (:obj:`bool`, `optional`, defaults to `True`):
-            Whether or not to correct bias in Adam (for instance, in Bert TF repository they use :obj:`False`).
+        correct_bias (`bool`, *optional*, defaults to *True*):
+            Whether or not to correct bias in Adam (for instance, in Bert TF repository they use `False`).
    """

    def __init__(
@@ -315,7 +314,7 @@ class AdamW(Optimizer):
        Performs a single optimization step.

        Arguments:
-            closure (:obj:`Callable`, `optional`): A closure that reevaluates the model and returns the loss.
+            closure (`Callable`, *optional*): A closure that reevaluates the model and returns the loss.
        """
        loss = None
        if closure is not None:
@@ -377,31 +376,31 @@ class Adafactor(Optimizer):
    AdaFactor pytorch implementation can be used as a drop in replacement for Adam original fairseq code:
    https://github.com/pytorch/fairseq/blob/master/fairseq/optim/adafactor.py

-    Paper: `Adafactor: Adaptive Learning Rates with Sublinear Memory Cost` https://arxiv.org/abs/1804.04235 Note that
-    this optimizer internally adjusts the learning rate depending on the *scale_parameter*, *relative_step* and
-    *warmup_init* options. To use a manual (external) learning rate schedule you should set `scale_parameter=False` and
+    Paper: *Adafactor: Adaptive Learning Rates with Sublinear Memory Cost* https://arxiv.org/abs/1804.04235 Note that
+    this optimizer internally adjusts the learning rate depending on the `scale_parameter`, `relative_step` and
+    `warmup_init` options. To use a manual (external) learning rate schedule you should set `scale_parameter=False` and
    `relative_step=False`.

    Arguments:
-        params (:obj:`Iterable[nn.parameter.Parameter]`):
+        params (`Iterable[nn.parameter.Parameter]`):
            Iterable of parameters to optimize or dictionaries defining parameter groups.
-        lr (:obj:`float`, `optional`):
+        lr (`float`, *optional*):
            The external learning rate.
-        eps (:obj:`Tuple[float, float]`, `optional`, defaults to (1e-30, 1e-3)):
+        eps (`Tuple[float, float]`, *optional*, defaults to (1e-30, 1e-3)):
            Regularization constants for square gradient and parameter scale respectively
-        clip_threshold (:obj:`float`, `optional`, defaults 1.0):
+        clip_threshold (`float`, *optional*, defaults 1.0):
            Threshold of root mean square of final gradient update
-        decay_rate (:obj:`float`, `optional`, defaults to -0.8):
+        decay_rate (`float`, *optional*, defaults to -0.8):
            Coefficient used to compute running averages of square
-        beta1 (:obj:`float`, `optional`):
+        beta1 (`float`, *optional*):
            Coefficient used for computing running averages of gradient
-        weight_decay (:obj:`float`, `optional`, defaults to 0):
+        weight_decay (`float`, *optional*, defaults to 0):
            Weight decay (L2 penalty)
-        scale_parameter (:obj:`bool`, `optional`, defaults to :obj:`True`):
+        scale_parameter (`bool`, *optional*, defaults to `True`):
            If True, learning rate is scaled by root mean square
-        relative_step (:obj:`bool`, `optional`, defaults to :obj:`True`):
+        relative_step (`bool`, *optional*, defaults to `True`):
            If True, time-dependent learning rate is computed instead of external learning rate
-        warmup_init (:obj:`bool`, `optional`, defaults to :obj:`False`):
+        warmup_init (`bool`, *optional*, defaults to `False`):
            Time-dependent learning rate computation depends on whether warm-up initialization is being used

    This implementation handles low-precision (FP16, bfloat) values, but we have not thoroughly tested.
@@ -410,43 +409,50 @@ class Adafactor(Optimizer):

        - Training without LR warmup or clip_threshold is not recommended.

-           * use scheduled LR warm-up to fixed LR
-           * use clip_threshold=1.0 (https://arxiv.org/abs/1804.04235)
+           - use scheduled LR warm-up to fixed LR
+           - use clip_threshold=1.0 (https://arxiv.org/abs/1804.04235)
        - Disable relative updates
        - Use scale_parameter=False
        - Additional optimizer operations like gradient clipping should not be used alongside Adafactor

-        Example::
+    Example:

-            Adafactor(model.parameters(), scale_parameter=False, relative_step=False, warmup_init=False, lr=1e-3)
+    ```python
+    Adafactor(model.parameters(), scale_parameter=False, relative_step=False, warmup_init=False, lr=1e-3)
+    ```

-        Others reported the following combination to work well::
+    Others reported the following combination to work well:

-            Adafactor(model.parameters(), scale_parameter=True, relative_step=True, warmup_init=True, lr=None)
+    ```python
+    Adafactor(model.parameters(), scale_parameter=True, relative_step=True, warmup_init=True, lr=None)
+    ```

-        When using ``lr=None`` with :class:`~transformers.Trainer` you will most likely need to use :class:`~transformers.optimization.AdafactorSchedule` scheduler as following::
+    When using `lr=None` with [`Trainer`] you will most likely need to use [`~optimization.AdafactorSchedule`] scheduler as following:

-            from transformers.optimization import Adafactor, AdafactorSchedule
-            optimizer = Adafactor(model.parameters(), scale_parameter=True, relative_step=True, warmup_init=True, lr=None)
-            lr_scheduler = AdafactorSchedule(optimizer)
-            trainer = Trainer(..., optimizers=(optimizer, lr_scheduler))
+    ```python
+    from transformers.optimization import Adafactor, AdafactorSchedule
+    optimizer = Adafactor(model.parameters(), scale_parameter=True, relative_step=True, warmup_init=True, lr=None)
+    lr_scheduler = AdafactorSchedule(optimizer)
+    trainer = Trainer(..., optimizers=(optimizer, lr_scheduler))
+    ```

-    Usage::
+    Usage:

-        # replace AdamW with Adafactor
-        optimizer = Adafactor(
-            model.parameters(),
-            lr=1e-3,
-            eps=(1e-30, 1e-3),
-            clip_threshold=1.0,
-            decay_rate=-0.8,
-            beta1=None,
-            weight_decay=0.0,
-            relative_step=False,
-            scale_parameter=False,
-            warmup_init=False
-        )
-    """
+    ```python
+    # replace AdamW with Adafactor
+    optimizer = Adafactor(
+        model.parameters(),
+        lr=1e-3,
+        eps=(1e-30, 1e-3),
+        clip_threshold=1.0,
+        decay_rate=-0.8,
+        beta1=None,
+        weight_decay=0.0,
+        relative_step=False,
+        scale_parameter=False,
+        warmup_init=False
+    )
+    ```"""

    def __init__(
        self,
@@ -605,11 +611,11 @@ class Adafactor(Optimizer):

 class AdafactorSchedule(LambdaLR):
    """
-    Since :class:`~transformers.optimization.Adafactor` performs its own scheduling, if the training loop relies on a
+    Since [`~optimization.Adafactor`] performs its own scheduling, if the training loop relies on a
    scheduler (e.g., for logging), this class creates a proxy object that retrieves the current lr values from the
    optimizer.

-    It returns ``initial_lr`` during startup and the actual ``lr`` during stepping.
+    It returns `initial_lr` during startup and the actual `lr` during stepping.
    """

    def __init__(self, optimizer, initial_lr=0.0):
@@ -636,16 +642,16 @@ class AdafactorSchedule(LambdaLR):

 def get_adafactor_schedule(optimizer, initial_lr=0.0):
    """
-    Get a proxy schedule for :class:`~transformers.optimization.Adafactor`
+    Get a proxy schedule for [`~optimization.Adafactor`]

    Args:
-        optimizer (:class:`~torch.optim.Optimizer`):
+        optimizer ([`~torch.optim.Optimizer`]):
            The optimizer for which to schedule the learning rate.
-        initial_lr (:obj:`float`, `optional`, defaults to 0.0):
+        initial_lr (`float`, *optional*, defaults to 0.0):
            Initial lr

    Return:
-        :class:`~transformers.optimization.Adafactor` proxy schedule object.
+        [`~optimization.Adafactor`] proxy schedule object.


    """

--- a/src/transformers/optimization_tf.py
+++ b/src/transformers/optimization_tf.py
@@ -26,16 +26,16 @@ class WarmUp(tf.keras.optimizers.schedules.LearningRateSchedule):
    Applies a warmup schedule on a given learning rate decay schedule.

    Args:
-        initial_learning_rate (:obj:`float`):
+        initial_learning_rate (`float`):
            The initial learning rate for the schedule after the warmup (so this will be the learning rate at the end
            of the warmup).
-        decay_schedule_fn (:obj:`Callable`):
+        decay_schedule_fn (`Callable`):
            The schedule function to apply after the warmup for the rest of training.
-        warmup_steps (:obj:`int`):
+        warmup_steps (`int`):
            The number of steps for the warmup part of training.
-        power (:obj:`float`, `optional`, defaults to 1):
+        power (`float`, *optional*, defaults to 1):
            The power to use for the polynomial warmup (defaults is a linear warmup).
-        name (:obj:`str`, `optional`):
+        name (`str`, *optional*):
            Optional name prefix for the returned tensors during the schedule.
    """

@@ -95,25 +95,25 @@ def create_optimizer(
    Creates an optimizer with a learning rate schedule using a warmup phase followed by a linear decay.

    Args:
-        init_lr (:obj:`float`):
+        init_lr (`float`):
            The desired learning rate at the end of the warmup phase.
-        num_train_steps (:obj:`int`):
+        num_train_steps (`int`):
            The total number of training steps.
-        num_warmup_steps (:obj:`int`):
+        num_warmup_steps (`int`):
            The number of warmup steps.
-        min_lr_ratio (:obj:`float`, `optional`, defaults to 0):
-            The final learning rate at the end of the linear decay will be :obj:`init_lr * min_lr_ratio`.
-        adam_beta1 (:obj:`float`, `optional`, defaults to 0.9):
+        min_lr_ratio (`float`, *optional*, defaults to 0):
+            The final learning rate at the end of the linear decay will be `init_lr * min_lr_ratio`.
+        adam_beta1 (`float`, *optional*, defaults to 0.9):
            The beta1 to use in Adam.
-        adam_beta2 (:obj:`float`, `optional`, defaults to 0.999):
+        adam_beta2 (`float`, *optional*, defaults to 0.999):
            The beta2 to use in Adam.
-        adam_epsilon (:obj:`float`, `optional`, defaults to 1e-8):
+        adam_epsilon (`float`, *optional*, defaults to 1e-8):
            The epsilon to use in Adam.
-        weight_decay_rate (:obj:`float`, `optional`, defaults to 0):
+        weight_decay_rate (`float`, *optional*, defaults to 0):
            The weight decay to use.
-        power (:obj:`float`, `optional`, defaults to 1.0):
+        power (`float`, *optional*, defaults to 1.0):
            The power to use for PolynomialDecay.
-        include_in_weight_decay (:obj:`List[str]`, `optional`):
+        include_in_weight_decay (`List[str]`, *optional*):
            List of the parameter names (or re patterns) to apply weight decay to. If none is passed, weight decay is
            applied to all parameters except bias and layer norm parameters.
    """
@@ -153,39 +153,37 @@ class AdamWeightDecay(tf.keras.optimizers.Adam):
    """
    Adam enables L2 weight decay and clip_by_global_norm on gradients. Just adding the square of the weights to the
    loss function is *not* the correct way of using L2 regularization/weight decay with Adam, since that will interact
-    with the m and v parameters in strange ways as shown in `Decoupled Weight Decay Regularization
-    <https://arxiv.org/abs/1711.05101>`__.
+    with the m and v parameters in strange ways as shown in [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101).

    Instead we want ot decay the weights in a manner that doesn't interact with the m/v parameters. This is equivalent
    to adding the square of the weights to the loss with plain (non-momentum) SGD.

    Args:
-        learning_rate (:obj:`Union[float, tf.keras.optimizers.schedules.LearningRateSchedule]`, `optional`, defaults to 1e-3):
+        learning_rate (`Union[float, tf.keras.optimizers.schedules.LearningRateSchedule]`, *optional*, defaults to 1e-3):
            The learning rate to use or a schedule.
-        beta_1 (:obj:`float`, `optional`, defaults to 0.9):
+        beta_1 (`float`, *optional*, defaults to 0.9):
            The beta1 parameter in Adam, which is the exponential decay rate for the 1st momentum estimates.
-        beta_2 (:obj:`float`, `optional`, defaults to 0.999):
+        beta_2 (`float`, *optional*, defaults to 0.999):
            The beta2 parameter in Adam, which is the exponential decay rate for the 2nd momentum estimates.
-        epsilon (:obj:`float`, `optional`, defaults to 1e-7):
+        epsilon (`float`, *optional*, defaults to 1e-7):
            The epsilon parameter in Adam, which is a small constant for numerical stability.
-        amsgrad (:obj:`bool`, `optional`, default to `False`):
-            Whether to apply AMSGrad variant of this algorithm or not, see `On the Convergence of Adam and Beyond
-            <https://arxiv.org/abs/1904.09237>`__.
-        weight_decay_rate (:obj:`float`, `optional`, defaults to 0):
+        amsgrad (`bool`, *optional*, default to *False*):
+            Whether to apply AMSGrad variant of this algorithm or not, see [On the Convergence of Adam and Beyond](https://arxiv.org/abs/1904.09237).
+        weight_decay_rate (`float`, *optional*, defaults to 0):
            The weight decay to apply.
-        include_in_weight_decay (:obj:`List[str]`, `optional`):
+        include_in_weight_decay (`List[str]`, *optional*):
            List of the parameter names (or re patterns) to apply weight decay to. If none is passed, weight decay is
-            applied to all parameters by default (unless they are in :obj:`exclude_from_weight_decay`).
-        exclude_from_weight_decay (:obj:`List[str]`, `optional`):
+            applied to all parameters by default (unless they are in `exclude_from_weight_decay`).
+        exclude_from_weight_decay (`List[str]`, *optional*):
            List of the parameter names (or re patterns) to exclude from applying weight decay to. If a
-            :obj:`include_in_weight_decay` is passed, the names in it will supersede this list.
-        name (:obj:`str`, `optional`, defaults to 'AdamWeightDecay'):
+            `include_in_weight_decay` is passed, the names in it will supersede this list.
+        name (`str`, *optional*, defaults to 'AdamWeightDecay'):
            Optional name for the operations created when applying gradients.
        kwargs:
-            Keyword arguments. Allowed to be {``clipnorm``, ``clipvalue``, ``lr``, ``decay``}. ``clipnorm`` is clip
-            gradients by norm; ``clipvalue`` is clip gradients by value, ``decay`` is included for backward
-            compatibility to allow time inverse decay of learning rate. ``lr`` is included for backward compatibility,
-            recommended to use ``learning_rate`` instead.
+            Keyword arguments. Allowed to be {`clipnorm`, `clipvalue`, `lr`, `decay`}. `clipnorm` is clip
+            gradients by norm; `clipvalue` is clip gradients by value, `decay` is included for backward
+            compatibility to allow time inverse decay of learning rate. `lr` is included for backward compatibility,
+            recommended to use `learning_rate` instead.
    """

    def __init__(
@@ -283,7 +281,7 @@ class GradientAccumulator(object):
    """
    Gradient accumulation utility. When used with a distribution strategy, the accumulator should be called in a
    replica context. Gradients will be accumulated locally on each replica and without synchronization. Users should
-    then call ``.gradients``, scale the gradients if required, and pass the result to ``apply_gradients``.
+    then call `.gradients`, scale the gradients if required, and pass the result to `apply_gradients`.
    """

    # We use the ON_READ synchronization policy so that no synchronization is
@@ -316,7 +314,7 @@ class GradientAccumulator(object):
        return list(gradient.value() if gradient is not None else gradient for gradient in self._gradients)

    def __call__(self, gradients):
-        """Accumulates :obj:`gradients` on the current replica."""
+        """Accumulates `gradients` on the current replica."""
        if not self._gradients:
            _ = self.step  # Create the step variable.
            self._gradients.extend(

--- a/src/transformers/pipelines/__init__.py
+++ b/src/transformers/pipelines/__init__.py
@@ -317,28 +317,28 @@ def check_task(task: str) -> Tuple[Dict, Any]:
    default models if they exist.

    Args:
-        task (:obj:`str`):
+        task (`str`):
            The task defining which pipeline will be returned. Currently accepted tasks are:

-            - :obj:`"audio-classification"`
-            - :obj:`"automatic-speech-recognition"`
-            - :obj:`"conversational"`
-            - :obj:`"feature-extraction"`
-            - :obj:`"fill-mask"`
-            - :obj:`"image-classification"`
-            - :obj:`"question-answering"`
-            - :obj:`"table-question-answering"`
-            - :obj:`"text2text-generation"`
-            - :obj:`"text-classification"` (alias :obj:`"sentiment-analysis" available)
-            - :obj:`"text-generation"`
-            - :obj:`"token-classification"` (alias :obj:`"ner"` available)
-            - :obj:`"translation"`
-            - :obj:`"translation_xx_to_yy"`
-            - :obj:`"summarization"`
-            - :obj:`"zero-shot-classification"`
+            - `"audio-classification"`
+            - `"automatic-speech-recognition"`
+            - `"conversational"`
+            - `"feature-extraction"`
+            - `"fill-mask"`
+            - `"image-classification"`
+            - `"question-answering"`
+            - `"table-question-answering"`
+            - `"text2text-generation"`
+            - `"text-classification"` (alias `"sentiment-analysis"` available)
+            - `"text-generation"`
+            - `"token-classification"` (alias `"ner"` available)
+            - `"translation"`
+            - `"translation_xx_to_yy"`
+            - `"summarization"`
+            - `"zero-shot-classification"`

    Returns:
-        (task_defaults:obj:`dict`, task_options: (:obj:`tuple`, None)) The actual dictionary required to initialize the
+        (task_defaults`dict`, task_options: (`tuple`, None)) The actual dictionary required to initialize the
        pipeline and some extra task options for parametrized tasks like "translation_XX_to_YY"


@@ -374,114 +374,114 @@ def pipeline(
    **kwargs
 ) -> Pipeline:
    """
-    Utility factory method to build a :class:`~transformers.Pipeline`.
+    Utility factory method to build a [`Pipeline`].

    Pipelines are made of:

-        - A :doc:`tokenizer <tokenizer>` in charge of mapping raw textual input to token.
-        - A :doc:`model <model>` to make predictions from the inputs.
+        - A [tokenizer](tokenizer) in charge of mapping raw textual input to token.
+        - A [model](model) to make predictions from the inputs.
        - Some (optional) post processing for enhancing model's output.

    Args:
-        task (:obj:`str`):
+        task (`str`):
            The task defining which pipeline will be returned. Currently accepted tasks are:

-            - :obj:`"audio-classification"`: will return a :class:`~transformers.AudioClassificationPipeline`:.
-            - :obj:`"automatic-speech-recognition"`: will return a
-              :class:`~transformers.AutomaticSpeechRecognitionPipeline`:.
-            - :obj:`"conversational"`: will return a :class:`~transformers.ConversationalPipeline`:.
-            - :obj:`"feature-extraction"`: will return a :class:`~transformers.FeatureExtractionPipeline`:.
-            - :obj:`"fill-mask"`: will return a :class:`~transformers.FillMaskPipeline`:.
-            - :obj:`"image-classification"`: will return a :class:`~transformers.ImageClassificationPipeline`:.
-            - :obj:`"question-answering"`: will return a :class:`~transformers.QuestionAnsweringPipeline`:.
-            - :obj:`"table-question-answering"`: will return a :class:`~transformers.TableQuestionAnsweringPipeline`:.
-            - :obj:`"text2text-generation"`: will return a :class:`~transformers.Text2TextGenerationPipeline`:.
-            - :obj:`"text-classification"` (alias :obj:`"sentiment-analysis" available): will return a
-              :class:`~transformers.TextClassificationPipeline`:.
-            - :obj:`"text-generation"`: will return a :class:`~transformers.TextGenerationPipeline`:.
-            - :obj:`"token-classification"` (alias :obj:`"ner"` available): will return a
-              :class:`~transformers.TokenClassificationPipeline`:.
-            - :obj:`"translation"`: will return a :class:`~transformers.TranslationPipeline`:.
-            - :obj:`"translation_xx_to_yy"`: will return a :class:`~transformers.TranslationPipeline`:.
-            - :obj:`"summarization"`: will return a :class:`~transformers.SummarizationPipeline`:.
-            - :obj:`"zero-shot-classification"`: will return a :class:`~transformers.ZeroShotClassificationPipeline`:.
-
-        model (:obj:`str` or :class:`~transformers.PreTrainedModel` or :class:`~transformers.TFPreTrainedModel`, `optional`):
+            - `"audio-classification"`: will return a [`AudioClassificationPipeline`].
+            - `"automatic-speech-recognition"`: will return a
+              [`AutomaticSpeechRecognitionPipeline`].
+            - `"conversational"`: will return a [`ConversationalPipeline`].
+            - `"feature-extraction"`: will return a [`FeatureExtractionPipeline`].
+            - `"fill-mask"`: will return a [`FillMaskPipeline`]:.
+            - `"image-classification"`: will return a [`ImageClassificationPipeline`].
+            - `"question-answering"`: will return a [`QuestionAnsweringPipeline`].
+            - `"table-question-answering"`: will return a [`TableQuestionAnsweringPipeline`].
+            - `"text2text-generation"`: will return a [`Text2TextGenerationPipeline`].
+            - `"text-classification"` (alias `"sentiment-analysis"` available): will return a
+              [`TextClassificationPipeline`].
+            - `"text-generation"`: will return a [`TextGenerationPipeline`]:.
+            - `"token-classification"` (alias `"ner"` available): will return a
+              [`TokenClassificationPipeline`].
+            - `"translation"`: will return a [`TranslationPipeline`].
+            - `"translation_xx_to_yy"`: will return a [`TranslationPipeline`].
+            - `"summarization"`: will return a [`SummarizationPipeline`].
+            - `"zero-shot-classification"`: will return a [`ZeroShotClassificationPipeline`].
+
+        model (`str` or [`PreTrainedModel`] or [`TFPreTrainedModel`], *optional*):
            The model that will be used by the pipeline to make predictions. This can be a model identifier or an
-            actual instance of a pretrained model inheriting from :class:`~transformers.PreTrainedModel` (for PyTorch)
-            or :class:`~transformers.TFPreTrainedModel` (for TensorFlow).
+            actual instance of a pretrained model inheriting from [`PreTrainedModel`] (for PyTorch)
+            or [`TFPreTrainedModel`] (for TensorFlow).

-            If not provided, the default for the :obj:`task` will be loaded.
-        config (:obj:`str` or :class:`~transformers.PretrainedConfig`, `optional`):
+            If not provided, the default for the `task` will be loaded.
+        config (`str` or [`PretrainedConfig`], *optional*):
            The configuration that will be used by the pipeline to instantiate the model. This can be a model
            identifier or an actual pretrained model configuration inheriting from
-            :class:`~transformers.PretrainedConfig`.
+            [`PretrainedConfig`].

            If not provided, the default configuration file for the requested model will be used. That means that if
-            :obj:`model` is given, its default configuration will be used. However, if :obj:`model` is not supplied,
-            this :obj:`task`'s default model's config is used instead.
-        tokenizer (:obj:`str` or :class:`~transformers.PreTrainedTokenizer`, `optional`):
+            `model` is given, its default configuration will be used. However, if `model` is not supplied,
+            this `task`'s default model's config is used instead.
+        tokenizer (`str` or [`PreTrainedTokenizer`], *optional*):
            The tokenizer that will be used by the pipeline to encode data for the model. This can be a model
-            identifier or an actual pretrained tokenizer inheriting from :class:`~transformers.PreTrainedTokenizer`.
+            identifier or an actual pretrained tokenizer inheriting from [`PreTrainedTokenizer`].

-            If not provided, the default tokenizer for the given :obj:`model` will be loaded (if it is a string). If
-            :obj:`model` is not specified or not a string, then the default tokenizer for :obj:`config` is loaded (if
-            it is a string). However, if :obj:`config` is also not given or not a string, then the default tokenizer
-            for the given :obj:`task` will be loaded.
-        feature_extractor (:obj:`str` or :class:`~transformers.PreTrainedFeatureExtractor`, `optional`):
+            If not provided, the default tokenizer for the given `model` will be loaded (if it is a string). If
+            `model` is not specified or not a string, then the default tokenizer for `config` is loaded (if
+            it is a string). However, if `config` is also not given or not a string, then the default tokenizer
+            for the given `task` will be loaded.
+        feature_extractor (`str` or [`PreTrainedFeatureExtractor`], *optional*):
            The feature extractor that will be used by the pipeline to encode data for the model. This can be a model
            identifier or an actual pretrained feature extractor inheriting from
-            :class:`~transformers.PreTrainedFeatureExtractor`.
+            [`PreTrainedFeatureExtractor`].

            Feature extractors are used for non-NLP models, such as Speech or Vision models as well as multi-modal
            models. Multi-modal models will also require a tokenizer to be passed.

-            If not provided, the default feature extractor for the given :obj:`model` will be loaded (if it is a
-            string). If :obj:`model` is not specified or not a string, then the default feature extractor for
-            :obj:`config` is loaded (if it is a string). However, if :obj:`config` is also not given or not a string,
-            then the default feature extractor for the given :obj:`task` will be loaded.
-        framework (:obj:`str`, `optional`):
-            The framework to use, either :obj:`"pt"` for PyTorch or :obj:`"tf"` for TensorFlow. The specified framework
+            If not provided, the default feature extractor for the given `model` will be loaded (if it is a
+            string). If `model` is not specified or not a string, then the default feature extractor for
+            `config` is loaded (if it is a string). However, if `config` is also not given or not a string,
+            then the default feature extractor for the given `task` will be loaded.
+        framework (`str`, *optional*):
+            The framework to use, either `"pt"` for PyTorch or `"tf"` for TensorFlow. The specified framework
            must be installed.

            If no framework is specified, will default to the one currently installed. If no framework is specified and
-            both frameworks are installed, will default to the framework of the :obj:`model`, or to PyTorch if no model
+            both frameworks are installed, will default to the framework of the `model`, or to PyTorch if no model
            is provided.
-        revision(:obj:`str`, `optional`, defaults to :obj:`"main"`):
+        revision(`str`, *optional*, defaults to `"main"`):
            When passing a task name or a string model identifier: The specific model version to use. It can be a
            branch name, a tag name, or a commit id, since we use a git-based system for storing models and other
-            artifacts on huggingface.co, so ``revision`` can be any identifier allowed by git.
-        use_fast (:obj:`bool`, `optional`, defaults to :obj:`True`):
-            Whether or not to use a Fast tokenizer if possible (a :class:`~transformers.PreTrainedTokenizerFast`).
-        use_auth_token (:obj:`str` or `bool`, `optional`):
-            The token to use as HTTP bearer authorization for remote files. If :obj:`True`, will use the token
-            generated when running :obj:`transformers-cli login` (stored in :obj:`~/.huggingface`).
-            revision(:obj:`str`, `optional`, defaults to :obj:`"main"`):
+            artifacts on huggingface.co, so `revision` can be any identifier allowed by git.
+        use_fast (`bool`, *optional*, defaults to `True`):
+            Whether or not to use a Fast tokenizer if possible (a [`PreTrainedTokenizerFast`]).
+        use_auth_token (`str` or *bool*, *optional*):
+            The token to use as HTTP bearer authorization for remote files. If `True`, will use the token
+            generated when running `transformers-cli login` (stored in `~/.huggingface`).
+            revision(`str`, *optional*, defaults to `"main"`):
        model_kwargs:
-            Additional dictionary of keyword arguments passed along to the model's :obj:`from_pretrained(...,
-            **model_kwargs)` function.
+            Additional dictionary of keyword arguments passed along to the model's `from_pretrained(..., **model_kwargs)` function.
        kwargs:
            Additional keyword arguments passed along to the specific pipeline init (see the documentation for the
            corresponding pipeline class for possible values).

    Returns:
-        :class:`~transformers.Pipeline`: A suitable pipeline for the task.
+        [`Pipeline`]: A suitable pipeline for the task.

-    Examples::
+    Examples:

-        >>> from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
+    ```python
+    >>> from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer

-        >>> # Sentiment analysis pipeline
-        >>> pipeline('sentiment-analysis')
+    >>> # Sentiment analysis pipeline
+    >>> pipeline('sentiment-analysis')

-        >>> # Question answering pipeline, specifying the checkpoint identifier
-        >>> pipeline('question-answering', model='distilbert-base-cased-distilled-squad', tokenizer='bert-base-cased')
+    >>> # Question answering pipeline, specifying the checkpoint identifier
+    >>> pipeline('question-answering', model='distilbert-base-cased-distilled-squad', tokenizer='bert-base-cased')

-        >>> # Named entity recognition pipeline, passing in a specific model and tokenizer
-        >>> model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
-        >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
-        >>> pipeline('ner', model=model, tokenizer=tokenizer)
-    """
+    >>> # Named entity recognition pipeline, passing in a specific model and tokenizer
+    >>> model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
+    >>> tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
+    >>> pipeline('ner', model=model, tokenizer=tokenizer)
+    ```"""
    if model_kwargs is None:
        model_kwargs = {}


--- a/src/transformers/pipelines/audio_classification.py
+++ b/src/transformers/pipelines/audio_classification.py
@@ -66,15 +66,14 @@ def ffmpeg_read(bpayload: bytes, sampling_rate: int) -> np.array:
 @add_end_docstrings(PIPELINE_INIT_ARGS)
 class AudioClassificationPipeline(Pipeline):
    """
-    Audio classification pipeline using any :obj:`AutoModelForAudioClassification`. This pipeline predicts the class of
+    Audio classification pipeline using any `AutoModelForAudioClassification`. This pipeline predicts the class of
    a raw waveform or an audio file. In case of an audio file, ffmpeg should be installed to support multiple audio
    formats.

-    This pipeline can currently be loaded from :func:`~transformers.pipeline` using the following task identifier:
-    :obj:`"audio-classification"`.
+    This pipeline can currently be loaded from [`pipeline`] using the following task identifier:
+    `"audio-classification"`.

-    See the list of available models on `huggingface.co/models
-    <https://huggingface.co/models?filter=audio-classification>`__.
+    See the list of available models on [huggingface.co/models](https://huggingface.co/models?filter=audio-classification).
    """

    def __init__(self, *args, **kwargs):
@@ -93,26 +92,26 @@ class AudioClassificationPipeline(Pipeline):
        **kwargs,
    ):
        """
-        Classify the sequence(s) given as inputs. See the :class:`~transformers.AutomaticSpeechRecognitionPipeline`
+        Classify the sequence(s) given as inputs. See the [`AutomaticSpeechRecognitionPipeline`]
        documentation for more information.

        Args:
-            inputs (:obj:`np.ndarray` or :obj:`bytes` or :obj:`str`):
-                The inputs is either a raw waveform (:obj:`np.ndarray` of shape (n, ) of type :obj:`np.float32` or
-                :obj:`np.float64`) at the correct sampling rate (no further check will be done) or a :obj:`str` that is
+            inputs (`np.ndarray` or `bytes` or `str`):
+                The inputs is either a raw waveform (`np.ndarray` of shape (n, ) of type `np.float32` or
+                `np.float64`) at the correct sampling rate (no further check will be done) or a `str` that is
                the filename of the audio file, the file will be read at the correct sampling rate to get the waveform
-                using `ffmpeg`. This requires `ffmpeg` to be installed on the system. If `inputs` is :obj:`bytes` it is
-                supposed to be the content of an audio file and is interpreted by `ffmpeg` in the same way.
-            top_k (:obj:`int`, `optional`, defaults to None):
-                The number of top labels that will be returned by the pipeline. If the provided number is `None` or
+                using *ffmpeg*. This requires *ffmpeg* to be installed on the system. If *inputs* is `bytes` it is
+                supposed to be the content of an audio file and is interpreted by *ffmpeg* in the same way.
+            top_k (`int`, *optional*, defaults to None):
+                The number of top labels that will be returned by the pipeline. If the provided number is *None* or
                higher than the number of labels available in the model configuration, it will default to the number of
                labels.

        Return:
-            A list of :obj:`dict` with the following keys:
+            A list of `dict` with the following keys:

-            - **label** (:obj:`str`) -- The label predicted.
-            - **score** (:obj:`float`) -- The corresponding probability.
+            - **label** (`str`) -- The label predicted.
+            - **score** (`float`) -- The corresponding probability.
        """
        return super().__call__(inputs, **kwargs)


--- a/src/transformers/pipelines/automatic_speech_recognition.py
+++ b/src/transformers/pipelines/automatic_speech_recognition.py
@@ -77,25 +77,25 @@ class AutomaticSpeechRecognitionPipeline(Pipeline):
    def __init__(self, feature_extractor: Union["SequenceFeatureExtractor", str], *args, **kwargs):
        """
        Arguments:
-            feature_extractor (:class:`~transformers.SequenceFeatureExtractor`):
+            feature_extractor ([`SequenceFeatureExtractor`]):
                The feature extractor that will be used by the pipeline to encode waveform for the model.
-            model (:class:`~transformers.PreTrainedModel` or :class:`~transformers.TFPreTrainedModel`):
+            model ([`PreTrainedModel`] or [`TFPreTrainedModel`]):
                The model that will be used by the pipeline to make predictions. This needs to be a model inheriting
-                from :class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel`
+                from [`PreTrainedModel`] for PyTorch and [`TFPreTrainedModel`]
                for TensorFlow.
-            tokenizer (:class:`~transformers.PreTrainedTokenizer`):
+            tokenizer ([`PreTrainedTokenizer`]):
                The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from
-                :class:`~transformers.PreTrainedTokenizer`.
-            modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`):
+                [`PreTrainedTokenizer`].
+            modelcard (`str` or [`ModelCard`], *optional*):
                Model card attributed to the model for this pipeline.
-            framework (:obj:`str`, `optional`):
-                The framework to use, either :obj:`"pt"` for PyTorch or :obj:`"tf"` for TensorFlow. The specified
+            framework (`str`, *optional*):
+                The framework to use, either `"pt"` for PyTorch or `"tf"` for TensorFlow. The specified
                framework must be installed.

                If no framework is specified, will default to the one currently installed. If no framework is specified
-                and both frameworks are installed, will default to the framework of the :obj:`model`, or to PyTorch if
+                and both frameworks are installed, will default to the framework of the `model`, or to PyTorch if
                no model is provided.
-            device (:obj:`int`, `optional`, defaults to -1):
+            device (`int`, *optional*, defaults to -1):
                Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the
                model on the associated CUDA device id.
        """
@@ -114,21 +114,21 @@ class AutomaticSpeechRecognitionPipeline(Pipeline):
        **kwargs,
    ):
        """
-        Classify the sequence(s) given as inputs. See the :class:`~transformers.AutomaticSpeechRecognitionPipeline`
+        Classify the sequence(s) given as inputs. See the [`AutomaticSpeechRecognitionPipeline`]
        documentation for more information.

        Args:
-            inputs (:obj:`np.ndarray` or :obj:`bytes` or :obj:`str`):
-                The inputs is either a raw waveform (:obj:`np.ndarray` of shape (n, ) of type :obj:`np.float32` or
-                :obj:`np.float64`) at the correct sampling rate (no further check will be done) or a :obj:`str` that is
+            inputs (`np.ndarray` or `bytes` or `str`):
+                The inputs is either a raw waveform (`np.ndarray` of shape (n, ) of type `np.float32` or
+                `np.float64`) at the correct sampling rate (no further check will be done) or a `str` that is
                the filename of the audio file, the file will be read at the correct sampling rate to get the waveform
-                using `ffmpeg`. This requires `ffmpeg` to be installed on the system. If `inputs` is :obj:`bytes` it is
-                supposed to be the content of an audio file and is interpreted by `ffmpeg` in the same way.
+                using *ffmpeg*. This requires *ffmpeg* to be installed on the system. If *inputs* is `bytes` it is
+                supposed to be the content of an audio file and is interpreted by *ffmpeg* in the same way.

        Return:
-            A :obj:`dict` with the following keys:
+            A `dict` with the following keys:

-            - **text** (:obj:`str`) -- The recognized text.
+            - **text** (`str`) -- The recognized text.
        """
        return super().__call__(inputs, **kwargs)


--- a/src/transformers/pipelines/base.py
+++ b/src/transformers/pipelines/base.py
@@ -145,30 +145,29 @@ def infer_framework_load_model(
    **model_kwargs
 ):
    """
-    Select framework (TensorFlow or PyTorch) to use from the :obj:`model` passed. Returns a tuple (framework, model).
+    Select framework (TensorFlow or PyTorch) to use from the `model` passed. Returns a tuple (framework, model).

-    If :obj:`model` is instantiated, this function will just infer the framework from the model class. Otherwise
-    :obj:`model` is actually a checkpoint name and this method will try to instantiate it using :obj:`model_classes`.
+    If `model` is instantiated, this function will just infer the framework from the model class. Otherwise
+    `model` is actually a checkpoint name and this method will try to instantiate it using `model_classes`.
    Since we don't want to instantiate the model twice, this model is returned for use by the pipeline.

-    If both frameworks are installed and available for :obj:`model`, PyTorch is selected.
+    If both frameworks are installed and available for `model`, PyTorch is selected.

    Args:
-        model (:obj:`str`, :class:`~transformers.PreTrainedModel` or :class:`~transformers.TFPreTrainedModel`):
-            The model to infer the framework from. If :obj:`str`, a checkpoint name. The model to infer the framewrok
+        model (`str`, [`PreTrainedModel`] or [`TFPreTrainedModel`]):
+            The model to infer the framework from. If `str`, a checkpoint name. The model to infer the framewrok
            from.
-        config (:class:`~transformers.AutoConfig`):
+        config ([`AutoConfig`]):
            The config associated with the model to help using the correct class
-        model_classes (dictionary :obj:`str` to :obj:`type`, `optional`):
+        model_classes (dictionary `str` to `type`, *optional*):
            A mapping framework to class.
-        task (:obj:`str`):
+        task (`str`):
            The task defining which pipeline will be returned.
        model_kwargs:
-            Additional dictionary of keyword arguments passed along to the model's :obj:`from_pretrained(...,
-            **model_kwargs)` function.
+            Additional dictionary of keyword arguments passed along to the model's `from_pretrained(..., **model_kwargs)` function.

    Returns:
-        :obj:`Tuple`: A tuple framework, model.
+        `Tuple`: A tuple framework, model.
    """
    if not is_tf_available() and not is_torch_available():
        raise RuntimeError(
@@ -242,28 +241,27 @@ def infer_framework_from_model(
    **model_kwargs
 ):
    """
-    Select framework (TensorFlow or PyTorch) to use from the :obj:`model` passed. Returns a tuple (framework, model).
+    Select framework (TensorFlow or PyTorch) to use from the `model` passed. Returns a tuple (framework, model).

-    If :obj:`model` is instantiated, this function will just infer the framework from the model class. Otherwise
-    :obj:`model` is actually a checkpoint name and this method will try to instantiate it using :obj:`model_classes`.
+    If `model` is instantiated, this function will just infer the framework from the model class. Otherwise
+    `model` is actually a checkpoint name and this method will try to instantiate it using `model_classes`.
    Since we don't want to instantiate the model twice, this model is returned for use by the pipeline.

-    If both frameworks are installed and available for :obj:`model`, PyTorch is selected.
+    If both frameworks are installed and available for `model`, PyTorch is selected.

    Args:
-        model (:obj:`str`, :class:`~transformers.PreTrainedModel` or :class:`~transformers.TFPreTrainedModel`):
-            The model to infer the framework from. If :obj:`str`, a checkpoint name. The model to infer the framewrok
+        model (`str`, [`PreTrainedModel`] or [`TFPreTrainedModel`]):
+            The model to infer the framework from. If `str`, a checkpoint name. The model to infer the framewrok
            from.
-        model_classes (dictionary :obj:`str` to :obj:`type`, `optional`):
+        model_classes (dictionary `str` to `type`, *optional*):
            A mapping framework to class.
-        task (:obj:`str`):
+        task (`str`):
            The task defining which pipeline will be returned.
        model_kwargs:
-            Additional dictionary of keyword arguments passed along to the model's :obj:`from_pretrained(...,
-            **model_kwargs)` function.
+            Additional dictionary of keyword arguments passed along to the model's `from_pretrained(..., **model_kwargs)` function.

    Returns:
-        :obj:`Tuple`: A tuple framework, model.
+        `Tuple`: A tuple framework, model.
    """
    if isinstance(model, str):
        config = AutoConfig.from_pretrained(model, _from_pipeline=task, **model_kwargs)
@@ -279,7 +277,7 @@ def get_framework(model, revision: Optional[str] = None):
    Select framework (TensorFlow or PyTorch) to use.

    Args:
-        model (:obj:`str`, :class:`~transformers.PreTrainedModel` or :class:`~transformers.TFPreTrainedModel`):
+        model (`str`, [`PreTrainedModel`] or [`TFPreTrainedModel`]):
            If both frameworks are installed, picks the one corresponding to the model passed (either a model class or
            the model name). If no specific model is provided, defaults to using PyTorch.
    """
@@ -313,19 +311,19 @@ def get_default_model(targeted_task: Dict, framework: Optional[str], task_option
    Select a default model to use for a given task. Defaults to pytorch if ambiguous.

    Args:
-        targeted_task (:obj:`Dict` ):
+        targeted_task (`Dict` ):
           Dictionary representing the given task, that should contain default models

-        framework (:obj:`str`, None)
+        framework (`str`, None)
           "pt", "tf" or None, representing a specific framework if it was specified, or None if we don't know yet.

-        task_options (:obj:`Any`, None)
+        task_options (`Any`, None)
           Any further value required by the task to get fully specified, for instance (SRC, TGT) languages for
           translation task.

    Returns

-        :obj:`str` The model string representing the default model for this pipeline
+        `str` The model string representing the default model for this pipeline
    """
    if is_torch_available() and not is_tf_available():
        framework = "pt"
@@ -352,12 +350,12 @@ def get_default_model(targeted_task: Dict, framework: Optional[str], task_option

 class PipelineException(Exception):
    """
-    Raised by a :class:`~transformers.Pipeline` when handling __call__.
+    Raised by a [`Pipeline`] when handling __call__.

    Args:
-        task (:obj:`str`): The task of the pipeline.
-        model (:obj:`str`): The model used by the pipeline.
-        reason (:obj:`str`): The error message to display.
+        task (`str`): The task of the pipeline.
+        model (`str`): The model used by the pipeline.
+        reason (`str`): The error message to display.
    """

    def __init__(self, task: str, model: str, reason: str):
@@ -369,7 +367,7 @@ class PipelineException(Exception):

 class ArgumentHandler(ABC):
    """
-    Base interface for handling arguments for each :class:`~transformers.pipelines.Pipeline`.
+    Base interface for handling arguments for each [`~pipelines.Pipeline`].
    """

    @abstractmethod
@@ -386,15 +384,15 @@ class PipelineDataFormat:
    - CSV
    - stdin/stdout (pipe)

-    :obj:`PipelineDataFormat` also includes some utilities to work with multi-columns like mapping from datasets
-    columns to pipelines keyword arguments through the :obj:`dataset_kwarg_1=dataset_column_1` format.
+    `PipelineDataFormat` also includes some utilities to work with multi-columns like mapping from datasets
+    columns to pipelines keyword arguments through the `dataset_kwarg_1=dataset_column_1` format.

    Args:
-        output_path (:obj:`str`, `optional`): Where to save the outgoing data.
-        input_path (:obj:`str`, `optional`): Where to look for the input data.
-        column (:obj:`str`, `optional`): The column to read.
-        overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`):
-            Whether or not to overwrite the :obj:`output_path`.
+        output_path (`str`, *optional*): Where to save the outgoing data.
+        input_path (`str`, *optional*): Where to look for the input data.
+        column (`str`, *optional*): The column to read.
+        overwrite (`bool`, *optional*, defaults to `False`):
+            Whether or not to overwrite the `output_path`.
    """

    SUPPORTED_FORMATS = ["json", "csv", "pipe"]
@@ -430,10 +428,10 @@ class PipelineDataFormat:
    def save(self, data: Union[dict, List[dict]]):
        """
        Save the provided data object with the representation for the current
-        :class:`~transformers.pipelines.PipelineDataFormat`.
+        [`~pipelines.PipelineDataFormat`].

        Args:
-            data (:obj:`dict` or list of :obj:`dict`): The data to store.
+            data (`dict` or list of `dict`): The data to store.
        """
        raise NotImplementedError()

@@ -442,10 +440,10 @@ class PipelineDataFormat:
        Save the provided data object as a pickle-formatted binary data on the disk.

        Args:
-            data (:obj:`dict` or list of :obj:`dict`): The data to store.
+            data (`dict` or list of `dict`): The data to store.

        Returns:
-            :obj:`str`: Path where the data has been saved.
+            `str`: Path where the data has been saved.
        """
        path, _ = os.path.splitext(self.output_path)
        binary_path = os.path.extsep.join((path, "pickle"))
@@ -464,23 +462,23 @@ class PipelineDataFormat:
        overwrite=False,
    ) -> "PipelineDataFormat":
        """
-        Creates an instance of the right subclass of :class:`~transformers.pipelines.PipelineDataFormat` depending on
-        :obj:`format`.
+        Creates an instance of the right subclass of [`~pipelines.PipelineDataFormat`] depending on
+        `format`.

        Args:
-            format: (:obj:`str`):
-                The format of the desired pipeline. Acceptable values are :obj:`"json"`, :obj:`"csv"` or :obj:`"pipe"`.
-            output_path (:obj:`str`, `optional`):
+            format: (`str`):
+                The format of the desired pipeline. Acceptable values are `"json"`, `"csv"` or `"pipe"`.
+            output_path (`str`, *optional*):
                Where to save the outgoing data.
-            input_path (:obj:`str`, `optional`):
+            input_path (`str`, *optional*):
                Where to look for the input data.
-            column (:obj:`str`, `optional`):
+            column (`str`, *optional*):
                The column to read.
-            overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`):
-                Whether or not to overwrite the :obj:`output_path`.
+            overwrite (`bool`, *optional*, defaults to `False`):
+                Whether or not to overwrite the `output_path`.

        Returns:
-            :class:`~transformers.pipelines.PipelineDataFormat`: The proper data format.
+            [`~pipelines.PipelineDataFormat`]: The proper data format.
        """
        if format == "json":
            return JsonPipelineDataFormat(output_path, input_path, column, overwrite=overwrite)
@@ -497,11 +495,11 @@ class CsvPipelineDataFormat(PipelineDataFormat):
    Support for pipelines using CSV data format.

    Args:
-        output_path (:obj:`str`, `optional`): Where to save the outgoing data.
-        input_path (:obj:`str`, `optional`): Where to look for the input data.
-        column (:obj:`str`, `optional`): The column to read.
-        overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`):
-            Whether or not to overwrite the :obj:`output_path`.
+        output_path (`str`, *optional*): Where to save the outgoing data.
+        input_path (`str`, *optional*): Where to look for the input data.
+        column (`str`, *optional*): The column to read.
+        overwrite (`bool`, *optional*, defaults to `False`):
+            Whether or not to overwrite the `output_path`.
    """

    def __init__(
@@ -525,10 +523,10 @@ class CsvPipelineDataFormat(PipelineDataFormat):
    def save(self, data: List[dict]):
        """
        Save the provided data object with the representation for the current
-        :class:`~transformers.pipelines.PipelineDataFormat`.
+        [`~pipelines.PipelineDataFormat`].

        Args:
-            data (:obj:`List[dict]`): The data to store.
+            data (`List[dict]`): The data to store.
        """
        with open(self.output_path, "w") as f:
            if len(data) > 0:
@@ -542,11 +540,11 @@ class JsonPipelineDataFormat(PipelineDataFormat):
    Support for pipelines using JSON file format.

    Args:
-        output_path (:obj:`str`, `optional`): Where to save the outgoing data.
-        input_path (:obj:`str`, `optional`): Where to look for the input data.
-        column (:obj:`str`, `optional`): The column to read.
-        overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`):
-            Whether or not to overwrite the :obj:`output_path`.
+        output_path (`str`, *optional*): Where to save the outgoing data.
+        input_path (`str`, *optional*): Where to look for the input data.
+        column (`str`, *optional*): The column to read.
+        overwrite (`bool`, *optional*, defaults to `False`):
+            Whether or not to overwrite the `output_path`.
    """

    def __init__(
@@ -573,7 +571,7 @@ class JsonPipelineDataFormat(PipelineDataFormat):
        Save the provided data object in a json file.

        Args:
-            data (:obj:`dict`): The data to store.
+            data (`dict`): The data to store.
        """
        with open(self.output_path, "w") as f:
            json.dump(data, f)
@@ -586,11 +584,11 @@ class PipedPipelineDataFormat(PipelineDataFormat):
    If columns are provided, then the output will be a dictionary with {column_x: value_x}

    Args:
-        output_path (:obj:`str`, `optional`): Where to save the outgoing data.
-        input_path (:obj:`str`, `optional`): Where to look for the input data.
-        column (:obj:`str`, `optional`): The column to read.
-        overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`):
-            Whether or not to overwrite the :obj:`output_path`.
+        output_path (`str`, *optional*): Where to save the outgoing data.
+        input_path (`str`, *optional*): Where to look for the input data.
+        column (`str`, *optional*): The column to read.
+        overwrite (`bool`, *optional*, defaults to `False`):
+            Whether or not to overwrite the `output_path`.
    """

    def __iter__(self):
@@ -614,7 +612,7 @@ class PipedPipelineDataFormat(PipelineDataFormat):
        Print the data.

        Args:
-            data (:obj:`dict`): The data to store.
+            data (`dict`): The data to store.
        """
        print(data)

@@ -644,37 +642,36 @@ class _ScikitCompat(ABC):

 PIPELINE_INIT_ARGS = r"""
    Arguments:
-        model (:class:`~transformers.PreTrainedModel` or :class:`~transformers.TFPreTrainedModel`):
+        model ([`PreTrainedModel`] or [`TFPreTrainedModel`]):
            The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from
-            :class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for
+            [`PreTrainedModel`] for PyTorch and [`TFPreTrainedModel`] for
            TensorFlow.
-        tokenizer (:class:`~transformers.PreTrainedTokenizer`):
+        tokenizer ([`PreTrainedTokenizer`]):
            The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from
-            :class:`~transformers.PreTrainedTokenizer`.
-        modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`):
+            [`PreTrainedTokenizer`].
+        modelcard (`str` or [`ModelCard`], *optional*):
            Model card attributed to the model for this pipeline.
-        framework (:obj:`str`, `optional`):
-            The framework to use, either :obj:`"pt"` for PyTorch or :obj:`"tf"` for TensorFlow. The specified framework
+        framework (`str`, *optional*):
+            The framework to use, either `"pt"` for PyTorch or `"tf"` for TensorFlow. The specified framework
            must be installed.

            If no framework is specified, will default to the one currently installed. If no framework is specified and
-            both frameworks are installed, will default to the framework of the :obj:`model`, or to PyTorch if no model
+            both frameworks are installed, will default to the framework of the `model`, or to PyTorch if no model
            is provided.
-        task (:obj:`str`, defaults to :obj:`""`):
+        task (`str`, defaults to `""`):
            A task-identifier for the pipeline.
-        num_workers (:obj:`int`, `optional`, defaults to 8):
-            When the pipeline will use `DataLoader` (when passing a dataset, on GPU for a Pytorch model), the number of
+        num_workers (`int`, *optional*, defaults to 8):
+            When the pipeline will use *DataLoader* (when passing a dataset, on GPU for a Pytorch model), the number of
            workers to be used.
-        batch_size (:obj:`int`, `optional`, defaults to 1):
-            When the pipeline will use `DataLoader` (when passing a dataset, on GPU for a Pytorch model), the size of
-            the batch to use, for inference this is not always beneficial, please read `Batching with pipelines
-            <https://huggingface.co/transformers/main_classes/pipelines.html#pipeline-batching>`_ .
-        args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`):
+        batch_size (`int`, *optional*, defaults to 1):
+            When the pipeline will use *DataLoader* (when passing a dataset, on GPU for a Pytorch model), the size of
+            the batch to use, for inference this is not always beneficial, please read [Batching with pipelines](https://huggingface.co/transformers/main_classes/pipelines.html#pipeline-batching) .
+        args_parser ([`~pipelines.ArgumentHandler`], *optional*):
            Reference to the object in charge of parsing supplied pipeline parameters.
-        device (:obj:`int`, `optional`, defaults to -1):
+        device (`int`, *optional*, defaults to -1):
            Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on
            the associated CUDA device id.
-        binary_output (:obj:`bool`, `optional`, defaults to :obj:`False`):
+        binary_output (`bool`, *optional*, defaults to `False`):
            Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.
 """

@@ -699,29 +696,29 @@ if is_torch_available():
            """
            Roughly equivalent to

-            .. code-block::
-                for item in loader:
-                    yield infer(item, **params)
+            ```python
+            for item in loader:
+                yield infer(item, **params)
+            ```

            Arguments:
-                loader (:obj:`torch.utils.data.DataLoader` or any iterator):
-                    The iterator that will be used to apply :obj:`infer` on.
+                loader (`torch.utils.data.DataLoader` or any iterator):
+                    The iterator that will be used to apply `infer` on.
                infer (any function):
-                    The function to apply of each element of :obj:`loader`.
-                params (:obj:`dict`):
-                    The parameters passed to :obj:`infer` along with every item
-                loader_batch_size (:obj:`int`, `optional`):
-                    If specified, the items of :obj:`loader` are supposed to come as batch, and are loader_batched here
+                    The function to apply of each element of `loader`.
+                params (`dict`):
+                    The parameters passed to `infer` along with every item
+                loader_batch_size (`int`, *optional*):
+                    If specified, the items of `loader` are supposed to come as batch, and are loader_batched here
                    making it roughly behave as


-                    .. code-block::
-
-                        for items in loader:
-                            for i in loader_batch_size:
-                                item = items[i]
-                                yield infer(item, **params)
-            """
+            ```python
+            for items in loader:
+                for i in loader_batch_size:
+                    item = items[i]
+                    yield infer(item, **params)
+            ```"""
            self.loader = loader
            self.infer = infer
            self.params = params
@@ -815,9 +812,9 @@ class Pipeline(_ScikitCompat):

    Pipeline supports running on CPU or GPU through the device argument (see below).

-    Some pipeline, like for instance :class:`~transformers.FeatureExtractionPipeline` (:obj:`'feature-extraction'` )
+    Some pipeline, like for instance [`FeatureExtractionPipeline`] (`'feature-extraction'`)
    output large tensor object as nested-lists. In order to avoid dumping such large structure as textual data we
-    provide the :obj:`binary_output` constructor argument. If set to :obj:`True`, the output will be stored in the
+    provide the `binary_output` constructor argument. If set to `True`, the output will be stored in the
    pickle format.
    """

@@ -866,7 +863,7 @@ class Pipeline(_ScikitCompat):
        Save the pipeline's model and tokenizer.

        Args:
-            save_directory (:obj:`str`):
+            save_directory (`str`):
                A path to the directory where to saved. It will be created if it doesn't exist.
        """
        if os.path.isfile(save_directory):
@@ -905,14 +902,15 @@ class Pipeline(_ScikitCompat):
        Returns:
            Context manager

-        Examples::
+        Examples:

-            # Explicitly ask for tensor allocation on CUDA device :0
-            pipe = pipeline(..., device=0)
-            with pipe.device_placement():
-                # Every framework specific tensor allocation will be done on the request device
-                output = pipe(...)
-        """
+        ```python
+        # Explicitly ask for tensor allocation on CUDA device :0
+        pipe = pipeline(..., device=0)
+        with pipe.device_placement():
+            # Every framework specific tensor allocation will be done on the request device
+            output = pipe(...)
+        ```"""
        if self.framework == "tf":
            with tf.device("/CPU:0" if self.device == -1 else f"/device:GPU:{self.device}"):
                yield
@@ -927,11 +925,11 @@ class Pipeline(_ScikitCompat):
        Ensure PyTorch tensors are on the specified device.

        Args:
-            inputs (keyword arguments that should be :obj:`torch.Tensor`, the rest is ignored): The tensors to place on :obj:`self.device`.
+            inputs (keyword arguments that should be `torch.Tensor`, the rest is ignored): The tensors to place on `self.device`.
            Recursive on lists **only**.

        Return:
-            :obj:`Dict[str, torch.Tensor]`: The same as :obj:`inputs` but on the proper device.
+            `Dict[str, torch.Tensor]`: The same as `inputs` but on the proper device.
        """
        return self._ensure_tensor_on_device(inputs, self.device)

@@ -958,7 +956,7 @@ class Pipeline(_ScikitCompat):
        Check if the model class is in supported by the pipeline.

        Args:
-            supported_models (:obj:`List[str]` or :obj:`dict`):
+            supported_models (`List[str]` or `dict`):
                The list of models supported by the pipeline, or a dictionary with model class values.
        """
        if not isinstance(supported_models, list):  # Create from a model mapping

--- a/src/transformers/pipelines/conversational.py
+++ b/src/transformers/pipelines/conversational.py
@@ -19,41 +19,42 @@ logger = logging.get_logger(__name__)
 class Conversation:
    """
    Utility class containing a conversation and its history. This class is meant to be used as an input to the
-    :class:`~transformers.ConversationalPipeline`. The conversation contains a number of utility function to manage the
+    [`ConversationalPipeline`]. The conversation contains a number of utility function to manage the
    addition of new user input and generated model responses. A conversation needs to contain an unprocessed user input
-    before being passed to the :class:`~transformers.ConversationalPipeline`. This user input is either created when
-    the class is instantiated, or by calling :obj:`conversational_pipeline.append_response("input")` after a
+    before being passed to the [`ConversationalPipeline`]. This user input is either created when
+    the class is instantiated, or by calling `conversational_pipeline.append_response("input")` after a
    conversation turn.

    Arguments:
-        text (:obj:`str`, `optional`):
+        text (`str`, *optional*):
            The initial user input to start the conversation. If not provided, a user input needs to be provided
-            manually using the :meth:`~transformers.Conversation.add_user_input` method before the conversation can
+            manually using the [`~Conversation.add_user_input`] method before the conversation can
            begin.
-        conversation_id (:obj:`uuid.UUID`, `optional`):
+        conversation_id (`uuid.UUID`, *optional*):
            Unique identifier for the conversation. If not provided, a random UUID4 id will be assigned to the
            conversation.
-        past_user_inputs (:obj:`List[str]`, `optional`):
+        past_user_inputs (`List[str]`, *optional*):
            Eventual past history of the conversation of the user. You don't need to pass it manually if you use the
-            pipeline interactively but if you want to recreate history you need to set both :obj:`past_user_inputs` and
-            :obj:`generated_responses` with equal length lists of strings
-        generated_responses (:obj:`List[str]`, `optional`):
+            pipeline interactively but if you want to recreate history you need to set both `past_user_inputs` and
+            `generated_responses` with equal length lists of strings
+        generated_responses (`List[str]`, *optional*):
            Eventual past history of the conversation of the model. You don't need to pass it manually if you use the
-            pipeline interactively but if you want to recreate history you need to set both :obj:`past_user_inputs` and
-            :obj:`generated_responses` with equal length lists of strings
+            pipeline interactively but if you want to recreate history you need to set both `past_user_inputs` and
+            `generated_responses` with equal length lists of strings

-    Usage::
+    Usage:

-        conversation = Conversation("Going to the movies tonight - any suggestions?")
+    ```python
+    conversation = Conversation("Going to the movies tonight - any suggestions?")

-        # Steps usually performed by the model when generating a response:
-        # 1. Mark the user input as processed (moved to the history)
-        conversation.mark_processed()
-        # 2. Append a mode response
-        conversation.append_response("The Big lebowski.")
+    # Steps usually performed by the model when generating a response:
+    # 1. Mark the user input as processed (moved to the history)
+    conversation.mark_processed()
+    # 2. Append a mode response
+    conversation.append_response("The Big lebowski.")

-        conversation.add_user_input("Is it good?")
-    """
+    conversation.add_user_input("Is it good?")
+    ```"""

    def __init__(
        self, text: str = None, conversation_id: uuid.UUID = None, past_user_inputs=None, generated_responses=None
@@ -83,12 +84,12 @@ class Conversation:

    def add_user_input(self, text: str, overwrite: bool = False):
        """
-        Add a user input to the conversation for the next round. This populates the internal :obj:`new_user_input`
+        Add a user input to the conversation for the next round. This populates the internal `new_user_input`
        field.

        Args:
-            text (:obj:`str`): The user input for the next conversation round.
-            overwrite (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            text (`str`): The user input for the next conversation round.
+            overwrite (`bool`, *optional*, defaults to `False`):
                Whether or not existing and unprocessed user input should be overwritten when this function is called.
        """
        if self.new_user_input:
@@ -108,8 +109,8 @@ class Conversation:

    def mark_processed(self):
        """
-        Mark the conversation as processed (moves the content of :obj:`new_user_input` to :obj:`past_user_inputs`) and
-        empties the :obj:`new_user_input` field.
+        Mark the conversation as processed (moves the content of `new_user_input` to `past_user_inputs`) and
+        empties the `new_user_input` field.
        """
        if self.new_user_input:
            self.past_user_inputs.append(self.new_user_input)
@@ -120,7 +121,7 @@ class Conversation:
        Append a response to the list of generated responses.

        Args:
-            response (:obj:`str`): The model generated response.
+            response (`str`): The model generated response.
        """
        self.generated_responses.append(response)

@@ -128,8 +129,8 @@ class Conversation:
        """
        Iterates over all blobs of the conversation.

-        Returns: Iterator of (is_user, text_chunk) in chronological order of the conversation. ``is_user`` is a
-        :obj:`bool`, ``text_chunks`` is a :obj:`str`.
+        Returns: Iterator of (is_user, text_chunk) in chronological order of the conversation. `is_user` is a
+        `bool`, `text_chunks` is a `str`.
        """
        for user_input, generated_response in zip(self.past_user_inputs, self.generated_responses):
            yield True, user_input
@@ -142,7 +143,7 @@ class Conversation:
        Generates a string representation of the conversation.

        Return:
-            :obj:`str`:
+            `str`:

            Example: Conversation id: 7d15686b-dc94-49f2-9c4b-c9eac6a1f114 user >> Going to the movies tonight - any
            suggestions? bot >> The Big Lebowski
@@ -157,9 +158,9 @@ class Conversation:
 @add_end_docstrings(
    PIPELINE_INIT_ARGS,
    r"""
-        min_length_for_response (:obj:`int`, `optional`, defaults to 32):
+        min_length_for_response (`int`, *optional*, defaults to 32):
            The minimum length (in number of tokens) for a response.
-        minimum_tokens (:obj:`int`, `optional`, defaults to 10):
+        minimum_tokens (`int`, *optional*, defaults to 10):
            The minimum length of tokens to leave for a response.
    """,
 )
@@ -167,28 +168,28 @@ class ConversationalPipeline(Pipeline):
    """
    Multi-turn conversational pipeline.

-    This conversational pipeline can currently be loaded from :func:`~transformers.pipeline` using the following task
-    identifier: :obj:`"conversational"`.
+    This conversational pipeline can currently be loaded from [`pipeline`] using the following task
+    identifier: `"conversational"`.

    The models that this pipeline can use are models that have been fine-tuned on a multi-turn conversational task,
-    currently: `'microsoft/DialoGPT-small'`, `'microsoft/DialoGPT-medium'`, `'microsoft/DialoGPT-large'`. See the
-    up-to-date list of available models on `huggingface.co/models
-    <https://huggingface.co/models?filter=conversational>`__.
+    currently: *'microsoft/DialoGPT-small'*, *'microsoft/DialoGPT-medium'*, *'microsoft/DialoGPT-large'*. See the
+    up-to-date list of available models on [huggingface.co/models](https://huggingface.co/models?filter=conversational).

-    Usage::
+    Usage:

-        conversational_pipeline = pipeline("conversational")
+    ```python
+    conversational_pipeline = pipeline("conversational")

-        conversation_1 = Conversation("Going to the movies tonight - any suggestions?")
-        conversation_2 = Conversation("What's the last book you have read?")
+    conversation_1 = Conversation("Going to the movies tonight - any suggestions?")
+    conversation_2 = Conversation("What's the last book you have read?")

-        conversational_pipeline([conversation_1, conversation_2])
+    conversational_pipeline([conversation_1, conversation_2])

-        conversation_1.add_user_input("Is it an action movie?")
-        conversation_2.add_user_input("What is the genre of this book?")
+    conversation_1.add_user_input("Is it an action movie?")
+    conversation_2.add_user_input("What is the genre of this book?")

-        conversational_pipeline([conversation_1, conversation_2])
-    """
+    conversational_pipeline([conversation_1, conversation_2])
+    ```"""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
@@ -222,16 +223,16 @@ class ConversationalPipeline(Pipeline):
        Generate responses for the conversation(s) given as inputs.

        Args:
-            conversations (a :class:`~transformers.Conversation` or a list of :class:`~transformers.Conversation`):
+            conversations (a [`Conversation`] or a list of [`Conversation`]):
                Conversations to generate responses for.
-            clean_up_tokenization_spaces (:obj:`bool`, `optional`, defaults to :obj:`False`):
+            clean_up_tokenization_spaces (`bool`, *optional*, defaults to `False`):
                Whether or not to clean up the potential extra spaces in the text output.
            generate_kwargs:
                Additional keyword arguments to pass along to the generate method of the model (see the generate method
-                corresponding to your framework `here <./model.html#generative-models>`__).
+                corresponding to your framework [here](./model#generative-models)).

        Returns:
-            :class:`~transformers.Conversation` or a list of :class:`~transformers.Conversation`: Conversation(s) with
+            [`Conversation`] or a list of [`Conversation`]: Conversation(s) with
            updated generated responses for those containing a new user input.
        """
        # XXX: num_workers==0 is required to be backward compatible

--- a/src/transformers/pipelines/feature_extraction.py
+++ b/src/transformers/pipelines/feature_extraction.py
@@ -9,34 +9,34 @@ class FeatureExtractionPipeline(Pipeline):
    Feature extraction pipeline using no model head. This pipeline extracts the hidden states from the base
    transformer, which can be used as features in downstream tasks.

-    This feature extraction pipeline can currently be loaded from :func:`~transformers.pipeline` using the task
-    identifier: :obj:`"feature-extraction"`.
+    This feature extraction pipeline can currently be loaded from [`pipeline`] using the task
+    identifier: `"feature-extraction"`.

    All models may be used for this pipeline. See a list of all models, including community-contributed models on
-    `huggingface.co/models <https://huggingface.co/models>`__.
+    [huggingface.co/models](https://huggingface.co/models).

    Arguments:
-        model (:class:`~transformers.PreTrainedModel` or :class:`~transformers.TFPreTrainedModel`):
+        model ([`PreTrainedModel`] or [`TFPreTrainedModel`]):
            The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from
-            :class:`~transformers.PreTrainedModel` for PyTorch and :class:`~transformers.TFPreTrainedModel` for
+            [`PreTrainedModel`] for PyTorch and [`TFPreTrainedModel`] for
            TensorFlow.
-        tokenizer (:class:`~transformers.PreTrainedTokenizer`):
+        tokenizer ([`PreTrainedTokenizer`]):
            The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from
-            :class:`~transformers.PreTrainedTokenizer`.
-        modelcard (:obj:`str` or :class:`~transformers.ModelCard`, `optional`):
+            [`PreTrainedTokenizer`].
+        modelcard (`str` or [`ModelCard`], *optional*):
            Model card attributed to the model for this pipeline.
-        framework (:obj:`str`, `optional`):
-            The framework to use, either :obj:`"pt"` for PyTorch or :obj:`"tf"` for TensorFlow. The specified framework
+        framework (`str`, *optional*):
+            The framework to use, either `"pt"` for PyTorch or `"tf"` for TensorFlow. The specified framework
            must be installed.

            If no framework is specified, will default to the one currently installed. If no framework is specified and
-            both frameworks are installed, will default to the framework of the :obj:`model`, or to PyTorch if no model
+            both frameworks are installed, will default to the framework of the `model`, or to PyTorch if no model
            is provided.
-        task (:obj:`str`, defaults to :obj:`""`):
+        task (`str`, defaults to `""`):
            A task-identifier for the pipeline.
-        args_parser (:class:`~transformers.pipelines.ArgumentHandler`, `optional`):
+        args_parser ([`~pipelines.ArgumentHandler`], *optional*):
            Reference to the object in charge of parsing supplied pipeline parameters.
-        device (:obj:`int`, `optional`, defaults to -1):
+        device (`int`, *optional*, defaults to -1):
            Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on
            the associated CUDA device id.
    """
@@ -72,9 +72,9 @@ class FeatureExtractionPipeline(Pipeline):
        Extract the features of the input(s).

        Args:
-            args (:obj:`str` or :obj:`List[str]`): One or several texts (or one list of texts) to get the features of.
+            args (`str` or `List[str]`): One or several texts (or one list of texts) to get the features of.

        Return:
-            A nested list of :obj:`float`: The features computed by the model.
+            A nested list of `float`: The features computed by the model.
        """
        return super().__call__(*args, **kwargs)
--- a/src/transformers/pipelines/fill_mask.py
+++ b/src/transformers/pipelines/fill_mask.py
@@ -21,9 +21,9 @@ logger = logging.get_logger(__name__)
 @add_end_docstrings(
    PIPELINE_INIT_ARGS,
    r"""
-        top_k (:obj:`int`, defaults to 5):
+        top_k (`int`, defaults to 5):
            The number of predictions to return.
-        targets (:obj:`str` or :obj:`List[str]`, `optional`):
+        targets (`str` or `List[str]`, *optional*):
            When passed, the model will limit the scores to the passed targets instead of looking up in the whole
            vocab. If the provided targets are not in the model vocab, they will be tokenized and the first resulting
            token will be used (with a warning, and that might be slower).
@@ -32,22 +32,23 @@ logger = logging.get_logger(__name__)
 )
 class FillMaskPipeline(Pipeline):
    """
-    Masked language modeling prediction pipeline using any :obj:`ModelWithLMHead`. See the `masked language modeling
-    examples <../task_summary.html#masked-language-modeling>`__ for more information.
+    Masked language modeling prediction pipeline using any `ModelWithLMHead`. See the [masked language modeling
+    examples](../task_summary#masked-language-modeling) for more information.

-    This mask filling pipeline can currently be loaded from :func:`~transformers.pipeline` using the following task
-    identifier: :obj:`"fill-mask"`.
+    This mask filling pipeline can currently be loaded from [`pipeline`] using the following task
+    identifier: `"fill-mask"`.

    The models that this pipeline can use are models that have been trained with a masked language modeling objective,
    which includes the bi-directional models in the library. See the up-to-date list of available models on
-    `huggingface.co/models <https://huggingface.co/models?filter=fill-mask>`__.
+    [huggingface.co/models](https://huggingface.co/models?filter=fill-mask).

-    .. note::
+    <Tip>

-        This pipeline only works for inputs with exactly one token masked. Experimental: We added support for multiple
-        masks. The returned values are raw model output, and correspond to disjoint probabilities where one might
-        expect joint probabilities (See `discussion <https://github.com/huggingface/transformers/pull/10222>`__).
-    """
+    This pipeline only works for inputs with exactly one token masked. Experimental: We added support for multiple
+    masks. The returned values are raw model output, and correspond to disjoint probabilities where one might
+    expect joint probabilities (See [discussion](https://github.com/huggingface/transformers/pull/10222)).
+
+    </Tip>"""

    def get_masked_index(self, input_ids: GenericTensor) -> np.ndarray:
        if self.framework == "tf":
@@ -205,22 +206,22 @@ class FillMaskPipeline(Pipeline):
        Fill the masked token in the text(s) given as inputs.

        Args:
-            args (:obj:`str` or :obj:`List[str]`):
+            args (`str` or `List[str]`):
                One or several texts (or one list of prompts) with masked tokens.
-            targets (:obj:`str` or :obj:`List[str]`, `optional`):
+            targets (`str` or `List[str]`, *optional*):
                When passed, the model will limit the scores to the passed targets instead of looking up in the whole
                vocab. If the provided targets are not in the model vocab, they will be tokenized and the first
                resulting token will be used (with a warning, and that might be slower).
-            top_k (:obj:`int`, `optional`):
+            top_k (`int`, *optional*):
                When passed, overrides the number of predictions to return.

        Return:
-            A list or a list of list of :obj:`dict`: Each result comes as list of dictionaries with the following keys:
+            A list or a list of list of `dict`: Each result comes as list of dictionaries with the following keys:

-            - **sequence** (:obj:`str`) -- The corresponding input with the mask token prediction.
-            - **score** (:obj:`float`) -- The corresponding probability.
-            - **token** (:obj:`int`) -- The predicted token id (to replace the masked one).
-            - **token** (:obj:`str`) -- The predicted token (to replace the masked one).
+            - **sequence** (`str`) -- The corresponding input with the mask token prediction.
+            - **score** (`float`) -- The corresponding probability.
+            - **token** (`int`) -- The predicted token id (to replace the masked one).
+            - **token** (`str`) -- The predicted token (to replace the masked one).
        """
        outputs = super().__call__(inputs, **kwargs)
        if isinstance(inputs, list) and len(inputs) == 1:

--- a/src/transformers/pipelines/image_classification.py
+++ b/src/transformers/pipelines/image_classification.py
@@ -19,14 +19,13 @@ logger = logging.get_logger(__name__)
 @add_end_docstrings(PIPELINE_INIT_ARGS)
 class ImageClassificationPipeline(Pipeline):
    """
-    Image classification pipeline using any :obj:`AutoModelForImageClassification`. This pipeline predicts the class of
+    Image classification pipeline using any `AutoModelForImageClassification`. This pipeline predicts the class of
    an image.

-    This image classification pipeline can currently be loaded from :func:`~transformers.pipeline` using the following
-    task identifier: :obj:`"image-classification"`.
+    This image classification pipeline can currently be loaded from [`pipeline`] using the following
+    task identifier: `"image-classification"`.

-    See the list of available models on `huggingface.co/models
-    <https://huggingface.co/models?filter=image-classification>`__.
+    See the list of available models on [huggingface.co/models](https://huggingface.co/models?filter=image-classification).
    """

    def __init__(self, *args, **kwargs):
@@ -49,7 +48,7 @@ class ImageClassificationPipeline(Pipeline):
        Assign labels to the image(s) passed as inputs.

        Args:
-            images (:obj:`str`, :obj:`List[str]`, :obj:`PIL.Image` or :obj:`List[PIL.Image]`):
+            images (`str`, `List[str]`, `PIL.Image` or `List[PIL.Image]`):
                The pipeline handles three types of images:

                - A string containing a http link pointing to an image
@@ -59,7 +58,7 @@ class ImageClassificationPipeline(Pipeline):
                The pipeline accepts either a single image or a batch of images, which must then be passed as a string.
                Images in a batch must all be in the same format: all as http links, all as local paths, or all as PIL
                images.
-            top_k (:obj:`int`, `optional`, defaults to 5):
+            top_k (`int`, *optional*, defaults to 5):
                The number of top labels that will be returned by the pipeline. If the provided number is higher than
                the number of labels available in the model configuration, it will default to the number of labels.

@@ -70,8 +69,8 @@ class ImageClassificationPipeline(Pipeline):

            The dictionaries contain the following keys:

-            - **label** (:obj:`str`) -- The label identified by the model.
-            - **score** (:obj:`int`) -- The score attributed by the model for that label.
+            - **label** (`str`) -- The label identified by the model.
+            - **score** (`int`) -- The score attributed by the model for that label.
        """
        return super().__call__(images, **kwargs)


--- a/src/transformers/pipelines/image_segmentation.py
+++ b/src/transformers/pipelines/image_segmentation.py
@@ -29,14 +29,13 @@ Predictions = List[Prediction]
 @add_end_docstrings(PIPELINE_INIT_ARGS)
 class ImageSegmentationPipeline(Pipeline):
    """
-    Image segmentation pipeline using any :obj:`AutoModelForImageSegmentation`. This pipeline predicts masks of objects
+    Image segmentation pipeline using any `AutoModelForImageSegmentation`. This pipeline predicts masks of objects
    and their classes.

-    This image segmntation pipeline can currently be loaded from :func:`~transformers.pipeline` using the following
-    task identifier: :obj:`"image-segmentation"`.
+    This image segmntation pipeline can currently be loaded from [`pipeline`] using the following
+    task identifier: `"image-segmentation"`.

-    See the list of available models on `huggingface.co/models
-    <https://huggingface.co/models?filter=image-segmentation>`__.
+    See the list of available models on [huggingface.co/models](https://huggingface.co/models?filter=image-segmentation).
    """

    def __init__(self, *args, **kwargs):
@@ -61,7 +60,7 @@ class ImageSegmentationPipeline(Pipeline):
        Perform segmentation (detect masks & classes) in the image(s) passed as inputs.

        Args:
-            images (:obj:`str`, :obj:`List[str]`, :obj:`PIL.Image` or :obj:`List[PIL.Image]`):
+            images (`str`, `List[str]`, `PIL.Image` or `List[PIL.Image]`):
                The pipeline handles three types of images:

                - A string containing an HTTP(S) link pointing to an image
@@ -70,9 +69,9 @@ class ImageSegmentationPipeline(Pipeline):

                The pipeline accepts either a single image or a batch of images. Images in a batch must all be in the
                same format: all as HTTP(S) links, all as local paths, or all as PIL images.
-            threshold (:obj:`float`, `optional`, defaults to 0.9):
+            threshold (`float`, *optional*, defaults to 0.9):
                The probability necessary to make a prediction.
-            mask_threshold (:obj:`float`, `optional`, defaults to 0.5):
+            mask_threshold (`float`, *optional*, defaults to 0.5):
                Threshold to use when turning the predicted masks into binary values.

        Return:
@@ -82,9 +81,9 @@ class ImageSegmentationPipeline(Pipeline):

            The dictionaries contain the following keys:

-            - **label** (:obj:`str`) -- The class label identified by the model.
-            - **score** (:obj:`float`) -- The score attributed by the model for that label.
-            - **mask** (:obj:`str`) -- base64 string of a grayscale (single-channel) PNG image that contain masks
+            - **label** (`str`) -- The class label identified by the model.
+            - **score** (`float`) -- The score attributed by the model for that label.
+            - **mask** (`str`) -- base64 string of a grayscale (single-channel) PNG image that contain masks
              information. The PNG image has size (heigth, width) of the original image. Pixel values in the image are
              either 0 or 255 (i.e. mask is absent VS mask is present).
        """
@@ -130,7 +129,8 @@ class ImageSegmentationPipeline(Pipeline):
        Turns mask numpy array into mask base64 str.

        Args:
-            mask (np.array): Numpy array (with shape (heigth, width) of the original image) containing masks information. Values in the array are either 0 or 255 (i.e. mask is absent VS mask is present).
+            mask (`np.array`): Numpy array (with shape (heigth, width) of the original image) containing masks
+                information. Values in the array are either 0 or 255 (i.e. mask is absent VS mask is present).

        Returns:
            A base64 string of a single-channel PNG image that contain masks information.