Unverified Commit bcc87c63 authored by Connor Brinton's avatar Connor Brinton Committed by GitHub
Browse files

Minor documentation revisions from copyediting (#9266)

* typo: Revise "checkout" to "check out"

* typo: Change "seemlessly" to "seamlessly"

* typo: Close parentheses in "Using the tokenizer"

* typo: Add closing parenthesis to supported models aside

* docs: Treat ``position_ids`` as plural

Alternatively, the word "argument" could be added to make the subject singular.

* docs: Remove comma, making subordinate clause

* docs: Remove comma separating verb and direct object

* docs: Fix typo ("next" -> "text")

* docs: Reverse phrase order to simplify sentence

* docs: "quicktour" -> "quick tour"

* docs: "to throw" -> "from throwing"

* docs: Remove disruptive newline in padding/truncation section

* docs: "show exemplary" -> "show examples of"

* docs: "much harder as" -> "much harder than"

* docs: Fix typo "seach" -> "search"

* docs: Fix subject-verb disagreement in WordPiece description

* docs: Fix style in preprocessing.rst
parent d5db6c37
...@@ -226,7 +226,7 @@ Contrary to RNNs that have the position of each token embedded within them, tran ...@@ -226,7 +226,7 @@ Contrary to RNNs that have the position of each token embedded within them, tran
each token. Therefore, the position IDs (``position_ids``) are used by the model to identify each token's position in each token. Therefore, the position IDs (``position_ids``) are used by the model to identify each token's position in
the list of tokens. the list of tokens.
They are an optional parameter. If no ``position_ids`` is passed to the model, the IDs are automatically created as They are an optional parameter. If no ``position_ids`` are passed to the model, the IDs are automatically created as
absolute positional embeddings. absolute positional embeddings.
Absolute positional embeddings are selected in the range ``[0, config.max_position_embeddings - 1]``. Some models use Absolute positional embeddings are selected in the range ``[0, config.max_position_embeddings - 1]``. Some models use
......
...@@ -16,7 +16,7 @@ Summary of the models ...@@ -16,7 +16,7 @@ Summary of the models
This is a summary of the models available in 🤗 Transformers. It assumes youre familiar with the original `transformer This is a summary of the models available in 🤗 Transformers. It assumes youre familiar with the original `transformer
model <https://arxiv.org/abs/1706.03762>`_. For a gentle introduction check the `annotated transformer model <https://arxiv.org/abs/1706.03762>`_. For a gentle introduction check the `annotated transformer
<http://nlp.seas.harvard.edu/2018/04/03/attention.html>`_. Here we focus on the high-level differences between the <http://nlp.seas.harvard.edu/2018/04/03/attention.html>`_. Here we focus on the high-level differences between the
models. You can check them more in detail in their respective documentation. Also checkout the :doc:`pretrained model models. You can check them more in detail in their respective documentation. Also check out the :doc:`pretrained model
page </pretrained_models>` to see the checkpoints available for each type of model and all `the community models page </pretrained_models>` to see the checkpoints available for each type of model and all `the community models
<https://huggingface.co/models>`_. <https://huggingface.co/models>`_.
...@@ -30,7 +30,7 @@ Each one of the models in the library falls into one of the following categories ...@@ -30,7 +30,7 @@ Each one of the models in the library falls into one of the following categories
Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the
previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full
sentence so that the attention heads can only see what was before in the next, and not whats after. Although those sentence so that the attention heads can only see what was before in the text, and not whats after. Although those
models can be fine-tuned and achieve great results on many tasks, the most natural application is text generation. A models can be fine-tuned and achieve great results on many tasks, the most natural application is text generation. A
typical example of such models is GPT. typical example of such models is GPT.
...@@ -512,8 +512,8 @@ BART ...@@ -512,8 +512,8 @@ BART
<https://arxiv.org/abs/1910.13461>`_, Mike Lewis et al. <https://arxiv.org/abs/1910.13461>`_, Mike Lewis et al.
Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is
fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). For the encoder fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). A composition of
, on the pretraining tasks, a composition of the following transformations are applied: the following transformations are applied on the pretraining tasks for the encoder:
* mask random tokens (like in BERT) * mask random tokens (like in BERT)
* delete random tokens * delete random tokens
......
...@@ -78,7 +78,7 @@ The library is built around three types of classes for each model: ...@@ -78,7 +78,7 @@ The library is built around three types of classes for each model:
All these classes can be instantiated from pretrained instances and saved locally using two methods: All these classes can be instantiated from pretrained instances and saved locally using two methods:
- :obj:`from_pretrained()` lets you instantiate a model/configuration/tokenizer from a pretrained version either - :obj:`from_pretrained()` lets you instantiate a model/configuration/tokenizer from a pretrained version either
provided by the library itself (the supported models are provided in the list :doc:`here <pretrained_models>` or provided by the library itself (the supported models are provided in the list :doc:`here <pretrained_models>`) or
stored locally (or on a server) by the user, stored locally (or on a server) by the user,
- :obj:`save_pretrained()` lets you save a model/configuration/tokenizer locally so that it can be reloaded using - :obj:`save_pretrained()` lets you save a model/configuration/tokenizer locally so that it can be reloaded using
:obj:`from_pretrained()`. :obj:`from_pretrained()`.
......
...@@ -17,10 +17,10 @@ In this tutorial, we'll explore how to preprocess your data using 🤗 Transform ...@@ -17,10 +17,10 @@ In this tutorial, we'll explore how to preprocess your data using 🤗 Transform
call a :doc:`tokenizer <main_classes/tokenizer>`. You can build one using the tokenizer class associated to the model call a :doc:`tokenizer <main_classes/tokenizer>`. You can build one using the tokenizer class associated to the model
you would like to use, or directly with the :class:`~transformers.AutoTokenizer` class. you would like to use, or directly with the :class:`~transformers.AutoTokenizer` class.
As we saw in the :doc:`quicktour </quicktour>`, the tokenizer will first split a given text in words (or part of words, As we saw in the :doc:`quick tour </quicktour>`, the tokenizer will first split a given text in words (or part of
punctuation symbols, etc.) usually called `tokens`. Then it will convert those `tokens` into numbers, to be able to words, punctuation symbols, etc.) usually called `tokens`. Then it will convert those `tokens` into numbers, to be able
build a tensor out of them and feed them to the model. It will also add any additional inputs the model might expect to to build a tensor out of them and feed them to the model. It will also add any additional inputs the model might expect
work properly. to work properly.
.. note:: .. note::
...@@ -131,7 +131,7 @@ ones it should not (because they represent padding in this case). ...@@ -131,7 +131,7 @@ ones it should not (because they represent padding in this case).
Note that if your model does not have a maximum length associated to it, the command above will throw a warning. You Note that if your model does not have a maximum length associated to it, the command above will throw a warning. You
can safely ignore it. You can also pass ``verbose=False`` to stop the tokenizer to throw those kinds of warnings. can safely ignore it. You can also pass ``verbose=False`` to stop the tokenizer from throwing those kinds of warnings.
.. _sentence-pairs: .. _sentence-pairs:
...@@ -216,7 +216,6 @@ Everything you always wanted to know about padding and truncation ...@@ -216,7 +216,6 @@ Everything you always wanted to know about padding and truncation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We have seen the commands that will work for most cases (pad your batch to the length of the maximum sentence and We have seen the commands that will work for most cases (pad your batch to the length of the maximum sentence and
truncate to the maximum length the mode can accept). However, the API supports more strategies if you need them. The truncate to the maximum length the mode can accept). However, the API supports more strategies if you need them. The
three arguments you need to know for this are :obj:`padding`, :obj:`truncation` and :obj:`max_length`. three arguments you need to know for this are :obj:`padding`, :obj:`truncation` and :obj:`max_length`.
......
...@@ -158,7 +158,7 @@ Using the tokenizer ...@@ -158,7 +158,7 @@ Using the tokenizer
We mentioned the tokenizer is responsible for the preprocessing of your texts. First, it will split a given text in We mentioned the tokenizer is responsible for the preprocessing of your texts. First, it will split a given text in
words (or part of words, punctuation symbols, etc.) usually called `tokens`. There are multiple rules that can govern words (or part of words, punctuation symbols, etc.) usually called `tokens`. There are multiple rules that can govern
that process (you can learn more about them in the :doc:`tokenizer summary <tokenizer_summary>`, which is why we need that process (you can learn more about them in the :doc:`tokenizer summary <tokenizer_summary>`), which is why we need
to instantiate the tokenizer using the name of the model, to make sure we use the same rules as when the model was to instantiate the tokenizer using the name of the model, to make sure we use the same rules as when the model was
pretrained. pretrained.
......
...@@ -327,7 +327,7 @@ Masked Language Modeling ...@@ -327,7 +327,7 @@ Masked Language Modeling
Masked language modeling is the task of masking tokens in a sequence with a masking token, and prompting the model to Masked language modeling is the task of masking tokens in a sequence with a masking token, and prompting the model to
fill that mask with an appropriate token. This allows the model to attend to both the right context (tokens on the fill that mask with an appropriate token. This allows the model to attend to both the right context (tokens on the
right of the mask) and the left context (tokens on the left of the mask). Such a training creates a strong basis for right of the mask) and the left context (tokens on the left of the mask). Such a training creates a strong basis for
downstream tasks, requiring bi-directional context such as SQuAD (question answering, see `Lewis, Lui, Goyal et al. downstream tasks requiring bi-directional context, such as SQuAD (question answering, see `Lewis, Lui, Goyal et al.
<https://arxiv.org/abs/1910.13461>`__, part 4.2). <https://arxiv.org/abs/1910.13461>`__, part 4.2).
Here is an example of using pipelines to replace a mask from a sequence: Here is an example of using pipelines to replace a mask from a sequence:
...@@ -657,7 +657,7 @@ Here are the expected results: ...@@ -657,7 +657,7 @@ Here are the expected results:
{'word': 'Bridge', 'score': 0.990249514579773, 'entity': 'I-LOC'} {'word': 'Bridge', 'score': 0.990249514579773, 'entity': 'I-LOC'}
] ]
Note, how the tokens of the sequence "Hugging Face" have been identified as an organisation, and "New York City", Note how the tokens of the sequence "Hugging Face" have been identified as an organisation, and "New York City",
"DUMBO" and "Manhattan Bridge" have been identified as locations. "DUMBO" and "Manhattan Bridge" have been identified as locations.
Here is an example of doing named entity recognition, using a model and a tokenizer. The process is the following: Here is an example of doing named entity recognition, using a model and a tokenizer. The process is the following:
......
...@@ -18,7 +18,7 @@ On this page, we will have a closer look at tokenization. As we saw in :doc:`the ...@@ -18,7 +18,7 @@ On this page, we will have a closer look at tokenization. As we saw in :doc:`the
look-up table. Converting words or subwords to ids is straightforward, so in this summary, we will focus on splitting a look-up table. Converting words or subwords to ids is straightforward, so in this summary, we will focus on splitting a
text into words or subwords (i.e. tokenizing a text). More specifically, we will look at the three main types of text into words or subwords (i.e. tokenizing a text). More specifically, we will look at the three main types of
tokenizers used in 🤗 Transformers: :ref:`Byte-Pair Encoding (BPE) <byte-pair-encoding>`, :ref:`WordPiece <wordpiece>`, tokenizers used in 🤗 Transformers: :ref:`Byte-Pair Encoding (BPE) <byte-pair-encoding>`, :ref:`WordPiece <wordpiece>`,
and :ref:`SentencePiece <sentencepiece>`, and show exemplary which tokenizer type is used by which model. and :ref:`SentencePiece <sentencepiece>`, and show examples of which tokenizer type is used by which model.
Note that on each model page, you can look at the documentation of the associated tokenizer to know which tokenizer Note that on each model page, you can look at the documentation of the associated tokenizer to know which tokenizer
type was used by the pretrained model. For instance, if we look at :class:`~transformers.BertTokenizer`, we can see type was used by the pretrained model. For instance, if we look at :class:`~transformers.BertTokenizer`, we can see
...@@ -72,7 +72,7 @@ greater than 50,000, especially if they are pretrained only on a single language ...@@ -72,7 +72,7 @@ greater than 50,000, especially if they are pretrained only on a single language
So if simple space and punctuation tokenization is unsatisfactory, why not simply tokenize on characters? While So if simple space and punctuation tokenization is unsatisfactory, why not simply tokenize on characters? While
character tokenization is very simple and would greatly reduce memory and time complexity it makes it much harder for character tokenization is very simple and would greatly reduce memory and time complexity it makes it much harder for
the model to learn meaningful input representations. *E.g.* learning a meaningful context-independent representation the model to learn meaningful input representations. *E.g.* learning a meaningful context-independent representation
for the letter ``"t"`` is much harder as learning a context-independent representation for the word ``"today"``. for the letter ``"t"`` is much harder than learning a context-independent representation for the word ``"today"``.
Therefore, character tokenization is often accompanied by a loss of performance. So to get the best of both worlds, Therefore, character tokenization is often accompanied by a loss of performance. So to get the best of both worlds,
transformers models use a hybrid between word-level and character-level tokenization called **subword** tokenization. transformers models use a hybrid between word-level and character-level tokenization called **subword** tokenization.
...@@ -202,10 +202,10 @@ WordPiece ...@@ -202,10 +202,10 @@ WordPiece
WordPiece is the subword tokenization algorithm used for :doc:`BERT <model_doc/bert>`, :doc:`DistilBERT WordPiece is the subword tokenization algorithm used for :doc:`BERT <model_doc/bert>`, :doc:`DistilBERT
<model_doc/distilbert>`, and :doc:`Electra <model_doc/electra>`. The algorithm was outlined in `Japanese and Korean <model_doc/distilbert>`, and :doc:`Electra <model_doc/electra>`. The algorithm was outlined in `Japanese and Korean
Voice Seach (Schuster et al., 2012) Voice Search (Schuster et al., 2012)
<https://static.googleusercontent.com/media/research.google.com/ja//pubs/archive/37842.pdf>`__ and is very similar to <https://static.googleusercontent.com/media/research.google.com/ja//pubs/archive/37842.pdf>`__ and is very similar to
BPE. WordPiece first initializes the vocabulary to include every character present in the training data and BPE. WordPiece first initializes the vocabulary to include every character present in the training data and
progressively learn a given number of merge rules. In contrast to BPE, WordPiece does not choose the most frequent progressively learns a given number of merge rules. In contrast to BPE, WordPiece does not choose the most frequent
symbol pair, but the one that maximizes the likelihood of the training data once added to the vocabulary. symbol pair, but the one that maximizes the likelihood of the training data once added to the vocabulary.
So what does this mean exactly? Referring to the previous example, maximizing the likelihood of the training data is So what does this mean exactly? Referring to the previous example, maximizing the likelihood of the training data is
......
...@@ -14,7 +14,7 @@ Training and fine-tuning ...@@ -14,7 +14,7 @@ Training and fine-tuning
======================================================================================================================= =======================================================================================================================
Model classes in 🤗 Transformers are designed to be compatible with native PyTorch and TensorFlow 2 and can be used Model classes in 🤗 Transformers are designed to be compatible with native PyTorch and TensorFlow 2 and can be used
seemlessly with either. In this quickstart, we will show how to fine-tune (or train from scratch) a model using the seamlessly with either. In this quickstart, we will show how to fine-tune (or train from scratch) a model using the
standard training tools available in either framework. We will also show how to use our included standard training tools available in either framework. We will also show how to use our included
:func:`~transformers.Trainer` class which handles much of the complexity of training for you. :func:`~transformers.Trainer` class which handles much of the complexity of training for you.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment