Unverified Commit a55dc157 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Add video links to the documentation (#12162)

parent 04028317
...@@ -55,6 +55,12 @@ Input IDs ...@@ -55,6 +55,12 @@ Input IDs
The input ids are often the only required parameters to be passed to the model as input. *They are token indices, The input ids are often the only required parameters to be passed to the model as input. *They are token indices,
numerical representations of tokens building the sequences that will be used as input by the model*. numerical representations of tokens building the sequences that will be used as input by the model*.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/VFp38yj8h3A" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Each tokenizer works differently but the underlying mechanism remains the same. Here's an example using the BERT Each tokenizer works differently but the underlying mechanism remains the same. Here's an example using the BERT
tokenizer, which is a `WordPiece <https://arxiv.org/pdf/1609.08144.pdf>`__ tokenizer: tokenizer, which is a `WordPiece <https://arxiv.org/pdf/1609.08144.pdf>`__ tokenizer:
...@@ -120,8 +126,15 @@ because this is the way a :class:`~transformers.BertModel` is going to expect it ...@@ -120,8 +126,15 @@ because this is the way a :class:`~transformers.BertModel` is going to expect it
Attention mask Attention mask
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The attention mask is an optional argument used when batching sequences together. This argument indicates to the model The attention mask is an optional argument used when batching sequences together.
which tokens should be attended to, and which should not.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/M6adb1j2jPI" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
This argument indicates to the model which tokens should be attended to, and which should not.
For example, consider these two sequences: For example, consider these two sequences:
...@@ -175,10 +188,17 @@ in the dictionary returned by the tokenizer under the key "attention_mask": ...@@ -175,10 +188,17 @@ in the dictionary returned by the tokenizer under the key "attention_mask":
Token Type IDs Token Type IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some models' purpose is to do sequence classification or question answering. These require two different sequences to Some models' purpose is to do classification on pairs of sentences or question answering.
be joined in a single "input_ids" entry, which usually is performed with the help of special tokens, such as the
classifier (``[CLS]``) and separator (``[SEP]``) tokens. For example, the BERT model builds its two sequence input as .. raw:: html
such:
<iframe width="560" height="315" src="https://www.youtube.com/embed/0u3ioSwev3s" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
These require two different sequences to be joined in a single "input_ids" entry, which usually is performed with the
help of special tokens, such as the classifier (``[CLS]``) and separator (``[SEP]``) tokens. For example, the BERT
model builds its two sequence input as such:
.. code-block:: .. code-block::
......
...@@ -16,6 +16,12 @@ Model sharing and uploading ...@@ -16,6 +16,12 @@ Model sharing and uploading
In this page, we will show you how to share a model you have trained or fine-tuned on new data with the community on In this page, we will show you how to share a model you have trained or fine-tuned on new data with the community on
the `model hub <https://huggingface.co/models>`__. the `model hub <https://huggingface.co/models>`__.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/XvSGPZFEjDY" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
.. note:: .. note::
You will need to create an account on `huggingface.co <https://huggingface.co/join>`__ for this. You will need to create an account on `huggingface.co <https://huggingface.co/join>`__ for this.
...@@ -77,6 +83,12 @@ token that you can just copy. ...@@ -77,6 +83,12 @@ token that you can just copy.
Directly push your model to the hub Directly push your model to the hub
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/Z1-XMy-GNLQ" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Once you have an API token (either stored in the cache or copied and pasted in your notebook), you can directly push a Once you have an API token (either stored in the cache or copied and pasted in your notebook), you can directly push a
finetuned model you saved in :obj:`save_drectory` by calling: finetuned model you saved in :obj:`save_drectory` by calling:
...@@ -152,6 +164,12 @@ or ...@@ -152,6 +164,12 @@ or
Use your terminal and git Use your terminal and git
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/rkCly_cbMBk" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Basic steps Basic steps
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......
...@@ -28,6 +28,12 @@ Each one of the models in the library falls into one of the following categories ...@@ -28,6 +28,12 @@ Each one of the models in the library falls into one of the following categories
* :ref:`multimodal-models` * :ref:`multimodal-models`
* :ref:`retrieval-based-models` * :ref:`retrieval-based-models`
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/H39Z_720T5s" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the
previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full
sentence so that the attention heads can only see what was before in the text, and not whats after. Although those sentence so that the attention heads can only see what was before in the text, and not whats after. Although those
...@@ -54,12 +60,18 @@ Multimodal models mix text inputs with other kinds (e.g. images) and are more sp ...@@ -54,12 +60,18 @@ Multimodal models mix text inputs with other kinds (e.g. images) and are more sp
.. _autoregressive-models: .. _autoregressive-models:
Autoregressive models Decoders or autoregressive models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As mentioned before, these models rely on the decoder part of the original transformer and use an attention mask so As mentioned before, these models rely on the decoder part of the original transformer and use an attention mask so
that at each position, the model can only look at the tokens before the attention heads. that at each position, the model can only look at the tokens before the attention heads.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/d_ixlCubqQw" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Original GPT Original GPT
----------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------------
...@@ -215,13 +227,19 @@ multiple choice classification and question answering. ...@@ -215,13 +227,19 @@ multiple choice classification and question answering.
.. _autoencoding-models: .. _autoencoding-models:
Autoencoding models Encoders or autoencoding models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As mentioned before, these models rely on the encoder part of the original transformer and use no mask so the model can As mentioned before, these models rely on the encoder part of the original transformer and use no mask so the model can
look at all the tokens in the attention heads. For pretraining, targets are the original sentences and inputs are their look at all the tokens in the attention heads. For pretraining, targets are the original sentences and inputs are their
corrupted versions. corrupted versions.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/MUqNwgPjJvQ" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
BERT BERT
----------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------------
...@@ -526,6 +544,12 @@ Sequence-to-sequence models ...@@ -526,6 +544,12 @@ Sequence-to-sequence models
As mentioned before, these models keep both the encoder and the decoder of the original transformer. As mentioned before, these models keep both the encoder and the decoder of the original transformer.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/0_4KEb08xrE" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
BART BART
----------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------------
......
...@@ -39,6 +39,12 @@ To automatically download the vocab used during pretraining or fine-tuning a giv ...@@ -39,6 +39,12 @@ To automatically download the vocab used during pretraining or fine-tuning a giv
Base use Base use
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/Yffk5aydLzg" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
A :class:`~transformers.PreTrainedTokenizer` has many methods, but the only one you need to remember for preprocessing A :class:`~transformers.PreTrainedTokenizer` has many methods, but the only one you need to remember for preprocessing
is its ``__call__``: you just need to feed your sentence to your tokenizer object. is its ``__call__``: you just need to feed your sentence to your tokenizer object.
...@@ -138,6 +144,12 @@ can safely ignore it. You can also pass ``verbose=False`` to stop the tokenizer ...@@ -138,6 +144,12 @@ can safely ignore it. You can also pass ``verbose=False`` to stop the tokenizer
Preprocessing pairs of sentences Preprocessing pairs of sentences
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/0u3ioSwev3s" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Sometimes you need to feed a pair of sentences to your model. For instance, if you want to classify if two sentences in Sometimes you need to feed a pair of sentences to your model. For instance, if you want to classify if two sentences in
a pair are similar, or for question-answering models, which take a context and a question. For BERT models, the input a pair are similar, or for question-answering models, which take a context and a question. For BERT models, the input
is then represented like this: :obj:`[CLS] Sequence A [SEP] Sequence B [SEP]` is then represented like this: :obj:`[CLS] Sequence A [SEP] Sequence B [SEP]`
......
...@@ -28,8 +28,15 @@ will dig a little bit more and see how the library gives you access to those mod ...@@ -28,8 +28,15 @@ will dig a little bit more and see how the library gives you access to those mod
Getting started on a task with a pipeline Getting started on a task with a pipeline
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The easiest way to use a pretrained model on a given task is to use :func:`~transformers.pipeline`. 🤗 Transformers The easiest way to use a pretrained model on a given task is to use :func:`~transformers.pipeline`.
provides the following tasks out of the box:
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/tiZFewofSLM" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
🤗 Transformers provides the following tasks out of the box:
- Sentiment analysis: is a text positive or negative? - Sentiment analysis: is a text positive or negative?
- Text generation (in English): provide a prompt and the model will generate what follows. - Text generation (in English): provide a prompt and the model will generate what follows.
...@@ -137,8 +144,15 @@ to share your fine-tuned model on the hub with the community, using :doc:`this t ...@@ -137,8 +144,15 @@ to share your fine-tuned model on the hub with the community, using :doc:`this t
Under the hood: pretrained models Under the hood: pretrained models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Let's now see what happens beneath the hood when using those pipelines. As we saw, the model and tokenizer are created Let's now see what happens beneath the hood when using those pipelines.
using the :obj:`from_pretrained` method:
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/AhChOFRegn4" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
As we saw, the model and tokenizer are created using the :obj:`from_pretrained` method:
.. code-block:: .. code-block::
......
...@@ -13,12 +13,20 @@ ...@@ -13,12 +13,20 @@
Summary of the tokenizers Summary of the tokenizers
----------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------------
On this page, we will have a closer look at tokenization. As we saw in :doc:`the preprocessing tutorial On this page, we will have a closer look at tokenization.
<preprocessing>`, tokenizing a text is splitting it into words or subwords, which then are converted to ids through a
look-up table. Converting words or subwords to ids is straightforward, so in this summary, we will focus on splitting a .. raw:: html
text into words or subwords (i.e. tokenizing a text). More specifically, we will look at the three main types of
tokenizers used in 🤗 Transformers: :ref:`Byte-Pair Encoding (BPE) <byte-pair-encoding>`, :ref:`WordPiece <wordpiece>`, <iframe width="560" height="315" src="https://www.youtube.com/embed/VFp38yj8h3A" title="YouTube video player"
and :ref:`SentencePiece <sentencepiece>`, and show examples of which tokenizer type is used by which model. frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
As we saw in :doc:`the preprocessing tutorial <preprocessing>`, tokenizing a text is splitting it into words or
subwords, which then are converted to ids through a look-up table. Converting words or subwords to ids is
straightforward, so in this summary, we will focus on splitting a text into words or subwords (i.e. tokenizing a text).
More specifically, we will look at the three main types of tokenizers used in 🤗 Transformers: :ref:`Byte-Pair Encoding
(BPE) <byte-pair-encoding>`, :ref:`WordPiece <wordpiece>`, and :ref:`SentencePiece <sentencepiece>`, and show examples
of which tokenizer type is used by which model.
Note that on each model page, you can look at the documentation of the associated tokenizer to know which tokenizer Note that on each model page, you can look at the documentation of the associated tokenizer to know which tokenizer
type was used by the pretrained model. For instance, if we look at :class:`~transformers.BertTokenizer`, we can see type was used by the pretrained model. For instance, if we look at :class:`~transformers.BertTokenizer`, we can see
...@@ -28,8 +36,15 @@ Introduction ...@@ -28,8 +36,15 @@ Introduction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Splitting a text into smaller chunks is a task that is harder than it looks, and there are multiple ways of doing so. Splitting a text into smaller chunks is a task that is harder than it looks, and there are multiple ways of doing so.
For instance, let's look at the sentence ``"Don't you love 🤗 Transformers? We sure do."`` A simple way of tokenizing For instance, let's look at the sentence ``"Don't you love 🤗 Transformers? We sure do."``
this text is to split it by spaces, which would give:
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/nhJxYji1aho" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
A simple way of tokenizing this text is to split it by spaces, which would give:
.. code-block:: .. code-block::
...@@ -69,16 +84,30 @@ Such a big vocabulary size forces the model to have an enormous embedding matrix ...@@ -69,16 +84,30 @@ Such a big vocabulary size forces the model to have an enormous embedding matrix
causes both an increased memory and time complexity. In general, transformers models rarely have a vocabulary size causes both an increased memory and time complexity. In general, transformers models rarely have a vocabulary size
greater than 50,000, especially if they are pretrained only on a single language. greater than 50,000, especially if they are pretrained only on a single language.
So if simple space and punctuation tokenization is unsatisfactory, why not simply tokenize on characters? While So if simple space and punctuation tokenization is unsatisfactory, why not simply tokenize on characters?
character tokenization is very simple and would greatly reduce memory and time complexity it makes it much harder for
the model to learn meaningful input representations. *E.g.* learning a meaningful context-independent representation .. raw:: html
for the letter ``"t"`` is much harder than learning a context-independent representation for the word ``"today"``.
Therefore, character tokenization is often accompanied by a loss of performance. So to get the best of both worlds, <iframe width="560" height="315" src="https://www.youtube.com/embed/ssLq_EK2jLE" title="YouTube video player"
transformers models use a hybrid between word-level and character-level tokenization called **subword** tokenization. frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
While character tokenization is very simple and would greatly reduce memory and time complexity it makes it much harder
for the model to learn meaningful input representations. *E.g.* learning a meaningful context-independent
representation for the letter ``"t"`` is much harder than learning a context-independent representation for the word
``"today"``. Therefore, character tokenization is often accompanied by a loss of performance. So to get the best of
both worlds, transformers models use a hybrid between word-level and character-level tokenization called **subword**
tokenization.
Subword tokenization Subword tokenization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/zHvTiHr506c" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Subword tokenization algorithms rely on the principle that frequently used words should not be split into smaller Subword tokenization algorithms rely on the principle that frequently used words should not be split into smaller
subwords, but rare words should be decomposed into meaningful subwords. For instance ``"annoyingly"`` might be subwords, but rare words should be decomposed into meaningful subwords. For instance ``"annoyingly"`` might be
considered a rare word and could be decomposed into ``"annoying"`` and ``"ly"``. Both ``"annoying"`` and ``"ly"`` as considered a rare word and could be decomposed into ``"annoying"`` and ``"ly"``. Both ``"annoying"`` and ``"ly"`` as
......
...@@ -27,6 +27,12 @@ negative. For examples of other tasks, refer to the :ref:`additional-resources` ...@@ -27,6 +27,12 @@ negative. For examples of other tasks, refer to the :ref:`additional-resources`
Preparing the datasets Preparing the datasets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/_BZearw7f0w" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
We will use the `🤗 Datasets <https:/github.com/huggingface/datasets/>`__ library to download and preprocess the IMDB We will use the `🤗 Datasets <https:/github.com/huggingface/datasets/>`__ library to download and preprocess the IMDB
datasets. We will go over this part pretty quickly. Since the focus of this tutorial is on training, you should refer datasets. We will go over this part pretty quickly. Since the focus of this tutorial is on training, you should refer
to the 🤗 Datasets `documentation <https://huggingface.co/docs/datasets/>`__ or the :doc:`preprocessing` tutorial for to the 🤗 Datasets `documentation <https://huggingface.co/docs/datasets/>`__ or the :doc:`preprocessing` tutorial for
...@@ -95,6 +101,12 @@ them by their `full` equivalent to train or evaluate on the full dataset. ...@@ -95,6 +101,12 @@ them by their `full` equivalent to train or evaluate on the full dataset.
Fine-tuning in PyTorch with the Trainer API Fine-tuning in PyTorch with the Trainer API
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/nvBXf7s7vTI" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Since PyTorch does not provide a training loop, the 🤗 Transformers library provides a :class:`~transformers.Trainer` Since PyTorch does not provide a training loop, the 🤗 Transformers library provides a :class:`~transformers.Trainer`
API that is optimized for 🤗 Transformers models, with a wide range of training options and with built-in features like API that is optimized for 🤗 Transformers models, with a wide range of training options and with built-in features like
logging, gradient accumulation, and mixed precision. logging, gradient accumulation, and mixed precision.
...@@ -200,6 +212,12 @@ See the documentation of :class:`~transformers.TrainingArguments` for more optio ...@@ -200,6 +212,12 @@ See the documentation of :class:`~transformers.TrainingArguments` for more optio
Fine-tuning with Keras Fine-tuning with Keras
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/rnTGBy2ax1c" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Models can also be trained natively in TensorFlow using the Keras API. First, let's define our model: Models can also be trained natively in TensorFlow using the Keras API. First, let's define our model:
.. code-block:: python .. code-block:: python
...@@ -257,6 +275,12 @@ as a PyTorch model (or vice-versa): ...@@ -257,6 +275,12 @@ as a PyTorch model (or vice-versa):
Fine-tuning in native PyTorch Fine-tuning in native PyTorch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/Dh9CL8fyG80" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
You might need to restart your notebook at this stage to free some memory, or excute the following code: You might need to restart your notebook at this stage to free some memory, or excute the following code:
.. code-block:: python .. code-block:: python
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment