Unverified Commit 207594be authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Convert rst files (#14888)

* Convert all tutorials and guides

* Convert all remaining rst to mdx

* Track and fix bad links
parent b0c7d2ec
.. <!--Copyright 2020 The HuggingFace Team. All rights reserved.
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-->
How to add a model to 🤗 Transformers? # How to add a model to 🤗 Transformers?
=======================================================================================================================
Adding a new model is often difficult and requires an in-depth knowledge of the 🤗 Transformers library and ideally also Adding a new model is often difficult and requires an in-depth knowledge of the 🤗 Transformers library and ideally also
of the model's original repository. At Hugging Face, we are trying to empower the community more and more to add models of the model's original repository. At Hugging Face, we are trying to empower the community more and more to add models
...@@ -20,8 +19,7 @@ independently. Thus, for some new models that the community wants to be added to ...@@ -20,8 +19,7 @@ independently. Thus, for some new models that the community wants to be added to
model to 🤗 Transformers. model to 🤗 Transformers.
If this sounds like something you would be interested in, feel free to check out the currently open If this sounds like something you would be interested in, feel free to check out the currently open
“calls-for-model-addition” `here “calls-for-model-addition” [here](https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_model/open_model_proposals/README.md)
<https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_model/open_model_proposals/README.md>`__
and to contact us. and to contact us.
If selected, you will then work closely with one member of the Hugging Face team to integrate the model into 🤗 If selected, you will then work closely with one member of the Hugging Face team to integrate the model into 🤗
...@@ -31,31 +29,28 @@ more importantly, you will have made a major open-source contribution to 🤗 Tr ...@@ -31,31 +29,28 @@ more importantly, you will have made a major open-source contribution to 🤗 Tr
- get insights into open-source best practices - get insights into open-source best practices
- understand the design principles of one of the most popular NLP libraries - understand the design principles of one of the most popular NLP libraries
- learn how to do efficiently test large NLP models - learn how to do efficiently test large NLP models
- learn how to integrate Python utilities like ``black``, ``isort``, ``make fix-copies`` into a library to always - learn how to integrate Python utilities like `black`, `isort`, `make fix-copies` into a library to always
ensure clean and readable code ensure clean and readable code
We are also more than happy if you want to add a model that cannot be found in the “calls-for-model-addition” folder. We are also more than happy if you want to add a model that cannot be found in the “calls-for-model-addition” folder.
The following sections explain in detail how to add a new model. It might also be very helpful to check out already The following sections explain in detail how to add a new model. It might also be very helpful to check out already
added models to see if those resemble the model you would like to add `here added models to see if those resemble the model you would like to add [here](https://github.com/huggingface/transformers/pulls?q=is%3Apr+label%3A%22PR+for+Model+Addition%22+is%3Aclosed).
<https://github.com/huggingface/transformers/pulls?q=is%3Apr+label%3A%22PR+for+Model+Addition%22+is%3Aclosed>`__.
To start, let's try to get a general overview of the Transformers library. To start, let's try to get a general overview of the Transformers library.
General overview of 🤗 Transformers ## General overview of 🤗 Transformers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
First, you should get a general overview of 🤗 Transformers. 🤗 Transformers is a very opinionated library, so there is a First, you should get a general overview of 🤗 Transformers. 🤗 Transformers is a very opinionated library, so there is a
chance that you don't agree with some of the library's philosophies or design choices. From our experience, however, we chance that you don't agree with some of the library's philosophies or design choices. From our experience, however, we
found that the fundamental design choices and philosophies of the library are crucial to efficiently scale 🤗 found that the fundamental design choices and philosophies of the library are crucial to efficiently scale 🤗
Transformers while keeping maintenance costs at a reasonable level. Transformers while keeping maintenance costs at a reasonable level.
A good first starting point to better understand the library is to read the :doc:`documentation of our philosophy A good first starting point to better understand the library is to read the [documentation of our philosophy](philosophy). As a result of our way of working, there are some choices that we try to apply to all models:
<philosophy>`. As a result of our way of working, there are some choices that we try to apply to all models:
- Composition is generally favored over-abstraction - Composition is generally favored over-abstraction
- Duplicating code is not always bad if it strongly improves the readability or accessibility of a model - Duplicating code is not always bad if it strongly improves the readability or accessibility of a model
- Model files are as self-contained as possible so that when you read the code of a specific model, you ideally only - Model files are as self-contained as possible so that when you read the code of a specific model, you ideally only
have to look into the respective ``modeling_....py`` file. have to look into the respective `modeling_....py` file.
In our opinion, the library's code is not just a means to provide a product, *e.g.* the ability to use BERT for In our opinion, the library's code is not just a means to provide a product, *e.g.* the ability to use BERT for
inference, but also as the very product that we want to improve. Hence, when adding a model, the user is not only the inference, but also as the very product that we want to improve. Hence, when adding a model, the user is not only the
...@@ -63,72 +58,68 @@ person that will use your model, but also everybody that will read, try to under ...@@ -63,72 +58,68 @@ person that will use your model, but also everybody that will read, try to under
With this in mind, let's go a bit deeper into the general library design. With this in mind, let's go a bit deeper into the general library design.
Overview of models ### Overview of models
-----------------------------------------------------------------------------------------------------------------------
To successfully add a model, it is important to understand the interaction between your model and its config, To successfully add a model, it is important to understand the interaction between your model and its config,
:class:`~transformers.PreTrainedModel`, and :class:`~transformers.PretrainedConfig`. For exemplary purposes, we will [`PreTrainedModel`], and [`PretrainedConfig`]. For exemplary purposes, we will
call the model to be added to 🤗 Transformers ``BrandNewBert``. call the model to be added to 🤗 Transformers `BrandNewBert`.
Let's take a look: Let's take a look:
.. image:: https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers_overview.png <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers_overview.png"/>
As you can see, we do make use of inheritance in 🤗 Transformers, but we keep the level of abstraction to an absolute As you can see, we do make use of inheritance in 🤗 Transformers, but we keep the level of abstraction to an absolute
minimum. There are never more than two levels of abstraction for any model in the library. :obj:`BrandNewBertModel` minimum. There are never more than two levels of abstraction for any model in the library. `BrandNewBertModel`
inherits from :obj:`BrandNewBertPreTrainedModel` which in turn inherits from :class:`~transformers.PreTrainedModel` and inherits from `BrandNewBertPreTrainedModel` which in turn inherits from [`PreTrainedModel`] and
that's it. As a general rule, we want to make sure that a new model only depends on that's it. As a general rule, we want to make sure that a new model only depends on
:class:`~transformers.PreTrainedModel`. The important functionalities that are automatically provided to every new [`PreTrainedModel`]. The important functionalities that are automatically provided to every new
model are :meth:`~transformers.PreTrainedModel.from_pretrained` and model are [`~PreTrainedModel.from_pretrained`] and
:meth:`~transformers.PreTrainedModel.save_pretrained`, which are used for serialization and deserialization. All of the [`~PreTrainedModel.save_pretrained`], which are used for serialization and deserialization. All of the
other important functionalities, such as :meth:`BrandNewBertModel.forward` should be completely defined in the new other important functionalities, such as `BrandNewBertModel.forward` should be completely defined in the new
``modeling_brand_new_bert.py`` script. Next, we want to make sure that a model with a specific head layer, such as `modeling_brand_new_bert.py` script. Next, we want to make sure that a model with a specific head layer, such as
:obj:`BrandNewBertForMaskedLM` does not inherit from :obj:`BrandNewBertModel`, but rather uses :obj:`BrandNewBertModel` `BrandNewBertForMaskedLM` does not inherit from `BrandNewBertModel`, but rather uses `BrandNewBertModel`
as a component that can be called in its forward pass to keep the level of abstraction low. Every new model requires a as a component that can be called in its forward pass to keep the level of abstraction low. Every new model requires a
configuration class, called :obj:`BrandNewBertConfig`. This configuration is always stored as an attribute in configuration class, called `BrandNewBertConfig`. This configuration is always stored as an attribute in
:class:`~transformers.PreTrainedModel`, and thus can be accessed via the ``config`` attribute for all classes [`PreTrainedModel`], and thus can be accessed via the `config` attribute for all classes
inheriting from :obj:`BrandNewBertPreTrainedModel`: inheriting from `BrandNewBertPreTrainedModel`:
.. code:: python ```python
model = BrandNewBertModel.from_pretrained("brandy/brand_new_bert")
model = BrandNewBertModel.from_pretrained("brandy/brand_new_bert") model.config # model has access to its config
model.config # model has access to its config ```
Similar to the model, the configuration inherits basic serialization and deserialization functionalities from Similar to the model, the configuration inherits basic serialization and deserialization functionalities from
:class:`~transformers.PretrainedConfig`. Note that the configuration and the model are always serialized into two [`PretrainedConfig`]. Note that the configuration and the model are always serialized into two
different formats - the model to a `pytorch_model.bin` file and the configuration to a `config.json` file. Calling different formats - the model to a *pytorch_model.bin* file and the configuration to a *config.json* file. Calling
:meth:`~transformers.PreTrainedModel.save_pretrained` will automatically call [`~PreTrainedModel.save_pretrained`] will automatically call
:meth:`~transformers.PretrainedConfig.save_pretrained`, so that both model and configuration are saved. [`~PretrainedConfig.save_pretrained`], so that both model and configuration are saved.
Overview of tokenizers ### Overview of tokenizers
-----------------------------------------------------------------------------------------------------------------------
Not quite ready yet :-( This section will be added soon! Not quite ready yet :-( This section will be added soon!
Step-by-step recipe to add a model to 🤗 Transformers ## Step-by-step recipe to add a model to 🤗 Transformers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Everyone has different preferences of how to port a model so it can be very helpful for you to take a look at summaries Everyone has different preferences of how to port a model so it can be very helpful for you to take a look at summaries
of how other contributors ported models to Hugging Face. Here is a list of community blog posts on how to port a model: of how other contributors ported models to Hugging Face. Here is a list of community blog posts on how to port a model:
1. `Porting GPT2 Model <https://medium.com/huggingface/from-tensorflow-to-pytorch-265f40ef2a28>`__ by `Thomas 1. [Porting GPT2 Model](https://medium.com/huggingface/from-tensorflow-to-pytorch-265f40ef2a28) by [Thomas](https://huggingface.co/thomwolf)
<https://huggingface.co/thomwolf>`__ 2. [Porting WMT19 MT Model](https://huggingface.co/blog/porting-fsmt) by [Stas](https://huggingface.co/stas)
2. `Porting WMT19 MT Model <https://huggingface.co/blog/porting-fsmt>`__ by `Stas <https://huggingface.co/stas>`__
From experience, we can tell you that the most important things to keep in mind when adding a model are: From experience, we can tell you that the most important things to keep in mind when adding a model are:
- Don't reinvent the wheel! Most parts of the code you will add for the new 🤗 Transformers model already exist - Don't reinvent the wheel! Most parts of the code you will add for the new 🤗 Transformers model already exist
somewhere in 🤗 Transformers. Take some time to find similar, already existing models and tokenizers you can copy somewhere in 🤗 Transformers. Take some time to find similar, already existing models and tokenizers you can copy
from. `grep <https://www.gnu.org/software/grep/>`__ and `rg <https://github.com/BurntSushi/ripgrep>`__ are your from. [grep](https://www.gnu.org/software/grep/) and [rg](https://github.com/BurntSushi/ripgrep) are your
friends. Note that it might very well happen that your model's tokenizer is based on one model implementation, and friends. Note that it might very well happen that your model's tokenizer is based on one model implementation, and
your model's modeling code on another one. *E.g.* FSMT's modeling code is based on BART, while FSMT's tokenizer code your model's modeling code on another one. *E.g.* FSMT's modeling code is based on BART, while FSMT's tokenizer code
is based on XLM. is based on XLM.
- It's more of an engineering challenge than a scientific challenge. You should spend more time on creating an - It's more of an engineering challenge than a scientific challenge. You should spend more time on creating an
efficient debugging environment than trying to understand all theoretical aspects of the model in the paper. efficient debugging environment than trying to understand all theoretical aspects of the model in the paper.
- Ask for help, when you're stuck! Models are the core component of 🤗 Transformers so that we at Hugging Face are more - Ask for help, when you're stuck! Models are the core component of 🤗 Transformers so that we at Hugging Face are more
than happy to help you at every step to add your model. Don't hesitate to ask if you notice you are not making than happy to help you at every step to add your model. Don't hesitate to ask if you notice you are not making
progress. progress.
In the following, we try to give you a general recipe that we found most useful when porting a model to 🤗 Transformers. In the following, we try to give you a general recipe that we found most useful when porting a model to 🤗 Transformers.
...@@ -150,14 +141,13 @@ List: ...@@ -150,14 +141,13 @@ List:
- 13. Submitted the pull request - 13. Submitted the pull request
- 14. (Optional) Added a demo notebook - 14. (Optional) Added a demo notebook
To begin with, we usually recommend to start by getting a good theoretical understanding of ``BrandNewBert``. However, To begin with, we usually recommend to start by getting a good theoretical understanding of `BrandNewBert`. However,
if you prefer to understand the theoretical aspects of the model *on-the-job*, then it is totally fine to directly dive if you prefer to understand the theoretical aspects of the model *on-the-job*, then it is totally fine to directly dive
into the ``BrandNewBert``'s code-base. This option might suit you better, if your engineering skills are better than into the `BrandNewBert`'s code-base. This option might suit you better, if your engineering skills are better than
your theoretical skill, if you have trouble understanding ``BrandNewBert``'s paper, or if you just enjoy programming your theoretical skill, if you have trouble understanding `BrandNewBert`'s paper, or if you just enjoy programming
much more than reading scientific papers. much more than reading scientific papers.
1. (Optional) Theoretical aspects of BrandNewBert ### 1. (Optional) Theoretical aspects of BrandNewBert
-----------------------------------------------------------------------------------------------------------------------
You should take some time to read *BrandNewBert's* paper, if such descriptive work exists. There might be large You should take some time to read *BrandNewBert's* paper, if such descriptive work exists. There might be large
sections of the paper that are difficult to understand. If this is the case, this is fine - don't worry! The goal is sections of the paper that are difficult to understand. If this is the case, this is fine - don't worry! The goal is
...@@ -166,46 +156,45 @@ effectively re-implement the model in 🤗 Transformers. That being said, you do ...@@ -166,46 +156,45 @@ effectively re-implement the model in 🤗 Transformers. That being said, you do
theoretical aspects, but rather focus on the practical ones, namely: theoretical aspects, but rather focus on the practical ones, namely:
- What type of model is *brand_new_bert*? BERT-like encoder-only model? GPT2-like decoder-only model? BART-like - What type of model is *brand_new_bert*? BERT-like encoder-only model? GPT2-like decoder-only model? BART-like
encoder-decoder model? Look at the :doc:`model_summary` if you're not familiar with the differences between those. encoder-decoder model? Look at the [model_summary](model_summary) if you're not familiar with the differences between those.
- What are the applications of *brand_new_bert*? Text classification? Text generation? Seq2Seq tasks, *e.g.,* - What are the applications of *brand_new_bert*? Text classification? Text generation? Seq2Seq tasks, *e.g.,*
summarization? summarization?
- What is the novel feature of the model making it different from BERT/GPT-2/BART? - What is the novel feature of the model making it different from BERT/GPT-2/BART?
- Which of the already existing `🤗 Transformers models <https://huggingface.co/transformers/#contents>`__ is most - Which of the already existing [🤗 Transformers models](https://huggingface.co/transformers/#contents) is most
similar to *brand_new_bert*? similar to *brand_new_bert*?
- What type of tokenizer is used? A sentencepiece tokenizer? Word piece tokenizer? Is it the same tokenizer as used - What type of tokenizer is used? A sentencepiece tokenizer? Word piece tokenizer? Is it the same tokenizer as used
for BERT or BART? for BERT or BART?
After you feel like you have gotten a good overview of the architecture of the model, you might want to write to the After you feel like you have gotten a good overview of the architecture of the model, you might want to write to the
Hugging Face team with any questions you might have. This might include questions regarding the model's architecture, Hugging Face team with any questions you might have. This might include questions regarding the model's architecture,
its attention layer, etc. We will be more than happy to help you. its attention layer, etc. We will be more than happy to help you.
2. Next prepare your environment ### 2. Next prepare your environment
-----------------------------------------------------------------------------------------------------------------------
1. Fork the `repository <https://github.com/huggingface/transformers>`__ by clicking on the ‘Fork' button on the 1. Fork the [repository](https://github.com/huggingface/transformers) by clicking on the ‘Fork' button on the
repository's page. This creates a copy of the code under your GitHub user account. repository's page. This creates a copy of the code under your GitHub user account.
2. Clone your ``transformers`` fork to your local disk, and add the base repository as a remote: 2. Clone your `transformers` fork to your local disk, and add the base repository as a remote:
.. code:: bash
git clone https://github.com/[your Github handle]/transformers.git ```bash
cd transformers git clone https://github.com/[your Github handle]/transformers.git
git remote add upstream https://github.com/huggingface/transformers.git cd transformers
git remote add upstream https://github.com/huggingface/transformers.git
```
3. Set up a development environment, for instance by running the following command: 3. Set up a development environment, for instance by running the following command:
.. code:: bash ```bash
python -m venv .env
python -m venv .env source .env/bin/activate
source .env/bin/activate pip install -e ".[dev]"
pip install -e ".[dev]" ```
and return to the parent directory and return to the parent directory
.. code:: bash ```bash
cd ..
cd .. ```
4. We recommend adding the PyTorch version of *brand_new_bert* to Transformers. To install PyTorch, please follow the 4. We recommend adding the PyTorch version of *brand_new_bert* to Transformers. To install PyTorch, please follow the
instructions on https://pytorch.org/get-started/locally/. instructions on https://pytorch.org/get-started/locally/.
...@@ -214,16 +203,15 @@ and return to the parent directory ...@@ -214,16 +203,15 @@ and return to the parent directory
5. To port *brand_new_bert*, you will also need access to its original repository: 5. To port *brand_new_bert*, you will also need access to its original repository:
.. code:: bash ```bash
git clone https://github.com/org_that_created_brand_new_bert_org/brand_new_bert.git
git clone https://github.com/org_that_created_brand_new_bert_org/brand_new_bert.git cd brand_new_bert
cd brand_new_bert pip install -e .
pip install -e . ```
Now you have set up a development environment to port *brand_new_bert* to 🤗 Transformers. Now you have set up a development environment to port *brand_new_bert* to 🤗 Transformers.
3.-4. Run a pretrained checkpoint using the original repository ### 3.-4. Run a pretrained checkpoint using the original repository
-----------------------------------------------------------------------------------------------------------------------
At first, you will work on the original *brand_new_bert* repository. Often, the original implementation is very At first, you will work on the original *brand_new_bert* repository. Often, the original implementation is very
researchy. Meaning that documentation might be lacking and the code can be difficult to understand. But this should researchy. Meaning that documentation might be lacking and the code can be difficult to understand. But this should
...@@ -238,16 +226,16 @@ Successfully running the official pretrained model in the original repository is ...@@ -238,16 +226,16 @@ Successfully running the official pretrained model in the original repository is
From our experience, it is very important to spend some time getting familiar with the original code-base. You need to From our experience, it is very important to spend some time getting familiar with the original code-base. You need to
figure out the following: figure out the following:
- Where to find the pretrained weights? - Where to find the pretrained weights?
- How to load the pretrained weights into the corresponding model? - How to load the pretrained weights into the corresponding model?
- How to run the tokenizer independently from the model? - How to run the tokenizer independently from the model?
- Trace one forward pass so that you know which classes and functions are required for a simple forward pass. Usually, - Trace one forward pass so that you know which classes and functions are required for a simple forward pass. Usually,
you only have to reimplement those functions. you only have to reimplement those functions.
- Be able to locate the important components of the model: Where is the model's class? Are there model sub-classes, - Be able to locate the important components of the model: Where is the model's class? Are there model sub-classes,
*e.g.* EncoderModel, DecoderModel? Where is the self-attention layer? Are there multiple different attention layers, *e.g.* EncoderModel, DecoderModel? Where is the self-attention layer? Are there multiple different attention layers,
*e.g.* *self-attention*, *cross-attention*...? *e.g.* *self-attention*, *cross-attention*...?
- How can you debug the model in the original environment of the repo? Do you have to add `print` statements, can you - How can you debug the model in the original environment of the repo? Do you have to add *print* statements, can you
work with an interactive debugger like `ipdb`, or should you use an efficient IDE to debug the model, like PyCharm? work with an interactive debugger like *ipdb*, or should you use an efficient IDE to debug the model, like PyCharm?
It is very important that before you start the porting process, that you can **efficiently** debug code in the original It is very important that before you start the porting process, that you can **efficiently** debug code in the original
repository! Also, remember that you are working with an open-source library, so do not hesitate to open an issue, or repository! Also, remember that you are working with an open-source library, so do not hesitate to open an issue, or
...@@ -262,8 +250,7 @@ model also works as expected on GPU. ...@@ -262,8 +250,7 @@ model also works as expected on GPU.
In general, there are two possible debugging environments for running the original model In general, there are two possible debugging environments for running the original model
- `Jupyter notebooks <https://jupyter.org/>`__ / `google colab - [Jupyter notebooks](https://jupyter.org/) / [google colab](https://colab.research.google.com/notebooks/intro.ipynb)
<https://colab.research.google.com/notebooks/intro.ipynb>`__
- Local python scripts. - Local python scripts.
Jupyter notebooks have the advantage that they allow for cell-by-cell execution which can be helpful to better split Jupyter notebooks have the advantage that they allow for cell-by-cell execution which can be helpful to better split
...@@ -273,24 +260,24 @@ Face team for help. If you are familiar with Jupiter notebooks, we strongly reco ...@@ -273,24 +260,24 @@ Face team for help. If you are familiar with Jupiter notebooks, we strongly reco
The obvious disadvantage of Jupyter notebooks is that if you are not used to working with them you will have to spend The obvious disadvantage of Jupyter notebooks is that if you are not used to working with them you will have to spend
some time adjusting to the new programming environment and that you might not be able to use your known debugging tools some time adjusting to the new programming environment and that you might not be able to use your known debugging tools
anymore, like ``ipdb``. anymore, like `ipdb`.
For each code-base, a good first step is always to load a **small** pretrained checkpoint and to be able to reproduce a For each code-base, a good first step is always to load a **small** pretrained checkpoint and to be able to reproduce a
single forward pass using a dummy integer vector of input IDs as an input. Such a script could look like this (in single forward pass using a dummy integer vector of input IDs as an input. Such a script could look like this (in
pseudocode): pseudocode):
.. code:: bash ```python
model = BrandNewBertModel.load_pretrained_checkpoint(/path/to/checkpoint/)
model = BrandNewBertModel.load_pretrained_checkpoint(/path/to/checkpoint/) input_ids = [0, 4, 5, 2, 3, 7, 9] # vector of input ids
input_ids = [0, 4, 5, 2, 3, 7, 9] # vector of input ids original_output = model.predict(input_ids)
original_output = model.predict(input_ids) ```
Next, regarding the debugging strategy, there are generally a few from which to choose from: Next, regarding the debugging strategy, there are generally a few from which to choose from:
- Decompose the original model into many small testable components and run a forward pass on each of those for - Decompose the original model into many small testable components and run a forward pass on each of those for
verification verification
- Decompose the original model only into the original *tokenizer* and the original *model*, run a forward pass on - Decompose the original model only into the original *tokenizer* and the original *model*, run a forward pass on
those, and use intermediate print statements or breakpoints for verification those, and use intermediate print statements or breakpoints for verification
Again, it is up to you which strategy to choose. Often, one or the other is advantageous depending on the original code Again, it is up to you which strategy to choose. Often, one or the other is advantageous depending on the original code
base. base.
...@@ -309,12 +296,12 @@ to taking the more difficult road in the beginning: ...@@ -309,12 +296,12 @@ to taking the more difficult road in the beginning:
- at a later stage those component-by-component tests help you to ensure that no regression occurs as you continue - at a later stage those component-by-component tests help you to ensure that no regression occurs as you continue
changing your code changing your code
`Lysandre's <https://gist.github.com/LysandreJik/db4c948f6b4483960de5cbac598ad4ed>`__ integration checks for ELECTRA [Lysandre's](https://gist.github.com/LysandreJik/db4c948f6b4483960de5cbac598ad4ed) integration checks for ELECTRA
gives a nice example of how this can be done. gives a nice example of how this can be done.
However, if the original code-base is very complex or only allows intermediate components to be run in a compiled mode, However, if the original code-base is very complex or only allows intermediate components to be run in a compiled mode,
it might be too time-consuming or even impossible to separate the model into smaller testable sub-components. A good it might be too time-consuming or even impossible to separate the model into smaller testable sub-components. A good
example is `T5's MeshTensorFlow <https://github.com/tensorflow/mesh/tree/master/mesh_tensorflow>`__ library which is example is [T5's MeshTensorFlow](https://github.com/tensorflow/mesh/tree/master/mesh_tensorflow) library which is
very complex and does not offer a simple way to decompose the model into its sub-components. For such libraries, one very complex and does not offer a simple way to decompose the model into its sub-components. For such libraries, one
often relies on verifying print statements. often relies on verifying print statements.
...@@ -324,27 +311,27 @@ starting layers first and the ending layers last. ...@@ -324,27 +311,27 @@ starting layers first and the ending layers last.
It is recommended that you retrieve the output, either by print statements or sub-component functions, of the following It is recommended that you retrieve the output, either by print statements or sub-component functions, of the following
layers in the following order: layers in the following order:
1. Retrieve the input IDs passed to the model 1. Retrieve the input IDs passed to the model
2. Retrieve the word embeddings 2. Retrieve the word embeddings
3. Retrieve the input of the first Transformer layer 3. Retrieve the input of the first Transformer layer
4. Retrieve the output of the first Transformer layer 4. Retrieve the output of the first Transformer layer
5. Retrieve the output of the following n - 1 Transformer layers 5. Retrieve the output of the following n - 1 Transformer layers
6. Retrieve the output of the whole BrandNewBert Model 6. Retrieve the output of the whole BrandNewBert Model
Input IDs should thereby consists of an array of integers, *e.g.* ``input_ids = [0, 4, 4, 3, 2, 4, 1, 7, 19]`` Input IDs should thereby consists of an array of integers, *e.g.* `input_ids = [0, 4, 4, 3, 2, 4, 1, 7, 19]`
The outputs of the following layers often consist of multi-dimensional float arrays and can look like this: The outputs of the following layers often consist of multi-dimensional float arrays and can look like this:
.. code:: bash ```
[[
[[ [-0.1465, -0.6501, 0.1993, ..., 0.1451, 0.3430, 0.6024],
[-0.1465, -0.6501, 0.1993, ..., 0.1451, 0.3430, 0.6024], [-0.4417, -0.5920, 0.3450, ..., -0.3062, 0.6182, 0.7132],
[-0.4417, -0.5920, 0.3450, ..., -0.3062, 0.6182, 0.7132], [-0.5009, -0.7122, 0.4548, ..., -0.3662, 0.6091, 0.7648],
[-0.5009, -0.7122, 0.4548, ..., -0.3662, 0.6091, 0.7648], ...,
..., [-0.5613, -0.6332, 0.4324, ..., -0.3792, 0.7372, 0.9288],
[-0.5613, -0.6332, 0.4324, ..., -0.3792, 0.7372, 0.9288], [-0.5416, -0.6345, 0.4180, ..., -0.3564, 0.6992, 0.9191],
[-0.5416, -0.6345, 0.4180, ..., -0.3564, 0.6992, 0.9191], [-0.5334, -0.6403, 0.4271, ..., -0.3339, 0.6533, 0.8694]]],
[-0.5334, -0.6403, 0.4271, ..., -0.3339, 0.6533, 0.8694]]], ```
We expect that every model added to 🤗 Transformers passes a couple of integration tests, meaning that the original We expect that every model added to 🤗 Transformers passes a couple of integration tests, meaning that the original
model and the reimplemented version in 🤗 Transformers have to give the exact same output up to a precision of 0.001! model and the reimplemented version in 🤗 Transformers have to give the exact same output up to a precision of 0.001!
...@@ -355,54 +342,52 @@ outputs of the 🤗 Transformers version multiple times against the intermediate ...@@ -355,54 +342,52 @@ outputs of the 🤗 Transformers version multiple times against the intermediate
*brand_new_bert* in which case an **efficient** debugging environment of the original repository is absolutely *brand_new_bert* in which case an **efficient** debugging environment of the original repository is absolutely
important. Here is some advice is to make your debugging environment as efficient as possible. important. Here is some advice is to make your debugging environment as efficient as possible.
- Find the best way of debugging intermediate results. Is the original repository written in PyTorch? Then you should - Find the best way of debugging intermediate results. Is the original repository written in PyTorch? Then you should
probably take the time to write a longer script that decomposes the original model into smaller sub-components to probably take the time to write a longer script that decomposes the original model into smaller sub-components to
retrieve intermediate values. Is the original repository written in Tensorflow 1? Then you might have to rely on retrieve intermediate values. Is the original repository written in Tensorflow 1? Then you might have to rely on
TensorFlow print operations like `tf.print <https://www.tensorflow.org/api_docs/python/tf/print>`__ to output TensorFlow print operations like [tf.print](https://www.tensorflow.org/api_docs/python/tf/print) to output
intermediate values. Is the original repository written in Jax? Then make sure that the model is **not jitted** when intermediate values. Is the original repository written in Jax? Then make sure that the model is **not jitted** when
running the forward pass, *e.g.* check-out `this link <https://github.com/google/jax/issues/196>`__. running the forward pass, *e.g.* check-out [this link](https://github.com/google/jax/issues/196).
- Use the smallest pretrained checkpoint you can find. The smaller the checkpoint, the faster your debug cycle - Use the smallest pretrained checkpoint you can find. The smaller the checkpoint, the faster your debug cycle
becomes. It is not efficient if your pretrained model is so big that your forward pass takes more than 10 seconds. becomes. It is not efficient if your pretrained model is so big that your forward pass takes more than 10 seconds.
In case only very large checkpoints are available, it might make more sense to create a dummy model in the new In case only very large checkpoints are available, it might make more sense to create a dummy model in the new
environment with randomly initialized weights and save those weights for comparison with the 🤗 Transformers version environment with randomly initialized weights and save those weights for comparison with the 🤗 Transformers version
of your model of your model
- Make sure you are using the easiest way of calling a forward pass in the original repository. Ideally, you want to - Make sure you are using the easiest way of calling a forward pass in the original repository. Ideally, you want to
find the function in the original repository that **only** calls a single forward pass, *i.e.* that is often called find the function in the original repository that **only** calls a single forward pass, *i.e.* that is often called
``predict``, ``evaluate``, ``forward`` or ``__call__``. You don't want to debug a function that calls ``forward`` `predict`, `evaluate`, `forward` or `__call__`. You don't want to debug a function that calls `forward`
multiple times, *e.g.* to generate text, like ``autoregressive_sample``, ``generate``. multiple times, *e.g.* to generate text, like `autoregressive_sample`, `generate`.
- Try to separate the tokenization from the model's `forward` pass. If the original repository shows examples where - Try to separate the tokenization from the model's *forward* pass. If the original repository shows examples where
you have to input a string, then try to find out where in the forward call the string input is changed to input ids you have to input a string, then try to find out where in the forward call the string input is changed to input ids
and start from this point. This might mean that you have to possibly write a small script yourself or change the and start from this point. This might mean that you have to possibly write a small script yourself or change the
original code so that you can directly input the ids instead of an input string. original code so that you can directly input the ids instead of an input string.
- Make sure that the model in your debugging setup is **not** in training mode, which often causes the model to yield - Make sure that the model in your debugging setup is **not** in training mode, which often causes the model to yield
random outputs due to multiple dropout layers in the model. Make sure that the forward pass in your debugging random outputs due to multiple dropout layers in the model. Make sure that the forward pass in your debugging
environment is **deterministic** so that the dropout layers are not used. Or use `transformers.file_utils.set_seed` environment is **deterministic** so that the dropout layers are not used. Or use *transformers.file_utils.set_seed*
if the old and new implementations are in the same framework. if the old and new implementations are in the same framework.
The following section gives you more specific details/tips on how you can do this for *brand_new_bert*. The following section gives you more specific details/tips on how you can do this for *brand_new_bert*.
5.-14. Port BrandNewBert to 🤗 Transformers ### 5.-14. Port BrandNewBert to 🤗 Transformers
-----------------------------------------------------------------------------------------------------------------------
Next, you can finally start adding new code to 🤗 Transformers. Go into the clone of your 🤗 Transformers' fork: Next, you can finally start adding new code to 🤗 Transformers. Go into the clone of your 🤗 Transformers' fork:
:: ```bash
cd transformers
cd transformers ```
In the special case that you are adding a model whose architecture exactly matches the model architecture of an In the special case that you are adding a model whose architecture exactly matches the model architecture of an
existing model you only have to add a conversion script as described in `this section <#write-a-conversion-script>`__. existing model you only have to add a conversion script as described in [this section](#write-a-conversion-script).
In this case, you can just re-use the whole model architecture of the already existing model. In this case, you can just re-use the whole model architecture of the already existing model.
Otherwise, let's start generating a new model with the amazing Cookiecutter! Otherwise, let's start generating a new model with the amazing Cookiecutter!
**Use the Cookiecutter to automatically generate the model's code** **Use the Cookiecutter to automatically generate the model's code**
To begin with head over to the `🤗 Transformers templates To begin with head over to the [🤗 Transformers templates](https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_model) to make use of our
<https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_model>`__ to make use of our `cookiecutter` implementation to automatically generate all the relevant files for your model. Again, we recommend
``cookiecutter`` implementation to automatically generate all the relevant files for your model. Again, we recommend only adding the PyTorch version of the model at first. Make sure you follow the instructions of the `README.md` on
only adding the PyTorch version of the model at first. Make sure you follow the instructions of the ``README.md`` on the [🤗 Transformers templates](https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_model)
the `🤗 Transformers templates <https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_model>`__
carefully. carefully.
**Open a Pull Request on the main huggingface/transformers repo** **Open a Pull Request on the main huggingface/transformers repo**
...@@ -415,29 +400,29 @@ You should do the following: ...@@ -415,29 +400,29 @@ You should do the following:
1. Create a branch with a descriptive name from your master branch 1. Create a branch with a descriptive name from your master branch
:: ```bash
git checkout -b add_brand_new_bert
git checkout -b add_brand_new_bert ```
2. Commit the automatically generated code: 2. Commit the automatically generated code:
:: ```bash
git add .
git add . git commit
git commit ```
3. Fetch and rebase to current master 3. Fetch and rebase to current master
:: ```bash
git fetch upstream
git fetch upstream git rebase upstream/master
git rebase upstream/master ```
4. Push the changes to your account using: 4. Push the changes to your account using:
:: ```bash
git push -u origin a-descriptive-name-for-my-changes
git push -u origin a-descriptive-name-for-my-changes ```
5. Once you are satisfied, go to the webpage of your fork on GitHub. Click on “Pull request”. Make sure to add the 5. Once you are satisfied, go to the webpage of your fork on GitHub. Click on “Pull request”. Make sure to add the
GitHub handle of some members of the Hugging Face team as reviewers, so that the Hugging Face team gets notified for GitHub handle of some members of the Hugging Face team as reviewers, so that the Hugging Face team gets notified for
...@@ -449,10 +434,10 @@ In the following, whenever you have done some progress, don't forget to commit y ...@@ -449,10 +434,10 @@ In the following, whenever you have done some progress, don't forget to commit y
that it shows in the pull request. Additionally, you should make sure to update your work with the current master from that it shows in the pull request. Additionally, you should make sure to update your work with the current master from
time to time by doing: time to time by doing:
:: ```bash
git fetch upstream
git fetch upstream git merge upstream/master
git merge upstream/master ```
In general, all questions you might have regarding the model or your implementation should be asked in your PR and In general, all questions you might have regarding the model or your implementation should be asked in your PR and
discussed/solved in the PR. This way, the Hugging Face team will always be notified when you are committing new code or discussed/solved in the PR. This way, the Hugging Face team will always be notified when you are committing new code or
...@@ -470,11 +455,11 @@ Hugging Face team by Slack or email. ...@@ -470,11 +455,11 @@ Hugging Face team by Slack or email.
**5. Adapt the generated models code for brand_new_bert** **5. Adapt the generated models code for brand_new_bert**
At first, we will focus only on the model itself and not care about the tokenizer. All the relevant code should be At first, we will focus only on the model itself and not care about the tokenizer. All the relevant code should be
found in the generated files ``src/transformers/models/brand_new_bert/modeling_brand_new_bert.py`` and found in the generated files `src/transformers/models/brand_new_bert/modeling_brand_new_bert.py` and
``src/transformers/models/brand_new_bert/configuration_brand_new_bert.py``. `src/transformers/models/brand_new_bert/configuration_brand_new_bert.py`.
Now you can finally start coding :). The generated code in Now you can finally start coding :). The generated code in
``src/transformers/models/brand_new_bert/modeling_brand_new_bert.py`` will either have the same architecture as BERT if `src/transformers/models/brand_new_bert/modeling_brand_new_bert.py` will either have the same architecture as BERT if
it's an encoder-only model or BART if it's an encoder-decoder model. At this point, you should remind yourself what it's an encoder-only model or BART if it's an encoder-decoder model. At this point, you should remind yourself what
you've learned in the beginning about the theoretical aspects of the model: *How is the model different from BERT or you've learned in the beginning about the theoretical aspects of the model: *How is the model different from BERT or
BART?*". Implement those changes which often means to change the *self-attention* layer, the order of the normalization BART?*". Implement those changes which often means to change the *self-attention* layer, the order of the normalization
...@@ -483,19 +468,19 @@ get a better feeling of how your model should be implemented. ...@@ -483,19 +468,19 @@ get a better feeling of how your model should be implemented.
**Note** that at this point, you don't have to be very sure that your code is fully correct or clean. Rather, it is **Note** that at this point, you don't have to be very sure that your code is fully correct or clean. Rather, it is
advised to add a first *unclean*, copy-pasted version of the original code to advised to add a first *unclean*, copy-pasted version of the original code to
``src/transformers/models/brand_new_bert/modeling_brand_new_bert.py`` until you feel like all the necessary code is `src/transformers/models/brand_new_bert/modeling_brand_new_bert.py` until you feel like all the necessary code is
added. From our experience, it is much more efficient to quickly add a first version of the required code and added. From our experience, it is much more efficient to quickly add a first version of the required code and
improve/correct the code iteratively with the conversion script as described in the next section. The only thing that improve/correct the code iteratively with the conversion script as described in the next section. The only thing that
has to work at this point is that you can instantiate the 🤗 Transformers implementation of *brand_new_bert*, *i.e.* the has to work at this point is that you can instantiate the 🤗 Transformers implementation of *brand_new_bert*, *i.e.* the
following command should work: following command should work:
.. code:: python ```python
from transformers import BrandNewBertModel, BrandNewBertConfig
from transformers import BrandNewBertModel, BrandNewBertConfig model = BrandNewBertModel(BrandNewBertConfig())
model = BrandNewBertModel(BrandNewBertConfig()) ```
The above command will create a model according to the default parameters as defined in ``BrandNewBertConfig()`` with The above command will create a model according to the default parameters as defined in `BrandNewBertConfig()` with
random weights, thus making sure that the ``init()`` methods of all components works. random weights, thus making sure that the `init()` methods of all components works.
**6. Write a conversion script** **6. Write a conversion script**
...@@ -507,177 +492,173 @@ the same framework as *brand_new_bert*. Usually, it is enough to copy an already ...@@ -507,177 +492,173 @@ the same framework as *brand_new_bert*. Usually, it is enough to copy an already
slightly adapt it for your use case. Don't hesitate to ask the Hugging Face team to point you to a similar already slightly adapt it for your use case. Don't hesitate to ask the Hugging Face team to point you to a similar already
existing conversion script for your model. existing conversion script for your model.
- If you are porting a model from TensorFlow to PyTorch, a good starting point might be BERT's conversion script `here - If you are porting a model from TensorFlow to PyTorch, a good starting point might be BERT's conversion script [here](https://github.com/huggingface/transformers/blob/7acfa95afb8194f8f9c1f4d2c6028224dbed35a2/src/transformers/models/bert/modeling_bert.py#L91)
<https://github.com/huggingface/transformers/blob/7acfa95afb8194f8f9c1f4d2c6028224dbed35a2/src/transformers/models/bert/modeling_bert.py#L91>`__ - If you are porting a model from PyTorch to PyTorch, a good starting point might be BART's conversion script [here](https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/convert_bart_original_pytorch_checkpoint_to_pytorch.py)
- If you are porting a model from PyTorch to PyTorch, a good starting point might be BART's conversion script `here
<https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/convert_bart_original_pytorch_checkpoint_to_pytorch.py>`__
In the following, we'll quickly explain how PyTorch models store layer weights and define layer names. In PyTorch, the In the following, we'll quickly explain how PyTorch models store layer weights and define layer names. In PyTorch, the
name of a layer is defined by the name of the class attribute you give the layer. Let's define a dummy model in name of a layer is defined by the name of the class attribute you give the layer. Let's define a dummy model in
PyTorch, called ``SimpleModel`` as follows: PyTorch, called `SimpleModel` as follows:
.. code:: python
from torch import nn
class SimpleModel(nn.Module): ```python
def __init__(self): from torch import nn
super().__init__()
self.dense = nn.Linear(10, 10)
self.intermediate = nn.Linear(10, 10)
self.layer_norm = nn.LayerNorm(10)
Now we can create an instance of this model definition which will fill all weights: ``dense``, ``intermediate``, class SimpleModel(nn.Module):
``layer_norm`` with random weights. We can print the model to see its architecture def __init__(self):
super().__init__()
self.dense = nn.Linear(10, 10)
self.intermediate = nn.Linear(10, 10)
self.layer_norm = nn.LayerNorm(10)
```
.. code:: python Now we can create an instance of this model definition which will fill all weights: `dense`, `intermediate`,
`layer_norm` with random weights. We can print the model to see its architecture
model = SimpleModel() ```python
model = SimpleModel()
print(model) print(model)
```
This will print out the following: This will print out the following:
.. code:: bash ```
SimpleModel(
SimpleModel( (dense): Linear(in_features=10, out_features=10, bias=True)
(dense): Linear(in_features=10, out_features=10, bias=True) (intermediate): Linear(in_features=10, out_features=10, bias=True)
(intermediate): Linear(in_features=10, out_features=10, bias=True) (layer_norm): LayerNorm((10,), eps=1e-05, elementwise_affine=True)
(layer_norm): LayerNorm((10,), eps=1e-05, elementwise_affine=True) )
) ```
We can see that the layer names are defined by the name of the class attribute in PyTorch. You can print out the weight We can see that the layer names are defined by the name of the class attribute in PyTorch. You can print out the weight
values of a specific layer: values of a specific layer:
.. code:: python ```python
print(model.dense.weight.data)
print(model.dense.weight.data) ```
to see that the weights were randomly initialized to see that the weights were randomly initialized
.. code:: bash ```
tensor([[-0.0818, 0.2207, -0.0749, -0.0030, 0.0045, -0.1569, -0.1598, 0.0212,
tensor([[-0.0818, 0.2207, -0.0749, -0.0030, 0.0045, -0.1569, -0.1598, 0.0212, -0.2077, 0.2157],
-0.2077, 0.2157], [ 0.1044, 0.0201, 0.0990, 0.2482, 0.3116, 0.2509, 0.2866, -0.2190,
[ 0.1044, 0.0201, 0.0990, 0.2482, 0.3116, 0.2509, 0.2866, -0.2190, 0.2166, -0.0212],
0.2166, -0.0212], [-0.2000, 0.1107, -0.1999, -0.3119, 0.1559, 0.0993, 0.1776, -0.1950,
[-0.2000, 0.1107, -0.1999, -0.3119, 0.1559, 0.0993, 0.1776, -0.1950, -0.1023, -0.0447],
-0.1023, -0.0447], [-0.0888, -0.1092, 0.2281, 0.0336, 0.1817, -0.0115, 0.2096, 0.1415,
[-0.0888, -0.1092, 0.2281, 0.0336, 0.1817, -0.0115, 0.2096, 0.1415, -0.1876, -0.2467],
-0.1876, -0.2467], [ 0.2208, -0.2352, -0.1426, -0.2636, -0.2889, -0.2061, -0.2849, -0.0465,
[ 0.2208, -0.2352, -0.1426, -0.2636, -0.2889, -0.2061, -0.2849, -0.0465, 0.2577, 0.0402],
0.2577, 0.0402], [ 0.1502, 0.2465, 0.2566, 0.0693, 0.2352, -0.0530, 0.1859, -0.0604,
[ 0.1502, 0.2465, 0.2566, 0.0693, 0.2352, -0.0530, 0.1859, -0.0604, 0.2132, 0.1680],
0.2132, 0.1680], [ 0.1733, -0.2407, -0.1721, 0.1484, 0.0358, -0.0633, -0.0721, -0.0090,
[ 0.1733, -0.2407, -0.1721, 0.1484, 0.0358, -0.0633, -0.0721, -0.0090, 0.2707, -0.2509],
0.2707, -0.2509], [-0.1173, 0.1561, 0.2945, 0.0595, -0.1996, 0.2988, -0.0802, 0.0407,
[-0.1173, 0.1561, 0.2945, 0.0595, -0.1996, 0.2988, -0.0802, 0.0407, 0.1829, -0.1568],
0.1829, -0.1568], [-0.1164, -0.2228, -0.0403, 0.0428, 0.1339, 0.0047, 0.1967, 0.2923,
[-0.1164, -0.2228, -0.0403, 0.0428, 0.1339, 0.0047, 0.1967, 0.2923, 0.0333, -0.0536],
0.0333, -0.0536], [-0.1492, -0.1616, 0.1057, 0.1950, -0.2807, -0.2710, -0.1586, 0.0739,
[-0.1492, -0.1616, 0.1057, 0.1950, -0.2807, -0.2710, -0.1586, 0.0739, 0.2220, 0.2358]]).
0.2220, 0.2358]]). ```
In the conversion script, you should fill those randomly initialized weights with the exact weights of the In the conversion script, you should fill those randomly initialized weights with the exact weights of the
corresponding layer in the checkpoint. *E.g.* corresponding layer in the checkpoint. *E.g.*
.. code:: python ```python
# retrieve matching layer weights, e.g. by
# recursive algorithm
layer_name = "dense"
pretrained_weight = array_of_dense_layer
# retrieve matching layer weights, e.g. by model_pointer = getattr(model, "dense")
# recursive algorithm
layer_name = "dense"
pretrained_weight = array_of_dense_layer
model_pointer = getattr(model, "dense") model_pointer.weight.data = torch.from_numpy(pretrained_weight)
```
model_pointer.weight.data = torch.from_numpy(pretrained_weight)
While doing so, you must verify that each randomly initialized weight of your PyTorch model and its corresponding While doing so, you must verify that each randomly initialized weight of your PyTorch model and its corresponding
pretrained checkpoint weight exactly match in both **shape and name**. To do so, it is **necessary** to add assert pretrained checkpoint weight exactly match in both **shape and name**. To do so, it is **necessary** to add assert
statements for the shape and print out the names of the checkpoints weights. E.g. you should add statements like: statements for the shape and print out the names of the checkpoints weights. E.g. you should add statements like:
.. code:: python ```python
assert (
assert ( model_pointer.weight.shape == pretrained_weight.shape
model_pointer.weight.shape == pretrained_weight.shape ), f"Pointer shape of random weight {model_pointer.shape} and array shape of checkpoint weight {pretrained_weight.shape} mismatched"
), f"Pointer shape of random weight {model_pointer.shape} and array shape of checkpoint weight {pretrained_weight.shape} mismatched" ```
Besides, you should also print out the names of both weights to make sure they match, *e.g.* Besides, you should also print out the names of both weights to make sure they match, *e.g.*
.. code:: python ```python
logger.info(f"Initialize PyTorch weight {layer_name} from {pretrained_weight.name}")
logger.info(f"Initialize PyTorch weight {layer_name} from {pretrained_weight.name}") ```
If either the shape or the name doesn't match, you probably assigned the wrong checkpoint weight to a randomly If either the shape or the name doesn't match, you probably assigned the wrong checkpoint weight to a randomly
initialized layer of the 🤗 Transformers implementation. initialized layer of the 🤗 Transformers implementation.
An incorrect shape is most likely due to an incorrect setting of the config parameters in ``BrandNewBertConfig()`` that An incorrect shape is most likely due to an incorrect setting of the config parameters in `BrandNewBertConfig()` that
do not exactly match those that were used for the checkpoint you want to convert. However, it could also be that do not exactly match those that were used for the checkpoint you want to convert. However, it could also be that
PyTorch's implementation of a layer requires the weight to be transposed beforehand. PyTorch's implementation of a layer requires the weight to be transposed beforehand.
Finally, you should also check that **all** required weights are initialized and print out all checkpoint weights that Finally, you should also check that **all** required weights are initialized and print out all checkpoint weights that
were not used for initialization to make sure the model is correctly converted. It is completely normal, that the were not used for initialization to make sure the model is correctly converted. It is completely normal, that the
conversion trials fail with either a wrong shape statement or wrong name assignment. This is most likely because either conversion trials fail with either a wrong shape statement or wrong name assignment. This is most likely because either
you used incorrect parameters in ``BrandNewBertConfig()``, have a wrong architecture in the 🤗 Transformers you used incorrect parameters in `BrandNewBertConfig()`, have a wrong architecture in the 🤗 Transformers
implementation, you have a bug in the ``init()`` functions of one of the components of the 🤗 Transformers implementation, you have a bug in the `init()` functions of one of the components of the 🤗 Transformers
implementation or you need to transpose one of the checkpoint weights. implementation or you need to transpose one of the checkpoint weights.
This step should be iterated with the previous step until all weights of the checkpoint are correctly loaded in the This step should be iterated with the previous step until all weights of the checkpoint are correctly loaded in the
Transformers model. Having correctly loaded the checkpoint into the 🤗 Transformers implementation, you can then save Transformers model. Having correctly loaded the checkpoint into the 🤗 Transformers implementation, you can then save
the model under a folder of your choice ``/path/to/converted/checkpoint/folder`` that should then contain both a the model under a folder of your choice `/path/to/converted/checkpoint/folder` that should then contain both a
``pytorch_model.bin`` file and a ``config.json`` file: `pytorch_model.bin` file and a `config.json` file:
.. code:: python
model.save_pretrained("/path/to/converted/checkpoint/folder") ```python
model.save_pretrained("/path/to/converted/checkpoint/folder")
```
**7. Implement the forward pass** **7. Implement the forward pass**
Having managed to correctly load the pretrained weights into the 🤗 Transformers implementation, you should now make Having managed to correctly load the pretrained weights into the 🤗 Transformers implementation, you should now make
sure that the forward pass is correctly implemented. In `Get familiar with the original repository sure that the forward pass is correctly implemented. In [Get familiar with the original repository](#run-a-pretrained-checkpoint-using-the-original-repository), you have already created a script that runs a forward
<#run-a-pretrained-checkpoint-using-the-original-repository>`__, you have already created a script that runs a forward
pass of the model using the original repository. Now you should write an analogous script using the 🤗 Transformers pass of the model using the original repository. Now you should write an analogous script using the 🤗 Transformers
implementation instead of the original one. It should look as follows: implementation instead of the original one. It should look as follows:
.. code:: python ```python
model = BrandNewBertModel.from_pretrained(/path/to/converted/checkpoint/folder)
model = BrandNewBertModel.from_pretrained(/path/to/converted/checkpoint/folder) input_ids = [0, 4, 4, 3, 2, 4, 1, 7, 19]
input_ids = [0, 4, 4, 3, 2, 4, 1, 7, 19] output = model(input_ids).last_hidden_states
output = model(input_ids).last_hidden_states ```
It is very likely that the 🤗 Transformers implementation and the original model implementation don't give the exact It is very likely that the 🤗 Transformers implementation and the original model implementation don't give the exact
same output the very first time or that the forward pass throws an error. Don't be disappointed - it's expected! First, same output the very first time or that the forward pass throws an error. Don't be disappointed - it's expected! First,
you should make sure that the forward pass doesn't throw any errors. It often happens that the wrong dimensions are you should make sure that the forward pass doesn't throw any errors. It often happens that the wrong dimensions are
used leading to a `Dimensionality mismatch` error or that the wrong data type object is used, *e.g.* ``torch.long`` used leading to a *Dimensionality mismatch* error or that the wrong data type object is used, *e.g.* `torch.long`
instead of ``torch.float32``. Don't hesitate to ask the Hugging Face team for help, if you don't manage to solve instead of `torch.float32`. Don't hesitate to ask the Hugging Face team for help, if you don't manage to solve
certain errors. certain errors.
The final part to make sure the 🤗 Transformers implementation works correctly is to ensure that the outputs are The final part to make sure the 🤗 Transformers implementation works correctly is to ensure that the outputs are
equivalent to a precision of ``1e-3``. First, you should ensure that the output shapes are identical, *i.e.* equivalent to a precision of `1e-3`. First, you should ensure that the output shapes are identical, *i.e.*
``outputs.shape`` should yield the same value for the script of the 🤗 Transformers implementation and the original `outputs.shape` should yield the same value for the script of the 🤗 Transformers implementation and the original
implementation. Next, you should make sure that the output values are identical as well. This one of the most difficult implementation. Next, you should make sure that the output values are identical as well. This one of the most difficult
parts of adding a new model. Common mistakes why the outputs are not identical are: parts of adding a new model. Common mistakes why the outputs are not identical are:
- Some layers were not added, *i.e.* an `activation` layer was not added, or the residual connection was forgotten - Some layers were not added, *i.e.* an *activation* layer was not added, or the residual connection was forgotten
- The word embedding matrix was not tied - The word embedding matrix was not tied
- The wrong positional embeddings are used because the original implementation uses on offset - The wrong positional embeddings are used because the original implementation uses on offset
- Dropout is applied during the forward pass. To fix this make sure `model.training is False` and that no dropout - Dropout is applied during the forward pass. To fix this make sure *model.training is False* and that no dropout
layer is falsely activated during the forward pass, *i.e.* pass `self.training` to `PyTorch's functional dropout layer is falsely activated during the forward pass, *i.e.* pass *self.training* to [PyTorch's functional dropout](https://pytorch.org/docs/stable/nn.functional.html?highlight=dropout#torch.nn.functional.dropout)
<https://pytorch.org/docs/stable/nn.functional.html?highlight=dropout#torch.nn.functional.dropout>`_
The best way to fix the problem is usually to look at the forward pass of the original implementation and the 🤗 The best way to fix the problem is usually to look at the forward pass of the original implementation and the 🤗
Transformers implementation side-by-side and check if there are any differences. Ideally, you should debug/print out Transformers implementation side-by-side and check if there are any differences. Ideally, you should debug/print out
intermediate outputs of both implementations of the forward pass to find the exact position in the network where the 🤗 intermediate outputs of both implementations of the forward pass to find the exact position in the network where the 🤗
Transformers implementation shows a different output than the original implementation. First, make sure that the Transformers implementation shows a different output than the original implementation. First, make sure that the
hard-coded ``input_ids`` in both scripts are identical. Next, verify that the outputs of the first transformation of hard-coded `input_ids` in both scripts are identical. Next, verify that the outputs of the first transformation of
the ``input_ids`` (usually the word embeddings) are identical. And then work your way up to the very last layer of the the `input_ids` (usually the word embeddings) are identical. And then work your way up to the very last layer of the
network. At some point, you will notice a difference between the two implementations, which should point you to the bug network. At some point, you will notice a difference between the two implementations, which should point you to the bug
in the 🤗 Transformers implementation. From our experience, a simple and efficient way is to add many print statements in the 🤗 Transformers implementation. From our experience, a simple and efficient way is to add many print statements
in both the original implementation and 🤗 Transformers implementation, at the same positions in the network in both the original implementation and 🤗 Transformers implementation, at the same positions in the network
respectively, and to successively remove print statements showing the same values for intermediate presentations. respectively, and to successively remove print statements showing the same values for intermediate presentations.
When you're confident that both implementations yield the same output, verifying the outputs with When you're confident that both implementations yield the same output, verifying the outputs with
``torch.allclose(original_output, output, atol=1e-3)``, you're done with the most difficult part! Congratulations - the `torch.allclose(original_output, output, atol=1e-3)`, you're done with the most difficult part! Congratulations - the
work left to be done should be a cakewalk 😊. work left to be done should be a cakewalk 😊.
**8. Adding all necessary model tests** **8. Adding all necessary model tests**
...@@ -685,42 +666,39 @@ work left to be done should be a cakewalk 😊. ...@@ -685,42 +666,39 @@ work left to be done should be a cakewalk 😊.
At this point, you have successfully added a new model. However, it is very much possible that the model does not yet At this point, you have successfully added a new model. However, it is very much possible that the model does not yet
fully comply with the required design. To make sure, the implementation is fully compatible with 🤗 Transformers, all fully comply with the required design. To make sure, the implementation is fully compatible with 🤗 Transformers, all
common tests should pass. The Cookiecutter should have automatically added a test file for your model, probably under common tests should pass. The Cookiecutter should have automatically added a test file for your model, probably under
the same ``tests/test_modeling_brand_new_bert.py``. Run this test file to verify that all common tests pass: the same `tests/test_modeling_brand_new_bert.py`. Run this test file to verify that all common tests pass:
.. code:: python
pytest tests/test_modeling_brand_new_bert.py ```python
pytest tests/test_modeling_brand_new_bert.py
```
Having fixed all common tests, it is now crucial to ensure that all the nice work you have done is well tested, so that Having fixed all common tests, it is now crucial to ensure that all the nice work you have done is well tested, so that
- - a) The community can easily understand your work by looking at specific tests of *brand_new_bert*
- b) Future changes to your model will not break any important feature of the model.
a) The community can easily understand your work by looking at specific tests of *brand_new_bert*
-
b) Future changes to your model will not break any important feature of the model.
At first, integration tests should be added. Those integration tests essentially do the same as the debugging scripts At first, integration tests should be added. Those integration tests essentially do the same as the debugging scripts
you used earlier to implement the model to 🤗 Transformers. A template of those model tests is already added by the you used earlier to implement the model to 🤗 Transformers. A template of those model tests is already added by the
Cookiecutter, called ``BrandNewBertModelIntegrationTests`` and only has to be filled out by you. To ensure that those Cookiecutter, called `BrandNewBertModelIntegrationTests` and only has to be filled out by you. To ensure that those
tests are passing, run tests are passing, run
.. code:: python ```bash
RUN_SLOW=1 pytest -sv tests/test_modeling_brand_new_bert.py::BrandNewBertModelIntegrationTests
```
RUN_SLOW=1 pytest -sv tests/test_modeling_brand_new_bert.py::BrandNewBertModelIntegrationTests <Tip>
.. note:: In case you are using Windows, you should replace `RUN_SLOW=1` with `SET RUN_SLOW=1`
In case you are using Windows, you should replace ``RUN_SLOW=1`` with ``SET RUN_SLOW=1`` </Tip>
Second, all features that are special to *brand_new_bert* should be tested additionally in a separate test under Second, all features that are special to *brand_new_bert* should be tested additionally in a separate test under
``BrandNewBertModelTester``/``BrandNewBertModelTest``. This part is often forgotten but is extremely useful in two `BrandNewBertModelTester`/``BrandNewBertModelTest`. This part is often forgotten but is extremely useful in two
ways: ways:
- It helps to transfer the knowledge you have acquired during the model addition to the community by showing how the - It helps to transfer the knowledge you have acquired during the model addition to the community by showing how the
special features of *brand_new_bert* should work. special features of *brand_new_bert* should work.
- Future contributors can quickly test changes to the model by running those special tests. - Future contributors can quickly test changes to the model by running those special tests.
**9. Implement the tokenizer** **9. Implement the tokenizer**
...@@ -732,29 +710,29 @@ It is very important to find/extract the original tokenizer file and to manage t ...@@ -732,29 +710,29 @@ It is very important to find/extract the original tokenizer file and to manage t
Transformers' implementation of the tokenizer. Transformers' implementation of the tokenizer.
To ensure that the tokenizer works correctly, it is recommended to first create a script in the original repository To ensure that the tokenizer works correctly, it is recommended to first create a script in the original repository
that inputs a string and returns the ``input_ids``. It could look similar to this (in pseudo-code): that inputs a string and returns the `input_ids``. It could look similar to this (in pseudo-code):
.. code:: bash
input_str = "This is a long example input string containing special characters .$?-, numbers 2872 234 12 and words." ```python
model = BrandNewBertModel.load_pretrained_checkpoint(/path/to/checkpoint/) input_str = "This is a long example input string containing special characters .$?-, numbers 2872 234 12 and words."
input_ids = model.tokenize(input_str) model = BrandNewBertModel.load_pretrained_checkpoint(/path/to/checkpoint/)
input_ids = model.tokenize(input_str)
```
You might have to take a deeper look again into the original repository to find the correct tokenizer function or you You might have to take a deeper look again into the original repository to find the correct tokenizer function or you
might even have to do changes to your clone of the original repository to only output the ``input_ids``. Having written might even have to do changes to your clone of the original repository to only output the `input_ids`. Having written
a functional tokenization script that uses the original repository, an analogous script for 🤗 Transformers should be a functional tokenization script that uses the original repository, an analogous script for 🤗 Transformers should be
created. It should look similar to this: created. It should look similar to this:
.. code:: python ```python
from transformers import BrandNewBertTokenizer
input_str = "This is a long example input string containing special characters .$?-, numbers 2872 234 12 and words."
from transformers import BrandNewBertTokenizer tokenizer = BrandNewBertTokenizer.from_pretrained(/path/to/tokenizer/folder/)
input_str = "This is a long example input string containing special characters .$?-, numbers 2872 234 12 and words."
tokenizer = BrandNewBertTokenizer.from_pretrained(/path/to/tokenizer/folder/) input_ids = tokenizer(input_str).input_ids
```
input_ids = tokenizer(input_str).input_ids When both `input_ids` yield the same values, as a final step a tokenizer test file should also be added.
When both ``input_ids`` yield the same values, as a final step a tokenizer test file should also be added.
Analogous to the modeling test files of *brand_new_bert*, the tokenization test files of *brand_new_bert* should Analogous to the modeling test files of *brand_new_bert*, the tokenization test files of *brand_new_bert* should
contain a couple of hard-coded integration tests. contain a couple of hard-coded integration tests.
...@@ -762,12 +740,12 @@ contain a couple of hard-coded integration tests. ...@@ -762,12 +740,12 @@ contain a couple of hard-coded integration tests.
**10. Run End-to-end integration tests** **10. Run End-to-end integration tests**
Having added the tokenizer, you should also add a couple of end-to-end integration tests using both the model and the Having added the tokenizer, you should also add a couple of end-to-end integration tests using both the model and the
tokenizer to ``tests/test_modeling_brand_new_bert.py`` in 🤗 Transformers. Such a test should show on a meaningful tokenizer to `tests/test_modeling_brand_new_bert.py` in 🤗 Transformers. Such a test should show on a meaningful
text-to-text sample that the 🤗 Transformers implementation works as expected. A meaningful text-to-text sample can text-to-text sample that the 🤗 Transformers implementation works as expected. A meaningful text-to-text sample can
include *e.g.* a source-to-target-translation pair, an article-to-summary pair, a question-to-answer pair, etc… If none include *e.g.* a source-to-target-translation pair, an article-to-summary pair, a question-to-answer pair, etc… If none
of the ported checkpoints has been fine-tuned on a downstream task it is enough to simply rely on the model tests. In a of the ported checkpoints has been fine-tuned on a downstream task it is enough to simply rely on the model tests. In a
final step to ensure that the model is fully functional, it is advised that you also run all tests on GPU. It can final step to ensure that the model is fully functional, it is advised that you also run all tests on GPU. It can
happen that you forgot to add some ``.to(self.device)`` statements to internal tensors of the model, which in such a happen that you forgot to add some `.to(self.device)` statements to internal tensors of the model, which in such a
test would show in an error. In case you have no access to a GPU, the Hugging Face team can take care of running those test would show in an error. In case you have no access to a GPU, the Hugging Face team can take care of running those
tests for you. tests for you.
...@@ -775,12 +753,12 @@ tests for you. ...@@ -775,12 +753,12 @@ tests for you.
Now, all the necessary functionality for *brand_new_bert* is added - you're almost done! The only thing left to add is Now, all the necessary functionality for *brand_new_bert* is added - you're almost done! The only thing left to add is
a nice docstring and a doc page. The Cookiecutter should have added a template file called a nice docstring and a doc page. The Cookiecutter should have added a template file called
``docs/source/model_doc/brand_new_bert.rst`` that you should fill out. Users of your model will usually first look at `docs/source/model_doc/brand_new_bert.rst` that you should fill out. Users of your model will usually first look at
this page before using your model. Hence, the documentation must be understandable and concise. It is very useful for this page before using your model. Hence, the documentation must be understandable and concise. It is very useful for
the community to add some *Tips* to show how the model should be used. Don't hesitate to ping the Hugging Face team the community to add some *Tips* to show how the model should be used. Don't hesitate to ping the Hugging Face team
regarding the docstrings. regarding the docstrings.
Next, make sure that the docstring added to ``src/transformers/models/brand_new_bert/modeling_brand_new_bert.py`` is Next, make sure that the docstring added to `src/transformers/models/brand_new_bert/modeling_brand_new_bert.py` is
correct and included all necessary inputs and outputs. It is always to good to remind oneself that documentation should correct and included all necessary inputs and outputs. It is always to good to remind oneself that documentation should
be treated at least as carefully as the code in 🤗 Transformers since the documentation is usually the first contact be treated at least as carefully as the code in 🤗 Transformers since the documentation is usually the first contact
point of the community with the model. point of the community with the model.
...@@ -790,15 +768,15 @@ point of the community with the model. ...@@ -790,15 +768,15 @@ point of the community with the model.
Great, now you have added all the necessary code for *brand_new_bert*. At this point, you should correct some potential Great, now you have added all the necessary code for *brand_new_bert*. At this point, you should correct some potential
incorrect code style by running: incorrect code style by running:
.. code:: bash ```bash
make style
make style ```
and verify that your coding style passes the quality check: and verify that your coding style passes the quality check:
.. code:: bash ```bash
make quality
make quality ```
There are a couple of other very strict design tests in 🤗 Transformers that might still be failing, which shows up in There are a couple of other very strict design tests in 🤗 Transformers that might still be failing, which shows up in
the tests of your pull request. This is often because of some missing information in the docstring or some incorrect the tests of your pull request. This is often because of some missing information in the docstring or some incorrect
...@@ -833,8 +811,7 @@ Hugging Face team should have helped you already at this point, but it is worth ...@@ -833,8 +811,7 @@ Hugging Face team should have helped you already at this point, but it is worth
PR a nice description and eventually add comments to your code, if you want to point out certain design choices to your PR a nice description and eventually add comments to your code, if you want to point out certain design choices to your
reviewer. reviewer.
Share your work!! ### Share your work!!
-----------------------------------------------------------------------------------------------------------------------
Now, it's time to get some credit from the community for your work! Having completed a model addition is a major Now, it's time to get some credit from the community for your work! Having completed a model addition is a major
contribution to Transformers and the whole NLP community. Your code and the ported pre-trained models will certainly be contribution to Transformers and the whole NLP community. Your code and the ported pre-trained models will certainly be
......
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-->
# How to add a pipeline to 🤗 Transformers?
First and foremost, you need to decide the raw entries the pipeline will be able to take. It can be strings, raw bytes,
dictionaries or whatever seems to be the most likely desired input. Try to keep these inputs as pure Python as possible
as it makes compatibility easier (even through other languages via JSON). Those will be the `inputs` of the
pipeline (`preprocess`).
Then define the `outputs`. Same policy as the `inputs`. The simpler, the better. Those will be the outputs of
`postprocess` method.
Start by inheriting the base class `Pipeline`. with the 4 methods needed to implement `preprocess`,
`_forward`, `postprocess` and `_sanitize_parameters`.
```python
from transformers import Pipeline
class MyPipeline(Pipeline):
def _sanitize_parameters(self, **kwargs):
preprocess_kwargs = {}
if "maybe_arg" in kwargs:
preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
return preprocess_kwargs, {}, {}
def preprocess(self, inputs, maybe_arg=2):
model_input = Tensor(....)
return {"model_input": model_input}
def _forward(self, model_inputs):
# model_inputs == {"model_input": model_input}
outputs = self.model(**model_inputs)
# Maybe {"logits": Tensor(...)}
return outputs
def postprocess(self, model_outputs):
best_class = model_outputs["logits"].softmax(-1)
return best_class
```
The structure of this breakdown is to support relatively seamless support for CPU/GPU, while supporting doing
pre/postprocessing on the CPU on different threads
`preprocess` will take the originally defined inputs, and turn them into something feedable to the model. It might
contain more information and is usually a `Dict`.
`_forward` is the implementation detail and is not meant to be called directly. `forward` is the preferred
called method as it contains safeguards to make sure everything is working on the expected device. If anything is
linked to a real model it belongs in the `_forward` method, anything else is in the preprocess/postprocess.
`postprocess` methods will take the output of `_forward` and turn it into the final output that were decided
earlier.
`_sanitize_parameters` exists to allow users to pass any parameters whenever they wish, be it at initialization
time `pipeline(...., maybe_arg=4)` or at call time `pipe = pipeline(...); output = pipe(...., maybe_arg=4)`.
The returns of `_sanitize_parameters` are the 3 dicts of kwargs that will be passed directly to `preprocess`,
`_forward` and `postprocess`. Don't fill anything if the caller didn't call with any extra parameter. That
allows to keep the default arguments in the function definition which is always more "natural".
A classic example would be a `top_k` argument in the post processing in classification tasks.
```python
>>> pipe = pipeline("my-new-task")
>>> pipe("This is a test")
[{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}, {"label": "3-star", "score": 0.05}
{"label": "4-star", "score": 0.025}, {"label": "5-star", "score": 0.025}]
>>> pipe("This is a test", top_k=2)
[{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}]
```
In order to achieve that, we'll update our `postprocess` method with a default parameter to `5`. and edit
`_sanitize_parameters` to allow this new parameter.
```python
def postprocess(self, model_outputs, top_k=5):
best_class = model_outputs["logits"].softmax(-1)
# Add logic to handle top_k
return best_class
def _sanitize_parameters(self, **kwargs):
preprocess_kwargs = {}
if "maybe_arg" in kwargs:
preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
postprocess_kwargs = {}
if "top_k" in kwargs:
preprocess_kwargs["top_k"] = kwargs["top_k"]
return preprocess_kwargs, {}, postprocess_kwargs
```
Try to keep the inputs/outputs very simple and ideally JSON-serializable as it makes the pipeline usage very easy
without requiring users to understand new kind of objects. It's also relatively common to support many different types
of arguments for ease of use (audio files, can be filenames, URLs or pure bytes)
## Adding it to the list of supported tasks
Go to `src/transformers/pipelines/__init__.py` and fill in `SUPPORTED_TASKS` with your newly created pipeline.
If possible it should provide a default model.
## Adding tests
Create a new file `tests/test_pipelines_MY_PIPELINE.py` with example with the other tests.
The `run_pipeline_test` function will be very generic and run on small random models on every possible
architecture as defined by `model_mapping` and `tf_model_mapping`.
This is very important to test future compatibility, meaning if someone adds a new model for
`XXXForQuestionAnswering` then the pipeline test will attempt to run on it. Because the models are random it's
impossible to check for actual values, that's why There is a helper `ANY` that will simply attempt to match the
output of the pipeline TYPE.
You also *need* to implement 2 (ideally 4) tests.
- `test_small_model_pt` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
and test the pipeline outputs. The results should be the same as `test_small_model_tf`.
- `test_small_model_tf` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
and test the pipeline outputs. The results should be the same as `test_small_model_pt`.
- `test_large_model_pt` (`optional`): Tests the pipeline on a real pipeline where the results are supposed to
make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
sure there is no drift in future releases
- `test_large_model_tf` (`optional`): Tests the pipeline on a real pipeline where the results are supposed to
make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
sure there is no drift in future releases
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
How to add a pipeline to 🤗 Transformers?
=======================================================================================================================
First and foremost, you need to decide the raw entries the pipeline will be able to take. It can be strings, raw bytes,
dictionaries or whatever seems to be the most likely desired input. Try to keep these inputs as pure Python as possible
as it makes compatibility easier (even through other languages via JSON). Those will be the :obj:`inputs` of the
pipeline (:obj:`preprocess`).
Then define the :obj:`outputs`. Same policy as the :obj:`inputs`. The simpler, the better. Those will be the outputs of
:obj:`postprocess` method.
Start by inheriting the base class :obj:`Pipeline`. with the 4 methods needed to implement :obj:`preprocess`,
:obj:`_forward`, :obj:`postprocess` and :obj:`_sanitize_parameters`.
.. code-block::
from transformers import Pipeline
class MyPipeline(Pipeline):
def _sanitize_parameters(self, **kwargs):
preprocess_kwargs = {}
if "maybe_arg" in kwargs:
preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
return preprocess_kwargs, {}, {}
def preprocess(self, inputs, maybe_arg=2):
model_input = Tensor(....)
return {"model_input": model_input}
def _forward(self, model_inputs):
# model_inputs == {"model_input": model_input}
outputs = self.model(**model_inputs)
# Maybe {"logits": Tensor(...)}
return outputs
def postprocess(self, model_outputs):
best_class = model_outputs["logits"].softmax(-1)
return best_class
The structure of this breakdown is to support relatively seamless support for CPU/GPU, while supporting doing
pre/postprocessing on the CPU on different threads
:obj:`preprocess` will take the originally defined inputs, and turn them into something feedable to the model. It might
contain more information and is usually a :obj:`Dict`.
:obj:`_forward` is the implementation detail and is not meant to be called directly. :obj:`forward` is the preferred
called method as it contains safeguards to make sure everything is working on the expected device. If anything is
linked to a real model it belongs in the :obj:`_forward` method, anything else is in the preprocess/postprocess.
:obj:`postprocess` methods will take the output of :obj:`_forward` and turn it into the final output that were decided
earlier.
:obj:`_sanitize_parameters` exists to allow users to pass any parameters whenever they wish, be it at initialization
time ``pipeline(...., maybe_arg=4)`` or at call time ``pipe = pipeline(...); output = pipe(...., maybe_arg=4)``.
The returns of :obj:`_sanitize_parameters` are the 3 dicts of kwargs that will be passed directly to :obj:`preprocess`,
:obj:`_forward` and :obj:`postprocess`. Don't fill anything if the caller didn't call with any extra parameter. That
allows to keep the default arguments in the function definition which is always more "natural".
A classic example would be a :obj:`top_k` argument in the post processing in classification tasks.
.. code-block::
>>> pipe = pipeline("my-new-task")
>>> pipe("This is a test")
[{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}, {"label": "3-star", "score": 0.05}
{"label": "4-star", "score": 0.025}, {"label": "5-star", "score": 0.025}]
>>> pipe("This is a test", top_k=2)
[{"label": "1-star", "score": 0.8}, {"label": "2-star", "score": 0.1}]
In order to achieve that, we'll update our :obj:`postprocess` method with a default parameter to :obj:`5`. and edit
:obj:`_sanitize_parameters` to allow this new parameter.
.. code-block::
def postprocess(self, model_outputs, top_k=5):
best_class = model_outputs["logits"].softmax(-1)
# Add logic to handle top_k
return best_class
def _sanitize_parameters(self, **kwargs):
preprocess_kwargs = {}
if "maybe_arg" in kwargs:
preprocess_kwargs["maybe_arg"] = kwargs["maybe_arg"]
postprocess_kwargs = {}
if "top_k" in kwargs:
preprocess_kwargs["top_k"] = kwargs["top_k"]
return preprocess_kwargs, {}, postprocess_kwargs
Try to keep the inputs/outputs very simple and ideally JSON-serializable as it makes the pipeline usage very easy
without requiring users to understand new kind of objects. It's also relatively common to support many different types
of arguments for ease of use (audio files, can be filenames, URLs or pure bytes)
Adding it to the list of supported tasks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Go to ``src/transformers/pipelines/__init__.py`` and fill in :obj:`SUPPORTED_TASKS` with your newly created pipeline.
If possible it should provide a default model.
Adding tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Create a new file ``tests/test_pipelines_MY_PIPELINE.py`` with example with the other tests.
The :obj:`run_pipeline_test` function will be very generic and run on small random models on every possible
architecture as defined by :obj:`model_mapping` and :obj:`tf_model_mapping`.
This is very important to test future compatibility, meaning if someone adds a new model for
:obj:`XXXForQuestionAnswering` then the pipeline test will attempt to run on it. Because the models are random it's
impossible to check for actual values, that's why There is a helper :obj:`ANY` that will simply attempt to match the
output of the pipeline TYPE.
You also *need* to implement 2 (ideally 4) tests.
- :obj:`test_small_model_pt` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
and test the pipeline outputs. The results should be the same as :obj:`test_small_model_tf`.
- :obj:`test_small_model_tf` : Define 1 small model for this pipeline (doesn't matter if the results don't make sense)
and test the pipeline outputs. The results should be the same as :obj:`test_small_model_pt`.
- :obj:`test_large_model_pt` (:obj:`optional`): Tests the pipeline on a real pipeline where the results are supposed to
make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
sure there is no drift in future releases
- :obj:`test_large_model_tf` (:obj:`optional`): Tests the pipeline on a real pipeline where the results are supposed to
make sense. These tests are slow and should be marked as such. Here the goal is to showcase the pipeline and to make
sure there is no drift in future releases
.. <!--Copyright 2020 The HuggingFace Team. All rights reserved.
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. specific language governing permissions and limitations under the License.
-->
BERTology # BERTology
-----------------------------------------------------------------------------------------------------------------------
There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT There is a growing field of study concerned with investigating the inner working of large-scale transformers like BERT
(that some call "BERTology"). Some good examples of this field are: (that some call "BERTology"). Some good examples of this field are:
* BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick: - BERT Rediscovers the Classical NLP Pipeline by Ian Tenney, Dipanjan Das, Ellie Pavlick:
https://arxiv.org/abs/1905.05950 https://arxiv.org/abs/1905.05950
* Are Sixteen Heads Really Better than One? by Paul Michel, Omer Levy, Graham Neubig: https://arxiv.org/abs/1905.10650 - Are Sixteen Heads Really Better than One? by Paul Michel, Omer Levy, Graham Neubig: https://arxiv.org/abs/1905.10650
* What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. - What Does BERT Look At? An Analysis of BERT's Attention by Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D.
Manning: https://arxiv.org/abs/1906.04341 Manning: https://arxiv.org/abs/1906.04341
In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to In order to help this new field develop, we have included a few additional features in the BERT/GPT/GPT-2 models to
...@@ -28,11 +27,10 @@ help people access the inner representations, mainly adapted from the great work ...@@ -28,11 +27,10 @@ help people access the inner representations, mainly adapted from the great work
(https://arxiv.org/abs/1905.10650): (https://arxiv.org/abs/1905.10650):
* accessing all the hidden-states of BERT/GPT/GPT-2, - accessing all the hidden-states of BERT/GPT/GPT-2,
* accessing all the attention weights for each head of BERT/GPT/GPT-2, - accessing all the attention weights for each head of BERT/GPT/GPT-2,
* retrieving heads output values and gradients to be able to compute head importance score and prune head as explained - retrieving heads output values and gradients to be able to compute head importance score and prune head as explained
in https://arxiv.org/abs/1905.10650. in https://arxiv.org/abs/1905.10650.
To help you understand and use these features, we have added a specific example script: :prefix_link:`bertology.py To help you understand and use these features, we have added a specific example script: [bertology.py](https://github.com/huggingface/transformers/tree/master/examples/research_projects/bertology/run_bertology.py) while extract information and prune a model pre-trained on
<examples/research_projects/bertology/run_bertology.py>` while extract information and prune a model pre-trained on
GLUE. GLUE.
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Converting Tensorflow Checkpoints
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints to models
that can be loaded using the `from_pretrained` methods of the library.
<Tip>
Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**) available in any
transformers >= 2.3.0 installation.
The documentation below reflects the **transformers-cli convert** command format.
</Tip>
## BERT
You can convert any TensorFlow checkpoint for BERT (in particular [the pre-trained models released by Google](https://github.com/google-research/bert#pre-trained-models)) in a PyTorch save file by using the
[convert_bert_original_tf_checkpoint_to_pytorch.py](https://github.com/huggingface/transformers/tree/master/src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py) script.
This CLI takes as input a TensorFlow checkpoint (three files starting with `bert_model.ckpt`) and the associated
configuration file (`bert_config.json`), and creates a PyTorch model for this configuration, loads the weights from
the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can
be imported using `from_pretrained()` (see example in [quicktour](quicktour) , [run_glue.py](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification/run_glue.py) ).
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
checkpoint (the three files starting with `bert_model.ckpt`) but be sure to keep the configuration file (\
`bert_config.json`) and the vocabulary file (`vocab.txt`) as these are needed for the PyTorch model too.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (`pip install tensorflow`). The rest of the repository only requires PyTorch.
Here is an example of the conversion process for a pre-trained `BERT-Base Uncased` model:
```bash
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
transformers-cli convert --model_type bert \
--tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
--config $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
```
You can download Google's pre-trained models for the conversion [here](https://github.com/google-research/bert#pre-trained-models).
## ALBERT
Convert TensorFlow model checkpoints of ALBERT to PyTorch using the
[convert_albert_original_tf_checkpoint_to_pytorch.py](https://github.com/huggingface/transformers/tree/master/src/transformers/models/albert/convert_albert_original_tf_checkpoint_to_pytorch.py) script.
The CLI takes as input a TensorFlow checkpoint (three files starting with `model.ckpt-best`) and the accompanying
configuration file (`albert_config.json`), then creates and saves a PyTorch model. To run this conversion you will
need to have TensorFlow and PyTorch installed.
Here is an example of the conversion process for the pre-trained `ALBERT Base` model:
```bash
export ALBERT_BASE_DIR=/path/to/albert/albert_base
transformers-cli convert --model_type albert \
--tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \
--config $ALBERT_BASE_DIR/albert_config.json \
--pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin
```
You can download Google's pre-trained models for the conversion [here](https://github.com/google-research/albert#pre-trained-models).
## OpenAI GPT
Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint
save as the same format than OpenAI pretrained model (see [here](https://github.com/openai/finetune-transformer-lm)\
)
```bash
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
transformers-cli convert --model_type gpt \
--tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT_CONFIG] \
[--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \
```
## OpenAI GPT-2
Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see [here](https://github.com/openai/gpt-2))
```bash
export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
transformers-cli convert --model_type gpt2 \
--tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT2_CONFIG] \
[--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
```
## Transformer-XL
Here is an example of the conversion process for a pre-trained Transformer-XL model (see [here](https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models))
```bash
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
transformers-cli convert --model_type transfo_xl \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config TRANSFO_XL_CONFIG] \
[--finetuning_task_name TRANSFO_XL_FINETUNED_TASK]
```
## XLNet
Here is an example of the conversion process for a pre-trained XLNet model:
```bash
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
transformers-cli convert --model_type xlnet \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \
--config $TRANSFO_XL_CONFIG_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--finetuning_task_name XLNET_FINETUNED_TASK] \
```
## XLM
Here is an example of the conversion process for a pre-trained XLM model:
```bash
export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
transformers-cli convert --model_type xlm \
--tf_checkpoint $XLM_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT
[--config XML_CONFIG] \
[--finetuning_task_name XML_FINETUNED_TASK]
```
## T5
Here is an example of the conversion process for a pre-trained T5 model:
```bash
export T5=/path/to/t5/uncased_L-12_H-768_A-12
transformers-cli convert --model_type t5 \
--tf_checkpoint $T5/t5_model.ckpt \
--config $T5/t5_config.json \
--pytorch_dump_output $T5/pytorch_model.bin
```
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Converting Tensorflow Checkpoints
=======================================================================================================================
A command-line interface is provided to convert original Bert/GPT/GPT-2/Transformer-XL/XLNet/XLM checkpoints to models
that can be loaded using the ``from_pretrained`` methods of the library.
.. note::
Since 2.3.0 the conversion script is now part of the transformers CLI (**transformers-cli**) available in any
transformers >= 2.3.0 installation.
The documentation below reflects the **transformers-cli convert** command format.
BERT
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can convert any TensorFlow checkpoint for BERT (in particular `the pre-trained models released by Google
<https://github.com/google-research/bert#pre-trained-models>`_) in a PyTorch save file by using the
:prefix_link:`convert_bert_original_tf_checkpoint_to_pytorch.py
<src/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py>` script.
This CLI takes as input a TensorFlow checkpoint (three files starting with ``bert_model.ckpt``) and the associated
configuration file (``bert_config.json``), and creates a PyTorch model for this configuration, loads the weights from
the TensorFlow checkpoint in the PyTorch model and saves the resulting model in a standard PyTorch save file that can
be imported using ``from_pretrained()`` (see example in :doc:`quicktour` , :prefix_link:`run_glue.py
<examples/pytorch/text-classification/run_glue.py>` ).
You only need to run this conversion script **once** to get a PyTorch model. You can then disregard the TensorFlow
checkpoint (the three files starting with ``bert_model.ckpt``) but be sure to keep the configuration file (\
``bert_config.json``) and the vocabulary file (``vocab.txt``) as these are needed for the PyTorch model too.
To run this specific conversion script you will need to have TensorFlow and PyTorch installed (``pip install
tensorflow``). The rest of the repository only requires PyTorch.
Here is an example of the conversion process for a pre-trained ``BERT-Base Uncased`` model:
.. code-block:: shell
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
transformers-cli convert --model_type bert \
--tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
--config $BERT_BASE_DIR/bert_config.json \
--pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin
You can download Google's pre-trained models for the conversion `here
<https://github.com/google-research/bert#pre-trained-models>`__.
ALBERT
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Convert TensorFlow model checkpoints of ALBERT to PyTorch using the
:prefix_link:`convert_albert_original_tf_checkpoint_to_pytorch.py
<src/transformers/models/albert/convert_albert_original_tf_checkpoint_to_pytorch.py>` script.
The CLI takes as input a TensorFlow checkpoint (three files starting with ``model.ckpt-best``) and the accompanying
configuration file (``albert_config.json``), then creates and saves a PyTorch model. To run this conversion you will
need to have TensorFlow and PyTorch installed.
Here is an example of the conversion process for the pre-trained ``ALBERT Base`` model:
.. code-block:: shell
export ALBERT_BASE_DIR=/path/to/albert/albert_base
transformers-cli convert --model_type albert \
--tf_checkpoint $ALBERT_BASE_DIR/model.ckpt-best \
--config $ALBERT_BASE_DIR/albert_config.json \
--pytorch_dump_output $ALBERT_BASE_DIR/pytorch_model.bin
You can download Google's pre-trained models for the conversion `here
<https://github.com/google-research/albert#pre-trained-models>`__.
OpenAI GPT
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained OpenAI GPT model, assuming that your NumPy checkpoint
save as the same format than OpenAI pretrained model (see `here <https://github.com/openai/finetune-transformer-lm>`__\
)
.. code-block:: shell
export OPENAI_GPT_CHECKPOINT_FOLDER_PATH=/path/to/openai/pretrained/numpy/weights
transformers-cli convert --model_type gpt \
--tf_checkpoint $OPENAI_GPT_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT_CONFIG] \
[--finetuning_task_name OPENAI_GPT_FINETUNED_TASK] \
OpenAI GPT-2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained OpenAI GPT-2 model (see `here
<https://github.com/openai/gpt-2>`__)
.. code-block:: shell
export OPENAI_GPT2_CHECKPOINT_PATH=/path/to/gpt2/pretrained/weights
transformers-cli convert --model_type gpt2 \
--tf_checkpoint $OPENAI_GPT2_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config OPENAI_GPT2_CONFIG] \
[--finetuning_task_name OPENAI_GPT2_FINETUNED_TASK]
Transformer-XL
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained Transformer-XL model (see `here
<https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models>`__)
.. code-block:: shell
export TRANSFO_XL_CHECKPOINT_FOLDER_PATH=/path/to/transfo/xl/checkpoint
transformers-cli convert --model_type transfo_xl \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_FOLDER_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--config TRANSFO_XL_CONFIG] \
[--finetuning_task_name TRANSFO_XL_FINETUNED_TASK]
XLNet
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained XLNet model:
.. code-block:: shell
export TRANSFO_XL_CHECKPOINT_PATH=/path/to/xlnet/checkpoint
export TRANSFO_XL_CONFIG_PATH=/path/to/xlnet/config
transformers-cli convert --model_type xlnet \
--tf_checkpoint $TRANSFO_XL_CHECKPOINT_PATH \
--config $TRANSFO_XL_CONFIG_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT \
[--finetuning_task_name XLNET_FINETUNED_TASK] \
XLM
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained XLM model:
.. code-block:: shell
export XLM_CHECKPOINT_PATH=/path/to/xlm/checkpoint
transformers-cli convert --model_type xlm \
--tf_checkpoint $XLM_CHECKPOINT_PATH \
--pytorch_dump_output $PYTORCH_DUMP_OUTPUT
[--config XML_CONFIG] \
[--finetuning_task_name XML_FINETUNED_TASK]
T5
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here is an example of the conversion process for a pre-trained T5 model:
.. code-block:: shell
export T5=/path/to/t5/uncased_L-12_H-768_A-12
transformers-cli convert --model_type t5 \
--tf_checkpoint $T5/t5_model.ckpt \
--config $T5/t5_config.json \
--pytorch_dump_output $T5/pytorch_model.bin
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Using tokenizers from 🤗 Tokenizers
The [`PreTrainedTokenizerFast`] depends on the [🤗 Tokenizers](https://huggingface.co/docs/tokenizers) library. The tokenizers obtained from the 🤗 Tokenizers library can be
loaded very simply into 🤗 Transformers.
Before getting in the specifics, let's first start by creating a dummy tokenizer in a few lines:
```python
>>> from tokenizers import Tokenizer
>>> from tokenizers.models import BPE
>>> from tokenizers.trainers import BpeTrainer
>>> from tokenizers.pre_tokenizers import Whitespace
>>> tokenizer = Tokenizer(BPE(unk_token="[UNK]"))
>>> trainer = BpeTrainer(special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"])
>>> tokenizer.pre_tokenizer = Whitespace()
>>> files = [...]
>>> tokenizer.train(files, trainer)
```
We now have a tokenizer trained on the files we defined. We can either continue using it in that runtime, or save it to
a JSON file for future re-use.
## Loading directly from the tokenizer object
Let's see how to leverage this tokenizer object in the 🤗 Transformers library. The
[`PreTrainedTokenizerFast`] class allows for easy instantiation, by accepting the instantiated
*tokenizer* object as an argument:
```python
>>> from transformers import PreTrainedTokenizerFast
>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)
```
This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to [the tokenizer
page](main_classes/tokenizer) for more information.
## Loading from a JSON file
In order to load a tokenizer from a JSON file, let's first start by saving our tokenizer:
```python
>>> tokenizer.save("tokenizer.json")
```
The path to which we saved this file can be passed to the [`PreTrainedTokenizerFast`] initialization
method using the `tokenizer_file` parameter:
```python
>>> from transformers import PreTrainedTokenizerFast
>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")
```
This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to [the tokenizer
page](main_classes/tokenizer) for more information.
Using tokenizers from 🤗 Tokenizers
=======================================================================================================================
The :class:`~transformers.PreTrainedTokenizerFast` depends on the `tokenizers
<https://huggingface.co/docs/tokenizers>`__ library. The tokenizers obtained from the 🤗 Tokenizers library can be
loaded very simply into 🤗 Transformers.
Before getting in the specifics, let's first start by creating a dummy tokenizer in a few lines:
.. code-block::
>>> from tokenizers import Tokenizer
>>> from tokenizers.models import BPE
>>> from tokenizers.trainers import BpeTrainer
>>> from tokenizers.pre_tokenizers import Whitespace
>>> tokenizer = Tokenizer(BPE(unk_token="[UNK]"))
>>> trainer = BpeTrainer(special_tokens=["[UNK]", "[CLS]", "[SEP]", "[PAD]", "[MASK]"])
>>> tokenizer.pre_tokenizer = Whitespace()
>>> files = [...]
>>> tokenizer.train(files, trainer)
We now have a tokenizer trained on the files we defined. We can either continue using it in that runtime, or save it to
a JSON file for future re-use.
Loading directly from the tokenizer object
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Let's see how to leverage this tokenizer object in the 🤗 Transformers library. The
:class:`~transformers.PreTrainedTokenizerFast` class allows for easy instantiation, by accepting the instantiated
`tokenizer` object as an argument:
.. code-block::
>>> from transformers import PreTrainedTokenizerFast
>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_object=tokenizer)
This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to :doc:`the tokenizer
page <main_classes/tokenizer>` for more information.
Loading from a JSON file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order to load a tokenizer from a JSON file, let's first start by saving our tokenizer:
.. code-block::
>>> tokenizer.save("tokenizer.json")
The path to which we saved this file can be passed to the :class:`~transformers.PreTrainedTokenizerFast` initialization
method using the :obj:`tokenizer_file` parameter:
.. code-block::
>>> from transformers import PreTrainedTokenizerFast
>>> fast_tokenizer = PreTrainedTokenizerFast(tokenizer_file="tokenizer.json")
This object can now be used with all the methods shared by the 🤗 Transformers tokenizers! Head to :doc:`the tokenizer
page <main_classes/tokenizer>` for more information.
.. <!--Copyright 2020 The HuggingFace Team. All rights reserved.
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. specific language governing permissions and limitations under the License.
-->
Glossary # Glossary
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
General terms ## General terms
-----------------------------------------------------------------------------------------------------------------------
- autoencoding models: see MLM - autoencoding models: see MLM
- autoregressive models: see CLM - autoregressive models: see CLM
...@@ -36,125 +34,113 @@ General terms ...@@ -36,125 +34,113 @@ General terms
- RNN: recurrent neural network, a type of model that uses a loop over a layer to process texts. - RNN: recurrent neural network, a type of model that uses a loop over a layer to process texts.
- self-attention: each element of the input finds out which other elements of the input they should attend to. - self-attention: each element of the input finds out which other elements of the input they should attend to.
- seq2seq or sequence-to-sequence: models that generate a new sequence from an input, like translation models, or - seq2seq or sequence-to-sequence: models that generate a new sequence from an input, like translation models, or
summarization models (such as :doc:`Bart </model_doc/bart>` or :doc:`T5 </model_doc/t5>`). summarization models (such as [Bart](model_doc/bart) or [T5](model_doc/t5)).
- token: a part of a sentence, usually a word, but can also be a subword (non-common words are often split in subwords) - token: a part of a sentence, usually a word, but can also be a subword (non-common words are often split in subwords)
or a punctuation symbol. or a punctuation symbol.
- transformer: self-attention based deep learning model architecture. - transformer: self-attention based deep learning model architecture.
Model inputs ## Model inputs
-----------------------------------------------------------------------------------------------------------------------
Every model is different yet bears similarities with the others. Therefore most models use the same inputs, which are Every model is different yet bears similarities with the others. Therefore most models use the same inputs, which are
detailed here alongside usage examples. detailed here alongside usage examples.
.. _input-ids: <a id='input-ids'></a>
Input IDs ### Input IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The input ids are often the only required parameters to be passed to the model as input. *They are token indices, The input ids are often the only required parameters to be passed to the model as input. *They are token indices,
numerical representations of tokens building the sequences that will be used as input by the model*. numerical representations of tokens building the sequences that will be used as input by the model*.
.. raw:: html <Youtube id="VFp38yj8h3A"/>
<iframe width="560" height="315" src="https://www.youtube.com/embed/VFp38yj8h3A" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
Each tokenizer works differently but the underlying mechanism remains the same. Here's an example using the BERT Each tokenizer works differently but the underlying mechanism remains the same. Here's an example using the BERT
tokenizer, which is a `WordPiece <https://arxiv.org/pdf/1609.08144.pdf>`__ tokenizer: tokenizer, which is a [WordPiece](https://arxiv.org/pdf/1609.08144.pdf) tokenizer:
.. code-block::
>>> from transformers import BertTokenizer ```python
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased") >>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence = "A Titan RTX has 24GB of VRAM" >>> sequence = "A Titan RTX has 24GB of VRAM"
```
The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary. The tokenizer takes care of splitting the sequence into tokens available in the tokenizer vocabulary.
.. code-block:: ```python
>>> tokenized_sequence = tokenizer.tokenize(sequence)
>>> tokenized_sequence = tokenizer.tokenize(sequence) ```
The tokens are either words or subwords. Here for instance, "VRAM" wasn't in the model vocabulary, so it's been split The tokens are either words or subwords. Here for instance, "VRAM" wasn't in the model vocabulary, so it's been split
in "V", "RA" and "M". To indicate those tokens are not separate words but parts of the same word, a double-hash prefix in "V", "RA" and "M". To indicate those tokens are not separate words but parts of the same word, a double-hash prefix
is added for "RA" and "M": is added for "RA" and "M":
.. code-block:: ```python
>>> print(tokenized_sequence)
>>> print(tokenized_sequence) ['A', 'Titan', 'R', '##T', '##X', 'has', '24', '##GB', 'of', 'V', '##RA', '##M']
['A', 'Titan', 'R', '##T', '##X', 'has', '24', '##GB', 'of', 'V', '##RA', '##M'] ```
These tokens can then be converted into IDs which are understandable by the model. This can be done by directly feeding These tokens can then be converted into IDs which are understandable by the model. This can be done by directly feeding
the sentence to the tokenizer, which leverages the Rust implementation of `huggingface/tokenizers the sentence to the tokenizer, which leverages the Rust implementation of [🤗 Tokenizers](https://github.com/huggingface/tokenizers) for peak performance.
<https://github.com/huggingface/tokenizers>`__ for peak performance.
.. code-block::
>>> inputs = tokenizer(sequence) ```python
>>> inputs = tokenizer(sequence)
```
The tokenizer returns a dictionary with all the arguments necessary for its corresponding model to work properly. The The tokenizer returns a dictionary with all the arguments necessary for its corresponding model to work properly. The
token indices are under the key "input_ids": token indices are under the key "input_ids":
.. code-block:: ```python
>>> encoded_sequence = inputs["input_ids"]
>>> encoded_sequence = inputs["input_ids"] >>> print(encoded_sequence)
>>> print(encoded_sequence) [101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102]
[101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102] ```
Note that the tokenizer automatically adds "special tokens" (if the associated model relies on them) which are special Note that the tokenizer automatically adds "special tokens" (if the associated model relies on them) which are special
IDs the model sometimes uses. IDs the model sometimes uses.
If we decode the previous sequence of ids, If we decode the previous sequence of ids,
.. code-block:: ```python
>>> decoded_sequence = tokenizer.decode(encoded_sequence)
>>> decoded_sequence = tokenizer.decode(encoded_sequence) ```
we will see we will see
.. code-block:: ```python
>>> print(decoded_sequence)
>>> print(decoded_sequence) [CLS] A Titan RTX has 24GB of VRAM [SEP]
[CLS] A Titan RTX has 24GB of VRAM [SEP] ```
because this is the way a :class:`~transformers.BertModel` is going to expect its inputs. because this is the way a [`BertModel`] is going to expect its inputs.
.. _attention-mask: <a id='attention-mask'></a>
Attention mask ### Attention mask
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The attention mask is an optional argument used when batching sequences together. The attention mask is an optional argument used when batching sequences together.
.. raw:: html <Youtube id="M6adb1j2jPI"/>
<iframe width="560" height="315" src="https://www.youtube.com/embed/M6adb1j2jPI" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
This argument indicates to the model which tokens should be attended to, and which should not. This argument indicates to the model which tokens should be attended to, and which should not.
For example, consider these two sequences: For example, consider these two sequences:
.. code-block:: ```python
>>> from transformers import BertTokenizer
>>> from transformers import BertTokenizer >>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence_a = "This is a short sequence." >>> sequence_a = "This is a short sequence."
>>> sequence_b = "This is a rather long sequence. It is at least longer than the sequence A." >>> sequence_b = "This is a rather long sequence. It is at least longer than the sequence A."
>>> encoded_sequence_a = tokenizer(sequence_a)["input_ids"] >>> encoded_sequence_a = tokenizer(sequence_a)["input_ids"]
>>> encoded_sequence_b = tokenizer(sequence_b)["input_ids"] >>> encoded_sequence_b = tokenizer(sequence_b)["input_ids"]
```
The encoded versions have different lengths: The encoded versions have different lengths:
.. code-block:: ```python
>>> len(encoded_sequence_a), len(encoded_sequence_b)
>>> len(encoded_sequence_a), len(encoded_sequence_b) (8, 19)
(8, 19) ```
Therefore, we can't put them together in the same tensor as-is. The first sequence needs to be padded up to the length Therefore, we can't put them together in the same tensor as-is. The first sequence needs to be padded up to the length
of the second one, or the second one needs to be truncated down to the length of the first one. of the second one, or the second one needs to be truncated down to the length of the first one.
...@@ -162,67 +148,62 @@ of the second one, or the second one needs to be truncated down to the length of ...@@ -162,67 +148,62 @@ of the second one, or the second one needs to be truncated down to the length of
In the first case, the list of IDs will be extended by the padding indices. We can pass a list to the tokenizer and ask In the first case, the list of IDs will be extended by the padding indices. We can pass a list to the tokenizer and ask
it to pad like this: it to pad like this:
.. code-block:: ```python
>>> padded_sequences = tokenizer([sequence_a, sequence_b], padding=True)
>>> padded_sequences = tokenizer([sequence_a, sequence_b], padding=True) ```
We can see that 0s have been added on the right of the first sentence to make it the same length as the second one: We can see that 0s have been added on the right of the first sentence to make it the same length as the second one:
.. code-block:: ```python
>>> padded_sequences["input_ids"]
>>> padded_sequences["input_ids"] [[101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 1188, 1110, 170, 1897, 1263, 4954, 119, 1135, 1110, 1120, 1655, 2039, 1190, 1103, 4954, 138, 119, 102]]
[[101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 1188, 1110, 170, 1897, 1263, 4954, 119, 1135, 1110, 1120, 1655, 2039, 1190, 1103, 4954, 138, 119, 102]] ```
This can then be converted into a tensor in PyTorch or TensorFlow. The attention mask is a binary tensor indicating the This can then be converted into a tensor in PyTorch or TensorFlow. The attention mask is a binary tensor indicating the
position of the padded indices so that the model does not attend to them. For the :class:`~transformers.BertTokenizer`, position of the padded indices so that the model does not attend to them. For the [`BertTokenizer`],
:obj:`1` indicates a value that should be attended to, while :obj:`0` indicates a padded value. This attention mask is `1` indicates a value that should be attended to, while `0` indicates a padded value. This attention mask is
in the dictionary returned by the tokenizer under the key "attention_mask": in the dictionary returned by the tokenizer under the key "attention_mask":
.. code-block:: ```python
>>> padded_sequences["attention_mask"]
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
```
>>> padded_sequences["attention_mask"] <a id='token-type-ids'></a>
[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]
.. _token-type-ids: ### Token Type IDs
Token Type IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Some models' purpose is to do classification on pairs of sentences or question answering. Some models' purpose is to do classification on pairs of sentences or question answering.
.. raw:: html <Youtube id="0u3ioSwev3s"/>
<iframe width="560" height="315" src="https://www.youtube.com/embed/0u3ioSwev3s" title="YouTube video player"
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
These require two different sequences to be joined in a single "input_ids" entry, which usually is performed with the These require two different sequences to be joined in a single "input_ids" entry, which usually is performed with the
help of special tokens, such as the classifier (``[CLS]``) and separator (``[SEP]``) tokens. For example, the BERT help of special tokens, such as the classifier (`[CLS]`) and separator (`[SEP]`) tokens. For example, the BERT
model builds its two sequence input as such: model builds its two sequence input as such:
.. code-block:: ```python
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP]
```
>>> # [CLS] SEQUENCE_A [SEP] SEQUENCE_B [SEP] We can use our tokenizer to automatically generate such a sentence by passing the two sequences to `tokenizer` as two
We can use our tokenizer to automatically generate such a sentence by passing the two sequences to ``tokenizer`` as two
arguments (and not a list, like before) like this: arguments (and not a list, like before) like this:
.. code-block:: ```python
>>> from transformers import BertTokenizer
>>> from transformers import BertTokenizer >>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased") >>> sequence_a = "HuggingFace is based in NYC"
>>> sequence_a = "HuggingFace is based in NYC" >>> sequence_b = "Where is HuggingFace based?"
>>> sequence_b = "Where is HuggingFace based?"
>>> encoded_dict = tokenizer(sequence_a, sequence_b) >>> encoded_dict = tokenizer(sequence_a, sequence_b)
>>> decoded = tokenizer.decode(encoded_dict["input_ids"]) >>> decoded = tokenizer.decode(encoded_dict["input_ids"])
```
which will return: which will return:
.. code-block:: ```python
>>> print(decoded)
>>> print(decoded) [CLS] HuggingFace is based in NYC [SEP] Where is HuggingFace based? [SEP]
[CLS] HuggingFace is based in NYC [SEP] Where is HuggingFace based? [SEP] ```
This is enough for some models to understand where one sequence ends and where another begins. However, other models, This is enough for some models to understand where one sequence ends and where another begins. However, other models,
such as BERT, also deploy token type IDs (also called segment IDs). They are represented as a binary mask identifying such as BERT, also deploy token type IDs (also called segment IDs). They are represented as a binary mask identifying
...@@ -230,35 +211,33 @@ the two types of sequence in the model. ...@@ -230,35 +211,33 @@ the two types of sequence in the model.
The tokenizer returns this mask as the "token_type_ids" entry: The tokenizer returns this mask as the "token_type_ids" entry:
.. code-block:: ```python
>>> encoded_dict['token_type_ids']
>>> encoded_dict['token_type_ids'] [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1] ```
The first sequence, the "context" used for the question, has all its tokens represented by a :obj:`0`, whereas the The first sequence, the "context" used for the question, has all its tokens represented by a `0`, whereas the
second sequence, corresponding to the "question", has all its tokens represented by a :obj:`1`. second sequence, corresponding to the "question", has all its tokens represented by a `1`.
Some models, like :class:`~transformers.XLNetModel` use an additional token represented by a :obj:`2`. Some models, like [`XLNetModel`] use an additional token represented by a `2`.
.. _position-ids: <a id='position-ids'></a>
Position IDs ### Position IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Contrary to RNNs that have the position of each token embedded within them, transformers are unaware of the position of Contrary to RNNs that have the position of each token embedded within them, transformers are unaware of the position of
each token. Therefore, the position IDs (``position_ids``) are used by the model to identify each token's position in each token. Therefore, the position IDs (`position_ids`) are used by the model to identify each token's position in
the list of tokens. the list of tokens.
They are an optional parameter. If no ``position_ids`` are passed to the model, the IDs are automatically created as They are an optional parameter. If no `position_ids` are passed to the model, the IDs are automatically created as
absolute positional embeddings. absolute positional embeddings.
Absolute positional embeddings are selected in the range ``[0, config.max_position_embeddings - 1]``. Some models use Absolute positional embeddings are selected in the range `[0, config.max_position_embeddings - 1]`. Some models use
other types of positional embeddings, such as sinusoidal position embeddings or relative position embeddings. other types of positional embeddings, such as sinusoidal position embeddings or relative position embeddings.
.. _labels: <a id='labels'></a>
Labels ### Labels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The labels are an optional argument which can be passed in order for the model to compute the loss itself. These labels The labels are an optional argument which can be passed in order for the model to compute the loss itself. These labels
should be the expected prediction of the model: it will use the standard loss in order to compute the loss between its should be the expected prediction of the model: it will use the standard loss in order to compute the loss between its
...@@ -266,57 +245,53 @@ predictions and the expected value (the label). ...@@ -266,57 +245,53 @@ predictions and the expected value (the label).
These labels are different according to the model head, for example: These labels are different according to the model head, for example:
- For sequence classification models (e.g., :class:`~transformers.BertForSequenceClassification`), the model expects a - For sequence classification models (e.g., [`BertForSequenceClassification`]), the model expects a
tensor of dimension :obj:`(batch_size)` with each value of the batch corresponding to the expected label of the tensor of dimension `(batch_size)` with each value of the batch corresponding to the expected label of the
entire sequence. entire sequence.
- For token classification models (e.g., :class:`~transformers.BertForTokenClassification`), the model expects a tensor - For token classification models (e.g., [`BertForTokenClassification`]), the model expects a tensor
of dimension :obj:`(batch_size, seq_length)` with each value corresponding to the expected label of each individual of dimension `(batch_size, seq_length)` with each value corresponding to the expected label of each individual
token. token.
- For masked language modeling (e.g., :class:`~transformers.BertForMaskedLM`), the model expects a tensor of dimension - For masked language modeling (e.g., [`BertForMaskedLM`]), the model expects a tensor of dimension
:obj:`(batch_size, seq_length)` with each value corresponding to the expected label of each individual token: the `(batch_size, seq_length)` with each value corresponding to the expected label of each individual token: the
labels being the token ID for the masked token, and values to be ignored for the rest (usually -100). labels being the token ID for the masked token, and values to be ignored for the rest (usually -100).
- For sequence to sequence tasks,(e.g., :class:`~transformers.BartForConditionalGeneration`, - For sequence to sequence tasks,(e.g., [`BartForConditionalGeneration`],
:class:`~transformers.MBartForConditionalGeneration`), the model expects a tensor of dimension :obj:`(batch_size, [`MBartForConditionalGeneration`]), the model expects a tensor of dimension `(batch_size, tgt_seq_length)` with each value corresponding to the target sequences associated with each input sequence. During
tgt_seq_length)` with each value corresponding to the target sequences associated with each input sequence. During training, both *BART* and *T5* will make the appropriate *decoder_input_ids* and decoder attention masks internally.
training, both `BART` and `T5` will make the appropriate `decoder_input_ids` and decoder attention masks internally.
They usually do not need to be supplied. This does not apply to models leveraging the Encoder-Decoder framework. See They usually do not need to be supplied. This does not apply to models leveraging the Encoder-Decoder framework. See
the documentation of each model for more information on each specific model's labels. the documentation of each model for more information on each specific model's labels.
The base models (e.g., :class:`~transformers.BertModel`) do not accept labels, as these are the base transformer The base models (e.g., [`BertModel`]) do not accept labels, as these are the base transformer
models, simply outputting features. models, simply outputting features.
.. _decoder-input-ids: <a id='decoder-input-ids'></a>
Decoder input IDs ### Decoder input IDs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This input is specific to encoder-decoder models, and contains the input IDs that will be fed to the decoder. These This input is specific to encoder-decoder models, and contains the input IDs that will be fed to the decoder. These
inputs should be used for sequence to sequence tasks, such as translation or summarization, and are usually built in a inputs should be used for sequence to sequence tasks, such as translation or summarization, and are usually built in a
way specific to each model. way specific to each model.
Most encoder-decoder models (BART, T5) create their :obj:`decoder_input_ids` on their own from the :obj:`labels`. In Most encoder-decoder models (BART, T5) create their `decoder_input_ids` on their own from the `labels`. In
such models, passing the :obj:`labels` is the preferred way to handle training. such models, passing the `labels` is the preferred way to handle training.
Please check each model's docs to see how they handle these input IDs for sequence to sequence training. Please check each model's docs to see how they handle these input IDs for sequence to sequence training.
.. _feed-forward-chunking: <a id='feed-forward-chunking'></a>
Feed Forward Chunking ### Feed Forward Chunking
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In each residual attention block in transformers the self-attention layer is usually followed by 2 feed forward layers. In each residual attention block in transformers the self-attention layer is usually followed by 2 feed forward layers.
The intermediate embedding size of the feed forward layers is often bigger than the hidden size of the model (e.g., for The intermediate embedding size of the feed forward layers is often bigger than the hidden size of the model (e.g., for
``bert-base-uncased``). `bert-base-uncased`).
For an input of size ``[batch_size, sequence_length]``, the memory required to store the intermediate feed forward For an input of size `[batch_size, sequence_length]`, the memory required to store the intermediate feed forward
embeddings ``[batch_size, sequence_length, config.intermediate_size]`` can account for a large fraction of the memory embeddings `[batch_size, sequence_length, config.intermediate_size]` can account for a large fraction of the memory
use. The authors of `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451>`_ noticed that since the use. The authors of [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) noticed that since the
computation is independent of the ``sequence_length`` dimension, it is mathematically equivalent to compute the output computation is independent of the `sequence_length` dimension, it is mathematically equivalent to compute the output
embeddings of both feed forward layers ``[batch_size, config.hidden_size]_0, ..., [batch_size, config.hidden_size]_n`` embeddings of both feed forward layers `[batch_size, config.hidden_size]_0, ..., [batch_size, config.hidden_size]_n`
individually and concat them afterward to ``[batch_size, sequence_length, config.hidden_size]`` with ``n = individually and concat them afterward to `[batch_size, sequence_length, config.hidden_size]` with `n = sequence_length`, which trades increased computation time against reduced memory use, but yields a mathematically
sequence_length``, which trades increased computation time against reduced memory use, but yields a mathematically
**equivalent** result. **equivalent** result.
For models employing the function :func:`~.transformers.apply_chunking_to_forward`, the ``chunk_size`` defines the For models employing the function [`apply_chunking_to_forward`], the `chunk_size` defines the
number of output embeddings that are computed in parallel and thus defines the trade-off between memory and time number of output embeddings that are computed in parallel and thus defines the trade-off between memory and time
complexity. If ``chunk_size`` is set to 0, no feed forward chunking is done. complexity. If `chunk_size` is set to 0, no feed forward chunking is done.
<!--Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# General Utilities
This page lists all of Transformers general utility functions that are found in the file `file_utils.py`.
Most of those are only useful if you are studying the general code in the library.
## Enums and namedtuples
[[autodoc]] file_utils.ExplicitEnum
[[autodoc]] file_utils.PaddingStrategy
[[autodoc]] file_utils.TensorType
## Special Decorators
[[autodoc]] file_utils.add_start_docstrings
[[autodoc]] file_utils.add_start_docstrings_to_model_forward
[[autodoc]] file_utils.add_end_docstrings
[[autodoc]] file_utils.add_code_sample_docstrings
[[autodoc]] file_utils.replace_return_docstrings
## Special Properties
[[autodoc]] file_utils.cached_property
## Other Utilities
[[autodoc]] file_utils._LazyModule
..
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
General Utilities
-----------------------------------------------------------------------------------------------------------------------
This page lists all of Transformers general utility functions that are found in the file ``file_utils.py``.
Most of those are only useful if you are studying the general code in the library.
Enums and namedtuples
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils.ExplicitEnum
.. autoclass:: transformers.file_utils.PaddingStrategy
.. autoclass:: transformers.file_utils.TensorType
Special Decorators
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.file_utils.add_start_docstrings
.. autofunction:: transformers.file_utils.add_start_docstrings_to_model_forward
.. autofunction:: transformers.file_utils.add_end_docstrings
.. autofunction:: transformers.file_utils.add_code_sample_docstrings
.. autofunction:: transformers.file_utils.replace_return_docstrings
Special Properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils.cached_property
Other Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.file_utils._LazyModule
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Utilities for Generation
This page lists all the utility functions used by [`~generation_utils.GenerationMixin.generate`],
[`~generation_utils.GenerationMixin.greedy_search`],
[`~generation_utils.GenerationMixin.sample`],
[`~generation_utils.GenerationMixin.beam_search`],
[`~generation_utils.GenerationMixin.beam_sample`], and
[`~generation_utils.GenerationMixin.group_beam_search`].
Most of those are only useful if you are studying the code of the generate methods in the library.
## Generate Outputs
The output of [`~generation_utils.GenerationMixin.generate`] is an instance of a subclass of
[`~file_utils.ModelOutput`]. This output is a data structure containing all the information returned
by [`~generation_utils.GenerationMixin.generate`], but that can also be used as tuple or dictionary.
Here's an example:
```python
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
```
The `generation_output` object is a [`~generation_utils.GreedySearchDecoderOnlyOutput`], as we can
see in the documentation of that class below, it means it has the following attributes:
- `sequences`: the generated sequences of tokens
- `scores` (optional): the prediction scores of the language modelling head, for each generation step
- `hidden_states` (optional): the hidden states of the model, for each generation step
- `attentions` (optional): the attention weights of the model, for each generation step
Here we have the `scores` since we passed along `output_scores=True`, but we don't have `hidden_states` and
`attentions` because we didn't pass `output_hidden_states=True` or `output_attentions=True`.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get `None`. Here for instance `generation_output.scores` are all the generated prediction scores of the
language modeling head, and `generation_output.attentions` is `None`.
When using our `generation_output` object as a tuple, it only keeps the attributes that don't have `None` values.
Here, for instance, it has two elements, `loss` then `logits`, so
```python
generation_output[:2]
```
will return the tuple `(generation_output.sequences, generation_output.scores)` for instance.
When using our `generation_output` object as a dictionary, it only keeps the attributes that don't have `None`
values. Here, for instance, it has two keys that are `sequences` and `scores`.
We document here all output types.
### GreedySearchOutput
[[autodoc]] generation_utils.GreedySearchDecoderOnlyOutput
[[autodoc]] generation_utils.GreedySearchEncoderDecoderOutput
[[autodoc]] generation_flax_utils.FlaxGreedySearchOutput
### SampleOutput
[[autodoc]] generation_utils.SampleDecoderOnlyOutput
[[autodoc]] generation_utils.SampleEncoderDecoderOutput
[[autodoc]] generation_flax_utils.FlaxSampleOutput
### BeamSearchOutput
[[autodoc]] generation_utils.BeamSearchDecoderOnlyOutput
[[autodoc]] generation_utils.BeamSearchEncoderDecoderOutput
### BeamSampleOutput
[[autodoc]] generation_utils.BeamSampleDecoderOnlyOutput
[[autodoc]] generation_utils.BeamSampleEncoderDecoderOutput
## LogitsProcessor
A [`LogitsProcessor`] can be used to modify the prediction scores of a language model head for
generation.
[[autodoc]] LogitsProcessor
- __call__
[[autodoc]] LogitsProcessorList
- __call__
[[autodoc]] LogitsWarper
- __call__
[[autodoc]] MinLengthLogitsProcessor
- __call__
[[autodoc]] TemperatureLogitsWarper
- __call__
[[autodoc]] RepetitionPenaltyLogitsProcessor
- __call__
[[autodoc]] TopPLogitsWarper
- __call__
[[autodoc]] TopKLogitsWarper
- __call__
[[autodoc]] NoRepeatNGramLogitsProcessor
- __call__
[[autodoc]] NoBadWordsLogitsProcessor
- __call__
[[autodoc]] PrefixConstrainedLogitsProcessor
- __call__
[[autodoc]] HammingDiversityLogitsProcessor
- __call__
[[autodoc]] ForcedBOSTokenLogitsProcessor
- __call__
[[autodoc]] ForcedEOSTokenLogitsProcessor
- __call__
[[autodoc]] InfNanRemoveLogitsProcessor
- __call__
[[autodoc]] FlaxLogitsProcessor
- __call__
[[autodoc]] FlaxLogitsProcessorList
- __call__
[[autodoc]] FlaxLogitsWarper
- __call__
[[autodoc]] FlaxTemperatureLogitsWarper
- __call__
[[autodoc]] FlaxTopPLogitsWarper
- __call__
[[autodoc]] FlaxTopKLogitsWarper
- __call__
[[autodoc]] FlaxForcedBOSTokenLogitsProcessor
- __call__
[[autodoc]] FlaxForcedEOSTokenLogitsProcessor
- __call__
[[autodoc]] FlaxMinLengthLogitsProcessor
- __call__
## StoppingCriteria
A [`StoppingCriteria`] can be used to change when to stop generation (other than EOS token).
[[autodoc]] StoppingCriteria
- __call__
[[autodoc]] StoppingCriteriaList
- __call__
[[autodoc]] MaxLengthCriteria
- __call__
[[autodoc]] MaxTimeCriteria
- __call__
## BeamSearch
[[autodoc]] BeamScorer
- process
- finalize
[[autodoc]] BeamSearchScorer
- process
- finalize
## Utilities
[[autodoc]] top_k_top_p_filtering
[[autodoc]] tf_top_k_top_p_filtering
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Utilities for Generation
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions used by :meth:`~transformers.generation_utils.GenerationMixin.generate`,
:meth:`~transformers.generation_utils.GenerationMixin.greedy_search`,
:meth:`~transformers.generation_utils.GenerationMixin.sample`,
:meth:`~transformers.generation_utils.GenerationMixin.beam_search`,
:meth:`~transformers.generation_utils.GenerationMixin.beam_sample`, and
:meth:`~transformers.generation_utils.GenerationMixin.group_beam_search`.
Most of those are only useful if you are studying the code of the generate methods in the library.
Generate Outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The output of :meth:`~transformers.generation_utils.GenerationMixin.generate` is an instance of a subclass of
:class:`~transformers.file_utils.ModelOutput`. This output is a data structure containing all the information returned
by :meth:`~transformers.generation_utils.GenerationMixin.generate`, but that can also be used as tuple or dictionary.
Here's an example:
.. code-block::
from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
The ``generation_output`` object is a :class:`~transformers.generation_utils.GreedySearchDecoderOnlyOutput`, as we can
see in the documentation of that class below, it means it has the following attributes:
- ``sequences``: the generated sequences of tokens
- ``scores`` (optional): the prediction scores of the language modelling head, for each generation step
- ``hidden_states`` (optional): the hidden states of the model, for each generation step
- ``attentions`` (optional): the attention weights of the model, for each generation step
Here we have the ``scores`` since we passed along ``output_scores=True``, but we don't have ``hidden_states`` and
``attentions`` because we didn't pass ``output_hidden_states=True`` or ``output_attentions=True``.
You can access each attribute as you would usually do, and if that attribute has not been returned by the model, you
will get ``None``. Here for instance ``generation_output.scores`` are all the generated prediction scores of the
language modeling head, and ``generation_output.attentions`` is ``None``.
When using our ``generation_output`` object as a tuple, it only keeps the attributes that don't have ``None`` values.
Here, for instance, it has two elements, ``loss`` then ``logits``, so
.. code-block::
generation_output[:2]
will return the tuple ``(generation_output.sequences, generation_output.scores)`` for instance.
When using our ``generation_output`` object as a dictionary, it only keeps the attributes that don't have ``None``
values. Here, for instance, it has two keys that are ``sequences`` and ``scores``.
We document here all output types.
GreedySearchOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.generation_utils.GreedySearchDecoderOnlyOutput
:members:
.. autoclass:: transformers.generation_utils.GreedySearchEncoderDecoderOutput
:members:
.. autoclass:: transformers.generation_flax_utils.FlaxGreedySearchOutput
:members:
SampleOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.generation_utils.SampleDecoderOnlyOutput
:members:
.. autoclass:: transformers.generation_utils.SampleEncoderDecoderOutput
:members:
.. autoclass:: transformers.generation_flax_utils.FlaxSampleOutput
:members:
BeamSearchOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.generation_utils.BeamSearchDecoderOnlyOutput
:members:
.. autoclass:: transformers.generation_utils.BeamSearchEncoderDecoderOutput
:members:
BeamSampleOutput
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. autoclass:: transformers.generation_utils.BeamSampleDecoderOnlyOutput
:members:
.. autoclass:: transformers.generation_utils.BeamSampleEncoderDecoderOutput
:members:
LogitsProcessor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A :class:`~transformers.LogitsProcessor` can be used to modify the prediction scores of a language model head for
generation.
.. autoclass:: transformers.LogitsProcessor
:members: __call__
.. autoclass:: transformers.LogitsProcessorList
:members: __call__
.. autoclass:: transformers.LogitsWarper
:members: __call__
.. autoclass:: transformers.MinLengthLogitsProcessor
:members: __call__
.. autoclass:: transformers.TemperatureLogitsWarper
:members: __call__
.. autoclass:: transformers.RepetitionPenaltyLogitsProcessor
:members: __call__
.. autoclass:: transformers.TopPLogitsWarper
:members: __call__
.. autoclass:: transformers.TopKLogitsWarper
:members: __call__
.. autoclass:: transformers.NoRepeatNGramLogitsProcessor
:members: __call__
.. autoclass:: transformers.NoBadWordsLogitsProcessor
:members: __call__
.. autoclass:: transformers.PrefixConstrainedLogitsProcessor
:members: __call__
.. autoclass:: transformers.HammingDiversityLogitsProcessor
:members: __call__
.. autoclass:: transformers.ForcedBOSTokenLogitsProcessor
:members: __call__
.. autoclass:: transformers.ForcedEOSTokenLogitsProcessor
:members: __call__
.. autoclass:: transformers.InfNanRemoveLogitsProcessor
:members: __call__
.. autoclass:: transformers.FlaxLogitsProcessor
:members: __call__
.. autoclass:: transformers.FlaxLogitsProcessorList
:members: __call__
.. autoclass:: transformers.FlaxLogitsWarper
:members: __call__
.. autoclass:: transformers.FlaxTemperatureLogitsWarper
:members: __call__
.. autoclass:: transformers.FlaxTopPLogitsWarper
:members: __call__
.. autoclass:: transformers.FlaxTopKLogitsWarper
:members: __call__
.. autoclass:: transformers.FlaxForcedBOSTokenLogitsProcessor
:members: __call__
.. autoclass:: transformers.FlaxForcedEOSTokenLogitsProcessor
:members: __call__
.. autoclass:: transformers.FlaxMinLengthLogitsProcessor
:members: __call__
StoppingCriteria
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A :class:`~transformers.StoppingCriteria` can be used to change when to stop generation (other than EOS token).
.. autoclass:: transformers.StoppingCriteria
:members: __call__
.. autoclass:: transformers.StoppingCriteriaList
:members: __call__
.. autoclass:: transformers.MaxLengthCriteria
:members: __call__
.. autoclass:: transformers.MaxTimeCriteria
:members: __call__
BeamSearch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BeamScorer
:members: process, finalize
.. autoclass:: transformers.BeamSearchScorer
:members: process, finalize
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.top_k_top_p_filtering
.. autofunction:: transformers.tf_top_k_top_p_filtering
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Custom Layers and Utilities
This page lists all the custom layers used by the library, as well as the utility functions it provides for modeling.
Most of those are only useful if you are studying the code of the models in the library.
## Pytorch custom modules
[[autodoc]] modeling_utils.Conv1D
[[autodoc]] modeling_utils.PoolerStartLogits
- forward
[[autodoc]] modeling_utils.PoolerEndLogits
- forward
[[autodoc]] modeling_utils.PoolerAnswerClass
- forward
[[autodoc]] modeling_utils.SquadHeadOutput
[[autodoc]] modeling_utils.SQuADHead
- forward
[[autodoc]] modeling_utils.SequenceSummary
- forward
## PyTorch Helper Functions
[[autodoc]] apply_chunking_to_forward
[[autodoc]] modeling_utils.find_pruneable_heads_and_indices
[[autodoc]] modeling_utils.prune_layer
[[autodoc]] modeling_utils.prune_conv1d_layer
[[autodoc]] modeling_utils.prune_linear_layer
## TensorFlow custom layers
[[autodoc]] modeling_tf_utils.TFConv1D
[[autodoc]] modeling_tf_utils.TFSharedEmbeddings
- call
[[autodoc]] modeling_tf_utils.TFSequenceSummary
## TensorFlow loss functions
[[autodoc]] modeling_tf_utils.TFCausalLanguageModelingLoss
[[autodoc]] modeling_tf_utils.TFMaskedLanguageModelingLoss
[[autodoc]] modeling_tf_utils.TFMultipleChoiceLoss
[[autodoc]] modeling_tf_utils.TFQuestionAnsweringLoss
[[autodoc]] modeling_tf_utils.TFSequenceClassificationLoss
[[autodoc]] modeling_tf_utils.TFTokenClassificationLoss
## TensorFlow Helper Functions
[[autodoc]] modeling_tf_utils.get_initializer
[[autodoc]] modeling_tf_utils.keras_serializable
[[autodoc]] modeling_tf_utils.shape_list
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Custom Layers and Utilities
-----------------------------------------------------------------------------------------------------------------------
This page lists all the custom layers used by the library, as well as the utility functions it provides for modeling.
Most of those are only useful if you are studying the code of the models in the library.
Pytorch custom modules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_utils.Conv1D
.. autoclass:: transformers.modeling_utils.PoolerStartLogits
:members: forward
.. autoclass:: transformers.modeling_utils.PoolerEndLogits
:members: forward
.. autoclass:: transformers.modeling_utils.PoolerAnswerClass
:members: forward
.. autoclass:: transformers.modeling_utils.SquadHeadOutput
.. autoclass:: transformers.modeling_utils.SQuADHead
:members: forward
.. autoclass:: transformers.modeling_utils.SequenceSummary
:members: forward
PyTorch Helper Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.apply_chunking_to_forward
.. autofunction:: transformers.modeling_utils.find_pruneable_heads_and_indices
.. autofunction:: transformers.modeling_utils.prune_layer
.. autofunction:: transformers.modeling_utils.prune_conv1d_layer
.. autofunction:: transformers.modeling_utils.prune_linear_layer
TensorFlow custom layers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFConv1D
.. autoclass:: transformers.modeling_tf_utils.TFSharedEmbeddings
:members: call
.. autoclass:: transformers.modeling_tf_utils.TFSequenceSummary
TensorFlow loss functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.modeling_tf_utils.TFCausalLanguageModelingLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFMaskedLanguageModelingLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFMultipleChoiceLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFQuestionAnsweringLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFSequenceClassificationLoss
:members:
.. autoclass:: transformers.modeling_tf_utils.TFTokenClassificationLoss
:members:
TensorFlow Helper Functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: transformers.modeling_tf_utils.get_initializer
.. autofunction:: transformers.modeling_tf_utils.keras_serializable
.. autofunction:: transformers.modeling_tf_utils.shape_list
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Utilities for pipelines
This page lists all the utility functions the library provides for pipelines.
Most of those are only useful if you are studying the code of the models in the library.
## Argument handling
[[autodoc]] pipelines.ArgumentHandler
[[autodoc]] pipelines.ZeroShotClassificationArgumentHandler
[[autodoc]] pipelines.QuestionAnsweringArgumentHandler
## Data format
[[autodoc]] pipelines.PipelineDataFormat
[[autodoc]] pipelines.CsvPipelineDataFormat
[[autodoc]] pipelines.JsonPipelineDataFormat
[[autodoc]] pipelines.PipedPipelineDataFormat
## Utilities
[[autodoc]] pipelines.PipelineException
..
Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
Utilities for pipelines
-----------------------------------------------------------------------------------------------------------------------
This page lists all the utility functions the library provides for pipelines.
Most of those are only useful if you are studying the code of the models in the library.
Argument handling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.ArgumentHandler
.. autoclass:: transformers.pipelines.ZeroShotClassificationArgumentHandler
.. autoclass:: transformers.pipelines.QuestionAnsweringArgumentHandler
Data format
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.PipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.CsvPipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.JsonPipelineDataFormat
:members:
.. autoclass:: transformers.pipelines.PipedPipelineDataFormat
:members:
Utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.pipelines.PipelineException
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->
# Utilities for Tokenizers
This page lists all the utility functions used by the tokenizers, mainly the class
[`~tokenization_utils_base.PreTrainedTokenizerBase`] that implements the common methods between
[`PreTrainedTokenizer`] and [`PreTrainedTokenizerFast`] and the mixin
[`~tokenization_utils_base.SpecialTokensMixin`].
Most of those are only useful if you are studying the code of the tokenizers in the library.
## PreTrainedTokenizerBase
[[autodoc]] tokenization_utils_base.PreTrainedTokenizerBase
- __call__
- all
## SpecialTokensMixin
[[autodoc]] tokenization_utils_base.SpecialTokensMixin
## Enums and namedtuples
[[autodoc]] tokenization_utils_base.TruncationStrategy
[[autodoc]] tokenization_utils_base.CharSpan
[[autodoc]] tokenization_utils_base.TokenSpan
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment