@@ -52,7 +52,7 @@ A good first starting point to better understand the library is to read the [doc
...
@@ -52,7 +52,7 @@ A good first starting point to better understand the library is to read the [doc
In our opinion, the library's code is not just a means to provide a product, *e.g.* the ability to use BERT for
In our opinion, the library's code is not just a means to provide a product, *e.g.* the ability to use BERT for
inference, but also as the very product that we want to improve. Hence, when adding a model, the user is not only the
inference, but also as the very product that we want to improve. Hence, when adding a model, the user is not only the
person that will use your model, but also everybody that will read, try to understand, and possibly tweak your code.
person who will use your model, but also everybody who will read, try to understand, and possibly tweak your code.
With this in mind, let's go a bit deeper into the general library design.
With this in mind, let's go a bit deeper into the general library design.
...
@@ -131,9 +131,9 @@ From experience, we can tell you that the most important things to keep in mind
...
@@ -131,9 +131,9 @@ From experience, we can tell you that the most important things to keep in mind
friends. Note that it might very well happen that your model's tokenizer is based on one model implementation, and
friends. Note that it might very well happen that your model's tokenizer is based on one model implementation, and
your model's modeling code on another one. *E.g.* FSMT's modeling code is based on BART, while FSMT's tokenizer code
your model's modeling code on another one. *E.g.* FSMT's modeling code is based on BART, while FSMT's tokenizer code
is based on XLM.
is based on XLM.
- It's more of an engineering challenge than a scientific challenge. You should spend more time on creating an
- It's more of an engineering challenge than a scientific challenge. You should spend more time creating an
efficient debugging environment than trying to understand all theoretical aspects of the model in the paper.
efficient debugging environment rather than trying to understand all theoretical aspects of the model in the paper.
- Ask for help, when you're stuck! Models are the core component of 🤗 Transformers so that we at Hugging Face are more
- Ask for help, when you're stuck! Models are the core component of 🤗 Transformers so we at Hugging Face are more
than happy to help you at every step to add your model. Don't hesitate to ask if you notice you are not making
than happy to help you at every step to add your model. Don't hesitate to ask if you notice you are not making
progress.
progress.
...
@@ -157,9 +157,9 @@ List:
...
@@ -157,9 +157,9 @@ List:
☐ Submitted the pull request<br>
☐ Submitted the pull request<br>
☐ (Optional) Added a demo notebook
☐ (Optional) Added a demo notebook
To begin with, we usually recommend to start by getting a good theoretical understanding of `BrandNewBert`. However,
To begin with, we usually recommend starting by getting a good theoretical understanding of `BrandNewBert`. However,
if you prefer to understand the theoretical aspects of the model *on-the-job*, then it is totally fine to directly dive
if you prefer to understand the theoretical aspects of the model *on-the-job*, then it is totally fine to directly dive
into the `BrandNewBert`'s code-base. This option might suit you better, if your engineering skills are better than
into the `BrandNewBert`'s code-base. This option might suit you better if your engineering skills are better than
your theoretical skill, if you have trouble understanding `BrandNewBert`'s paper, or if you just enjoy programming
your theoretical skill, if you have trouble understanding `BrandNewBert`'s paper, or if you just enjoy programming
much more than reading scientific papers.
much more than reading scientific papers.
...
@@ -175,7 +175,7 @@ theoretical aspects, but rather focus on the practical ones, namely:
...
@@ -175,7 +175,7 @@ theoretical aspects, but rather focus on the practical ones, namely:
encoder-decoder model? Look at the [model_summary](model_summary) if you're not familiar with the differences between those.
encoder-decoder model? Look at the [model_summary](model_summary) if you're not familiar with the differences between those.
- What are the applications of *brand_new_bert*? Text classification? Text generation? Seq2Seq tasks, *e.g.,*
- What are the applications of *brand_new_bert*? Text classification? Text generation? Seq2Seq tasks, *e.g.,*
summarization?
summarization?
- What is the novel feature of the model making it different from BERT/GPT-2/BART?
- What is the novel feature of the model that makes it different from BERT/GPT-2/BART?
- Which of the already existing [🤗 Transformers models](https://huggingface.co/transformers/#contents) is most
- Which of the already existing [🤗 Transformers models](https://huggingface.co/transformers/#contents) is most
similar to *brand_new_bert*?
similar to *brand_new_bert*?
- What type of tokenizer is used? A sentencepiece tokenizer? Word piece tokenizer? Is it the same tokenizer as used
- What type of tokenizer is used? A sentencepiece tokenizer? Word piece tokenizer? Is it the same tokenizer as used
...
@@ -261,7 +261,7 @@ figure out the following:
...
@@ -261,7 +261,7 @@ figure out the following:
- How can you debug the model in the original environment of the repo? Do you have to add *print* statements, can you
- How can you debug the model in the original environment of the repo? Do you have to add *print* statements, can you
work with an interactive debugger like *ipdb*, or should you use an efficient IDE to debug the model, like PyCharm?
work with an interactive debugger like *ipdb*, or should you use an efficient IDE to debug the model, like PyCharm?
It is very important that before you start the porting process, that you can **efficiently** debug code in the original
It is very important that before you start the porting process, you can **efficiently** debug code in the original
repository! Also, remember that you are working with an open-source library, so do not hesitate to open an issue, or
repository! Also, remember that you are working with an open-source library, so do not hesitate to open an issue, or
even a pull request in the original repository. The maintainers of this repository are most likely very happy about
even a pull request in the original repository. The maintainers of this repository are most likely very happy about
someone looking into their code!
someone looking into their code!
...
@@ -280,10 +280,10 @@ In general, there are two possible debugging environments for running the origin
...
@@ -280,10 +280,10 @@ In general, there are two possible debugging environments for running the origin
Jupyter notebooks have the advantage that they allow for cell-by-cell execution which can be helpful to better split
Jupyter notebooks have the advantage that they allow for cell-by-cell execution which can be helpful to better split
logical components from one another and to have faster debugging cycles as intermediate results can be stored. Also,
logical components from one another and to have faster debugging cycles as intermediate results can be stored. Also,
notebooks are often easier to share with other contributors, which might be very helpful if you want to ask the Hugging
notebooks are often easier to share with other contributors, which might be very helpful if you want to ask the Hugging
Face team for help. If you are familiar with Jupyter notebooks, we strongly recommend you to work with them.
Face team for help. If you are familiar with Jupyter notebooks, we strongly recommend you work with them.
The obvious disadvantage of Jupyter notebooks is that if you are not used to working with them you will have to spend
The obvious disadvantage of Jupyter notebooks is that if you are not used to working with them you will have to spend
some time adjusting to the new programming environment and that you might not be able to use your known debugging tools
some time adjusting to the new programming environment and you might not be able to use your known debugging tools
anymore, like `ipdb`.
anymore, like `ipdb`.
For each code-base, a good first step is always to load a **small** pretrained checkpoint and to be able to reproduce a
For each code-base, a good first step is always to load a **small** pretrained checkpoint and to be able to reproduce a
...
@@ -329,7 +329,7 @@ example is [T5's MeshTensorFlow](https://github.com/tensorflow/mesh/tree/master/
...
@@ -329,7 +329,7 @@ example is [T5's MeshTensorFlow](https://github.com/tensorflow/mesh/tree/master/
very complex and does not offer a simple way to decompose the model into its sub-components. For such libraries, one
very complex and does not offer a simple way to decompose the model into its sub-components. For such libraries, one
often relies on verifying print statements.
often relies on verifying print statements.
No matter which strategy you choose, the recommended procedure is often the same in that you should start to debug the
No matter which strategy you choose, the recommended procedure is often the same that you should start to debug the
starting layers first and the ending layers last.
starting layers first and the ending layers last.
It is recommended that you retrieve the output, either by print statements or sub-component functions, of the following
It is recommended that you retrieve the output, either by print statements or sub-component functions, of the following
...
@@ -364,7 +364,7 @@ depending on the library framework, we accept an error tolerance of 1e-3 (0.001)
...
@@ -364,7 +364,7 @@ depending on the library framework, we accept an error tolerance of 1e-3 (0.001)
nearly the same output, they have to be almost identical. Therefore, you will certainly compare the intermediate
nearly the same output, they have to be almost identical. Therefore, you will certainly compare the intermediate
outputs of the 🤗 Transformers version multiple times against the intermediate outputs of the original implementation of
outputs of the 🤗 Transformers version multiple times against the intermediate outputs of the original implementation of
*brand_new_bert* in which case an **efficient** debugging environment of the original repository is absolutely
*brand_new_bert* in which case an **efficient** debugging environment of the original repository is absolutely
important. Here is some advice is to make your debugging environment as efficient as possible.
important. Here is some advice to make your debugging environment as efficient as possible.
- Find the best way of debugging intermediate results. Is the original repository written in PyTorch? Then you should
- Find the best way of debugging intermediate results. Is the original repository written in PyTorch? Then you should
probably take the time to write a longer script that decomposes the original model into smaller sub-components to
probably take the time to write a longer script that decomposes the original model into smaller sub-components to
...
@@ -409,7 +409,7 @@ Otherwise, let's start generating a new model. You have two choices here:
...
@@ -409,7 +409,7 @@ Otherwise, let's start generating a new model. You have two choices here:
-`transformers-cli add-new-model-like` to add a new model like an existing one
-`transformers-cli add-new-model-like` to add a new model like an existing one
-`transformers-cli add-new-model` to add a new model from our template (will look like BERT or Bart depending on the type of model you select)
-`transformers-cli add-new-model` to add a new model from our template (will look like BERT or Bart depending on the type of model you select)
In both cases, you will be prompted with a questionnaire to fill the basic information of your model. The second command requires to install `cookiecutter`, you can find more information on it [here](https://github.com/huggingface/transformers/tree/main/templates/adding_a_new_model).
In both cases, you will be prompted with a questionnaire to fill in the basic information of your model. The second command requires to install `cookiecutter`, you can find more information on it [here](https://github.com/huggingface/transformers/tree/main/templates/adding_a_new_model).
**Open a Pull Request on the main huggingface/transformers repo**
**Open a Pull Request on the main huggingface/transformers repo**