Unverified Commit eca77f47 authored by Lysandre Debut's avatar Lysandre Debut Committed by GitHub
Browse files

Updates the default branch from master to main (#16326)



* Updates the default branch from master to main

* Links from `master` to `main`

* Typo

* Update examples/flax/README.md
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
parent 77321481
...@@ -57,11 +57,11 @@ of the script. ...@@ -57,11 +57,11 @@ of the script.
## Old version of the script ## Old version of the script
You can find the old version of the PyTorch script [here](https://github.com/huggingface/transformers/blob/master/examples/legacy/token-classification/run_ner.py). You can find the old version of the PyTorch script [here](https://github.com/huggingface/transformers/blob/main/examples/legacy/token-classification/run_ner.py).
## Pytorch version, no Trainer ## Pytorch version, no Trainer
Based on the script [run_ner_no_trainer.py](https://github.com/huggingface/transformers/blob/master/examples/pytorch/token-classification/run_ner_no_trainer.py). Based on the script [run_ner_no_trainer.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/token-classification/run_ner_no_trainer.py).
Like `run_ner.py`, this script allows you to fine-tune any of the models on the [hub](https://huggingface.co/models) on a Like `run_ner.py`, this script allows you to fine-tune any of the models on the [hub](https://huggingface.co/models) on a
token classification task, either NER, POS or CHUNKS tasks or your own data in a csv or a JSON file. The main difference is that this token classification task, either NER, POS or CHUNKS tasks or your own data in a csv or a JSON file. The main difference is that this
......
...@@ -18,8 +18,8 @@ limitations under the License. ...@@ -18,8 +18,8 @@ limitations under the License.
This directory contains examples for finetuning and evaluating transformers on translation tasks. This directory contains examples for finetuning and evaluating transformers on translation tasks.
Please tag @patil-suraj with any issues/unexpected behaviors, or send a PR! Please tag @patil-suraj with any issues/unexpected behaviors, or send a PR!
For deprecated `bertabs` instructions, see [`bertabs/README.md`](https://github.com/huggingface/transformers/blob/master/examples/research_projects/bertabs/README.md). For deprecated `bertabs` instructions, see [`bertabs/README.md`](https://github.com/huggingface/transformers/blob/main/examples/research_projects/bertabs/README.md).
For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2seq`](https://github.com/huggingface/transformers/blob/master/examples/legacy/seq2seq). For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2seq`](https://github.com/huggingface/transformers/blob/main/examples/legacy/seq2seq).
### Supported Architectures ### Supported Architectures
...@@ -150,7 +150,7 @@ python examples/pytorch/translation/run_translation.py \ ...@@ -150,7 +150,7 @@ python examples/pytorch/translation/run_translation.py \
## With Accelerate ## With Accelerate
Based on the script [`run_translation_no_trainer.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/translation/run_translationn_no_trainer.py). Based on the script [`run_translation_no_trainer.py`](https://github.com/huggingface/transformers/blob/main/examples/pytorch/translation/run_translationn_no_trainer.py).
Like `run_translation.py`, this script allows you to fine-tune any of the models supported on a Like `run_translation.py`, this script allows you to fine-tune any of the models supported on a
translation task, the main difference is that this translation task, the main difference is that this
......
...@@ -12,7 +12,7 @@ setuptools.setup( ...@@ -12,7 +12,7 @@ setuptools.setup(
description="Few-shot Named Entity Recognition", description="Few-shot Named Entity Recognition",
long_description=long_description, long_description=long_description,
long_description_content_type="text/markdown", long_description_content_type="text/markdown",
url="https://github.com/huggingface/transformers/tree/master/examples/research_projects/fsner", url="https://github.com/huggingface/transformers/tree/main/examples/research_projects/fsner",
project_urls={ project_urls={
"Bug Tracker": "https://github.com/huggingface/transformers/issues", "Bug Tracker": "https://github.com/huggingface/transformers/issues",
}, },
......
...@@ -45,9 +45,9 @@ Fourth, make sure that your project proposal includes the following information: ...@@ -45,9 +45,9 @@ Fourth, make sure that your project proposal includes the following information:
1. *A clear description of the project* 1. *A clear description of the project*
2. *In which language should the project be conducted?* English, German, Chinese, ...? It can also be a multi-lingual project 2. *In which language should the project be conducted?* English, German, Chinese, ...? It can also be a multi-lingual project
3. *Which model should be used?* If you want to adapt an existing model, you can add the link to one of the 4000 available checkpoints in JAX [here](https://huggingface.co/models?filter=jax) If you want to train a model from scratch, you can simply state the model architecture to be used, *e.g.* BERT, CLIP, etc. You can also base your project on a model that is not part of transformers. For an overview of libraries based on JAX, you can take a look at [awesome-jax](https://github.com/n2cholas/awesome-jax#awesome-jax-). **Note** that for a project that is not based on Transformers it will be more difficult for the 🤗 team to help you. Also have a look at the section [Quickstart Flax & Jax in Transformers](https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects#quickstart-flax-and-jax-in-transformers) to see what model architectures are currently supported in 🤗 Transformers. 3. *Which model should be used?* If you want to adapt an existing model, you can add the link to one of the 4000 available checkpoints in JAX [here](https://huggingface.co/models?filter=jax) If you want to train a model from scratch, you can simply state the model architecture to be used, *e.g.* BERT, CLIP, etc. You can also base your project on a model that is not part of transformers. For an overview of libraries based on JAX, you can take a look at [awesome-jax](https://github.com/n2cholas/awesome-jax#awesome-jax-). **Note** that for a project that is not based on Transformers it will be more difficult for the 🤗 team to help you. Also have a look at the section [Quickstart Flax & Jax in Transformers](https://github.com/huggingface/transformers/tree/main/examples/research_projects/jax-projects#quickstart-flax-and-jax-in-transformers) to see what model architectures are currently supported in 🤗 Transformers.
4. *What data should be used?* It is important to state at least what kind of data you would like to use. Ideally, you can already point to publicly available data or a dataset in the 🤗 Datasets library. 4. *What data should be used?* It is important to state at least what kind of data you would like to use. Ideally, you can already point to publicly available data or a dataset in the 🤗 Datasets library.
5. *Are similar training scripts available in Flax/JAX?* It would be important to find similar training scripts that already exist in Flax/JAX. *E.g.* if you are working on a Seq-to-Seq task, you can make use of the [`run_summarization_flax.py`](https://github.com/huggingface/transformers/blob/master/examples/flax/summarization/run_summarization_flax.py) script which is very similar to any seq2seq training. Also have a look at the section [Quickstart Flax & Jax in Transformers](https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects#quickstart-flax-and-jax-in-transformers) to see what training scripts are currently supported in 🤗 Transformers. 5. *Are similar training scripts available in Flax/JAX?* It would be important to find similar training scripts that already exist in Flax/JAX. *E.g.* if you are working on a Seq-to-Seq task, you can make use of the [`run_summarization_flax.py`](https://github.com/huggingface/transformers/blob/main/examples/flax/summarization/run_summarization_flax.py) script which is very similar to any seq2seq training. Also have a look at the section [Quickstart Flax & Jax in Transformers](https://github.com/huggingface/transformers/tree/main/examples/research_projects/jax-projects#quickstart-flax-and-jax-in-transformers) to see what training scripts are currently supported in 🤗 Transformers.
6. *(Optionally) What are possible challenges?* List possible difficulties with your project. *E.g.* If you know that training convergence usually takes a lot of time, it is worth stating this here! 6. *(Optionally) What are possible challenges?* List possible difficulties with your project. *E.g.* If you know that training convergence usually takes a lot of time, it is worth stating this here!
7. *(Optionally) What is the desired project outcome?* - How would you like to demo your project? One could *e.g.* create a Streamlit application. 7. *(Optionally) What is the desired project outcome?* - How would you like to demo your project? One could *e.g.* create a Streamlit application.
8. *(Optionally) Links to read upon* - Can you provide any links that would help the reader to better understand your project idea? 8. *(Optionally) Links to read upon* - Can you provide any links that would help the reader to better understand your project idea?
......
...@@ -88,7 +88,7 @@ All officially defined projects can be seen [here](https://docs.google.com/sprea ...@@ -88,7 +88,7 @@ All officially defined projects can be seen [here](https://docs.google.com/sprea
### How to propose a project ### How to propose a project
Some default project ideas are given by the organizers. **However, we strongly encourage participants to submit their own project ideas!** Some default project ideas are given by the organizers. **However, we strongly encourage participants to submit their own project ideas!**
Check out the [HOW_TO_PROPOSE_PROJECT.md](https://github.com/huggingface/transformers/tree/master/examples/research_projects/jax-projects/HOW_TO_PROPOSE_PROJECT.md) for more information on how to propose a new project. Check out the [HOW_TO_PROPOSE_PROJECT.md](https://github.com/huggingface/transformers/tree/main/examples/research_projects/jax-projects/HOW_TO_PROPOSE_PROJECT.md) for more information on how to propose a new project.
### How to form a team around a project ### How to form a team around a project
...@@ -161,7 +161,7 @@ To give an example, a well-defined project would be the following: ...@@ -161,7 +161,7 @@ To give an example, a well-defined project would be the following:
- task: summarization - task: summarization
- model: [t5-small](https://huggingface.co/t5-small) - model: [t5-small](https://huggingface.co/t5-small)
- dataset: [CNN/Daily mail](https://huggingface.co/datasets/cnn_dailymail) - dataset: [CNN/Daily mail](https://huggingface.co/datasets/cnn_dailymail)
- training script: [run_summarization_flax.py](https://github.com/huggingface/transformers/blob/master/examples/flax/summarization/run_summarization_flax.py) - training script: [run_summarization_flax.py](https://github.com/huggingface/transformers/blob/main/examples/flax/summarization/run_summarization_flax.py)
- outcome: t5 model that can summarize news - outcome: t5 model that can summarize news
- work flow: adapt `run_summarization_flax.py` to work with `t5-small`. - work flow: adapt `run_summarization_flax.py` to work with `t5-small`.
...@@ -269,7 +269,7 @@ You can activate your venv by running ...@@ -269,7 +269,7 @@ You can activate your venv by running
source ~/<your-venv-name>/bin/activate source ~/<your-venv-name>/bin/activate
``` ```
We strongly recommend to make use of the provided JAX/Flax examples scripts in [transformers/examples/flax](https://github.com/huggingface/transformers/tree/master/examples/flax) even if you want to train a JAX/Flax model of another github repository that is not integrated into 🤗 Transformers. We strongly recommend to make use of the provided JAX/Flax examples scripts in [transformers/examples/flax](https://github.com/huggingface/transformers/tree/main/examples/flax) even if you want to train a JAX/Flax model of another github repository that is not integrated into 🤗 Transformers.
In all likelihood, you will need to adapt one of the example scripts, so we recommend forking and cloning the 🤗 Transformers repository as follows. In all likelihood, you will need to adapt one of the example scripts, so we recommend forking and cloning the 🤗 Transformers repository as follows.
Doing so will allow you to share your fork of the Transformers library with your team members so that the team effectively works on the same code base. It will also automatically install the newest versions of `flax`, `jax` and `optax`. Doing so will allow you to share your fork of the Transformers library with your team members so that the team effectively works on the same code base. It will also automatically install the newest versions of `flax`, `jax` and `optax`.
...@@ -323,7 +323,7 @@ the community week, please fork the datasets repository and follow the instructi ...@@ -323,7 +323,7 @@ the community week, please fork the datasets repository and follow the instructi
[here](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-create-a-pull-request). [here](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-create-a-pull-request).
To verify that all libraries are correctly installed, you can run the following command. To verify that all libraries are correctly installed, you can run the following command.
It assumes that both `transformers` and `datasets` were installed from master - otherwise It assumes that both `transformers` and `datasets` were installed from main - otherwise
datasets streaming will not work correctly. datasets streaming will not work correctly.
```python ```python
...@@ -426,7 +426,7 @@ jax.device_count() ...@@ -426,7 +426,7 @@ jax.device_count()
This should display the number of TPU cores, which should be 8 on a TPUv3-8 VM. This should display the number of TPU cores, which should be 8 on a TPUv3-8 VM.
We strongly recommend to make use of the provided JAX/Flax examples scripts in [transformers/examples/flax](https://github.com/huggingface/transformers/tree/master/examples/flax) even if you want to train a JAX/Flax model of another github repository that is not integrated into 🤗 Transformers. We strongly recommend to make use of the provided JAX/Flax examples scripts in [transformers/examples/flax](https://github.com/huggingface/transformers/tree/main/examples/flax) even if you want to train a JAX/Flax model of another github repository that is not integrated into 🤗 Transformers.
In all likelihood, you will need to adapt one of the example scripts, so we recommend forking and cloning the 🤗 Transformers repository as follows. In all likelihood, you will need to adapt one of the example scripts, so we recommend forking and cloning the 🤗 Transformers repository as follows.
Doing so will allow you to share your fork of the Transformers library with your team members so that the team effectively works on the same code base. It will also automatically install the newest versions of `flax`, `jax` and `optax`. Doing so will allow you to share your fork of the Transformers library with your team members so that the team effectively works on the same code base. It will also automatically install the newest versions of `flax`, `jax` and `optax`.
...@@ -480,7 +480,7 @@ the community week, please fork the datasets repository and follow the instructi ...@@ -480,7 +480,7 @@ the community week, please fork the datasets repository and follow the instructi
[here](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-create-a-pull-request). [here](https://github.com/huggingface/datasets/blob/master/CONTRIBUTING.md#how-to-create-a-pull-request).
To verify that all libraries are correctly installed, you can run the following command. To verify that all libraries are correctly installed, you can run the following command.
It assumes that both `transformers` and `datasets` were installed from master - otherwise It assumes that both `transformers` and `datasets` were installed from main - otherwise
datasets streaming will not work correctly. datasets streaming will not work correctly.
```python ```python
...@@ -510,31 +510,31 @@ model(input_ids) ...@@ -510,31 +510,31 @@ model(input_ids)
## Quickstart flax and jax in transformers ## Quickstart flax and jax in transformers
Currently, we support the following models in Flax. Currently, we support the following models in Flax.
Note that some models are about to be merged to `master` and will Note that some models are about to be merged to `main` and will
be available in a couple of days. be available in a couple of days.
- [BART](https://github.com/huggingface/transformers/blob/master/src/transformers/models/bart/modeling_flax_bart.py) - [BART](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bart/modeling_flax_bart.py)
- [BERT](https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/modeling_flax_bert.py) - [BERT](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_flax_bert.py)
- [BigBird](https://github.com/huggingface/transformers/blob/master/src/transformers/models/big_bird/modeling_flax_big_bird.py) - [BigBird](https://github.com/huggingface/transformers/blob/main/src/transformers/models/big_bird/modeling_flax_big_bird.py)
- [CLIP](https://github.com/huggingface/transformers/blob/master/src/transformers/models/clip/modeling_flax_clip.py) - [CLIP](https://github.com/huggingface/transformers/blob/main/src/transformers/models/clip/modeling_flax_clip.py)
- [ELECTRA](https://github.com/huggingface/transformers/blob/master/src/transformers/models/electra/modeling_flax_electra.py) - [ELECTRA](https://github.com/huggingface/transformers/blob/main/src/transformers/models/electra/modeling_flax_electra.py)
- [GPT2](https://github.com/huggingface/transformers/blob/master/src/transformers/models/gpt2/modeling_flax_gpt2.py) - [GPT2](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_flax_gpt2.py)
- [(TODO) MBART](https://github.com/huggingface/transformers/blob/master/src/transformers/models/mbart/modeling_flax_mbart.py) - [(TODO) MBART](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mbart/modeling_flax_mbart.py)
- [RoBERTa](https://github.com/huggingface/transformers/blob/master/src/transformers/models/roberta/modeling_flax_roberta.py) - [RoBERTa](https://github.com/huggingface/transformers/blob/main/src/transformers/models/roberta/modeling_flax_roberta.py)
- [T5](https://github.com/huggingface/transformers/blob/master/src/transformers/models/t5/modeling_flax_t5.py) - [T5](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/modeling_flax_t5.py)
- [ViT](https://github.com/huggingface/transformers/blob/master/src/transformers/models/vit/modeling_flax_vit.py) - [ViT](https://github.com/huggingface/transformers/blob/main/src/transformers/models/vit/modeling_flax_vit.py)
- [Wav2Vec2](https://github.com/huggingface/transformers/blob/master/src/transformers/models/wav2vec2/modeling_flax_wav2vec2.py) - [Wav2Vec2](https://github.com/huggingface/transformers/blob/main/src/transformers/models/wav2vec2/modeling_flax_wav2vec2.py)
You can find all available training scripts for JAX/Flax under the You can find all available training scripts for JAX/Flax under the
official [flax example folder](https://github.com/huggingface/transformers/tree/master/examples/flax). Note that a couple of training scripts will be released in the following week. official [flax example folder](https://github.com/huggingface/transformers/tree/main/examples/flax). Note that a couple of training scripts will be released in the following week.
- [Causal language modeling (GPT2)](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/run_clm_flax.py) - [Causal language modeling (GPT2)](https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_clm_flax.py)
- [Masked language modeling (BERT, RoBERTa, ELECTRA, BigBird)](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/run_mlm_flax.py) - [Masked language modeling (BERT, RoBERTa, ELECTRA, BigBird)](https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_mlm_flax.py)
- [Text classification (BERT, RoBERTa, ELECTRA, BigBird)](https://github.com/huggingface/transformers/blob/master/examples/flax/text-classification/run_flax_glue.py) - [Text classification (BERT, RoBERTa, ELECTRA, BigBird)](https://github.com/huggingface/transformers/blob/main/examples/flax/text-classification/run_flax_glue.py)
- [Summarization / Seq2Seq (BART, MBART, T5)](https://github.com/huggingface/transformers/blob/master/examples/flax/summarization/run_summarization_flax.py) - [Summarization / Seq2Seq (BART, MBART, T5)](https://github.com/huggingface/transformers/blob/main/examples/flax/summarization/run_summarization_flax.py)
- [Masked Seq2Seq pret-training (T5)](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/run_t5_mlm_flax.py) - [Masked Seq2Seq pret-training (T5)](https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_t5_mlm_flax.py)
- [Contrastive Loss pretraining for Wav2Vec2](https://github.com/huggingface/transformers/blob/master/examples/research_projects/jax-projects/wav2vec2) - [Contrastive Loss pretraining for Wav2Vec2](https://github.com/huggingface/transformers/blob/main/examples/research_projects/jax-projects/wav2vec2)
- [Fine-tuning long-range QA for BigBird](https://github.com/huggingface/transformers/blob/master/examples/research_projects/jax-projects/big_bird) - [Fine-tuning long-range QA for BigBird](https://github.com/huggingface/transformers/blob/main/examples/research_projects/jax-projects/big_bird)
- [(TODO) Image classification (ViT)]( ) - [(TODO) Image classification (ViT)]( )
- [(TODO) CLIP pretraining, fine-tuning (CLIP)]( ) - [(TODO) CLIP pretraining, fine-tuning (CLIP)]( )
...@@ -712,7 +712,7 @@ class FlaxMLPModel(FlaxMLPPreTrainedModel): ...@@ -712,7 +712,7 @@ class FlaxMLPModel(FlaxMLPPreTrainedModel):
Now the `FlaxMLPModel` will have a similar interface as PyTorch or Tensorflow models and allows us to attach loaded or randomely initialized weights to the model instance. Now the `FlaxMLPModel` will have a similar interface as PyTorch or Tensorflow models and allows us to attach loaded or randomely initialized weights to the model instance.
So the important point to remember is that the `model` is not an instance of `nn.Module`; it's an abstract class, like a container that holds a Flax module, its parameters and provides convenient methods for initialization and forward pass. The key take-away here is that an instance of `FlaxMLPModel` is very much stateful now since it holds all the model parameters, whereas the underlying Flax module `FlaxMLPModule` is still stateless. Now to make `FlaxMLPModel` fully compliant with JAX transformations, it is always possible to pass the parameters to `FlaxMLPModel` as well to make it stateless and easier to work with during training. Feel free to take a look at the code to see how exactly this is implemented for ex. [`modeling_flax_bert.py`](https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/modeling_flax_bert.py#L536) So the important point to remember is that the `model` is not an instance of `nn.Module`; it's an abstract class, like a container that holds a Flax module, its parameters and provides convenient methods for initialization and forward pass. The key take-away here is that an instance of `FlaxMLPModel` is very much stateful now since it holds all the model parameters, whereas the underlying Flax module `FlaxMLPModule` is still stateless. Now to make `FlaxMLPModel` fully compliant with JAX transformations, it is always possible to pass the parameters to `FlaxMLPModel` as well to make it stateless and easier to work with during training. Feel free to take a look at the code to see how exactly this is implemented for ex. [`modeling_flax_bert.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_flax_bert.py#L536)
Another significant difference between Flax and PyTorch models is that, we can pass the `labels` directly to PyTorch's forward pass to compute the loss, whereas Flax models never accept `labels` as an input argument. In PyTorch, gradient backpropagation is performed by simply calling `.backward()` on the computed loss which makes it very handy for the user to be able to pass the `labels`. In Flax however, gradient backpropagation cannot be done by simply calling `.backward()` on the loss output, but the loss function itself has to be transformed by `jax.grad` or `jax.value_and_grad` to return the gradients of all parameters. This transformation cannot happen under-the-hood when one passes the `labels` to Flax's forward function, so that in Flax, we simply don't allow `labels` to be passed by design and force the user to implement the loss function oneself. As a conclusion, you will see that all training-related code is decoupled from the modeling code and always defined in the training scripts themselves. Another significant difference between Flax and PyTorch models is that, we can pass the `labels` directly to PyTorch's forward pass to compute the loss, whereas Flax models never accept `labels` as an input argument. In PyTorch, gradient backpropagation is performed by simply calling `.backward()` on the computed loss which makes it very handy for the user to be able to pass the `labels`. In Flax however, gradient backpropagation cannot be done by simply calling `.backward()` on the loss output, but the loss function itself has to be transformed by `jax.grad` or `jax.value_and_grad` to return the gradients of all parameters. This transformation cannot happen under-the-hood when one passes the `labels` to Flax's forward function, so that in Flax, we simply don't allow `labels` to be passed by design and force the user to implement the loss function oneself. As a conclusion, you will see that all training-related code is decoupled from the modeling code and always defined in the training scripts themselves.
...@@ -838,7 +838,7 @@ model.save_pretrained("awesome-flax-model", params=params) ...@@ -838,7 +838,7 @@ model.save_pretrained("awesome-flax-model", params=params)
Note that, as JAX is backed by the [XLA](https://www.tensorflow.org/xla) compiler any JAX/Flax code can run on all `XLA` compliant device without code change! Note that, as JAX is backed by the [XLA](https://www.tensorflow.org/xla) compiler any JAX/Flax code can run on all `XLA` compliant device without code change!
That menas you could use the same training script on CPUs, GPUs, TPUs. That menas you could use the same training script on CPUs, GPUs, TPUs.
To know more about how to train the Flax models on different devices (GPU, multi-GPUs, TPUs) and use the example scripts, please look at the [examples README](https://github.com/huggingface/transformers/tree/master/examples/flax). To know more about how to train the Flax models on different devices (GPU, multi-GPUs, TPUs) and use the example scripts, please look at the [examples README](https://github.com/huggingface/transformers/tree/main/examples/flax).
## Talks ## Talks
...@@ -1025,7 +1025,7 @@ Cool! The file is now displayed on the model page under the [files tab](https:// ...@@ -1025,7 +1025,7 @@ Cool! The file is now displayed on the model page under the [files tab](https://
We encourage you to upload all files except maybe the actual data files to the repository. This includes training scripts, model weights, We encourage you to upload all files except maybe the actual data files to the repository. This includes training scripts, model weights,
model configurations, training logs, etc... model configurations, training logs, etc...
Next, let's create a tokenizer and save it to the model dir by following the instructions of the [official Flax MLM README](https://github.com/huggingface/transformers/tree/master/examples/flax/language-modeling#train-tokenizer). We can again use a simple Python shell. Next, let's create a tokenizer and save it to the model dir by following the instructions of the [official Flax MLM README](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling#train-tokenizer). We can again use a simple Python shell.
```python ```python
from datasets import load_dataset from datasets import load_dataset
...@@ -1055,7 +1055,7 @@ tokenizer.save("./tokenizer.json") ...@@ -1055,7 +1055,7 @@ tokenizer.save("./tokenizer.json")
``` ```
This creates and saves our tokenizer directly in the cloned repository. This creates and saves our tokenizer directly in the cloned repository.
Finally, we can start training. For now, we'll simply use the official [`run_mlm_flax`](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/run_mlm_flax.py) Finally, we can start training. For now, we'll simply use the official [`run_mlm_flax`](https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_mlm_flax.py)
script, but we might make some changes later. So let's copy the script into our model repository. script, but we might make some changes later. So let's copy the script into our model repository.
```bash ```bash
......
git+https://github.com/huggingface/transformers@master git+https://github.com/huggingface/transformers@main
datasets datasets
sentencepiece sentencepiece
wandb wandb
......
...@@ -90,7 +90,7 @@ config.save_pretrained(model_dir) ...@@ -90,7 +90,7 @@ config.save_pretrained(model_dir)
### Train model ### Train model
Next we can run the example script to pretrain the model. Next we can run the example script to pretrain the model.
Compared to the default [`run_mlm_flax`](https://github.com/huggingface/transformers/blob/master/examples/flax/language-modeling/run_mlm_flax.py), we introduced 4 new training settings: Compared to the default [`run_mlm_flax`](https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_mlm_flax.py), we introduced 4 new training settings:
- `num_train_steps` - how many update steps should be run. - `num_train_steps` - how many update steps should be run.
- `num_eval_samples` - how many training samples should be taken for evaluation. - `num_eval_samples` - how many training samples should be taken for evaluation.
- `logging_steps` - at what rate should the training loss be logged. - `logging_steps` - at what rate should the training loss be logged.
......
## MM-IMDb ## MM-IMDb
Based on the script [`run_mmimdb.py`](https://github.com/huggingface/transformers/blob/master/examples/research_projects/mm-imdb/run_mmimdb.py). Based on the script [`run_mmimdb.py`](https://github.com/huggingface/transformers/blob/main/examples/research_projects/mm-imdb/run_mmimdb.py).
[MM-IMDb](http://lisi1.unal.edu.co/mmimdb/) is a Multimodal dataset with around 26,000 movies including images, plots and other metadata. [MM-IMDb](http://lisi1.unal.edu.co/mmimdb/) is a Multimodal dataset with around 26,000 movies including images, plots and other metadata.
......
...@@ -23,7 +23,7 @@ You can also have a look at this fun *Explain Like I'm Five* introductory [slide ...@@ -23,7 +23,7 @@ You can also have a look at this fun *Explain Like I'm Five* introductory [slide
One promise of extreme pruning is to obtain extremely small models that can be easily sent (and stored) on edge devices. By setting weights to 0., we reduce the amount of information we need to store, and thus decreasing the memory size. We are able to obtain extremely sparse fine-pruned models with movement pruning: ~95% of the dense performance with ~5% of total remaining weights in the BERT encoder. One promise of extreme pruning is to obtain extremely small models that can be easily sent (and stored) on edge devices. By setting weights to 0., we reduce the amount of information we need to store, and thus decreasing the memory size. We are able to obtain extremely sparse fine-pruned models with movement pruning: ~95% of the dense performance with ~5% of total remaining weights in the BERT encoder.
In [this notebook](https://github.com/huggingface/transformers/blob/master/examples/research_projects/movement-pruning/Saving_PruneBERT.ipynb), we showcase how we can leverage standard tools that exist out-of-the-box to efficiently store an extremely sparse question answering model (only 6% of total remaining weights in the encoder). We are able to reduce the memory size of the encoder **from the 340MB (the original dense BERT) to 11MB**, without any additional training of the model (every operation is performed *post fine-pruning*). It is sufficiently small to store it on a [91' floppy disk](https://en.wikipedia.org/wiki/Floptical) 📎! In [this notebook](https://github.com/huggingface/transformers/blob/main/examples/research_projects/movement-pruning/Saving_PruneBERT.ipynb), we showcase how we can leverage standard tools that exist out-of-the-box to efficiently store an extremely sparse question answering model (only 6% of total remaining weights in the encoder). We are able to reduce the memory size of the encoder **from the 340MB (the original dense BERT) to 11MB**, without any additional training of the model (every operation is performed *post fine-pruning*). It is sufficiently small to store it on a [91' floppy disk](https://en.wikipedia.org/wiki/Floptical) 📎!
While movement pruning does not directly optimize for memory footprint (but rather the number of non-null weights), we hypothetize that further memory compression ratios can be achieved with specific quantization aware trainings (see for instance [Q8BERT](https://arxiv.org/abs/1910.06188), [And the Bit Goes Down](https://arxiv.org/abs/1907.05686) or [Quant-Noise](https://arxiv.org/abs/2004.07320)). While movement pruning does not directly optimize for memory footprint (but rather the number of non-null weights), we hypothetize that further memory compression ratios can be achieved with specific quantization aware trainings (see for instance [Q8BERT](https://arxiv.org/abs/1910.06188), [And the Bit Goes Down](https://arxiv.org/abs/1907.05686) or [Quant-Noise](https://arxiv.org/abs/2004.07320)).
...@@ -40,9 +40,9 @@ Pre-trained `BERT-base-uncased` fine-pruned with soft movement pruning on MNLI. ...@@ -40,9 +40,9 @@ Pre-trained `BERT-base-uncased` fine-pruned with soft movement pruning on MNLI.
### Setup ### Setup
The code relies on the 🤗 Transformers library. In addition to the dependencies listed in the [`examples`](https://github.com/huggingface/transformers/tree/master/examples) folder, you should install a few additional dependencies listed in the `requirements.txt` file: `pip install -r requirements.txt`. The code relies on the 🤗 Transformers library. In addition to the dependencies listed in the [`examples`](https://github.com/huggingface/transformers/tree/main/examples) folder, you should install a few additional dependencies listed in the `requirements.txt` file: `pip install -r requirements.txt`.
Note that we built our experiments on top of a stabilized version of the library (commit https://github.com/huggingface/transformers/commit/352d5472b0c1dec0f420d606d16747d851b4bda8): we do not guarantee that everything is still compatible with the latest version of the master branch. Note that we built our experiments on top of a stabilized version of the library (commit https://github.com/huggingface/transformers/commit/352d5472b0c1dec0f420d606d16747d851b4bda8): we do not guarantee that everything is still compatible with the latest version of the main branch.
### Fine-pruning with movement pruning ### Fine-pruning with movement pruning
......
...@@ -8,7 +8,7 @@ The original RAG implementation is able to train the question encoder and genera ...@@ -8,7 +8,7 @@ The original RAG implementation is able to train the question encoder and genera
This extension enables complete end-to-end training of RAG including the context encoder in the retriever component. This extension enables complete end-to-end training of RAG including the context encoder in the retriever component.
Please read the [accompanying blog post](https://shamanesiri.medium.com/how-to-finetune-the-entire-rag-architecture-including-dpr-retriever-4b4385322552) for details on this implementation. Please read the [accompanying blog post](https://shamanesiri.medium.com/how-to-finetune-the-entire-rag-architecture-including-dpr-retriever-4b4385322552) for details on this implementation.
The original RAG code has also been modified to work with the latest versions of pytorch lightning (version 1.2.10) and RAY (version 1.3.0). All other implementation details remain the same as the [original RAG code](https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag). The original RAG code has also been modified to work with the latest versions of pytorch lightning (version 1.2.10) and RAY (version 1.3.0). All other implementation details remain the same as the [original RAG code](https://github.com/huggingface/transformers/tree/main/examples/research_projects/rag).
Read more about RAG at https://arxiv.org/abs/2005.11401. Read more about RAG at https://arxiv.org/abs/2005.11401.
This code can be modified to experiment with other research on retrival augmented models which include training of the retriever (e.g. [REALM](https://arxiv.org/abs/2002.08909) and [MARGE](https://arxiv.org/abs/2006.15020)). This code can be modified to experiment with other research on retrival augmented models which include training of the retriever (e.g. [REALM](https://arxiv.org/abs/2002.08909) and [MARGE](https://arxiv.org/abs/2006.15020)).
......
...@@ -17,7 +17,7 @@ Read more about RAG at https://arxiv.org/abs/2005.11401. ...@@ -17,7 +17,7 @@ Read more about RAG at https://arxiv.org/abs/2005.11401.
# Finetuning # Finetuning
Our finetuning logic is based on scripts from [`examples/seq2seq`](https://github.com/huggingface/transformers/tree/master/examples/seq2seq). We accept training data in the same format as specified there - we expect a directory consisting of 6 text files: Our finetuning logic is based on scripts from [`examples/seq2seq`](https://github.com/huggingface/transformers/tree/main/examples/seq2seq). We accept training data in the same format as specified there - we expect a directory consisting of 6 text files:
```bash ```bash
train.source train.source
train.target train.target
......
...@@ -43,7 +43,7 @@ The section [Data and preprocessing](#data-and-preprocessing) explains ...@@ -43,7 +43,7 @@ The section [Data and preprocessing](#data-and-preprocessing) explains
in more detail what audio data can be used, how to find suitable audio data, and in more detail what audio data can be used, how to find suitable audio data, and
how the audio data can be processed. how the audio data can be processed.
For training, it is recommended to use the [official training script](https://github.com/huggingface/transformers/blob/master/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py) or a modification thereof. A step-by-step guide on how to fine-tune For training, it is recommended to use the [official training script](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py) or a modification thereof. A step-by-step guide on how to fine-tune
an acoustic model for a speech recognition system can be found under [How to fine-tune an acoustic model](#how-to-finetune-an-acoustic-model). an acoustic model for a speech recognition system can be found under [How to fine-tune an acoustic model](#how-to-finetune-an-acoustic-model).
If possible it is encouraged to fine-tune the acoustic models on local GPU machines, but If possible it is encouraged to fine-tune the acoustic models on local GPU machines, but
if those are not available, the OVH could team kindly provides a limited if those are not available, the OVH could team kindly provides a limited
...@@ -124,7 +124,7 @@ training the acoustic model (example shown in [How to fine-tune an acoustic mode ...@@ -124,7 +124,7 @@ training the acoustic model (example shown in [How to fine-tune an acoustic mode
It is recommended that this is done by using 🤗 Datasets `.map()` function as shown It is recommended that this is done by using 🤗 Datasets `.map()` function as shown
[here](https://github.com/huggingface/transformers/blob/9a2dabae7002258e41419491c73dd43ad61b5de7/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py#L444). As can be [here](https://github.com/huggingface/transformers/blob/9a2dabae7002258e41419491c73dd43ad61b5de7/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py#L444). As can be
see we can pass some characters that will be removed from the transcriptions, *e.g.*: `--chars_to_ignore , ? . ! - \; \: \" “ % ‘ ” � \` see we can pass some characters that will be removed from the transcriptions, *e.g.*: `--chars_to_ignore , ? . ! - \; \: \" “ % ‘ ” � \`
on the official ["Single GPU Example"](https://github.com/huggingface/transformers/tree/master/examples/pytorch/speech-recognition#single-gpu-ctc). on the official ["Single GPU Example"](https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition#single-gpu-ctc).
The participants are free to modify this preprocessing by removing more characters or even replacing characters as The participants are free to modify this preprocessing by removing more characters or even replacing characters as
it is done in the [official blog post](https://github.com/huggingface/transformers/blob/9a2dabae7002258e41419491c73dd43ad61b5de7/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py#L444). it is done in the [official blog post](https://github.com/huggingface/transformers/blob/9a2dabae7002258e41419491c73dd43ad61b5de7/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py#L444).
**However**, there are some rules regarding what characters are allowed to be removed/replaced and which are not. **However**, there are some rules regarding what characters are allowed to be removed/replaced and which are not.
...@@ -173,7 +173,7 @@ python -c "import torch; print(torch.cuda.is_available())" ...@@ -173,7 +173,7 @@ python -c "import torch; print(torch.cuda.is_available())"
If the above command doesn't print ``True``, in the first step, please follow the If the above command doesn't print ``True``, in the first step, please follow the
instructions [here](https://pytorch.org/) to install PyTorch with CUDA. instructions [here](https://pytorch.org/) to install PyTorch with CUDA.
We strongly recommend making use of the provided PyTorch examples scripts in [transformers/examples/pytorch/speech-recognition](https://github.com/huggingface/transformers/tree/master/examples/pytorch/speech-recognition) to train your speech recognition We strongly recommend making use of the provided PyTorch examples scripts in [transformers/examples/pytorch/speech-recognition](https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition) to train your speech recognition
system. system.
In all likelihood, you will adjust one of the example scripts, so we recommend forking and cloning the 🤗 Transformers repository as follows. In all likelihood, you will adjust one of the example scripts, so we recommend forking and cloning the 🤗 Transformers repository as follows.
...@@ -332,7 +332,7 @@ cp ~/transformers/examples/pytorch/speech-recognition/run_speech_recognition_ctc ...@@ -332,7 +332,7 @@ cp ~/transformers/examples/pytorch/speech-recognition/run_speech_recognition_ctc
``` ```
Next, we'll create a bash file to define the hyper-parameters and configurations Next, we'll create a bash file to define the hyper-parameters and configurations
for training. More detailed information on different settings (single-GPU vs. multi-GPU) can be found [here](https://github.com/huggingface/transformers/tree/master/examples/pytorch/speech-recognition#connectionist-temporal-classification). for training. More detailed information on different settings (single-GPU vs. multi-GPU) can be found [here](https://github.com/huggingface/transformers/tree/main/examples/pytorch/speech-recognition#connectionist-temporal-classification).
For demonstration purposes, we will use a dummy XLS-R model `model_name_or_path="hf-test/xls-r-dummy"` on the very low-resource language of "Abkhaz" of [Common Voice 7](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0): `dataset_config_name="ab"` for just a single epoch. For demonstration purposes, we will use a dummy XLS-R model `model_name_or_path="hf-test/xls-r-dummy"` on the very low-resource language of "Abkhaz" of [Common Voice 7](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0): `dataset_config_name="ab"` for just a single epoch.
...@@ -347,7 +347,7 @@ dummy hyper-parameters and configurations for demonstration purposes. ...@@ -347,7 +347,7 @@ dummy hyper-parameters and configurations for demonstration purposes.
Note that we add the flag `--use_auth_token` so that datasets requiring access, Note that we add the flag `--use_auth_token` so that datasets requiring access,
such as [Common Voice 7](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0) can be downloaded. In addition, we add the `--push_to_hub` flag to make use of the such as [Common Voice 7](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0) can be downloaded. In addition, we add the `--push_to_hub` flag to make use of the
[Trainers `push_to-hub` functionality](https://huggingface.co/docs/transformers/master/en/main_classes/trainer#transformers.Trainer.push_to_hub) so that your model will be automatically uploaded to the Hub. [Trainers `push_to-hub` functionality](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.push_to_hub) so that your model will be automatically uploaded to the Hub.
Let's copy the following code snippet in a file called `run.sh` Let's copy the following code snippet in a file called `run.sh`
...@@ -389,7 +389,7 @@ The training should not take more than a couple of minutes. ...@@ -389,7 +389,7 @@ The training should not take more than a couple of minutes.
During the training intermediate saved checkpoints are automatically uploaded to During the training intermediate saved checkpoints are automatically uploaded to
your model repository as can be seen [on this commit](https://huggingface.co/hf-test/xls-r-ab-test/commit/0eb19a0fca4d7d163997b59663d98cd856022aa6) . your model repository as can be seen [on this commit](https://huggingface.co/hf-test/xls-r-ab-test/commit/0eb19a0fca4d7d163997b59663d98cd856022aa6) .
At the end of the training, the [Trainer](https://huggingface.co/docs/transformers/master/en/main_classes/trainer) automatically creates a nice model card and all At the end of the training, the [Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer) automatically creates a nice model card and all
relevant files are uploaded. relevant files are uploaded.
5. **Tips for real model training** 5. **Tips for real model training**
...@@ -587,7 +587,7 @@ both the word- and character error rate. ...@@ -587,7 +587,7 @@ both the word- and character error rate.
In a few days, we will give everybody access to some real-world audio data for as many languages as possible. In a few days, we will give everybody access to some real-world audio data for as many languages as possible.
If your language has real-world audio data, it will most likely have audio input If your language has real-world audio data, it will most likely have audio input
of multiple minutes. 🤗Transformer's [ASR pipeline](https://huggingface.co/docs/transformers/master/en/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) supports audio chunking out-of-the-box. You only need to specify of multiple minutes. 🤗Transformer's [ASR pipeline](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) supports audio chunking out-of-the-box. You only need to specify
how song each audio chunk should be (`chunk_length_s`) and how much audio stride how song each audio chunk should be (`chunk_length_s`) and how much audio stride
(`stride_length_s`) each chunk should use. (`stride_length_s`) each chunk should use.
For more information on the chunking works, please have a look at [this nice blog post](TODO: ). For more information on the chunking works, please have a look at [this nice blog post](TODO: ).
......
...@@ -62,7 +62,7 @@ export DATA_DIR=${PWD}/wmt_en_de ...@@ -62,7 +62,7 @@ export DATA_DIR=${PWD}/wmt_en_de
#### FSMT datasets (wmt) #### FSMT datasets (wmt)
Refer to the scripts starting with `eval_` under: Refer to the scripts starting with `eval_` under:
https://github.com/huggingface/transformers/tree/master/scripts/fsmt https://github.com/huggingface/transformers/tree/main/scripts/fsmt
#### Pegasus (multiple datasets) #### Pegasus (multiple datasets)
......
# VisualBERT Demo # VisualBERT Demo
This demo shows usage of VisualBERT VQA model and is adapted from LXMERT demo present [here](https://github.com/huggingface/transformers/blob/master/examples/research_projects/lxmert/demo.ipynb). This demo shows usage of VisualBERT VQA model and is adapted from LXMERT demo present [here](https://github.com/huggingface/transformers/blob/main/examples/research_projects/lxmert/demo.ipynb).
1. make a virtualenv: ``virtualenv venv`` and activate ``source venv/bin/activate`` 1. make a virtualenv: ``virtualenv venv`` and activate ``source venv/bin/activate``
2. install reqs: ``pip install -r ./requirements.txt`` 2. install reqs: ``pip install -r ./requirements.txt``
3. usage is as shown in demo.ipynb 3. usage is as shown in demo.ipynb
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"**Note**: This demo is adapted from the LXMERT Demo present here: https://github.com/huggingface/transformers/tree/master/examples/research_projects/lxmert" "**Note**: This demo is adapted from the LXMERT Demo present here: https://github.com/huggingface/transformers/tree/main/examples/research_projects/lxmert"
], ],
"metadata": {} "metadata": {}
}, },
......
**NOTE**: This example is outdated and is not longer actively maintained. Please **NOTE**: This example is outdated and is not longer actively maintained. Please
follow the new instructions of fine-tuning Wav2Vec2 [here](https://github.com/huggingface/transformers/blob/master/examples/pytorch/speech-recognition/README.md) follow the new instructions of fine-tuning Wav2Vec2 [here](https://github.com/huggingface/transformers/blob/main/examples/pytorch/speech-recognition/README.md)
## Fine-tuning Wav2Vec2 ## Fine-tuning Wav2Vec2
...@@ -131,7 +131,7 @@ which helps with capping GPU memory usage. ...@@ -131,7 +131,7 @@ which helps with capping GPU memory usage.
### DeepSpeed Integration ### DeepSpeed Integration
To learn how to deploy Deepspeed Integration please refer to [this guide](https://huggingface.co/transformers/master/main_classes/deepspeed.html#deepspeed-trainer-integration). To learn how to deploy Deepspeed Integration please refer to [this guide](https://huggingface.co/transformers/main/main_classes/deepspeed.html#deepspeed-trainer-integration).
But to get started quickly all you need is to install: But to get started quickly all you need is to install:
``` ```
...@@ -188,7 +188,7 @@ run_asr.py \ ...@@ -188,7 +188,7 @@ run_asr.py \
### Pretraining Wav2Vec2 ### Pretraining Wav2Vec2
The `run_pretrain.py` script allows one to pretrain a Wav2Vec2 model from scratch using Wav2Vec2's contrastive loss objective (see official [paper](https://arxiv.org/abs/2006.11477) for more information). The `run_pretrain.py` script allows one to pretrain a Wav2Vec2 model from scratch using Wav2Vec2's contrastive loss objective (see official [paper](https://arxiv.org/abs/2006.11477) for more information).
It is recommended to pre-train Wav2Vec2 with Trainer + Deepspeed (please refer to [this guide](https://huggingface.co/transformers/master/main_classes/deepspeed.html#deepspeed-trainer-integration) for more information). It is recommended to pre-train Wav2Vec2 with Trainer + Deepspeed (please refer to [this guide](https://huggingface.co/transformers/main/main_classes/deepspeed.html#deepspeed-trainer-integration) for more information).
Here is an example of how you can use DeepSpeed ZeRO-2 to pretrain a small Wav2Vec2 model: Here is an example of how you can use DeepSpeed ZeRO-2 to pretrain a small Wav2Vec2 model:
......
...@@ -28,7 +28,7 @@ Dataset: [https://huggingface.co/datasets/google/xtreme_s](https://huggingface.c ...@@ -28,7 +28,7 @@ Dataset: [https://huggingface.co/datasets/google/xtreme_s](https://huggingface.c
## Fine-tuning for the XTREME-S tasks ## Fine-tuning for the XTREME-S tasks
Based on the [`run_xtreme_s.py`](https://github.com/huggingface/transformers/blob/master/examples/research_projects/xtreme-s/run_xtreme_s.py) script. Based on the [`run_xtreme_s.py`](https://github.com/huggingface/transformers/blob/main/examples/research_projects/xtreme-s/run_xtreme_s.py) script.
This script can fine-tune any of the pretrained speech models on the [hub](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition) on the [XTREME-S dataset](https://huggingface.co/datasets/google/xtreme_s) tasks. This script can fine-tune any of the pretrained speech models on the [hub](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition) on the [XTREME-S dataset](https://huggingface.co/datasets/google/xtreme_s) tasks.
...@@ -73,7 +73,7 @@ The corresponding training commands for each dataset are given in the sections b ...@@ -73,7 +73,7 @@ The corresponding training commands for each dataset are given in the sections b
### Speech Recognition with MLS ### Speech Recognition with MLS
The following command shows how to fine-tune the [XLS-R](https://huggingface.co/docs/transformers/master/model_doc/xls_r) model on [XTREME-S MLS](https://huggingface.co/datasets/google/xtreme_s#multilingual-librispeech-mls) using 8 GPUs in half-precision. The following command shows how to fine-tune the [XLS-R](https://huggingface.co/docs/transformers/main/model_doc/xls_r) model on [XTREME-S MLS](https://huggingface.co/datasets/google/xtreme_s#multilingual-librispeech-mls) using 8 GPUs in half-precision.
```bash ```bash
python -m torch.distributed.launch \ python -m torch.distributed.launch \
...@@ -117,7 +117,7 @@ On 8 V100 GPUs, this script should run in ~19 hours and yield a cross-entropy lo ...@@ -117,7 +117,7 @@ On 8 V100 GPUs, this script should run in ~19 hours and yield a cross-entropy lo
### Speech Classification with Minds-14 ### Speech Classification with Minds-14
The following command shows how to fine-tune the [XLS-R](https://huggingface.co/docs/transformers/master/model_doc/xls_r) model on [XTREME-S MLS](https://huggingface.co/datasets/google/xtreme_s#intent-classification---minds-14) using 2 GPUs in half-precision. The following command shows how to fine-tune the [XLS-R](https://huggingface.co/docs/transformers/main/model_doc/xls_r) model on [XTREME-S MLS](https://huggingface.co/datasets/google/xtreme_s#intent-classification---minds-14) using 2 GPUs in half-precision.
```bash ```bash
python -m torch.distributed.launch \ python -m torch.distributed.launch \
......
...@@ -19,7 +19,7 @@ classification performance to the original zero-shot model ...@@ -19,7 +19,7 @@ classification performance to the original zero-shot model
### Usage ### Usage
A teacher NLI model can be distilled to a more efficient student model by running [`distill_classifier.py`](https://github.com/huggingface/transformers/blob/master/examples/research_projects/zero-shot-distillation/distill_classifier.py): A teacher NLI model can be distilled to a more efficient student model by running [`distill_classifier.py`](https://github.com/huggingface/transformers/blob/main/examples/research_projects/zero-shot-distillation/distill_classifier.py):
``` ```
python distill_classifier.py \ python distill_classifier.py \
......
...@@ -31,13 +31,13 @@ Here is the list of all our examples: ...@@ -31,13 +31,13 @@ Here is the list of all our examples:
| Task | Example datasets | | Task | Example datasets |
|---|---| |---|---|
| [**`language-modeling`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/language-modeling) | WikiText-2 | [**`language-modeling`**](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling) | WikiText-2
| [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/multiple-choice) | SWAG | [**`multiple-choice`**](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/multiple-choice) | SWAG
| [**`question-answering`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/question-answering) | SQuAD | [**`question-answering`**](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering) | SQuAD
| [**`summarization`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/summarization) | XSum | [**`summarization`**](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/summarization) | XSum
| [**`text-classification`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/text-classification) | GLUE | [**`text-classification`**](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification) | GLUE
| [**`token-classification`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/token-classification) | CoNLL NER | [**`token-classification`**](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification) | CoNLL NER
| [**`translation`**](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/translation) | WMT | [**`translation`**](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/translation) | WMT
## Coming soon ## Coming soon
......
...@@ -56,7 +56,7 @@ def main(): ...@@ -56,7 +56,7 @@ def main():
"This issue has been automatically marked as stale because it has not had " "This issue has been automatically marked as stale because it has not had "
"recent activity. If you think this still needs to be addressed " "recent activity. If you think this still needs to be addressed "
"please comment on this thread.\n\nPlease note that issues that do not follow the " "please comment on this thread.\n\nPlease note that issues that do not follow the "
"[contributing guidelines](https://github.com/huggingface/transformers/blob/master/CONTRIBUTING.md) " "[contributing guidelines](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md) "
"are likely to be ignored." "are likely to be ignored."
) )
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment