Unverified Commit e1f3156b authored by Santiago Castro's avatar Santiago Castro Committed by GitHub
Browse files

Fix many typos (#8708)

parent 9c0afdaf
...@@ -44,7 +44,7 @@ The documentation is organized in five parts: ...@@ -44,7 +44,7 @@ The documentation is organized in five parts:
and a glossary. and a glossary.
- **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library. - **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library. - **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general resarch in - **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in
transformers model transformers model
- The three last section contain the documentation of each public class and function, grouped in: - The three last section contain the documentation of each public class and function, grouped in:
......
...@@ -5,7 +5,7 @@ Overview ...@@ -5,7 +5,7 @@ Overview
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dense Passage Retrieval (DPR) is a set of tools and models for state-of-the-art open-domain Q&A research. It was Dense Passage Retrieval (DPR) is a set of tools and models for state-of-the-art open-domain Q&A research. It was
intorduced in `Dense Passage Retrieval for Open-Domain Question Answering <https://arxiv.org/abs/2004.04906>`__ by introduced in `Dense Passage Retrieval for Open-Domain Question Answering <https://arxiv.org/abs/2004.04906>`__ by
Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih. Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih.
The abstract from the paper is the following: The abstract from the paper is the following:
......
...@@ -530,7 +530,7 @@ Sequence-to-sequence model with the same encoder-decoder model architecture as B ...@@ -530,7 +530,7 @@ Sequence-to-sequence model with the same encoder-decoder model architecture as B
two self-supervised objective functions: Masked Language Modeling (MLM) and a novel summarization specific pre-training two self-supervised objective functions: Masked Language Modeling (MLM) and a novel summarization specific pre-training
objective, called Gap Sentence Generation (GSG). objective, called Gap Sentence Generation (GSG).
* MLM: encoder input tokens are randomely replaced by a mask tokens and have to be predicted by the encoder (like in * MLM: encoder input tokens are randomly replaced by a mask tokens and have to be predicted by the encoder (like in
BERT) BERT)
* GSG: whole encoder input sentences are replaced by a second mask token and fed to the decoder, but which has a * GSG: whole encoder input sentences are replaced by a second mask token and fed to the decoder, but which has a
causal mask to hide the future words like a regular auto-regressive transformer decoder. causal mask to hide the future words like a regular auto-regressive transformer decoder.
......
...@@ -109,7 +109,7 @@ XLM-RoBERTa ...@@ -109,7 +109,7 @@ XLM-RoBERTa
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
XLM-RoBERTa was trained on 2.5TB of newly created clean CommonCrawl data in 100 languages. It provides strong gains XLM-RoBERTa was trained on 2.5TB of newly created clean CommonCrawl data in 100 languages. It provides strong gains
over previously released multi-lingual models like mBERT or XLM on downstream taks like classification, sequence over previously released multi-lingual models like mBERT or XLM on downstream tasks like classification, sequence
labeling and question answering. labeling and question answering.
Two XLM-RoBERTa checkpoints can be used for multi-lingual tasks: Two XLM-RoBERTa checkpoints can be used for multi-lingual tasks:
......
...@@ -62,7 +62,7 @@ sliding the context window so that the model has more context when making each p ...@@ -62,7 +62,7 @@ sliding the context window so that the model has more context when making each p
This is a closer approximation to the true decomposition of the sequence probability and will typically yield a more This is a closer approximation to the true decomposition of the sequence probability and will typically yield a more
favorable score. The downside is that it requires a separate forward pass for each token in the corpus. A good favorable score. The downside is that it requires a separate forward pass for each token in the corpus. A good
practical compromise is to employ a strided sliding window, moving the context by larger strides rather than sliding by practical compromise is to employ a strided sliding window, moving the context by larger strides rather than sliding by
1 token a time. This allows computation to procede much faster while still giving the model a large context to make 1 token a time. This allows computation to proceed much faster while still giving the model a large context to make
predictions at each step. predictions at each step.
Example: Calculating perplexity with GPT-2 in 🤗 Transformers Example: Calculating perplexity with GPT-2 in 🤗 Transformers
......
...@@ -25,7 +25,7 @@ class PlotArguments: ...@@ -25,7 +25,7 @@ class PlotArguments:
) )
plot_along_batch: bool = field( plot_along_batch: bool = field(
default=False, default=False,
metadata={"help": "Whether to plot along batch size or sequence lengh. Defaults to sequence length."}, metadata={"help": "Whether to plot along batch size or sequence length. Defaults to sequence length."},
) )
is_time: bool = field( is_time: bool = field(
default=False, default=False,
......
...@@ -17,7 +17,7 @@ This folder contains the original code used to train Distil* as well as examples ...@@ -17,7 +17,7 @@ This folder contains the original code used to train Distil* as well as examples
## What is Distil* ## What is Distil*
Distil* is a class of compressed models that started with DistilBERT. DistilBERT stands for Distillated-BERT. DistilBERT is a small, fast, cheap and light Transformer model based on Bert architecture. It has 40% less parameters than `bert-base-uncased`, runs 60% faster while preserving 97% of BERT's performances as measured on the GLUE language understanding benchmark. DistilBERT is trained using knowledge distillation, a technique to compress a large model called the teacher into a smaller model called the student. By distillating Bert, we obtain a smaller Transformer model that bears a lot of similarities with the original BERT model while being lighter, smaller and faster to run. DistilBERT is thus an interesting option to put large-scaled trained Transformer model into production. Distil* is a class of compressed models that started with DistilBERT. DistilBERT stands for Distilled-BERT. DistilBERT is a small, fast, cheap and light Transformer model based on Bert architecture. It has 40% less parameters than `bert-base-uncased`, runs 60% faster while preserving 97% of BERT's performances as measured on the GLUE language understanding benchmark. DistilBERT is trained using knowledge distillation, a technique to compress a large model called the teacher into a smaller model called the student. By distillating Bert, we obtain a smaller Transformer model that bears a lot of similarities with the original BERT model while being lighter, smaller and faster to run. DistilBERT is thus an interesting option to put large-scaled trained Transformer model into production.
We have applied the same method to other Transformer architectures and released the weights: We have applied the same method to other Transformer architectures and released the weights:
- GPT2: on the [WikiText-103](https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/) benchmark, GPT2 reaches a perplexity on the test set of 16.3 compared to 21.1 for **DistilGPT2** (after fine-tuning on the train set). - GPT2: on the [WikiText-103](https://blog.einstein.ai/the-wikitext-long-term-dependency-language-modeling-dataset/) benchmark, GPT2 reaches a perplexity on the test set of 16.3 compared to 21.1 for **DistilGPT2** (after fine-tuning on the train set).
...@@ -57,7 +57,7 @@ Here are the results on the *test* sets for 6 of the languages available in XNLI ...@@ -57,7 +57,7 @@ Here are the results on the *test* sets for 6 of the languages available in XNLI
This part of the library has only be tested with Python3.6+. There are few specific dependencies to install before launching a distillation, you can install them with the command `pip install -r requirements.txt`. This part of the library has only be tested with Python3.6+. There are few specific dependencies to install before launching a distillation, you can install them with the command `pip install -r requirements.txt`.
**Important note:** The training scripts have been updated to support PyTorch v1.2.0 (there are breakings changes compared to v1.1.0). **Important note:** The training scripts have been updated to support PyTorch v1.2.0 (there are breaking changes compared to v1.1.0).
## How to use DistilBERT ## How to use DistilBERT
...@@ -111,7 +111,7 @@ python scripts/binarized_data.py \ ...@@ -111,7 +111,7 @@ python scripts/binarized_data.py \
--dump_file data/binarized_text --dump_file data/binarized_text
``` ```
Our implementation of masked language modeling loss follows [XLM](https://github.com/facebookresearch/XLM)'s one and smoothes the probability of masking with a factor that put more emphasis on rare words. Thus we count the occurrences of each tokens in the data: Our implementation of masked language modeling loss follows [XLM](https://github.com/facebookresearch/XLM)'s one and smooths the probability of masking with a factor that put more emphasis on rare words. Thus we count the occurrences of each tokens in the data:
```bash ```bash
python scripts/token_counts.py \ python scripts/token_counts.py \
...@@ -173,7 +173,7 @@ python -m torch.distributed.launch \ ...@@ -173,7 +173,7 @@ python -m torch.distributed.launch \
--token_counts data/token_counts.bert-base-uncased.pickle --token_counts data/token_counts.bert-base-uncased.pickle
``` ```
**Tips:** Starting distillated training with good initialization of the model weights is crucial to reach decent performance. In our experiments, we initialized our model from a few layers of the teacher (Bert) itself! Please refer to `scripts/extract.py` and `scripts/extract_distilbert.py` to create a valid initialization checkpoint and use `--student_pretrained_weights` argument to use this initialization for the distilled training! **Tips:** Starting distilled training with good initialization of the model weights is crucial to reach decent performance. In our experiments, we initialized our model from a few layers of the teacher (Bert) itself! Please refer to `scripts/extract.py` and `scripts/extract_distilbert.py` to create a valid initialization checkpoint and use `--student_pretrained_weights` argument to use this initialization for the distilled training!
Happy distillation! Happy distillation!
......
...@@ -188,7 +188,7 @@ class Distiller: ...@@ -188,7 +188,7 @@ class Distiller:
def prepare_batch_mlm(self, batch): def prepare_batch_mlm(self, batch):
""" """
Prepare the batch: from the token_ids and the lenghts, compute the attention mask and the masked label for MLM. Prepare the batch: from the token_ids and the lengths, compute the attention mask and the masked label for MLM.
Input: Input:
------ ------
...@@ -200,7 +200,7 @@ class Distiller: ...@@ -200,7 +200,7 @@ class Distiller:
------- -------
token_ids: `torch.tensor(bs, seq_length)` - The token ids after the modifications for MLM. token_ids: `torch.tensor(bs, seq_length)` - The token ids after the modifications for MLM.
attn_mask: `torch.tensor(bs, seq_length)` - The attention mask for the self-attention. attn_mask: `torch.tensor(bs, seq_length)` - The attention mask for the self-attention.
mlm_labels: `torch.tensor(bs, seq_length)` - The masked languge modeling labels. There is a -100 where there is nothing to predict. mlm_labels: `torch.tensor(bs, seq_length)` - The masked language modeling labels. There is a -100 where there is nothing to predict.
""" """
token_ids, lengths = batch token_ids, lengths = batch
token_ids, lengths = self.round_batch(x=token_ids, lengths=lengths) token_ids, lengths = self.round_batch(x=token_ids, lengths=lengths)
...@@ -253,7 +253,7 @@ class Distiller: ...@@ -253,7 +253,7 @@ class Distiller:
def prepare_batch_clm(self, batch): def prepare_batch_clm(self, batch):
""" """
Prepare the batch: from the token_ids and the lenghts, compute the attention mask and the labels for CLM. Prepare the batch: from the token_ids and the lengths, compute the attention mask and the labels for CLM.
Input: Input:
------ ------
......
...@@ -86,7 +86,7 @@ if __name__ == "__main__": ...@@ -86,7 +86,7 @@ if __name__ == "__main__":
compressed_sd[f"vocab_layer_norm.{w}"] = state_dict[f"cls.predictions.transform.LayerNorm.{w}"] compressed_sd[f"vocab_layer_norm.{w}"] = state_dict[f"cls.predictions.transform.LayerNorm.{w}"]
print(f"N layers selected for distillation: {std_idx}") print(f"N layers selected for distillation: {std_idx}")
print(f"Number of params transfered for distillation: {len(compressed_sd.keys())}") print(f"Number of params transferred for distillation: {len(compressed_sd.keys())}")
print(f"Save transfered checkpoint to {args.dump_checkpoint}.") print(f"Save transferred checkpoint to {args.dump_checkpoint}.")
torch.save(compressed_sd, args.dump_checkpoint) torch.save(compressed_sd, args.dump_checkpoint)
...@@ -21,7 +21,7 @@ You can also have a look at this fun *Explain Like I'm Five* introductory [slide ...@@ -21,7 +21,7 @@ You can also have a look at this fun *Explain Like I'm Five* introductory [slide
One promise of extreme pruning is to obtain extremely small models that can be easily sent (and stored) on edge devices. By setting weights to 0., we reduce the amount of information we need to store, and thus decreasing the memory size. We are able to obtain extremely sparse fine-pruned models with movement pruning: ~95% of the dense performance with ~5% of total remaining weights in the BERT encoder. One promise of extreme pruning is to obtain extremely small models that can be easily sent (and stored) on edge devices. By setting weights to 0., we reduce the amount of information we need to store, and thus decreasing the memory size. We are able to obtain extremely sparse fine-pruned models with movement pruning: ~95% of the dense performance with ~5% of total remaining weights in the BERT encoder.
In [this notebook](https://github.com/huggingface/transformers/blob/master/examples/movement-pruning/Saving_PruneBERT.ipynb), we showcase how we can leverage standard tools that exist out-of-the-box to efficiently store an extremely sparse question answering model (only 6% of total remaining weights in the encoder). We are able to reduce the memory size of the encoder **from the 340MB (the orignal dense BERT) to 11MB**, without any additional training of the model (every operation is performed *post fine-pruning*). It is sufficiently small to store it on a [91' floppy disk](https://en.wikipedia.org/wiki/Floptical) 📎! In [this notebook](https://github.com/huggingface/transformers/blob/master/examples/movement-pruning/Saving_PruneBERT.ipynb), we showcase how we can leverage standard tools that exist out-of-the-box to efficiently store an extremely sparse question answering model (only 6% of total remaining weights in the encoder). We are able to reduce the memory size of the encoder **from the 340MB (the original dense BERT) to 11MB**, without any additional training of the model (every operation is performed *post fine-pruning*). It is sufficiently small to store it on a [91' floppy disk](https://en.wikipedia.org/wiki/Floptical) 📎!
While movement pruning does not directly optimize for memory footprint (but rather the number of non-null weights), we hypothetize that further memory compression ratios can be achieved with specific quantization aware trainings (see for instance [Q8BERT](https://arxiv.org/abs/1910.06188), [And the Bit Goes Down](https://arxiv.org/abs/1907.05686) or [Quant-Noise](https://arxiv.org/abs/2004.07320)). While movement pruning does not directly optimize for memory footprint (but rather the number of non-null weights), we hypothetize that further memory compression ratios can be achieved with specific quantization aware trainings (see for instance [Q8BERT](https://arxiv.org/abs/1910.06188), [And the Bit Goes Down](https://arxiv.org/abs/1907.05686) or [Quant-Noise](https://arxiv.org/abs/2004.07320)).
......
...@@ -14,7 +14,7 @@ ...@@ -14,7 +14,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
""" """
Binarizers take a (real value) matrice as input and produce a binary (values in {0,1}) mask of the same shape. Binarizers take a (real value) matrix as input and produce a binary (values in {0,1}) mask of the same shape.
""" """
import torch import torch
......
...@@ -4,7 +4,7 @@ language: sv ...@@ -4,7 +4,7 @@ language: sv
# Swedish BERT Models # Swedish BERT Models
The National Library of Sweden / KBLab releases three pretrained language models based on BERT and ALBERT. The models are trained on aproximately 15-20GB of text (200M sentences, 3000M tokens) from various sources (books, news, government publications, swedish wikipedia and internet forums) aiming to provide a representative BERT model for Swedish text. A more complete description will be published later on. The National Library of Sweden / KBLab releases three pretrained language models based on BERT and ALBERT. The models are trained on approximately 15-20GB of text (200M sentences, 3000M tokens) from various sources (books, news, government publications, swedish wikipedia and internet forums) aiming to provide a representative BERT model for Swedish text. A more complete description will be published later on.
The following three models are currently available: The following three models are currently available:
...@@ -86,7 +86,7 @@ for token in nlp(text): ...@@ -86,7 +86,7 @@ for token in nlp(text):
print(l) print(l)
``` ```
Which should result in the following (though less cleanly formated): Which should result in the following (though less cleanly formatted):
```python ```python
[ { 'word': 'Engelbert', 'score': 0.99..., 'entity': 'PRS'}, [ { 'word': 'Engelbert', 'score': 0.99..., 'entity': 'PRS'},
...@@ -104,7 +104,7 @@ Which should result in the following (though less cleanly formated): ...@@ -104,7 +104,7 @@ Which should result in the following (though less cleanly formated):
### ALBERT base ### ALBERT base
The easisest way to do this is, again, using Huggingface Transformers: The easiest way to do this is, again, using Huggingface Transformers:
```python ```python
from transformers import AutoModel,AutoTokenizer from transformers import AutoModel,AutoTokenizer
......
...@@ -4,7 +4,7 @@ language: sv ...@@ -4,7 +4,7 @@ language: sv
# Swedish BERT Models # Swedish BERT Models
The National Library of Sweden / KBLab releases three pretrained language models based on BERT and ALBERT. The models are trained on aproximately 15-20GB of text (200M sentences, 3000M tokens) from various sources (books, news, government publications, swedish wikipedia and internet forums) aiming to provide a representative BERT model for Swedish text. A more complete description will be published later on. The National Library of Sweden / KBLab releases three pretrained language models based on BERT and ALBERT. The models are trained on approximately 15-20GB of text (200M sentences, 3000M tokens) from various sources (books, news, government publications, swedish wikipedia and internet forums) aiming to provide a representative BERT model for Swedish text. A more complete description will be published later on.
The following three models are currently available: The following three models are currently available:
...@@ -86,7 +86,7 @@ for token in nlp(text): ...@@ -86,7 +86,7 @@ for token in nlp(text):
print(l) print(l)
``` ```
Which should result in the following (though less cleanly formated): Which should result in the following (though less cleanly formatted):
```python ```python
[ { 'word': 'Engelbert', 'score': 0.99..., 'entity': 'PRS'}, [ { 'word': 'Engelbert', 'score': 0.99..., 'entity': 'PRS'},
...@@ -104,7 +104,7 @@ Which should result in the following (though less cleanly formated): ...@@ -104,7 +104,7 @@ Which should result in the following (though less cleanly formated):
### ALBERT base ### ALBERT base
The easisest way to do this is, again, using Huggingface Transformers: The easiest way to do this is, again, using Huggingface Transformers:
```python ```python
from transformers import AutoModel,AutoTokenizer from transformers import AutoModel,AutoTokenizer
......
...@@ -4,7 +4,7 @@ tags: ...@@ -4,7 +4,7 @@ tags:
--- ---
## CS224n SQuAD2.0 Project Dataset ## CS224n SQuAD2.0 Project Dataset
The goal of this model is to save CS224n students GPU time when establising The goal of this model is to save CS224n students GPU time when establishing
baselines to beat for the [Default Final Project](http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf). baselines to beat for the [Default Final Project](http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf).
The training set used to fine-tune this model is the same as The training set used to fine-tune this model is the same as
the [official one](https://rajpurkar.github.io/SQuAD-explorer/); however, the [official one](https://rajpurkar.github.io/SQuAD-explorer/); however,
......
...@@ -4,7 +4,7 @@ tags: ...@@ -4,7 +4,7 @@ tags:
--- ---
## CS224n SQuAD2.0 Project Dataset ## CS224n SQuAD2.0 Project Dataset
The goal of this model is to save CS224n students GPU time when establising The goal of this model is to save CS224n students GPU time when establishing
baselines to beat for the [Default Final Project](http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf). baselines to beat for the [Default Final Project](http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf).
The training set used to fine-tune this model is the same as The training set used to fine-tune this model is the same as
the [official one](https://rajpurkar.github.io/SQuAD-explorer/); however, the [official one](https://rajpurkar.github.io/SQuAD-explorer/); however,
......
## CS224n SQuAD2.0 Project Dataset ## CS224n SQuAD2.0 Project Dataset
The goal of this model is to save CS224n students GPU time when establising The goal of this model is to save CS224n students GPU time when establishing
baselines to beat for the [Default Final Project](http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf). baselines to beat for the [Default Final Project](http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf).
The training set used to fine-tune this model is the same as The training set used to fine-tune this model is the same as
the [official one](https://rajpurkar.github.io/SQuAD-explorer/); however, the [official one](https://rajpurkar.github.io/SQuAD-explorer/); however,
......
## CS224n SQuAD2.0 Project Dataset ## CS224n SQuAD2.0 Project Dataset
The goal of this model is to save CS224n students GPU time when establising The goal of this model is to save CS224n students GPU time when establishing
baselines to beat for the [Default Final Project](http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf). baselines to beat for the [Default Final Project](http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf).
The training set used to fine-tune this model is the same as The training set used to fine-tune this model is the same as
the [official one](https://rajpurkar.github.io/SQuAD-explorer/); however, the [official one](https://rajpurkar.github.io/SQuAD-explorer/); however,
......
...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before ...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before
## Details of the downstream task (Intent Prediction) - Dataset 📚 ## Details of the downstream task (Intent Prediction) - Dataset 📚
Dataset ID: ```event2Mind``` from [HugginFace/NLP](https://github.com/huggingface/nlp) Dataset ID: ```event2Mind``` from [Huggingface/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples | | Dataset | Split | # samples |
| -------- | ----- | --------- | | -------- | ----- | --------- |
......
...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before ...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before
## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓ ## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓
Dataset ID: ```squad``` from [HugginFace/NLP](https://github.com/huggingface/nlp) Dataset ID: ```squad``` from [Huggingface/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples | | Dataset | Split | # samples |
| -------- | ----- | --------- | | -------- | ----- | --------- |
......
...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before ...@@ -19,7 +19,7 @@ Transfer learning, where a model is first pre-trained on a data-rich task before
## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓ ## Details of the downstream task (Q&A) - Dataset 📚 🧐 ❓
Dataset ID: ```squad_v2``` from [HugginFace/NLP](https://github.com/huggingface/nlp) Dataset ID: ```squad_v2``` from [Huggingface/NLP](https://github.com/huggingface/nlp)
| Dataset | Split | # samples | | Dataset | Split | # samples |
| -------- | ----- | --------- | | -------- | ----- | --------- |
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment