Copyright 2020 The HuggingFace Team. All rights reserved.
Copyright 2021 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
...
...
@@ -14,24 +14,26 @@ See the License for the specific language governing permissions and
limitations under the License.
-->
# SQuAD
# Question answering
Based on the script [`run_qa.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/question-answering/run_qa.py).
This folder contains several scripts that showcase how to fine-tune a 🤗 Transformers model on a question answering dataset,
like SQuAD.
**Note:** This script only works with models that have a fast tokenizer (backed by the 🤗 Tokenizers library) as it
uses special features of those tokenizers. You can check if your favorite model has a fast tokenizer in
[this table](https://huggingface.co/transformers/index.html#supported-frameworks), if it doesn't you can still use the old version
of the script.
## Trainer-based scripts
The old version of this script can be found [here](https://github.com/huggingface/transformers/tree/master/examples/legacy/question-answering).
The [`run_qa.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/question-answering/run_qa.py),
[`run_qa_beam_search.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/question-answering/run_qa_beam_search.py) and [`run_seq2seq_qa.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/question-answering/run_seq2seq_qa.py) leverage the 🤗 [Trainer](https://huggingface.co/transformers/main_classes/trainer.html) for fine-tuning.
`run_qa.py` allows you to fine-tune any model from our [hub](https://huggingface.co/models)(as long as its architecture as a `ForQuestionAnswering` version in the library) on the SQUAD dataset or another question-answering dataset of the `datasets` library or your own csv/jsonlines files as long as they are structured the same way as SQUAD. You might need to tweak the data processing inside the script if your data is structured differently.
### Fine-tuning BERT on SQuAD1.0
Note that if your dataset contains samples with no possible answers (like SQUAD version 2), you need to pass along the flag `--version_2_with_negative`.
The [`run_qa.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/question-answering/run_qa.py) script
allows to fine-tune any model from our [hub](https://huggingface.co/models)(as long as its architecture has a `ForQuestionAnswering` version in the library) on a question-answering dataset (such as SQuAD, or any other QA dataset available in the `datasets` library, or your own csv/jsonlines files) as long as they are structured the same way as SQuAD. You might need to tweak the data processing inside the script if your data is structured differently.
## Trainer-based scripts
**Note:** This script only works with models that have a fast tokenizer (backed by the 🤗 Tokenizers library) as it
uses special features of those tokenizers. You can check if your favorite model has a fast tokenizer in
[this table](https://huggingface.co/transformers/index.html#supported-frameworks), if it doesn't you can still use the old version of the script which can be found [here](https://github.com/huggingface/transformers/tree/master/examples/legacy/question-answering).
### Fine-tuning BERT on SQuAD1.0
Note that if your dataset contains samples with no possible answers (like SQuAD version 2), you need to pass along the flag `--version_2_with_negative`.
This example code fine-tunes BERT on the SQuAD1.0 dataset. It runs in 24 min (with BERT-base) or 68 min (with BERT-large)
on a single tesla V100 16GB.
...
...
@@ -57,62 +59,11 @@ f1 = 88.52
exact_match = 81.22
```
### Fine-tuning T5 on SQuAD2.0
This example code fine-tunes T5 on the SQuAD2.0 dataset.
### Fine-tuning XLNet with beam search on SQuAD
```bash
python run_seq2seq_qa.py \
--model_name_or_path t5-small \
--dataset_name squad_v2 \
--context_column context \
--question_column question \
--answer_column answer \
--do_train\
--do_eval\
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/debug_seq2seq_squad/
```
The [`run_qa_beam_search.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/question-answering/run_qa_beam_search.py) script is only meant to fine-tune XLNet, which is a special encoder-only Transformer model. The example code below fine-tunes XLNet on the SQuAD1.0 and SQuAD2.0 datasets.
#### Distributed training
Here is an example using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD1.1:
The [`run_seq2seq_qa.py`](https://github.com/huggingface/transformers/blob/master/examples/pytorch/question-answering/run_seq2seq_qa.py) script is meant for encoder-decoder (also called seq2seq) Transformer models, such as T5 or BART. These
models are generative, rather than discriminative. This means that they learn to generate the correct answer, rather than predicting the start and end position of the tokens of the answer.
This example code fine-tunes T5 on the SQuAD2.0 dataset.
```bash
python run_seq2seq_qa.py \
--model_name_or_path t5-small \
--dataset_name squad_v2 \
--context_column context \
--question_column question \
--answer_column answer \
--do_train\
--do_eval\
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/debug_seq2seq_squad/
```
## Accelerate-based scripts
Based on the script `run_qa_no_trainer.py` and `run_qa_beam_search_no_trainer.py`.
Based on the scripts`run_qa_no_trainer.py` and `run_qa_beam_search_no_trainer.py`.
Like `run_qa.py` and `run_qa_beam_search.py`, these scripts allow you to fine-tune any of the models supported on a
SQUAD or a similar dataset, the main difference is that this
script exposes the bare training loop, to allow you to quickly experiment and add any customization you would like.
SQuAD or a similar dataset, the main difference is that this script exposes the bare training loop, to allow you to quickly experiment and add any customization you would like. It offers less options than the script with `Trainer` (for instance you can easily change the options for the optimizer or the dataloaders directly in the script), but still run in a distributed setup, on TPU and supports mixed precision by leveraging the [🤗 `Accelerate`](https://github.com/huggingface/accelerate) library.
It offers less options than the script with `Trainer` (for instance you can easily change the options for the optimizer
or the dataloaders directly in the script) but still run in a distributed setup, on TPU and supports mixed precision by
the mean of the [🤗 `Accelerate`](https://github.com/huggingface/accelerate) library. You can use the script normally
after installing it:
You can use the script normally after installing it:
```bash
pip install accelerate
...
...
@@ -209,103 +180,4 @@ This command is the same and will work for:
- a distributed training with several GPUs (single or multi node)
- a training on TPUs
Note that this library is in alpha release so your feedback is more than welcome if you encounter any problem using it.
## Results
Larger batch size may improve the performance while costing more memory.
##### Results for SQuAD1.0 with the previously defined hyper-parameters:
```python
{
"exact":85.45884578997162,
"f1":92.5974600601065,
"total":10570,
"HasAns_exact":85.45884578997162,
"HasAns_f1":92.59746006010651,
"HasAns_total":10570
}
```
##### Results for SQuAD2.0 with the previously defined hyper-parameters:
```python
{
"exact":80.4177545691906,
"f1":84.07154997729623,
"total":11873,
"HasAns_exact":76.73751686909581,
"HasAns_f1":84.05558584352873,
"HasAns_total":5928,
"NoAns_exact":84.0874684608915,
"NoAns_f1":84.0874684608915,
"NoAns_total":5945
}
```
#### Fine-tuning BERT on SQuAD1.0 with relative position embeddings
The following examples show how to fine-tune BERT models with different relative position embeddings. The BERT model
`bert-base-uncased` was pretrained with default absolute position embeddings. We provide the following pretrained
models which were pre-trained on the same training data (BooksCorpus and English Wikipedia) as in the BERT model
training, but with different relative position embeddings.
*`zhiheng-huang/bert-base-uncased-embedding-relative-key`, trained from scratch with relative embedding proposed by
Shaw et al., [Self-Attention with Relative Position Representations](https://arxiv.org/abs/1803.02155)
*`zhiheng-huang/bert-base-uncased-embedding-relative-key-query`, trained from scratch with relative embedding method 4
in Huang et al. [Improve Transformer Models with Better Relative Position Embeddings](https://arxiv.org/abs/2009.13658)
*`zhiheng-huang/bert-large-uncased-whole-word-masking-embedding-relative-key-query`, fine-tuned from model
`bert-large-uncased-whole-word-masking` with 3 additional epochs with relative embedding method 4 in Huang et al.
[Improve Transformer Models with Better Relative Position Embeddings](https://arxiv.org/abs/2009.13658)