README.md 6.41 KB
Newer Older
Ola Piktus's avatar
Ola Piktus committed
1
# Intro
2
3
4
Aimed at tackling the knowledge-intensive NLP tasks (think tasks a human wouldn't be expected to solve without access to external knowledge sources), RAG models are seq2seq models with access to a retrieval mechanism providing relevant context documents at training and evaluation time.

A RAG model encapsulates two core components: a question encoder and a generator.
Ola Piktus's avatar
Ola Piktus committed
5
6
During a forward pass, we encode the input with the question encoder and pass it
to the retriever to extract relevant context documents. The documents are then prepended to the input.
7
Such contextualized inputs are passed to the generator.
Ola Piktus's avatar
Ola Piktus committed
8

9
10
Read more about RAG  at https://arxiv.org/abs/2005.11401.
# Finetuning
Ola Piktus's avatar
Ola Piktus committed
11
12


13
14
15
16
17
18
19
20
21
Our finetuning logic is based on scripts from [`examples/seq2seq`](https://github.com/huggingface/transformers/tree/master/examples/seq2seq). We accept training data in the same format as specified there - we expect a directory consisting of 6 text files:
```bash
train.source
train.target
val.source
val.target
test.source
test.target
```
Ola Piktus's avatar
Ola Piktus committed
22

23
A sample finetuning command (run ` ./examples/rag/finetune.py --help` to list all available options):
Ola Piktus's avatar
Ola Piktus committed
24

25
```bash
Ola Piktus's avatar
Ola Piktus committed
26
27
28
29
30
31
32
33
python examples/rag/finetune.py \
    --data_dir $DATA_DIR \
    --output_dir $OUTPUT_DIR \
    --model_name_or_path $MODEL_NAME_OR_PATH \
    --model_type rag_sequence \
    --fp16 \
    --gpus 8
```
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
We publish two `base` models which can serve as a starting point for finetuning on downstream tasks (use them as `model_name_or_path`):
- [`facebook/rag-sequence-base`](https://huggingface.co/facebook/rag-sequence-base) - a base for finetuning `RagSequenceForGeneration` models,
- [`facebook/rag-token-base`](https://huggingface.co/facebook/rag-token-base) - a base for finetuning `RagTokenForGeneration` models.

The `base` models initialize the question encoder with [`facebook/dpr-question_encoder-single-nq-base`](https://huggingface.co/facebook/dpr-question_encoder-single-nq-base) and the generator with [`facebook/bart-large`](https://huggingface.co/facebook/bart-large).

If you would like to initialize finetuning with a base model using different question encoder and generator architectures, you can build it with a consolidation script, e.g.:
```
python examples/rag/consolidate_rag_checkpoint.py \
    --model_type rag_sequence \
    --generator_name_or_path facebook/bart-large-cnn \
    --question_encoder_name_or_path facebook/dpr-question_encoder-single-nq-base \
    --dest path/to/checkpoint
```
You will then be able to pass `path/to/checkpoint` as `model_name_or_path` to the `finetune.py` script.
Ola Piktus's avatar
Ola Piktus committed
49
50
51


# Evaluation
52
Our evaluation script enables two modes of evaluation (controlled by the `eval_mode` argument): `e2e` - end2end evaluation, returns EM (exact match) and F1 scores calculated for the downstream task and `retrieval` - which returns precision@k of the documents retrieved for provided inputs.
Ola Piktus's avatar
Ola Piktus committed
53

54
55
56
The evaluation script expects paths to two files:
- `evaluation_set` - a path to a file specifying the evaluation dataset, a single input per line.
- `gold_data_path` - a path to a file contaning ground truth answers for datapoints from the `evaluation_set`, a single output per line. Check below for expected formats of the gold data files.
Ola Piktus's avatar
Ola Piktus committed
57
58


59
60
## Retrieval evaluation
For `retrieval` evaluation, we expect a gold data file where each line will consist of a tab-separated list of document titles constituting positive contexts for respective datapoints from the `evaluation_set`. E.g. given a question `who sings does he love me with reba` in the `evaluation_set`, a respective ground truth line could look as follows:
Ola Piktus's avatar
Ola Piktus committed
61
62
63
64
65
66
67
68
```
Does He Love You	Does He Love You	Red Sandy Spika dress of Reba McEntire	Greatest Hits Volume Two (Reba McEntire album)	Shoot for the Moon (album)
```

We demonstrate how to evaluate retrieval against DPR evaluation data. You can download respective files from links listed [here](https://github.com/facebookresearch/DPR/blob/master/data/download_data.py#L39-L45).

1. Download and unzip the gold data file. We use the `biencoder-nq-dev` from https://dl.fbaipublicfiles.com/dpr/data/retriever/biencoder-nq-dev.json.gz.
2. Parse the unziped file using the `parse_dpr_relevance_data.py`
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
    ```bash
    python examples/rag/parse_dpr_relevance_data.py \
        --src_path path/to/unziped/biencoder-nq-dev.json \
        --evaluation_set path/to/output/biencoder-nq-dev.questions \
        --gold_data_path path/to/output/biencoder-nq-dev.pages
    ```
3. Run evaluation:
    ```bash
    python examples/rag/eval_rag.py \
        --model_name_or_path facebook/rag-sequence-nq \ # model name or path of the model we're evaluating
        --model_type rag_sequence \ # RAG model type (rag_token or rag_sequence)
        --evaluation_set path/to/output/biencoder-nq-dev.questions \ # an input dataset for evaluation
        --gold_data_path path/to/output/biencoder-nq-dev.pages \ # a dataset containing ground truth answers for samples from the evaluation_set
        --predictions_path path/to/retrieval_preds.tsv  \ # name of file where predictions will be stored
        --eval_mode retrieval \ # indicates whether we're performing retrieval evaluation or e2e evaluation
        --k 1 # parameter k for the precision@k metric
    ```


## End-to-end evaluation

We support two formats of the gold data file (controlled by the `gold_data_mode` parameter):
- `qa` - where a single line has the following format: `input [tab] output_list`, e.g.:
Ola Piktus's avatar
Ola Piktus committed
92
```
93
who is the owner of reading football club	['Xiu Li Dai', 'Dai Yongge', 'Dai Xiuli', 'Yongge Dai']
Ola Piktus's avatar
Ola Piktus committed
94
```
95
- `ans` - where a single line contains a single expected answer, e.g.:
Ola Piktus's avatar
Ola Piktus committed
96
```
97
Xiu Li Dai
Ola Piktus's avatar
Ola Piktus committed
98
99
```

100
Predictions of the model for the samples from the `evaluation_set` will be saved under the path specified by the `predictions_path` parameter. If this path already exists, the script will use saved predictions to calculate metrics. Add `--recalculate` parameter to force the script to perform inference from scratch.
Ola Piktus's avatar
Ola Piktus committed
101

102
103
An example e2e evaluation run could look as follows:
```bash
Ola Piktus's avatar
Ola Piktus committed
104
python examples/rag/eval_rag.py \
105
    --model_name_or_path facebook/rag-sequence-nq \
Ola Piktus's avatar
Ola Piktus committed
106
107
108
109
    --model_type rag_sequence \
    --evaluation_set path/to/test.source \
    --gold_data_path path/to/gold_data \
    --predictions_path path/to/e2e_preds.txt \
110
111
    --eval_mode e2e \
    --gold_data_mode qa \
Ola Piktus's avatar
Ola Piktus committed
112
    --n_docs 5 \ # You can experiment with retrieving different number of documents at evaluation time
113
114
    --print_predictions \
    --recalculate \ # adding this parameter will force recalculating predictions even if predictions_path already exists
Ola Piktus's avatar
Ola Piktus committed
115
```