updated readme

baf2e2a4 · Mostofa Patwary · 9d350c9c · baf2e2a4
Commit baf2e2a4 authored Jun 10, 2021 by Mostofa Patwary
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 2 deletions

tasks/orqa/README.md tasks/orqa/README.md +2 -2

No files found.
--- a/tasks/orqa/README.md
+++ b/tasks/orqa/README.md
@@ -4,7 +4,7 @@ Below we present the steps to run unsupervised and supervised trainining and eva
 ### Retriever Training
-#### Unsupervised pretraining by ICT
+##### Unsupervised pretraining by ICT
 1. Use `tools/preprocess_data.py` to preprocess the dataset for Inverse Cloze Task (ICT), which we call unsupervised pretraining. This script takes as input a corpus in loose JSON format and creates fixed-size blocks of text as the fundamental units of data. For a corpus like Wikipedia, this will mean multiple sentences per block and multiple blocks per document. Run [`tools/preprocess_data.py`](../../tools/preprocess_data.py) to construct one or more indexed datasets with the `--split-sentences` argument to make sentences the basic unit. We construct two datasets, one with the title of every document and another with the body.
 <pre>
@@ -22,7 +22,7 @@ python tools/preprocess_data.py \
 3. Evaluate the pretrained ICT model using [`examples/evaluate_retriever_nq.sh`](../../examples/evaluate_retriever_nq.sh) for [Google's Natural Questions Open dataset](https://arxiv.org/pdf/1906.00300.pdf).
-#### Supervised finetuning
+##### Supervised finetuning
 1. Use the above pretrained ICT model to finetune using [Google's Natural Questions Open dataset](https://github.com/google-research/language/tree/master/language/orqa). The script [`examples/finetune_retriever_distributed.sh`](../../examples/finetune_retriever_distributed.sh) provides an example for how to perform the training. Our finetuning process includes retriever score scaling and longer training (80 epochs) on top [DPR training](https://arxiv.org/abs/2004.04906).