Update README

7b3baaaa · Neel Kant · 47996737 · 7b3baaaa
Commit 7b3baaaa authored Jun 24, 2020 by Neel Kant
Hide whitespace changes
Inline Side-by-side

Showing with 3 additions and 4 deletions

README.md README.md +3 -4

No files found.
--- a/README.md
+++ b/README.md
@@ -265,10 +265,9 @@ python -m torch.distributed.launch $DISTRIBUTED_ARGS ./pretrain_gpt2.py \
 </pre>
 <a id="realm"></a>
-# REALM Pipeline
+## REALM Pipeline
-This branch is up-to-date with the current progress on building REALM, the open domain information retrieval QA system. (We should ensure that this is on a stable branch, ready to use.)
+The following sections (will) reflect the three stages of training a REALM system. For now it's just the ICT code.
+Loosely, they are pretraining the retriever modules, then jointly training the language model and the retriever, and then finetuning a question answering head on the language model with fixed retriever.
-The following sections reflect the three stages of training a REALM system. Loosely, they are pretraining the retriever modules, then jointly training the language model and the retriever, and then finetuning a question answering head on the language model with fixed retriever.
 ### Inverse Cloze Task (ICT) Pretraining
 1. Have a corpus in loose JSON format with the intention of creating a collection of fixed-size blocks of text as the fundamental units of data. For a corpus like Wikipedia, this will mean multiple sentences per block but also multiple blocks per document.