Add more documentation

c2a32e12 · Neel Kant · 11f76cd3 · c2a32e12 · c2a32e12
Commit c2a32e12 authored Jul 22, 2020 by Neel Kant
Show whitespace changes
Inline Side-by-side

Showing with 5 additions and 1 deletion

README.md README.md +3 -1

tools/create_doc_index.py tools/create_doc_index.py +2 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -324,7 +324,7 @@ After having trained an ICT model, you can now embed an entire dataset of blocks
 and wrap it with a `FaissMIPSIndex` to do fast similarity search which is key in the learned information retrieval pipeline. The initial index can be built with the following script, meant to be run in an interactive session. It can leverage multiple GPUs on multiple nodes to index large datasets much more quickly. 

 <pre>
-python indexer.py \
+python tools/create_doc_index.py \
    --num-layers 12 \
    --hidden-size 768 \
    --ict-head-size 128 \
@@ -337,6 +337,8 @@ python indexer.py \
    --data-path /path/to/indexed_dataset \
    --titles-data-path /path/to/titles_indexed_dataset \
    --block-data-path embedded_blocks.pkl \
+    --indexer-log-interval 1000 \
+    --indexer-batch-size 128 \
    --vocab-file /path/to/vocab.txt \
    --num-workers 2 \
    --fp16

--- a/tools/create_doc_index.py
+++ b/tools/create_doc_index.py
@@ -13,6 +13,8 @@ def main():
        --block-data-path: path to write to
        --ict-load or --realm-load: path to checkpoint with which to embed
        --data-path and --titles-data-path: paths for dataset
+        --indexer-log-interval: reporting interval
+        --indexer-batch-size: size specific for indexer jobs

    Check README.md for example script
    """