Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
Megatron-LM
Commits
edf475ff
Commit
edf475ff
authored
Feb 23, 2021
by
Mostofa Patwary
Browse files
added script for creating embeddings
parent
60c95ab6
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
32 additions
and
0 deletions
+32
-0
examples/create_embeddings.sh
examples/create_embeddings.sh
+32
-0
No files found.
examples/create_embeddings.sh
0 → 100644
View file @
edf475ff
#!/bin/bash
# Compute embeddings for each entry of a given dataset (e.g. Wikipedia)
RANK
=
0
WORLD_SIZE
=
1
# Wikipedia data can be downloaded from the following link:
# https://github.com/facebookresearch/DPR/blob/master/data/download_data.py
EVIDENCE_DATA_DIR
=
<Specify path of Wikipedia dataset>
EMBEDDING_PATH
=
<Specify path to store embeddings>
CHECKPOINT_PATH
=
<Specify path of pretrained ICT model>
python tools/create_doc_index.py
\
--num-layers
12
\
--hidden-size
768
\
--num-attention-heads
12
\
--tensor-model-parallel-size
1
\
--micro-batch-size
128
\
--checkpoint-activations
\
--seq-length
512
\
--retriever-seq-length
256
\
--max-position-embeddings
512
\
--load
${
CHECKPOINT_PATH
}
\
--evidence-data-path
${
EVIDENCE_DATA_DIR
}
\
--embedding-path
${
EMBEDDING_PATH
}
\
--indexer-log-interval
1000
\
--indexer-batch-size
128
\
--vocab-file
bert-vocab.txt
\
--num-workers
2
\
--fp16
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment