Many more features added

b35e15f2 · Mostofa Patwary · c44f7622 · b35e15f2
Commit b35e15f2 authored Mar 04, 2021 by Mostofa Patwary
Hide whitespace changes
Inline Side-by-side

Showing with 2 additions and 0 deletions

tools/openwebtext/README.md tools/openwebtext/README.md +2 -0

No files found.
--- a/tools/openwebtext/README.md
+++ b/tools/openwebtext/README.md
@@ -54,3 +54,5 @@ python filter_ngrams.py --tasks <name of he task, e.g. lambada, squad> --dedup-d
 We use 13-grams for the deduplication. When we find a 13-gram match in a training document, we split the document into two pieces and remove the 13-gram along with 200 characters from the both side of the 13-gram. We also remove any splitted document with less than 200 characters or if a document got splitted more than 10 times.
 Only for the lambada task, we need to provide the path, `--lambada-path <path of the lambada test data>`.
+Several other features (e.g. save and load dictionary) have been added, look at the arguments for details.