Internal change

PiperOrigin-RevId: 315511975

Internal change
PiperOrigin-RevId: 315511975
2e6cf5d2 · A. Unique TensorFlower · 5e5e0706 · 2e6cf5d2
Commit 2e6cf5d2 authored Jun 09, 2020 by A. Unique TensorFlower
Show whitespace changes
Inline Side-by-side

Showing with 5 additions and 4 deletions

official/nlp/nhnet/README.md official/nlp/nhnet/README.md +5 -4

No files found.
--- a/official/nlp/nhnet/README.md
+++ b/official/nlp/nhnet/README.md
@@ -39,13 +39,14 @@ First, install the `news-please` CLI (requires python 3.x)
 $ pip3 install news-please
 ```

-Next, run the crawler with our provided config and URL list
+Next, run the crawler with our provided [config and URL list](https://github.com/google-research-datasets/NewSHead/releases)

 ```shell
-# Sets to path of the downloaded data folder
+# Sets to path of the downloaded data folder.
 $ DATA_FOLDER=/path/to/downloaded_dataset

-# Uses CLI interface to crawl
+# Uses CLI interface to crawl. We assume news_please subfolder contains the
+# decompressed config.cfg and sitelist.hjson.
 $ news-please -c $DATA_FOLDER/news_please
 ```
 By default, it will store crawled
@@ -80,7 +81,7 @@ Next, we can run the following data preprocess script which may take a few hours


 ```shell
-# Recall that we use DATA_FOLDER=/path/to/downloaded_dataset
+# Recall that we use DATA_FOLDER=/path/to/downloaded_dataset.
 $ python3 raw_data_preprocess.py \
    -crawled_articles=/tmp/nhnet \
    -vocab=/path/to/bert_checkpoint/vocab.txt \