Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
ModelZoo
ResNet50_tensorflow
Commits
2e6cf5d2
Commit
2e6cf5d2
authored
Jun 09, 2020
by
A. Unique TensorFlower
Browse files
Internal change
PiperOrigin-RevId: 315511975
parent
5e5e0706
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
4 deletions
+5
-4
official/nlp/nhnet/README.md
official/nlp/nhnet/README.md
+5
-4
No files found.
official/nlp/nhnet/README.md
View file @
2e6cf5d2
...
...
@@ -39,13 +39,14 @@ First, install the `news-please` CLI (requires python 3.x)
$
pip3
install
news-please
```
Next, run the crawler with our provided config and URL list
Next, run the crawler with our provided
[
config and URL list
](
https://github.com/google-research-datasets/NewSHead/releases
)
```
shell
# Sets to path of the downloaded data folder
# Sets to path of the downloaded data folder
.
$ DATA_FOLDER
=
/path/to/downloaded_dataset
# Uses CLI interface to crawl
# Uses CLI interface to crawl. We assume news_please subfolder contains the
# decompressed config.cfg and sitelist.hjson.
$
news-please
-c
$DATA_FOLDER
/news_please
```
By default, it will store crawled
...
...
@@ -80,7 +81,7 @@ Next, we can run the following data preprocess script which may take a few hours
```
shell
# Recall that we use DATA_FOLDER=/path/to/downloaded_dataset
# Recall that we use DATA_FOLDER=/path/to/downloaded_dataset
.
$
python3 raw_data_preprocess.py
\
-crawled_articles
=
/tmp/nhnet
\
-vocab
=
/path/to/bert_checkpoint/vocab.txt
\
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment