Unverified Commit ccbf74a6 authored by Aditya Soni's avatar Aditya Soni Committed by GitHub
Browse files

typos in seq2seq/readme (#5937)

parent d3227943
...@@ -14,7 +14,7 @@ wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/xsum.tar.gz ...@@ -14,7 +14,7 @@ wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/xsum.tar.gz
tar -xzvf xsum.tar.gz tar -xzvf xsum.tar.gz
export XSUM_DIR=${PWD}/xsum export XSUM_DIR=${PWD}/xsum
``` ```
this should make a directory called cnn_dm/ with files like `test.source`. this should make a directory called `xsum/` with files like `test.source`.
To use your own data, copy that files format. Each article to be summarized is on its own line. To use your own data, copy that files format. Each article to be summarized is on its own line.
CNN/DailyMail data CNN/DailyMail data
...@@ -22,8 +22,8 @@ CNN/DailyMail data ...@@ -22,8 +22,8 @@ CNN/DailyMail data
cd examples/seq2seq cd examples/seq2seq
wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/cnn_dm.tgz wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/cnn_dm.tgz
tar -xzvf cnn_dm.tgz tar -xzvf cnn_dm.tgz
export CNN_DIR=${PWD}/cnn_dm export CNN_DIR=${PWD}/cnn_dm
this should make a directory called `cnn_dm/` with files like `test.source`.
``` ```
WMT16 English-Romanian Translation Data: WMT16 English-Romanian Translation Data:
...@@ -32,6 +32,7 @@ cd examples/seq2seq ...@@ -32,6 +32,7 @@ cd examples/seq2seq
wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz
tar -xzvf wmt_en_ro.tar.gz tar -xzvf wmt_en_ro.tar.gz
export ENRO_DIR=${PWD}/wmt_en_ro export ENRO_DIR=${PWD}/wmt_en_ro
this should make a directory called `wmt_en_ro/` with files like `test.source`.
``` ```
If you are using your own data, it must be formatted as one directory with 6 files: train.source, train.target, val.source, val.target, test.source, test.target. If you are using your own data, it must be formatted as one directory with 6 files: train.source, train.target, val.source, val.target, test.source, test.target.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment