Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
0e24e4c1
"vscode:/vscode.git/clone" did not exist on "763688c0aa9c385a620e7095a4abdf9279e3a142"
Unverified
Commit
0e24e4c1
authored
Oct 20, 2020
by
Stas Bekman
Committed by
GitHub
Oct 20, 2020
Browse files
[s2s] create doc for pegasus/fsmt replication (#7934)
parent
96f4828a
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
18 additions
and
4 deletions
+18
-4
examples/seq2seq/README.md
examples/seq2seq/README.md
+18
-4
No files found.
examples/seq2seq/README.md
View file @
0e24e4c1
...
...
@@ -15,7 +15,8 @@ For `bertabs` instructions, see [`bertabs/README.md`](bertabs/README.md).
## Datasets
#### XSUM:
#### XSUM
```
bash
cd
examples/seq2seq
wget https://cdn-datasets.huggingface.co/summarization/xsum.tar.gz
...
...
@@ -26,6 +27,7 @@ this should make a directory called `xsum/` with files like `test.source`.
To use your own data, copy that files format. Each article to be summarized is on its own line.
#### CNN/DailyMail
```
bash
cd
examples/seq2seq
wget https://cdn-datasets.huggingface.co/summarization/cnn_dm_v2.tgz
...
...
@@ -35,7 +37,8 @@ export CNN_DIR=${PWD}/cnn_dm
```
this should make a directory called
`cnn_dm/`
with 6 files.
#### WMT16 English-Romanian Translation Data:
#### WMT16 English-Romanian Translation Data
download with this command:
```
bash
wget https://cdn-datasets.huggingface.co/translation/wmt_en_ro.tar.gz
...
...
@@ -44,13 +47,25 @@ export ENRO_DIR=${PWD}/wmt_en_ro
```
this should make a directory called
`wmt_en_ro/`
with 6 files.
#### WMT English-German:
#### WMT English-German
```
bash
wget https://cdn-datasets.huggingface.co/translation/wmt_en_de.tgz
tar
-xzvf
wmt_en_de.tgz
export
DATA_DIR
=
${
PWD
}
/wmt_en_de
```
#### FSMT datasets (wmt)
Refer to the scripts starting with
`eval_`
under:
https://github.com/huggingface/transformers/tree/master/scripts/fsmt
#### Pegasus (multiple datasets)
Multiple eval datasets are available for download from:
https://github.com/stas00/porting/tree/master/datasets/pegasus
#### Private Data
If you are using your own data, it must be formatted as one directory with 6 files:
...
...
@@ -64,7 +79,6 @@ test.target
```
The
`.source`
files are the input, the
`.target`
files are the desired output.
### Tips and Tricks
General Tips:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment