Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
0e24e4c1
"git@developer.sourcefind.cn:change/sglang.git" did not exist on "c7962868c1a7b21f20f00507af43710c268ebfd2"
Unverified
Commit
0e24e4c1
authored
Oct 20, 2020
by
Stas Bekman
Committed by
GitHub
Oct 20, 2020
Browse files
[s2s] create doc for pegasus/fsmt replication (#7934)
parent
96f4828a
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
18 additions
and
4 deletions
+18
-4
examples/seq2seq/README.md
examples/seq2seq/README.md
+18
-4
No files found.
examples/seq2seq/README.md
View file @
0e24e4c1
...
@@ -15,7 +15,8 @@ For `bertabs` instructions, see [`bertabs/README.md`](bertabs/README.md).
...
@@ -15,7 +15,8 @@ For `bertabs` instructions, see [`bertabs/README.md`](bertabs/README.md).
## Datasets
## Datasets
#### XSUM:
#### XSUM
```
bash
```
bash
cd
examples/seq2seq
cd
examples/seq2seq
wget https://cdn-datasets.huggingface.co/summarization/xsum.tar.gz
wget https://cdn-datasets.huggingface.co/summarization/xsum.tar.gz
...
@@ -26,6 +27,7 @@ this should make a directory called `xsum/` with files like `test.source`.
...
@@ -26,6 +27,7 @@ this should make a directory called `xsum/` with files like `test.source`.
To use your own data, copy that files format. Each article to be summarized is on its own line.
To use your own data, copy that files format. Each article to be summarized is on its own line.
#### CNN/DailyMail
#### CNN/DailyMail
```
bash
```
bash
cd
examples/seq2seq
cd
examples/seq2seq
wget https://cdn-datasets.huggingface.co/summarization/cnn_dm_v2.tgz
wget https://cdn-datasets.huggingface.co/summarization/cnn_dm_v2.tgz
...
@@ -35,7 +37,8 @@ export CNN_DIR=${PWD}/cnn_dm
...
@@ -35,7 +37,8 @@ export CNN_DIR=${PWD}/cnn_dm
```
```
this should make a directory called
`cnn_dm/`
with 6 files.
this should make a directory called
`cnn_dm/`
with 6 files.
#### WMT16 English-Romanian Translation Data:
#### WMT16 English-Romanian Translation Data
download with this command:
download with this command:
```
bash
```
bash
wget https://cdn-datasets.huggingface.co/translation/wmt_en_ro.tar.gz
wget https://cdn-datasets.huggingface.co/translation/wmt_en_ro.tar.gz
...
@@ -44,13 +47,25 @@ export ENRO_DIR=${PWD}/wmt_en_ro
...
@@ -44,13 +47,25 @@ export ENRO_DIR=${PWD}/wmt_en_ro
```
```
this should make a directory called
`wmt_en_ro/`
with 6 files.
this should make a directory called
`wmt_en_ro/`
with 6 files.
#### WMT English-German:
#### WMT English-German
```
bash
```
bash
wget https://cdn-datasets.huggingface.co/translation/wmt_en_de.tgz
wget https://cdn-datasets.huggingface.co/translation/wmt_en_de.tgz
tar
-xzvf
wmt_en_de.tgz
tar
-xzvf
wmt_en_de.tgz
export
DATA_DIR
=
${
PWD
}
/wmt_en_de
export
DATA_DIR
=
${
PWD
}
/wmt_en_de
```
```
#### FSMT datasets (wmt)
Refer to the scripts starting with
`eval_`
under:
https://github.com/huggingface/transformers/tree/master/scripts/fsmt
#### Pegasus (multiple datasets)
Multiple eval datasets are available for download from:
https://github.com/stas00/porting/tree/master/datasets/pegasus
#### Private Data
#### Private Data
If you are using your own data, it must be formatted as one directory with 6 files:
If you are using your own data, it must be formatted as one directory with 6 files:
...
@@ -64,7 +79,6 @@ test.target
...
@@ -64,7 +79,6 @@ test.target
```
```
The
`.source`
files are the input, the
`.target`
files are the desired output.
The
`.source`
files are the input, the
`.target`
files are the desired output.
### Tips and Tricks
### Tips and Tricks
General Tips:
General Tips:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment