"...composable_kernel_onnx.git" did not exist on "b73ae2423495a9054ceaec4d529d30db7e089743"
Unverified Commit 1e00ef68 authored by Sam Shleifer's avatar Sam Shleifer Committed by GitHub
Browse files

[s2s] dont document packing because it hurts performance (#6077)

parent 9d0d3a66
...@@ -27,17 +27,7 @@ this should make a directory called `cnn_dm/` with files like `test.source`. ...@@ -27,17 +27,7 @@ this should make a directory called `cnn_dm/` with files like `test.source`.
``` ```
WMT16 English-Romanian Translation Data: WMT16 English-Romanian Translation Data:
download with this command:
This dataset comes in two formats. The "packed" version merges short training examples into examples of <200 tokens to increase GPU utilization (and also improves validation performance).
```bash
cd examples/seq2seq
wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro_packed_train_200.tgz
tar -xzvf wmt_en_ro_packed_200.tgz
export ENRO_DIR=wmt_en_ro_packed_train_200
```
The original data can also be downloaded with this command:
```bash ```bash
wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz wget https://s3.amazonaws.com/datasets.huggingface.co/translation/wmt_en_ro.tar.gz
tar -xzvf wmt_en_ro.tar.gz tar -xzvf wmt_en_ro.tar.gz
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment