Both types of models do require CNN data and follow different procedures of obtaining so.
#### For BART models
To be able to reproduce the authors' results on the CNN/Daily Mail dataset you first need to download both CNN and Daily Mail datasets [from Kyunghyun Cho's website](https://cs.nyu.edu/~kcho/DMQA/)(the links next to "Stories") in the same folder. Then uncompress the archives by running:
To be able to reproduce the authors' results on the CNN/Daily Mail dataset you first need to download both CNN and Daily Mail datasets [from Kyunghyun Cho's website](https://cs.nyu.edu/~kcho/DMQA/)(the links next to "Stories") in the same folder. Then uncompress the archives by running:
```bash
```bash
...
@@ -12,40 +9,17 @@ tar -xzvf cnn_dm.tgz
...
@@ -12,40 +9,17 @@ tar -xzvf cnn_dm.tgz
this should make a directory called cnn_dm/ with files like `test.source`.
this should make a directory called cnn_dm/ with files like `test.source`.
To use your own data, copy that files format. Each article to be summarized is on its own line.
To use your own data, copy that files format. Each article to be summarized is on its own line.
#### For T5 models
First, you need to download the CNN data. It's about ~400 MB and can be downloaded by