The default batch size, 4, fits in 16GB GPU memory, but may need to be adjusted to fit your system.
### Training
Run/modify `finetune.sh`
The following command should work on a 16GB GPU:
```bash
export me=`git config user.name`
./finetune.sh \
--data_dir$XSUM_DIR\
--train_batch_size=1 \
--eval_batch_size=1 \
--output_dir="$me"_xsum_results \
--num_train_epochs 1
```
Tips:
- 1 epoch at batch size 1 for bart-large takes 24 hours, requires 13GB GPU RAM with fp16 on an NVIDIA-V100.
- try `bart-base`, `--freeze_encoder` or `--freeze_embeds` for faster training/larger batch size. (3hr/epoch with bs=8, see below)
-`fp16_opt_level=O1` (the default works best).
- If you are finetuning on your own dataset, start from `bart-large-cnn` if you want long summaries and `bart-large-xsum` if you want short summaries.
(It rarely makes sense to start from `bart-large` unless you are a researching finetuning methods).
- In addition to the pytorch-lightning .ckpt checkpoint, a transformers checkpoint will be saved.
Load it with `BartForConditionalGeneration.from_pretrained(f'{output_dir}/best_tfmr)`.
- At the moment, `--do_predict` does not work in a multi-gpu setting. You need to use `evaluate_checkpoint` or the `run_eval.py` code.
- If you want to run experiments on improving the summarization finetuning process, try the XSUM Shared Task (below). It's faster to train than CNNDM because the summaries are shorter.
### XSUM Shared Task
Compare XSUM results with others by using `--logger wandb_shared`. This requires `wandb` registration.
***This script evaluates the multitask pre-trained checkpoint for ``t5-base`` (see paper [here](https://arxiv.org/pdf/1910.10683.pdf)) on the English to German WMT dataset. Please note that the results in the paper were attained using a model fine-tuned on translation, so that results will be worse here by approx. 1.5 BLEU points***
### Intro
This example shows how T5 (here the official [paper](https://arxiv.org/abs/1910.10683)) can be
evaluated on the WMT English-German dataset.
### Get the WMT Data
To be able to reproduce the authors' results on WMT English to German, you first need to download
the WMT14 en-de news datasets.
Go on Stanford's official NLP [website](https://nlp.stanford.edu/projects/nmt/) and find "newstest2014.en" and "newstest2014.de" under WMT'14 English-German data or download the dataset directly via:
You should have 2737 sentences in each file. You can verify this by running:
```bash
wc-l newstest2014.en # should give 2737
```
### Usage
Let's check the longest and shortest sentence in our file to find reasonable decoding hyperparameters:
Get the longest and shortest sentence:
```bash
awk'{print NF}' newstest2014.en | sort-n | head-1# shortest sentence has 2 word
awk'{print NF}' newstest2014.en | sort-n | tail-1# longest sentence has 91 words
```
We will set our `max_length` to ~3 times the longest sentence and leave `min_length` to its default value of 0.
We decode with beam search `num_beams=4` as proposed in the paper. Also as is common in beam search we set `early_stopping=True` and `length_penalty=2.0`.
To create translation for each in dataset and get a final BLEU score, run: