README.md 2.1 KB
Newer Older
Zihan Wang's avatar
Zihan Wang committed
1
2
3
# Longformer: The Long-Document Transformer

## Modifications from Huggingface's Implementation
4
5
6

All models require a `global_attention_size` specified in the config, setting a
global attention for all first `global_attention_size` tokens in any sentence.
Zihan Wang's avatar
Zihan Wang committed
7
8
9
Individual different global attention sizes for sentences are not supported.
This setting allows running on TPUs where tensor sizes have to be determined.

10
11
12
`_get_global_attn_indices` in `longformer_attention.py` contains how the new
global attention indices are specified. Changed all `tf.cond` to if
confiditions, since global attention is specified in the start now.
Zihan Wang's avatar
Zihan Wang committed
13

14
15
16
17
To load weights from a pre-trained huggingface longformer, run
`utils/convert_pretrained_pytorch_checkpoint_to_tf.py` to create a checkpoint. \
There is also a `utils/longformer_tokenizer_to_tfrecord.py` that transformers
pytorch longformer tokenized data to tf_records.
Zihan Wang's avatar
Zihan Wang committed
18

Zihan Wang's avatar
Zihan Wang committed
19
20
21
## Steps to Fine-tune on MNLI
#### Prepare the pre-trained checkpoint
Option 1. Use our saved checkpoint of `allenai/longformer-base-4096` stored in cloud storage
22

Zihan Wang's avatar
Zihan Wang committed
23
```bash
24
gsutil cp -r gs://model-garden-ucsd-zihan/longformer-4096 .
Zihan Wang's avatar
Zihan Wang committed
25
26
```
Option 2. Create it directly
27

Zihan Wang's avatar
Zihan Wang committed
28
```bash
29
python3 utils/convert_pretrained_pytorch_checkpoint_to_tf.py
Zihan Wang's avatar
Zihan Wang committed
30
31
32
33
34
35
36
```
#### [Optional] Prepare the input file
```bash
python3 longformer_tokenizer_to_tfrecord.py
```
#### Training
Here, we use the training data of MNLI that were uploaded to the cloud storage, you can replace it with the input files you generated.
37

Zihan Wang's avatar
Zihan Wang committed
38
39
```bash
TRAIN_DATA=task.train_data.input_path=gs://model-garden-ucsd-zihan/longformer_allenai_mnli_train.tf_record,task.validation_data.input_path=gs://model-garden-ucsd-zihan/longformer_allenai_mnli_eval.tf_record
40
INIT_CHECKPOINT=longformer-4096/longformer
Zihan Wang's avatar
Zihan Wang committed
41
42
43
44
PYTHONPATH=/path/to/model/garden \
    python3 train.py \
    --experiment=longformer/glue \
    --config_file=experiments/glue_mnli_allenai.yaml \
45
    --params_override="${TRAIN_DATA},runtime.distribution_strategy=tpu,task.init_checkpoint=${INIT_CHECKPOINT}" \
Zihan Wang's avatar
Zihan Wang committed
46
47
    --tpu=local \
    --model_dir=/path/to/outputdir \
48
    --mode=train_and_eval
Zihan Wang's avatar
Zihan Wang committed
49
```
50
This should take ~ 3 hours to run, and give a performance of ~86.