README.md 2.29 KB
Newer Older
Zihan Wang's avatar
Zihan Wang committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Longformer: The Long-Document Transformer

## Modifications from Huggingface's Implementation
All models require a `global_attention_size` specified in the config, 
setting a global attention for all first `global_attention_size` tokens in any sentence. 
Individual different global attention sizes for sentences are not supported.
This setting allows running on TPUs where tensor sizes have to be determined.

`_get_global_attn_indices` in `longformer_attention.py` contains how the new global attention indices are specified.
Changed all `tf.cond` to if confiditions, since global attention is specified in the start now.

`sentence_prediction_with_checkpoint_convert.py` now contains a `initial_parameters_from_pk` parameter that
specified a pk file containing all pre-trained weights from a pytorch longformer, which can be loaded into the 
tf model.
The pk file can be generated from `utils/get_parameters_from_pretrained_pytorch_checkpoint.py`.
There is also a `longformer_tokenizer_to_tfrecord.py` that transformers pytorch longformer tokenized data to tf_records.

Zihan Wang's avatar
Zihan Wang committed
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## Steps to Fine-tune on MNLI
#### Prepare the pre-trained checkpoint
Option 1. Use our saved checkpoint of `allenai/longformer-base-4096` stored in cloud storage
```bash
gsutil cp gs://model-garden-ucsd-zihan/allenai.pk allenai_longformer-base-4096.pk
```
Option 2. Create it directly
```bash
python3 utils/get_parameters_from_pretrained_pytorch_checkpoint.py
```
#### [Optional] Prepare the input file
```bash
python3 longformer_tokenizer_to_tfrecord.py
```
#### Training
Here, we use the training data of MNLI that were uploaded to the cloud storage, you can replace it with the input files you generated.
Zihan Wang's avatar
Zihan Wang committed
34
35
36
37
38
39
40
41
42
43
44
```bash
TRAIN_DATA=task.train_data.input_path=gs://model-garden-ucsd-zihan/longformer_allenai_mnli_train.tf_record,task.validation_data.input_path=gs://model-garden-ucsd-zihan/longformer_allenai_mnli_eval.tf_record
PYTHONPATH=/path/to/model/garden \
    python3 train.py \
    --experiment=longformer/glue \
    --config_file=experiments/glue_mnli_allenai.yaml \
    --params_override="${TRAIN_DATA},runtime.distribution_strategy=tpu,task.initial_parameters_from_pk=allenai_longformer-base-4096.pk" \
    --tpu=local \
    --model_dir=/path/to/outputdir \
    --mode=train_and_eval 
```
Zihan Wang's avatar
Zihan Wang committed
45
This should take an hour or two to run, and give a performance of ~86.