"ts/nni_manager/core/nnimanager.ts" did not exist on "ae7a72bc66496a501824c95bc6dc10e6cd45be0a"
README.md 2.1 KB
Newer Older
Zihan Wang's avatar
Zihan Wang committed
1
2
3
4
5
6
7
8
9
10
11
# Longformer: The Long-Document Transformer

## Modifications from Huggingface's Implementation
All models require a `global_attention_size` specified in the config, 
setting a global attention for all first `global_attention_size` tokens in any sentence. 
Individual different global attention sizes for sentences are not supported.
This setting allows running on TPUs where tensor sizes have to be determined.

`_get_global_attn_indices` in `longformer_attention.py` contains how the new global attention indices are specified.
Changed all `tf.cond` to if confiditions, since global attention is specified in the start now.

12
13
14
To load weights from a pre-trained huggingface longformer, run `utils/convert_pretrained_pytorch_checkpoint_to_tf.py` 
to create a checkpoint.  
There is also a `utils/longformer_tokenizer_to_tfrecord.py` that transformers pytorch longformer tokenized data to tf_records.
Zihan Wang's avatar
Zihan Wang committed
15

Zihan Wang's avatar
Zihan Wang committed
16
17
18
19
## Steps to Fine-tune on MNLI
#### Prepare the pre-trained checkpoint
Option 1. Use our saved checkpoint of `allenai/longformer-base-4096` stored in cloud storage
```bash
20
gsutil cp -r gs://model-garden-ucsd-zihan/longformer-4096 .
Zihan Wang's avatar
Zihan Wang committed
21
22
23
```
Option 2. Create it directly
```bash
24
python3 utils/convert_pretrained_pytorch_checkpoint_to_tf.py
Zihan Wang's avatar
Zihan Wang committed
25
26
27
28
29
30
31
```
#### [Optional] Prepare the input file
```bash
python3 longformer_tokenizer_to_tfrecord.py
```
#### Training
Here, we use the training data of MNLI that were uploaded to the cloud storage, you can replace it with the input files you generated.
Zihan Wang's avatar
Zihan Wang committed
32
33
```bash
TRAIN_DATA=task.train_data.input_path=gs://model-garden-ucsd-zihan/longformer_allenai_mnli_train.tf_record,task.validation_data.input_path=gs://model-garden-ucsd-zihan/longformer_allenai_mnli_eval.tf_record
34
INIT_CHECKPOINT=longformer-4096/longformer
Zihan Wang's avatar
Zihan Wang committed
35
36
37
38
PYTHONPATH=/path/to/model/garden \
    python3 train.py \
    --experiment=longformer/glue \
    --config_file=experiments/glue_mnli_allenai.yaml \
39
    --params_override="${TRAIN_DATA},runtime.distribution_strategy=tpu,task.init_checkpoint=${INIT_CHECKPOINT}" \
Zihan Wang's avatar
Zihan Wang committed
40
41
42
43
    --tpu=local \
    --model_dir=/path/to/outputdir \
    --mode=train_and_eval 
```
44
This should take ~ 3 hours to run, and give a performance of ~86.