README.md 3.93 KB
Newer Older
1
2
3
4
# Neural Language Modeling

## Pre-trained models

Myle Ott's avatar
Myle Ott committed
5
6
Description | Parameters | Dataset | Model and Test set(s)
---|---:|---|---
7
8
Adaptive Inputs <br> ([Baevski and Auli, 2018](https://arxiv.org/abs/1809.10853)) | 1026M | [Google Billion Words](https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/lm/adaptive_lm_gbw_huge.tar.bz2)
Adaptive Inputs <br> ([Baevski and Auli, 2018](https://arxiv.org/abs/1809.10853)) | 247M | [WikiText-103](https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/lm/adaptive_lm_wiki103.tar.bz2)
9

Myle Ott's avatar
Myle Ott committed
10

11
## Example usage
12

Myle Ott's avatar
Myle Ott committed
13
14
15
Interactive generation via PyTorch Hub:
```
>>> import torch
Myle Ott's avatar
Myle Ott committed
16
17
>>> torch.hub.list('pytorch/fairseq')
[..., 'transformer_lm.gbw.adaptive_huge', 'transformer_lm.wiki103.adaptive', ...]
Myle Ott's avatar
Myle Ott committed
18
19
>>> lm = torch.hub.load(
...   'pytorch/fairseq',
Myle Ott's avatar
Myle Ott committed
20
...   'transformer_lm.wiki103.adaptive',
Myle Ott's avatar
Myle Ott committed
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
...   data_name_or_path='./data-bin',
...   tokenizer='moses',
...   no_escape=True,
...   beam=1,
...   sampling=True,
...   sampling_topk=10,
...   temperature=0.8,
... )
>>> lm.generate('Barack Obama', verbose=True)
```

Available models are listed in the ``hub_models()`` method in each model file, for example:
[transformer_lm.py](https://github.com/pytorch/fairseq/blob/master/fairseq/models/transformer_lm.py).


## Training a new model with the CLI tools

38
39
These scripts provide an example of pre-processing data for the Language Modeling task.

40
### prepare-wikitext-103.sh
41

42
Provides an example of pre-processing for [WikiText-103 language modeling task](https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/):
43
44

Example usage:
Myle Ott's avatar
Myle Ott committed
45
46

Prepare data:
47
48
49
50
51
52
53
54
```
$ cd examples/language_model/
$ bash prepare-wikitext-103.sh
$ cd ../..

# Binarize the dataset:
$ TEXT=examples/language_model/wikitext-103

Myle Ott's avatar
Myle Ott committed
55
$ fairseq-preprocess --only-source \
56
57
  --trainpref $TEXT/wiki.train.tokens --validpref $TEXT/wiki.valid.tokens --testpref $TEXT/wiki.test.tokens \ 
  --destdir data-bin/wikitext-103
Myle Ott's avatar
Myle Ott committed
58
59
60
61
62
63
64
65
66
67
```

Train a transformer language model with adaptive inputs ([Baevski and Auli (2018): Adaptive Input Representations for Neural Language Modeling](transformer_lm/README.md)):
```
# If it runs out of memory, try to reduce max-tokens and tokens-per-sample
$ mkdir -p checkpoints/transformer_wikitext-103
$ fairseq-train --task language_modeling data-bin/wikitext-103 \
  --save-dir checkpoints/transformer_wikitext-103 --arch transformer_lm_wiki103 \
  --max-update 286000 --max-lr 1.0 --t-mult 2 --lr-period-updates 270000 --lr-scheduler cosine --lr-shrink 0.75 \
  --warmup-updates 16000 --warmup-init-lr 1e-07 --min-lr 1e-09 --optimizer nag --lr 0.0001 --clip-norm 0.1 \
Myle Ott's avatar
Myle Ott committed
68
  --criterion adaptive_loss --max-tokens 3072 --update-freq 3 --tokens-per-sample 3072 --seed 1 \
Myle Ott's avatar
Myle Ott committed
69
  --sample-break-mode none --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d
70

Myle Ott's avatar
Myle Ott committed
71
72
73
74
75
76
77
78
79
# Evaluate:
$ fairseq-eval-lm data-bin/wikitext-103 --path 'checkpoints/transformer_wiki103/checkpoint_best.pt' \
  --sample-break-mode complete --max-tokens 3072 --context-window 2560 --softmax-batch 1024
```

Train a convolutional language model ([Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](conv_lm/README.md)):
```
# If it runs out of memory, try to reduce max-tokens and tokens-per-sample
$ mkdir -p checkpoints/fconv_wikitext-103
Myle Ott's avatar
Myle Ott committed
80
$ fairseq-train --task language_modeling data-bin/wikitext-103 \
Myle Ott's avatar
Myle Ott committed
81
  --save-dir checkpoints/fconv_wikitext-103 \
Myle Ott's avatar
Myle Ott committed
82
83
84
  --max-epoch 35 --arch fconv_lm_dauphin_wikitext103 --optimizer nag \
  --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 \
  --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion adaptive_loss \
85
  --adaptive-softmax-cutoff 10000,20000,200000 --max-tokens 1024 --tokens-per-sample 1024 \
Myle Ott's avatar
Myle Ott committed
86
  --ddp-backend=no_c10d
87
88

# Evaluate:
Myle Ott's avatar
Myle Ott committed
89
$ fairseq-eval-lm data-bin/wikitext-103 --path 'checkpoints/fconv_wiki103/checkpoint_best.pt'
90
```