README.md 4.27 KB
Newer Older
1
2
3
4
# Neural Language Modeling

## Pre-trained models

Myle Ott's avatar
Myle Ott committed
5
6
7
8
9
10
11
Model | Description | Dataset | Download
---|---|---|---
`transformer_lm.gbw.adaptive_huge` | Adaptive Inputs <br> ([Baevski and Auli, 2018](https://arxiv.org/abs/1809.10853)) <br> 1026M params | [Google Billion Words](https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/lm/adaptive_lm_gbw_huge.tar.bz2)
`transformer_lm.wiki103.adaptive` | Adaptive Inputs <br> ([Baevski and Auli, 2018](https://arxiv.org/abs/1809.10853)) <br> 247M params | [WikiText-103](https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/lm/adaptive_lm_wiki103.tar.bz2)
`transformer_lm.wmt19.en` | English LM <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) | [WMT News Crawl](http://data.statmt.org/news-crawl/) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt19.en.tar.gz)
`transformer_lm.wmt19.de` | German LM <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) | [WMT News Crawl](http://data.statmt.org/news-crawl/) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt19.de.tar.gz)
`transformer_lm.wmt19.ru` | Russian LM <br> ([Ng et al., 2019](https://arxiv.org/abs/1907.06616)) | [WMT News Crawl](http://data.statmt.org/news-crawl/) | [download (.tar.gz)](https://dl.fbaipublicfiles.com/fairseq/models/lm/wmt19.ru.tar.gz)
Myle Ott's avatar
Myle Ott committed
12

13
## Example usage
14

Myle Ott's avatar
Myle Ott committed
15
To sample from a language model using PyTorch Hub:
Myle Ott's avatar
Myle Ott committed
16
17
```python
import torch
Myle Ott's avatar
Myle Ott committed
18

Myle Ott's avatar
Myle Ott committed
19
20
# List available models
torch.hub.list('pytorch/fairseq')  # [..., 'transformer_lm.wmt19.en', ...]
Myle Ott's avatar
Myle Ott committed
21

Myle Ott's avatar
Myle Ott committed
22
23
24
25
26
27
# Load an English LM trained on WMT'19 News Crawl data
en_lm = torch.hub.load('pytorch/fairseq', 'transformer_lm.wmt19.en', tokenizer='moses', bpe='fastbpe')

# Sample from the language model
en_lm.sample('Barack Obama', beam=1, sampling=True, sampling_topk=10, temperature=0.8)
# "Barack Obama is coming to Sydney and New Zealand (...)"
Myle Ott's avatar
Myle Ott committed
28
29
30
31
32
33

# The same interface can be used with custom models as well
from fairseq.models.transformer_lm import TransformerLanguageModel
custom_lm = TransformerLanguageModel.from_pretrained('/path/to/model/dir', 'checkpoint100.pt', tokenizer='moses', bpe='fastbpe')
custom_lm.sample('Barack Obama', beam=5)
# "Barack Obama (...)"
Myle Ott's avatar
Myle Ott committed
34
```
Myle Ott's avatar
Myle Ott committed
35

Myle Ott's avatar
Myle Ott committed
36
## Training a transformer language model with the CLI tools
Myle Ott's avatar
Myle Ott committed
37

Myle Ott's avatar
Myle Ott committed
38
### 1) Preprocess the data
39

Myle Ott's avatar
Myle Ott committed
40
First download and prepare the [WikiText-103 dataset](https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/):
Myle Ott's avatar
Myle Ott committed
41
42
43
44
```bash
cd examples/language_model/
bash prepare-wikitext-103.sh
cd ../..
Myle Ott's avatar
Myle Ott committed
45
```
46

Myle Ott's avatar
Myle Ott committed
47
48
Next preprocess/binarize the data:
```bash
Myle Ott's avatar
Myle Ott committed
49
TEXT=examples/language_model/wikitext-103
Myle Ott's avatar
Myle Ott committed
50
51
52
53
54
55
56
fairseq-preprocess \
    --only-source \
    --trainpref $TEXT/wiki.train.tokens \
    --validpref $TEXT/wiki.valid.tokens \
    --testpref $TEXT/wiki.test.tokens \ 
    --destdir data-bin/wikitext-103 \
    --workers 20
Myle Ott's avatar
Myle Ott committed
57
58
```

Myle Ott's avatar
Myle Ott committed
59
60
61
### 2) Train a language model

Next we'll train a transformer language model using [adaptive inputs](transformer_lm/README.md):
Myle Ott's avatar
Myle Ott committed
62
```bash
Myle Ott's avatar
Myle Ott committed
63
64
65
66
fairseq-train --task language_modeling \
    data-bin/wikitext-103 \
    --save-dir checkpoints/transformer_wikitext-103 \
    --arch transformer_lm_wiki103 \
Myle Ott's avatar
Myle Ott committed
67
68
69
70
    --max-update 286000 --max-lr 1.0 --t-mult 2 --lr-period-updates 270000 --lr-scheduler cosine --lr-shrink 0.75 \
    --warmup-updates 16000 --warmup-init-lr 1e-07 --min-lr 1e-09 --optimizer nag --lr 0.0001 --clip-norm 0.1 \
    --criterion adaptive_loss --max-tokens 3072 --update-freq 3 --tokens-per-sample 3072 --seed 1 \
    --sample-break-mode none --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d
Myle Ott's avatar
Myle Ott committed
71
72
```

Myle Ott's avatar
Myle Ott committed
73
74
75
76
77
78
79
80
81
82
83
If the above command runs out of memory, try reducing `--max-tokens` (max number
of tokens per batch) or `--tokens-per-sample` (max sequence length). You can
also increase `--update-freq` to accumulate gradients and simulate training on
more GPUs.

### 3) Evaluate
```bash
fairseq-eval-lm data-bin/wikitext-103 \
    --path checkpoints/transformer_wiki103/checkpoint_best.pt \
    --sample-break-mode complete --max-tokens 3072 \
    --context-window 2560 --softmax-batch 1024
84
```
Myle Ott's avatar
Myle Ott committed
85
86
87
88
89

## Convolutional language models

Please see the [convolutional LM README](conv_lm/README.md) for instructions to
train convolutional language models.