README.md 3.89 KB
Newer Older
1
2
3
4
# Neural Language Modeling

## Pre-trained models

Myle Ott's avatar
Myle Ott committed
5
6
Description | Parameters | Dataset | Model and Test set(s)
---|---:|---|---
7
8
Adaptive Inputs <br> ([Baevski and Auli, 2018](https://arxiv.org/abs/1809.10853)) | 1026M | [Google Billion Words](https://github.com/ciprian-chelba/1-billion-word-language-modeling-benchmark) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/lm/adaptive_lm_gbw_huge.tar.bz2)
Adaptive Inputs <br> ([Baevski and Auli, 2018](https://arxiv.org/abs/1809.10853)) | 247M | [WikiText-103](https://einstein.ai/research/the-wikitext-long-term-dependency-language-modeling-dataset) | [download (.tar.bz2)](https://dl.fbaipublicfiles.com/fairseq/models/lm/adaptive_lm_wiki103.tar.bz2)
9

Myle Ott's avatar
Myle Ott committed
10

11
## Example usage
12

Myle Ott's avatar
Myle Ott committed
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Interactive generation via PyTorch Hub:
```
>>> import torch
>>> lm = torch.hub.load(
...   'pytorch/fairseq',
...   'transformer_lm',
...   model_name_or_path='transformer_lm.wiki103.adaptive',
...   data_name_or_path='./data-bin',
...   tokenizer='moses',
...   aggressive_dash_splits=True,
...   no_escape=True,
...   beam=1,
...   sampling=True,
...   sampling_topk=10,
...   temperature=0.8,
... )
>>> lm.generate('Barack Obama', verbose=True)
```

Available models are listed in the ``hub_models()`` method in each model file, for example:
[transformer_lm.py](https://github.com/pytorch/fairseq/blob/master/fairseq/models/transformer_lm.py).


## Training a new model with the CLI tools

38
39
These scripts provide an example of pre-processing data for the Language Modeling task.

40
### prepare-wikitext-103.sh
41

42
Provides an example of pre-processing for [WikiText-103 language modeling task](https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/):
43
44

Example usage:
Myle Ott's avatar
Myle Ott committed
45
46

Prepare data:
47
48
49
50
51
52
53
54
```
$ cd examples/language_model/
$ bash prepare-wikitext-103.sh
$ cd ../..

# Binarize the dataset:
$ TEXT=examples/language_model/wikitext-103

Myle Ott's avatar
Myle Ott committed
55
$ fairseq-preprocess --only-source \
56
57
  --trainpref $TEXT/wiki.train.tokens --validpref $TEXT/wiki.valid.tokens --testpref $TEXT/wiki.test.tokens \ 
  --destdir data-bin/wikitext-103
Myle Ott's avatar
Myle Ott committed
58
59
60
61
62
63
64
65
66
67
```

Train a transformer language model with adaptive inputs ([Baevski and Auli (2018): Adaptive Input Representations for Neural Language Modeling](transformer_lm/README.md)):
```
# If it runs out of memory, try to reduce max-tokens and tokens-per-sample
$ mkdir -p checkpoints/transformer_wikitext-103
$ fairseq-train --task language_modeling data-bin/wikitext-103 \
  --save-dir checkpoints/transformer_wikitext-103 --arch transformer_lm_wiki103 \
  --max-update 286000 --max-lr 1.0 --t-mult 2 --lr-period-updates 270000 --lr-scheduler cosine --lr-shrink 0.75 \
  --warmup-updates 16000 --warmup-init-lr 1e-07 --min-lr 1e-09 --optimizer nag --lr 0.0001 --clip-norm 0.1 \
Myle Ott's avatar
Myle Ott committed
68
  --criterion adaptive_loss --max-tokens 3072 --update-freq 3 --tokens-per-sample 3072 --seed 1 \
Myle Ott's avatar
Myle Ott committed
69
  --sample-break-mode none --skip-invalid-size-inputs-valid-test --ddp-backend=no_c10d
70

Myle Ott's avatar
Myle Ott committed
71
72
73
74
75
76
77
78
79
# Evaluate:
$ fairseq-eval-lm data-bin/wikitext-103 --path 'checkpoints/transformer_wiki103/checkpoint_best.pt' \
  --sample-break-mode complete --max-tokens 3072 --context-window 2560 --softmax-batch 1024
```

Train a convolutional language model ([Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](conv_lm/README.md)):
```
# If it runs out of memory, try to reduce max-tokens and tokens-per-sample
$ mkdir -p checkpoints/fconv_wikitext-103
Myle Ott's avatar
Myle Ott committed
80
$ fairseq-train --task language_modeling data-bin/wikitext-103 \
Myle Ott's avatar
Myle Ott committed
81
  --save-dir checkpoints/fconv_wikitext-103 \
Myle Ott's avatar
Myle Ott committed
82
83
84
85
  --max-epoch 35 --arch fconv_lm_dauphin_wikitext103 --optimizer nag \
  --lr 1.0 --lr-scheduler reduce_lr_on_plateau --lr-shrink 0.5 \
  --clip-norm 0.1 --dropout 0.2 --weight-decay 5e-06 --criterion adaptive_loss \
  --adaptive-softmax-cutoff 10000,20000,200000 --max-tokens 1024 --tokens-per-sample 1024
Myle Ott's avatar
Myle Ott committed
86
  --ddp-backend=no_c10d
87
88

# Evaluate:
Myle Ott's avatar
Myle Ott committed
89
$ fairseq-eval-lm data-bin/wikitext-103 --path 'checkpoints/fconv_wiki103/checkpoint_best.pt'
90
```