README.md 6.32 KB
Newer Older
Myle Ott's avatar
Myle Ott committed
1
# <img src="fairseq_logo.png" width="30"> Introduction
Sergey Edunov's avatar
Sergey Edunov committed
2

Myle Ott's avatar
Myle Ott committed
3
4
Fairseq(-py) is a sequence modeling toolkit that allows researchers and
developers to train custom models for translation, summarization, language
Myle Ott's avatar
Myle Ott committed
5
6
7
8
9
10
11
12
13
14
15
modeling and other text generation tasks.

### What's New:

- July 2019: [RoBERTa models and code release](examples/roberta/README.md)
- June 2019: [wav2vec models and code release](examples/wav2vec/README.md)
- April 2019: [fairseq demo paper @ NAACL 2019](https://arxiv.org/abs/1904.01038)

### Features:

Fairseq provides reference implementations of various sequence-to-sequence models, including:
Myle Ott's avatar
Myle Ott committed
16
- **Convolutional Neural Networks (CNN)**
Myle Ott's avatar
Myle Ott committed
17
  - [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](examples/language_model/conv_lm/README.md)
18
19
20
  - [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](examples/conv_seq2seq/README.md)
  - [Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
  - [Fan et al. (2018): Hierarchical Neural Story Generation](examples/stories/README.md)
alexeib's avatar
alexeib committed
21
  - **_New_** [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
22
- **LightConv and DynamicConv models**
Myle Ott's avatar
Myle Ott committed
23
  - [Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions](examples/pay_less_attention_paper/README.md)
Myle Ott's avatar
Myle Ott committed
24
- **Long Short-Term Memory (LSTM) networks**
Myle Ott's avatar
Myle Ott committed
25
  - Luong et al. (2015): Effective Approaches to Attention-based Neural Machine Translation
Myle Ott's avatar
Myle Ott committed
26
- **Transformer (self-attention) networks**
Myle Ott's avatar
Myle Ott committed
27
  - Vaswani et al. (2017): Attention Is All You Need
28
  - [Ott et al. (2018): Scaling Neural Machine Translation](examples/scaling_nmt/README.md)
29
  - [Edunov et al. (2018): Understanding Back-Translation at Scale](examples/backtranslation/README.md)
Myle Ott's avatar
Myle Ott committed
30
31
32
  - [Baevski and Auli (2018): Adaptive Input Representations for Neural Language Modeling](examples/language_model/transformer_lm/README.md)
  - [Shen et al. (2019): Mixture Models for Diverse Machine Translation: Tricks of the Trade](examples/translation_moe/README.md)
  - **_New_** [Liu et al. (2019): RoBERTa: A Robustly Optimized BERT Pretraining Approach](examples/roberta/README.md)
33

Myle Ott's avatar
Myle Ott committed
34
**Additionally:**
Myle Ott's avatar
Myle Ott committed
35
- multi-GPU (distributed) training on one machine or across multiple machines
Myle Ott's avatar
Myle Ott committed
36
37
38
39
- fast generation on both CPU and GPU with multiple search algorithms implemented:
  - beam search
  - Diverse Beam Search ([Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424))
  - sampling (unconstrained and top-k)
Myle Ott's avatar
Myle Ott committed
40
- large mini-batch training even on a single GPU via delayed updates
41
- mixed precision training (trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores))
Myle Ott's avatar
Myle Ott committed
42
- extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
Myle Ott's avatar
Myle Ott committed
43

44
We also provide [pre-trained models](#pre-trained-models-and-examples) for several benchmark
Myle Ott's avatar
Myle Ott committed
45
translation and language modeling datasets.
Sergey Edunov's avatar
Sergey Edunov committed
46
47
48
49

![Model](fairseq.gif)

# Requirements and Installation
Myle Ott's avatar
Myle Ott committed
50
51

* [PyTorch](http://pytorch.org/) version >= 1.0.0
Bairen Yi's avatar
Bairen Yi committed
52
* Python version >= 3.5
Myle Ott's avatar
Myle Ott committed
53
* For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)
Sergey Edunov's avatar
Sergey Edunov committed
54

Myle Ott's avatar
Myle Ott committed
55
Please follow the instructions here to install PyTorch: https://github.com/pytorch/pytorch#installation.
Sergey Edunov's avatar
Sergey Edunov committed
56

Myle Ott's avatar
Myle Ott committed
57
58
If you use Docker make sure to increase the shared memory size either with
`--ipc=host` or `--shm-size` as command line options to `nvidia-docker run`.
59

Myle Ott's avatar
Myle Ott committed
60
After PyTorch is installed, you can install fairseq with `pip`:
Sergey Edunov's avatar
Sergey Edunov committed
61
```
Myle Ott's avatar
Myle Ott committed
62
63
64
65
66
67
68
69
70
71
pip install fairseq
```

**Installing from source**

To install fairseq from source and develop locally:
```
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable .
Sergey Edunov's avatar
Sergey Edunov committed
72
73
```

Myle Ott's avatar
Myle Ott committed
74
75
76
77
78
79
**Improved training speed**

Training speed can be further improved by installing NVIDIA's
[apex](https://github.com/NVIDIA/apex) library with the `--cuda_ext` option.
fairseq will automatically switch to the faster modules provided by apex.

Myle Ott's avatar
Myle Ott committed
80
# Getting Started
81

Myle Ott's avatar
Myle Ott committed
82
83
84
The [full documentation](https://fairseq.readthedocs.io/) contains instructions
for getting started, training new models and extending fairseq with new model
types and tasks.
Sergey Edunov's avatar
Sergey Edunov committed
85

86
# Pre-trained models and examples
Sergey Edunov's avatar
Sergey Edunov committed
87

88
89
We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below,
as well as example training and evaluation commands.
Sergey Edunov's avatar
Sergey Edunov committed
90

91
92
- [Translation](examples/translation/README.md): convolutional and transformer models are available
- [Language Modeling](examples/language_model/README.md): convolutional models are available
Sergey Edunov's avatar
Sergey Edunov committed
93

94
We also have more detailed READMEs to reproduce results from specific papers:
Myle Ott's avatar
Myle Ott committed
95
- [Liu et al. (2019): RoBERTa: A Robustly Optimized BERT Pretraining Approach](examples/roberta/README.md)
alexeib's avatar
alexeib committed
96
- [Schneider et al. (2019): wav2vec: Unsupervised Pre-training for Speech Recognition](examples/wav2vec/README.md)
97
- [Shen et al. (2019) Mixture Models for Diverse Machine Translation: Tricks of the Trade](examples/translation_moe/README.md)
98
- [Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions](examples/pay_less_attention_paper/README.md)
99
- [Edunov et al. (2018): Understanding Back-Translation at Scale](examples/backtranslation/README.md)
100
101
102
103
- [Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Fan et al. (2018): Hierarchical Neural Story Generation](examples/stories/README.md)
- [Ott et al. (2018): Scaling Neural Machine Translation](examples/scaling_nmt/README.md)
- [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](examples/conv_seq2seq/README.md)
Myle Ott's avatar
Myle Ott committed
104
- [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](examples/language_model/conv_lm/README.md)
Sergey Edunov's avatar
Sergey Edunov committed
105
106
107
108
109
110
111

# Join the fairseq community

* Facebook page: https://www.facebook.com/groups/fairseq.users
* Google group: https://groups.google.com/forum/#!forum/fairseq-users

# License
Myle Ott's avatar
Myle Ott committed
112
fairseq(-py) is BSD-licensed.
Sergey Edunov's avatar
Sergey Edunov committed
113
114
The license applies to the pre-trained models as well.
We also provide an additional patent grant.
Myle Ott's avatar
Myle Ott committed
115

Myle Ott's avatar
Myle Ott committed
116
117
118
119
120
121
122
123
124
125
126
127
# Citation

Please cite as:

```bibtex
@inproceedings{ott2019fairseq,
  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
  year = {2019},
}
```