README.md 6.05 KB
Newer Older
Huihui Fan's avatar
Huihui Fan committed
1
# Introduction <img src="fairseq_logo.png" width="50"> 
Sergey Edunov's avatar
Sergey Edunov committed
2

Myle Ott's avatar
Myle Ott committed
3
4
5
6
Fairseq(-py) is a sequence modeling toolkit that allows researchers and
developers to train custom models for translation, summarization, language
modeling and other text generation tasks. It provides reference implementations
of various sequence-to-sequence models, including:
Myle Ott's avatar
Myle Ott committed
7
- **Convolutional Neural Networks (CNN)**
Myle Ott's avatar
Myle Ott committed
8
  - [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](examples/language_model/conv_lm/README.md)
9
10
11
  - [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](examples/conv_seq2seq/README.md)
  - [Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
  - [Fan et al. (2018): Hierarchical Neural Story Generation](examples/stories/README.md)
alexeib's avatar
alexeib committed
12
  - **_New_** [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
13
- **LightConv and DynamicConv models**
14
  - **_New_** [Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions](examples/pay_less_attention_paper/README.md)
Myle Ott's avatar
Myle Ott committed
15
16
17
- **Long Short-Term Memory (LSTM) networks**
  - [Luong et al. (2015): Effective Approaches to Attention-based Neural Machine Translation](https://arxiv.org/abs/1508.04025)
  - [Wiseman and Rush (2016): Sequence-to-Sequence Learning as Beam-Search Optimization](https://arxiv.org/abs/1606.02960)
Myle Ott's avatar
Myle Ott committed
18
19
- **Transformer (self-attention) networks**
  - [Vaswani et al. (2017): Attention Is All You Need](https://arxiv.org/abs/1706.03762)
20
  - [Ott et al. (2018): Scaling Neural Machine Translation](examples/scaling_nmt/README.md)
21
  - [Edunov et al. (2018): Understanding Back-Translation at Scale](examples/backtranslation/README.md)
Myle Ott's avatar
Myle Ott committed
22
23
  - **_New_** [Baevski and Auli (2018): Adaptive Input Representations for Neural Language Modeling](examples/language_model/transformer_lm/README.md)
  - **_New_** [Shen et al. (2019): Mixture Models for Diverse Machine Translation: Tricks of the Trade](examples/translation_moe/README.md)
24

Myle Ott's avatar
Myle Ott committed
25
26
Fairseq features:
- multi-GPU (distributed) training on one machine or across multiple machines
Myle Ott's avatar
Myle Ott committed
27
28
29
30
- fast generation on both CPU and GPU with multiple search algorithms implemented:
  - beam search
  - Diverse Beam Search ([Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424))
  - sampling (unconstrained and top-k)
Myle Ott's avatar
Myle Ott committed
31
- large mini-batch training even on a single GPU via delayed updates
32
- mixed precision training (trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores))
Myle Ott's avatar
Myle Ott committed
33
- extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
Myle Ott's avatar
Myle Ott committed
34

35
We also provide [pre-trained models](#pre-trained-models-and-examples) for several benchmark
Myle Ott's avatar
Myle Ott committed
36
translation and language modeling datasets.
Sergey Edunov's avatar
Sergey Edunov committed
37
38
39
40

![Model](fairseq.gif)

# Requirements and Installation
Myle Ott's avatar
Myle Ott committed
41
42

* [PyTorch](http://pytorch.org/) version >= 1.0.0
Bairen Yi's avatar
Bairen Yi committed
43
* Python version >= 3.5
Myle Ott's avatar
Myle Ott committed
44
* For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)
Sergey Edunov's avatar
Sergey Edunov committed
45

Myle Ott's avatar
Myle Ott committed
46
Please follow the instructions here to install PyTorch: https://github.com/pytorch/pytorch#installation.
Sergey Edunov's avatar
Sergey Edunov committed
47

Myle Ott's avatar
Myle Ott committed
48
49
If you use Docker make sure to increase the shared memory size either with
`--ipc=host` or `--shm-size` as command line options to `nvidia-docker run`.
50

Myle Ott's avatar
Myle Ott committed
51
After PyTorch is installed, you can install fairseq with `pip`:
Sergey Edunov's avatar
Sergey Edunov committed
52
```
Myle Ott's avatar
Myle Ott committed
53
54
55
56
57
58
59
60
61
62
pip install fairseq
```

**Installing from source**

To install fairseq from source and develop locally:
```
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable .
Sergey Edunov's avatar
Sergey Edunov committed
63
64
```

Myle Ott's avatar
Myle Ott committed
65
66
67
68
69
70
**Improved training speed**

Training speed can be further improved by installing NVIDIA's
[apex](https://github.com/NVIDIA/apex) library with the `--cuda_ext` option.
fairseq will automatically switch to the faster modules provided by apex.

Myle Ott's avatar
Myle Ott committed
71
# Getting Started
72

Myle Ott's avatar
Myle Ott committed
73
74
75
The [full documentation](https://fairseq.readthedocs.io/) contains instructions
for getting started, training new models and extending fairseq with new model
types and tasks.
Sergey Edunov's avatar
Sergey Edunov committed
76

77
# Pre-trained models and examples
Sergey Edunov's avatar
Sergey Edunov committed
78

79
80
We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below,
as well as example training and evaluation commands.
Sergey Edunov's avatar
Sergey Edunov committed
81

82
83
- [Translation](examples/translation/README.md): convolutional and transformer models are available
- [Language Modeling](examples/language_model/README.md): convolutional models are available
Sergey Edunov's avatar
Sergey Edunov committed
84

85
We also have more detailed READMEs to reproduce results from specific papers:
alexeib's avatar
alexeib committed
86
- [Schneider et al. (2019): wav2vec: Unsupervised Pre-training for Speech Recognition](examples/wav2vec/README.md)
87
- [Shen et al. (2019) Mixture Models for Diverse Machine Translation: Tricks of the Trade](examples/translation_moe/README.md)
88
- [Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions](examples/pay_less_attention_paper/README.md)
89
- [Edunov et al. (2018): Understanding Back-Translation at Scale](examples/backtranslation/README.md)
90
91
92
93
- [Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Fan et al. (2018): Hierarchical Neural Story Generation](examples/stories/README.md)
- [Ott et al. (2018): Scaling Neural Machine Translation](examples/scaling_nmt/README.md)
- [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](examples/conv_seq2seq/README.md)
Myle Ott's avatar
Myle Ott committed
94
- [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](examples/language_model/conv_lm/README.md)
Sergey Edunov's avatar
Sergey Edunov committed
95
96
97
98
99
100
101

# Join the fairseq community

* Facebook page: https://www.facebook.com/groups/fairseq.users
* Google group: https://groups.google.com/forum/#!forum/fairseq-users

# License
Myle Ott's avatar
Myle Ott committed
102
fairseq(-py) is BSD-licensed.
Sergey Edunov's avatar
Sergey Edunov committed
103
104
The license applies to the pre-trained models as well.
We also provide an additional patent grant.
Myle Ott's avatar
Myle Ott committed
105

Myle Ott's avatar
Myle Ott committed
106
107
108
109
110
111
112
113
114
115
116
117
# Citation

Please cite as:

```bibtex
@inproceedings{ott2019fairseq,
  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
  year = {2019},
}
```