README.md 5.74 KB
Newer Older
Huihui Fan's avatar
Huihui Fan committed
1
# Introduction <img src="fairseq_logo.png" width="50"> 
Sergey Edunov's avatar
Sergey Edunov committed
2

Myle Ott's avatar
Myle Ott committed
3
4
5
6
Fairseq(-py) is a sequence modeling toolkit that allows researchers and
developers to train custom models for translation, summarization, language
modeling and other text generation tasks. It provides reference implementations
of various sequence-to-sequence models, including:
Myle Ott's avatar
Myle Ott committed
7
- **Convolutional Neural Networks (CNN)**
Myle Ott's avatar
Myle Ott committed
8
  - [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](examples/language_model/conv_lm/README.md)
9
10
11
  - [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](examples/conv_seq2seq/README.md)
  - [Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
  - [Fan et al. (2018): Hierarchical Neural Story Generation](examples/stories/README.md)
12
- **LightConv and DynamicConv models**
13
  - **_New_** [Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions](examples/pay_less_attention_paper/README.md)
Myle Ott's avatar
Myle Ott committed
14
15
16
- **Long Short-Term Memory (LSTM) networks**
  - [Luong et al. (2015): Effective Approaches to Attention-based Neural Machine Translation](https://arxiv.org/abs/1508.04025)
  - [Wiseman and Rush (2016): Sequence-to-Sequence Learning as Beam-Search Optimization](https://arxiv.org/abs/1606.02960)
Myle Ott's avatar
Myle Ott committed
17
18
- **Transformer (self-attention) networks**
  - [Vaswani et al. (2017): Attention Is All You Need](https://arxiv.org/abs/1706.03762)
19
  - [Ott et al. (2018): Scaling Neural Machine Translation](examples/scaling_nmt/README.md)
20
  - [Edunov et al. (2018): Understanding Back-Translation at Scale](examples/backtranslation/README.md)
Myle Ott's avatar
Myle Ott committed
21
22
  - **_New_** [Baevski and Auli (2018): Adaptive Input Representations for Neural Language Modeling](examples/language_model/transformer_lm/README.md)
  - **_New_** [Shen et al. (2019): Mixture Models for Diverse Machine Translation: Tricks of the Trade](examples/translation_moe/README.md)
23

Myle Ott's avatar
Myle Ott committed
24
25
Fairseq features:
- multi-GPU (distributed) training on one machine or across multiple machines
Myle Ott's avatar
Myle Ott committed
26
27
28
29
- fast generation on both CPU and GPU with multiple search algorithms implemented:
  - beam search
  - Diverse Beam Search ([Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424))
  - sampling (unconstrained and top-k)
Myle Ott's avatar
Myle Ott committed
30
- large mini-batch training even on a single GPU via delayed updates
Myle Ott's avatar
Myle Ott committed
31
- fast half-precision floating point (FP16) training
Myle Ott's avatar
Myle Ott committed
32
- extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
Myle Ott's avatar
Myle Ott committed
33

34
We also provide [pre-trained models](#pre-trained-models-and-examples) for several benchmark
Myle Ott's avatar
Myle Ott committed
35
translation and language modeling datasets.
Sergey Edunov's avatar
Sergey Edunov committed
36
37
38
39

![Model](fairseq.gif)

# Requirements and Installation
Myle Ott's avatar
Myle Ott committed
40
41
42

* [PyTorch](http://pytorch.org/) version >= 1.0.0
* Python version >= 3.6
Myle Ott's avatar
Myle Ott committed
43
* For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)
Sergey Edunov's avatar
Sergey Edunov committed
44

Myle Ott's avatar
Myle Ott committed
45
Please follow the instructions here to install PyTorch: https://github.com/pytorch/pytorch#installation.
Sergey Edunov's avatar
Sergey Edunov committed
46

Myle Ott's avatar
Myle Ott committed
47
48
If you use Docker make sure to increase the shared memory size either with
`--ipc=host` or `--shm-size` as command line options to `nvidia-docker run`.
49

Myle Ott's avatar
Myle Ott committed
50
After PyTorch is installed, you can install fairseq with `pip`:
Sergey Edunov's avatar
Sergey Edunov committed
51
```
Myle Ott's avatar
Myle Ott committed
52
53
54
55
56
57
58
59
60
61
pip install fairseq
```

**Installing from source**

To install fairseq from source and develop locally:
```
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable .
Sergey Edunov's avatar
Sergey Edunov committed
62
63
```

Myle Ott's avatar
Myle Ott committed
64
65
66
67
68
69
**Improved training speed**

Training speed can be further improved by installing NVIDIA's
[apex](https://github.com/NVIDIA/apex) library with the `--cuda_ext` option.
fairseq will automatically switch to the faster modules provided by apex.

Myle Ott's avatar
Myle Ott committed
70
# Getting Started
71

Myle Ott's avatar
Myle Ott committed
72
73
74
The [full documentation](https://fairseq.readthedocs.io/) contains instructions
for getting started, training new models and extending fairseq with new model
types and tasks.
Sergey Edunov's avatar
Sergey Edunov committed
75

76
# Pre-trained models and examples
Sergey Edunov's avatar
Sergey Edunov committed
77

78
79
We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below,
as well as example training and evaluation commands.
Sergey Edunov's avatar
Sergey Edunov committed
80

81
82
- [Translation](examples/translation/README.md): convolutional and transformer models are available
- [Language Modeling](examples/language_model/README.md): convolutional models are available
Sergey Edunov's avatar
Sergey Edunov committed
83

84
We also have more detailed READMEs to reproduce results from specific papers:
85
- [Shen et al. (2019) Mixture Models for Diverse Machine Translation: Tricks of the Trade](examples/translation_moe/README.md)
86
- [Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions](examples/pay_less_attention_paper/README.md)
87
- [Edunov et al. (2018): Understanding Back-Translation at Scale](examples/backtranslation/README.md)
88
89
90
91
- [Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Fan et al. (2018): Hierarchical Neural Story Generation](examples/stories/README.md)
- [Ott et al. (2018): Scaling Neural Machine Translation](examples/scaling_nmt/README.md)
- [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](examples/conv_seq2seq/README.md)
Myle Ott's avatar
Myle Ott committed
92
- [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](examples/language_model/conv_lm/README.md)
Sergey Edunov's avatar
Sergey Edunov committed
93
94
95
96
97
98
99

# Join the fairseq community

* Facebook page: https://www.facebook.com/groups/fairseq.users
* Google group: https://groups.google.com/forum/#!forum/fairseq-users

# License
Myle Ott's avatar
Myle Ott committed
100
fairseq(-py) is BSD-licensed.
Sergey Edunov's avatar
Sergey Edunov committed
101
102
The license applies to the pre-trained models as well.
We also provide an additional patent grant.
Myle Ott's avatar
Myle Ott committed
103

Myle Ott's avatar
Myle Ott committed
104
105
106
107
108
109
110
111
112
113
114
115
# Citation

Please cite as:

```bibtex
@inproceedings{ott2019fairseq,
  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
  year = {2019},
}
```