README.md 4.89 KB
Newer Older
Huihui Fan's avatar
Huihui Fan committed
1
# Introduction <img src="fairseq_logo.png" width="50"> 
Sergey Edunov's avatar
Sergey Edunov committed
2

Myle Ott's avatar
Myle Ott committed
3
4
5
6
Fairseq(-py) is a sequence modeling toolkit that allows researchers and
developers to train custom models for translation, summarization, language
modeling and other text generation tasks. It provides reference implementations
of various sequence-to-sequence models, including:
Myle Ott's avatar
Myle Ott committed
7
- **Convolutional Neural Networks (CNN)**
Myle Ott's avatar
Myle Ott committed
8
  - [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](https://arxiv.org/abs/1612.08083)
Myle Ott's avatar
Myle Ott committed
9
  - [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122)
10
  - [Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning](https://arxiv.org/abs/1711.04956)
11
12
13
  - [Fan et al. (2018): Hierarchical Neural Story Generation](https://arxiv.org/abs/1805.04833)
- **LightConv and DynamicConv models**
  - **_New_** [Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions](https://openreview.net/pdf?id=SkVhlh09tX)
Myle Ott's avatar
Myle Ott committed
14
15
16
- **Long Short-Term Memory (LSTM) networks**
  - [Luong et al. (2015): Effective Approaches to Attention-based Neural Machine Translation](https://arxiv.org/abs/1508.04025)
  - [Wiseman and Rush (2016): Sequence-to-Sequence Learning as Beam-Search Optimization](https://arxiv.org/abs/1606.02960)
Myle Ott's avatar
Myle Ott committed
17
18
- **Transformer (self-attention) networks**
  - [Vaswani et al. (2017): Attention Is All You Need](https://arxiv.org/abs/1706.03762)
19
20
  - [Ott et al. (2018): Scaling Neural Machine Translation](https://arxiv.org/abs/1806.00187)
  - [Edunov et al. (2018): Understanding Back-Translation at Scale](https://arxiv.org/abs/1808.09381)
21

Myle Ott's avatar
Myle Ott committed
22
23
Fairseq features:
- multi-GPU (distributed) training on one machine or across multiple machines
Myle Ott's avatar
Myle Ott committed
24
25
26
27
- fast generation on both CPU and GPU with multiple search algorithms implemented:
  - beam search
  - Diverse Beam Search ([Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424))
  - sampling (unconstrained and top-k)
Myle Ott's avatar
Myle Ott committed
28
- large mini-batch training even on a single GPU via delayed updates
Myle Ott's avatar
Myle Ott committed
29
- fast half-precision floating point (FP16) training
Myle Ott's avatar
Myle Ott committed
30
- extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
Myle Ott's avatar
Myle Ott committed
31

32
We also provide [pre-trained models](#pre-trained-models-and-examples) for several benchmark
Myle Ott's avatar
Myle Ott committed
33
translation and language modeling datasets.
Sergey Edunov's avatar
Sergey Edunov committed
34
35
36
37
38

![Model](fairseq.gif)

# Requirements and Installation
* A [PyTorch installation](http://pytorch.org/)
Myle Ott's avatar
Myle Ott committed
39
40
* For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)
* Python version 3.6
Sergey Edunov's avatar
Sergey Edunov committed
41

Myle Ott's avatar
Myle Ott committed
42
Currently fairseq requires PyTorch version >= 1.0.0.
43
Please follow the instructions here: https://github.com/pytorch/pytorch#installation.
Sergey Edunov's avatar
Sergey Edunov committed
44

Myle Ott's avatar
Myle Ott committed
45
46
If you use Docker make sure to increase the shared memory size either with
`--ipc=host` or `--shm-size` as command line options to `nvidia-docker run`.
47

Myle Ott's avatar
Myle Ott committed
48
After PyTorch is installed, you can install fairseq with `pip`:
Sergey Edunov's avatar
Sergey Edunov committed
49
```
Myle Ott's avatar
Myle Ott committed
50
51
52
53
54
55
56
57
58
59
pip install fairseq
```

**Installing from source**

To install fairseq from source and develop locally:
```
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable .
Sergey Edunov's avatar
Sergey Edunov committed
60
61
```

Myle Ott's avatar
Myle Ott committed
62
# Getting Started
63

Myle Ott's avatar
Myle Ott committed
64
65
66
The [full documentation](https://fairseq.readthedocs.io/) contains instructions
for getting started, training new models and extending fairseq with new model
types and tasks.
Sergey Edunov's avatar
Sergey Edunov committed
67

68
# Pre-trained models and examples
Sergey Edunov's avatar
Sergey Edunov committed
69

70
71
We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below,
as well as example training and evaluation commands.
Sergey Edunov's avatar
Sergey Edunov committed
72

73
74
- [Translation](examples/translation/README.md): convolutional and transformer models are available
- [Language Modeling](examples/language_model/README.md): convolutional models are available
Sergey Edunov's avatar
Sergey Edunov committed
75

76
77
78
79
80
81
82
We also have more detailed READMEs to reproduce results from specific papers:
- [Wu et al. (2019): Pay Less Attention with Lightweight and Dynamic Convolutions](examples/pay_less_attention_paper/README.md)
- [Edunov et al. (2018): Classical Structured Prediction Losses for Sequence to Sequence Learning](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Fan et al. (2018): Hierarchical Neural Story Generation](examples/stories/README.md)
- [Ott et al. (2018): Scaling Neural Machine Translation](examples/scaling_nmt/README.md)
- [Gehring et al. (2017): Convolutional Sequence to Sequence Learning](examples/conv_seq2seq/README.md)
- [Dauphin et al. (2017): Language Modeling with Gated Convolutional Networks](examples/conv_lm/README.md)
Sergey Edunov's avatar
Sergey Edunov committed
83
84
85
86
87
88
89

# Join the fairseq community

* Facebook page: https://www.facebook.com/groups/fairseq.users
* Google group: https://groups.google.com/forum/#!forum/fairseq-users

# License
Myle Ott's avatar
Myle Ott committed
90
fairseq(-py) is BSD-licensed.
Sergey Edunov's avatar
Sergey Edunov committed
91
92
The license applies to the pre-trained models as well.
We also provide an additional patent grant.
Myle Ott's avatar
Myle Ott committed
93
94

# Credits
Myle Ott's avatar
Myle Ott committed
95
96
97
98
99
This is a PyTorch version of
[fairseq](https://github.com/facebookresearch/fairseq), a sequence-to-sequence
learning toolkit from Facebook AI Research. The original authors of this
reimplementation are (in no particular order) Sergey Edunov, Myle Ott, and Sam
Gross.