"router/vscode:/vscode.git/clone" did not exist on "3b0c979efcccd8ca51f59f1f982bfbbc842d06c9"
README.md 7.94 KB
Newer Older
Myle Ott's avatar
Myle Ott committed
1
# <img src="fairseq_logo.png" width="30"> Introduction
Sergey Edunov's avatar
Sergey Edunov committed
2

Myle Ott's avatar
Myle Ott committed
3
4
Fairseq(-py) is a sequence modeling toolkit that allows researchers and
developers to train custom models for translation, summarization, language
Myle Ott's avatar
Myle Ott committed
5
6
7
8
modeling and other text generation tasks.

### What's New:

Louis Martin's avatar
Louis Martin committed
9
- November 2019: [CamemBERT model and code released](examples/camembert/README.md)
10
- November 2019: [BART model and code released](examples/bart/README.md)
11
- November 2019: [XLM-R models and code released](examples/xlmr/README.md)
12
- September 2019: [Nonautoregressive translation code released](examples/nonautoregressive_translation/README.md)
Myle Ott's avatar
Myle Ott committed
13
- August 2019: [WMT'19 models released](examples/wmt19/README.md)
14
- July 2019: fairseq relicensed under MIT license
Myle Ott's avatar
Myle Ott committed
15
16
- July 2019: [RoBERTa models and code released](examples/roberta/README.md)
- June 2019: [wav2vec models and code released](examples/wav2vec/README.md)
Myle Ott's avatar
Myle Ott committed
17
18
19
20

### Features:

Fairseq provides reference implementations of various sequence-to-sequence models, including:
Myle Ott's avatar
Myle Ott committed
21
- **Convolutional Neural Networks (CNN)**
Myle Ott's avatar
Myle Ott committed
22
23
24
25
26
  - [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/conv_lm/README.md)
  - [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)
  - [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
  - [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)
  - [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
27
- **LightConv and DynamicConv models**
Myle Ott's avatar
Myle Ott committed
28
  - [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)
Myle Ott's avatar
Myle Ott committed
29
- **Long Short-Term Memory (LSTM) networks**
Myle Ott's avatar
Myle Ott committed
30
  - Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)
Myle Ott's avatar
Myle Ott committed
31
- **Transformer (self-attention) networks**
Myle Ott's avatar
Myle Ott committed
32
33
34
35
36
37
  - Attention Is All You Need (Vaswani et al., 2017)
  - [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)
  - [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)
  - [Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)](examples/language_model/transformer_lm/README.md)
  - [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)
  - [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)
Myle Ott's avatar
Myle Ott committed
38
  - [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)
39
  - [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md )
40
41
42
43
44
45
46
- **Non-autoregressive Transformers**
  - Non-Autoregressive Neural Machine Translation (Gu et al., 2017)
  - Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)
  - Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)
  - Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)
  - [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)

47

Myle Ott's avatar
Myle Ott committed
48
**Additionally:**
Myle Ott's avatar
Myle Ott committed
49
- multi-GPU (distributed) training on one machine or across multiple machines
Myle Ott's avatar
Myle Ott committed
50
51
52
- fast generation on both CPU and GPU with multiple search algorithms implemented:
  - beam search
  - Diverse Beam Search ([Vijayakumar et al., 2016](https://arxiv.org/abs/1610.02424))
53
  - sampling (unconstrained, top-k and top-p/nucleus)
Myle Ott's avatar
Myle Ott committed
54
- large mini-batch training even on a single GPU via delayed updates
55
- mixed precision training (trains faster with less GPU memory on [NVIDIA tensor cores](https://developer.nvidia.com/tensor-cores))
Myle Ott's avatar
Myle Ott committed
56
- extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
Myle Ott's avatar
Myle Ott committed
57

58
59
60
61
62
63
64
65
66
We also provide [pre-trained models for translation and language modeling](#pre-trained-models-and-examples)
with a convenient `torch.hub` interface:
```python
en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model')
en2de.translate('Hello world', beam=5)
# 'Hallo Welt'
```
See the PyTorch Hub tutorials for [translation](https://pytorch.org/hub/pytorch_fairseq_translation/)
and [RoBERTa](https://pytorch.org/hub/pytorch_fairseq_roberta/) for more examples.
Sergey Edunov's avatar
Sergey Edunov committed
67
68
69
70

![Model](fairseq.gif)

# Requirements and Installation
Myle Ott's avatar
Myle Ott committed
71

72
* [PyTorch](http://pytorch.org/) version >= 1.2.0
Bairen Yi's avatar
Bairen Yi committed
73
* Python version >= 3.5
Myle Ott's avatar
Myle Ott committed
74
* For training new models, you'll also need an NVIDIA GPU and [NCCL](https://github.com/NVIDIA/nccl)
Myle Ott's avatar
Myle Ott committed
75
* **For faster training** install NVIDIA's [apex](https://github.com/NVIDIA/apex) library with the `--cuda_ext` option
Sergey Edunov's avatar
Sergey Edunov committed
76

Myle Ott's avatar
Myle Ott committed
77
78
To install fairseq:
```bash
Myle Ott's avatar
Myle Ott committed
79
80
pip install fairseq
```
Myle Ott's avatar
Myle Ott committed
81
82
83

On MacOS:
```bash
84
85
CFLAGS="-stdlib=libc++" pip install fairseq
```
Myle Ott's avatar
Myle Ott committed
86
87
88
89

If you use Docker make sure to increase the shared memory size either with
`--ipc=host` or `--shm-size` as command line options to `nvidia-docker run`.

Myle Ott's avatar
Myle Ott committed
90
91
92
**Installing from source**

To install fairseq from source and develop locally:
Myle Ott's avatar
Myle Ott committed
93
```bash
Myle Ott's avatar
Myle Ott committed
94
95
96
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable .
Sergey Edunov's avatar
Sergey Edunov committed
97
98
```

Myle Ott's avatar
Myle Ott committed
99
# Getting Started
100

Myle Ott's avatar
Myle Ott committed
101
102
103
The [full documentation](https://fairseq.readthedocs.io/) contains instructions
for getting started, training new models and extending fairseq with new model
types and tasks.
Sergey Edunov's avatar
Sergey Edunov committed
104

105
# Pre-trained models and examples
Sergey Edunov's avatar
Sergey Edunov committed
106

107
108
We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below,
as well as example training and evaluation commands.
Sergey Edunov's avatar
Sergey Edunov committed
109

110
- [Translation](examples/translation/README.md): convolutional and transformer models are available
Myle Ott's avatar
Myle Ott committed
111
- [Language Modeling](examples/language_model/README.md): convolutional and transformer models are available
alexeib's avatar
alexeib committed
112
- [wav2vec](examples/wav2vec/README.md): wav2vec large model is available
Sergey Edunov's avatar
Sergey Edunov committed
113

114
We also have more detailed READMEs to reproduce results from specific papers:
115
- [Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)](examples/joint_alignment_translation/README.md )
116
- [Levenshtein Transformer (Gu et al., 2019)](examples/nonautoregressive_translation/README.md)
Myle Ott's avatar
Myle Ott committed
117
- [Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)](examples/wmt19/README.md)
Myle Ott's avatar
Myle Ott committed
118
119
120
121
122
123
124
125
126
127
- [RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)](examples/roberta/README.md)
- [wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)](examples/wav2vec/README.md)
- [Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)](examples/translation_moe/README.md)
- [Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)](examples/pay_less_attention_paper/README.md)
- [Understanding Back-Translation at Scale (Edunov et al., 2018)](examples/backtranslation/README.md)
- [Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)](https://github.com/pytorch/fairseq/tree/classic_seqlevel)
- [Hierarchical Neural Story Generation (Fan et al., 2018)](examples/stories/README.md)
- [Scaling Neural Machine Translation (Ott et al., 2018)](examples/scaling_nmt/README.md)
- [Convolutional Sequence to Sequence Learning (Gehring et al., 2017)](examples/conv_seq2seq/README.md)
- [Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)](examples/language_model/conv_lm/README.md)
Sergey Edunov's avatar
Sergey Edunov committed
128
129
130
131
132
133
134

# Join the fairseq community

* Facebook page: https://www.facebook.com/groups/fairseq.users
* Google group: https://groups.google.com/forum/#!forum/fairseq-users

# License
135
fairseq(-py) is MIT-licensed.
Sergey Edunov's avatar
Sergey Edunov committed
136
The license applies to the pre-trained models as well.
Myle Ott's avatar
Myle Ott committed
137

Myle Ott's avatar
Myle Ott committed
138
139
140
141
142
143
144
145
146
147
148
149
# Citation

Please cite as:

```bibtex
@inproceedings{ott2019fairseq,
  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
  year = {2019},
}
```