README.md 14.9 KB
Newer Older
thomwolf's avatar
thomwolf committed
1
# 馃懢 PyTorch-Transformers
VictorSanh's avatar
VictorSanh committed
2

thomwolf's avatar
thomwolf committed
3
[![CircleCI](https://circleci.com/gh/huggingface/pytorch-pretrained-bert.svg?style=svg)](https://circleci.com/gh/huggingface/pytorch-pretrained-bert)
Julien Chaumond's avatar
Julien Chaumond committed
4

thomwolf's avatar
thomwolf committed
5
PyTorch-Transformers is a library of state-of-the-art pre-trained models for Natural Language Processing (NLP). The library currently contains PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for the following models:
VictorSanh's avatar
VictorSanh committed
6

thomwolf's avatar
thomwolf committed
7
8
9
10
11
12
- **[Google's BERT model](https://github.com/google-research/bert)** released with the paper [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
- **[OpenAI's GPT model](https://github.com/openai/finetune-transformer-lm) released  with the paper [Improving Language Understanding by Generative Pre-Training](https://blog.openai.com/language-unsupervised/)** by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
- **[OpenAI's GPT-2 model](https://blog.openai.com/better-language-models/)** released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
- **[Google/CMU's Transformer-XL model](https://github.com/kimiyoung/transformer-xl)** released with the paper [鈥媂LNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
- **[Google/CMU's XLNet model](https://github.com/zihangdai/xlnet/)** released with the paper [鈥媂LNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
- **[Facebook's XLM model](https://github.com/facebookresearch/XLM/)** released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
thomwolf's avatar
thomwolf committed
13

thomwolf's avatar
thomwolf committed
14
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Peason R coefficient on STS-B for XLNet).
thomwolf's avatar
thomwolf committed
15

thomwolf's avatar
thomwolf committed
16
You can find more details in the [Examples](#examples) section of the documentation.
thomwolf's avatar
thomwolf committed
17

thomwolf's avatar
thomwolf committed
18
## Readme
19

thomwolf's avatar
thomwolf committed
20
| Section | Description |
thomwolf's avatar
thomwolf committed
21
|-|-|
thomwolf's avatar
thomwolf committed
22
| [Installation](#installation) | How to install the package |
thomwolf's avatar
thomwolf committed
23
24
25
| [Quick tour: Usage](#quick-tour-usage) | Tokenizers & models usage: Bert and GPT-2 |
| [Quick tour: Fine-tuning/usage scripts](#quick-tour-fine-tuning-usage-scripts) | Using provided scripts: GLUE, SQuAD and Text generation |
| [Documentation](#documentation) | Full API documentation and more |
thomwolf's avatar
thomwolf committed
26

thomwolf's avatar
thomwolf committed
27
## Installation
VictorSanh's avatar
VictorSanh committed
28

thomwolf's avatar
thomwolf committed
29
This repo is tested on Python 2.7 and 3.5+ (examples are tested only on python 3.5+) and PyTorch 0.4.1 to 1.1.0
VictorSanh's avatar
VictorSanh committed
30

thomwolf's avatar
thomwolf committed
31
### With pip
thomwolf's avatar
thomwolf committed
32

thomwolf's avatar
thomwolf committed
33
PyTorch-Transformers can be installed by pip as follows:
thomwolf's avatar
thomwolf committed
34

thomwolf's avatar
thomwolf committed
35
```bash
thomwolf's avatar
thomwolf committed
36
pip install pytorch-transformers
thomwolf's avatar
thomwolf committed
37
```
VictorSanh's avatar
VictorSanh committed
38

thomwolf's avatar
thomwolf committed
39
### From source
thomwolf's avatar
thomwolf committed
40
41

Clone the repository and run:
thomwolf's avatar
thomwolf committed
42

thomwolf's avatar
thomwolf committed
43
44
45
```bash
pip install [--editable] .
```
VictorSanh's avatar
VictorSanh committed
46

thomwolf's avatar
thomwolf committed
47
48
49
### SpaCy, ftfy

If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you can install `ftfy` (version 4.4.3 if you are using Python 2) and `SpaCy` :
thomwolf's avatar
thomwolf committed
50

thomwolf's avatar
thomwolf committed
51
52
53
54
55
```bash
pip install spacy ftfy==4.4.3
python -m spacy download en
```

thomwolf's avatar
thomwolf committed
56
If you don't install `ftfy` and `SpaCy`, the `OpenAI GPT` tokenizer will default to tokenize using BERT's `BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
thomwolf's avatar
thomwolf committed
57

thomwolf's avatar
thomwolf committed
58
### Tests
thomwolf's avatar
thomwolf committed
59

thomwolf's avatar
thomwolf committed
60
A series of tests is included for the library and the example scripts. Library tests can be found in the [tests folder](https://github.com/huggingface/pytorch-transformers/tree/master/pytorch_transformers/tests) and examples tests in the [examples folder](https://github.com/huggingface/pytorch-transformers/tree/master/examples).
thomwolf's avatar
thomwolf committed
61

thomwolf's avatar
thomwolf committed
62
These tests can be run using `pytest` (install pytest if needed with `pip install pytest`).
thomwolf's avatar
thomwolf committed
63

thomwolf's avatar
thomwolf committed
64
You can run the tests from the root of the cloned repository with the commands:
thomwolf's avatar
thomwolf committed
65

thomwolf's avatar
thomwolf committed
66
67
68
69
```bash
python -m pytest -sv ./pytorch_transformers/tests/
python -m pytest -sv ./examples/
```
thomwolf's avatar
thomwolf committed
70

thomwolf's avatar
thomwolf committed
71
## Quick tour: Usage
thomwolf's avatar
thomwolf committed
72

thomwolf's avatar
thomwolf committed
73
Here are two quick-start examples using `Bert` and `GPT2` with pre-trained models.
thomwolf's avatar
thomwolf committed
74

thomwolf's avatar
thomwolf committed
75
See the [documentation](#doc) for the details of all the models and classes.
thomwolf's avatar
thomwolf committed
76

thomwolf's avatar
thomwolf committed
77
### BERT example
thomwolf's avatar
thomwolf committed
78

thomwolf's avatar
thomwolf committed
79
First let's prepare a tokenized input with `BertTokenizer`
thomwolf's avatar
thomwolf committed
80
81
82

```python
import torch
thomwolf's avatar
thomwolf committed
83
from pytorch_transformers import BertTokenizer, BertModel, BertForMaskedLM
thomwolf's avatar
thomwolf committed
84

thomwolf's avatar
thomwolf committed
85
# OPTIONAL: if you want to have more information on what's happening under the hood, activate the logger as follows
thomwolf's avatar
thomwolf committed
86
87
88
import logging
logging.basicConfig(level=logging.INFO)

thomwolf's avatar
thomwolf committed
89
# Load pre-trained model tokenizer (vocabulary)
thomwolf's avatar
thomwolf committed
90
91
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

thomwolf's avatar
thomwolf committed
92
# Tokenize input
thomwolf's avatar
thomwolf committed
93
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
thomwolf's avatar
thomwolf committed
94
tokenized_text = tokenizer.tokenize(text)
thomwolf's avatar
thomwolf committed
95
96

# Mask a token that we will try to predict back with `BertForMaskedLM`
Liang Niu's avatar
Liang Niu committed
97
masked_index = 8
thomwolf's avatar
thomwolf committed
98
tokenized_text[masked_index] = '[MASK]'
thomwolf's avatar
thomwolf committed
99
assert tokenized_text == ['[CLS]', 'who', 'was', 'jim', 'henson', '?', '[SEP]', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer', '[SEP]']
thomwolf's avatar
thomwolf committed
100
101
102

# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
thomwolf's avatar
thomwolf committed
103
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
thomwolf's avatar
thomwolf committed
104
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
thomwolf's avatar
thomwolf committed
105

thomwolf's avatar
thomwolf committed
106
# Convert inputs to PyTorch tensors
thomwolf's avatar
thomwolf committed
107
108
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
thomwolf's avatar
thomwolf committed
109
110
```

thomwolf's avatar
thomwolf committed
111
Let's see how to use `BertModel` to get encoded inputs:
thomwolf's avatar
thomwolf committed
112
113
114
115

```python
# Load pre-trained model (weights)
model = BertModel.from_pretrained('bert-base-uncased')
thomwolf's avatar
thomwolf committed
116
117
118

# Set the model in evaluation mode to desactivate the DropOut modules
# This is IMPORTANT to have reproductible results during evaluation!
thomwolf's avatar
thomwolf committed
119
model.eval()
thomwolf's avatar
thomwolf committed
120

thomwolf's avatar
thomwolf committed
121
122
123
124
125
# If you have a GPU, put everything on cuda
tokens_tensor = tokens_tensor.to('cuda')
segments_tensors = segments_tensors.to('cuda')
model.to('cuda')

thomwolf's avatar
thomwolf committed
126
# Predict hidden states features for each layer
thomwolf's avatar
thomwolf committed
127
with torch.no_grad():
thomwolf's avatar
thomwolf committed
128
129
130
131
132
133
134
135
    # See the models docstrings for the detail of the inputs
    outputs = model(tokens_tensor, token_type_ids=segments_tensors)
    # PyTorch-Transformers models always output tuples.
    # See the models docstrings for the detail of all the outputs
    # In our case, the first element is the hidden state of the last layer of the Bert model
    encoded_layers = outputs[0]
# We have encoded our input sequence in a FloatTensor of shape (batch size, sequence length, model hidden dimension)
assert tuple(encoded_layers.shape) == (1, len(indexed_tokens), model.config.hidden_size)
thomwolf's avatar
thomwolf committed
136
137
```

thomwolf's avatar
thomwolf committed
138
And how to use `BertForMaskedLM` to predict a masked token:
thomwolf's avatar
thomwolf committed
139
140
141
142
143
144

```python
# Load pre-trained model (weights)
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.eval()

thomwolf's avatar
thomwolf committed
145
146
147
148
149
# If you have a GPU, put everything on cuda
tokens_tensor = tokens_tensor.to('cuda')
segments_tensors = segments_tensors.to('cuda')
model.to('cuda')

thomwolf's avatar
thomwolf committed
150
# Predict all tokens
thomwolf's avatar
thomwolf committed
151
with torch.no_grad():
thomwolf's avatar
thomwolf committed
152
153
    outputs = model(tokens_tensor, token_type_ids=segments_tensors)
    predictions = outputs[0]
thomwolf's avatar
thomwolf committed
154

thomwolf's avatar
thomwolf committed
155
# confirm we were able to predict 'henson'
thomwolf's avatar
thomwolf committed
156
predicted_index = torch.argmax(predictions[0, masked_index]).item()
157
predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
thomwolf's avatar
thomwolf committed
158
159
160
assert predicted_token == 'henson'
```

thomwolf's avatar
thomwolf committed
161
### OpenAI GPT-2
thomwolf's avatar
thomwolf committed
162

thomwolf's avatar
thomwolf committed
163
Here is a quick-start example using `GPT2Tokenizer` and `GPT2LMHeadModel` class with OpenAI's pre-trained model.
thomwolf's avatar
thomwolf committed
164

thomwolf's avatar
thomwolf committed
165
First let's prepare a tokenized input with `GPT2Tokenizer`
thomwolf's avatar
thomwolf committed
166
167
168

```python
import torch
thomwolf's avatar
thomwolf committed
169
from pytorch_transformers import GPT2Tokenizer, GPT2LMHeadModel
thomwolf's avatar
thomwolf committed
170

thomwolf's avatar
thomwolf committed
171
172
173
174
# OPTIONAL: if you want to have more information on what's happening, activate the logger as follows
import logging
logging.basicConfig(level=logging.INFO)

thomwolf's avatar
thomwolf committed
175
# Load pre-trained model tokenizer (vocabulary)
thomwolf's avatar
thomwolf committed
176
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
thomwolf's avatar
thomwolf committed
177

thomwolf's avatar
thomwolf committed
178
179
180
# Encode some inputs
text = "Who was Jim Henson ? Jim Henson was a"
indexed_tokens = tokenizer.encode(text)
thomwolf's avatar
thomwolf committed
181
182
183
184
185

# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
```

thomwolf's avatar
thomwolf committed
186
Let's see how to use `GPT2LMHeadModel` to generate some text from our prompt:
thomwolf's avatar
thomwolf committed
187
188
189

```python
# Load pre-trained model (weights)
thomwolf's avatar
thomwolf committed
190
model = GPT2LMHeadModel.from_pretrained('gpt2')
thomwolf's avatar
thomwolf committed
191

thomwolf's avatar
thomwolf committed
192
193
# Set the model in evaluation mode to desactivate the DropOut modules
# This is IMPORTANT to have reproductible results during evaluation!
thomwolf's avatar
thomwolf committed
194
195
model.eval()

thomwolf's avatar
thomwolf committed
196
197
198
199
# If you have a GPU, put everything on cuda
tokens_tensor = tokens_tensor.to('cuda')
model.to('cuda')

thomwolf's avatar
thomwolf committed
200
# Predict all tokens
thomwolf's avatar
thomwolf committed
201
with torch.no_grad():
thomwolf's avatar
thomwolf committed
202
203
    outputs = model(tokens_tensor)
    predictions = outputs[0]
thomwolf's avatar
thomwolf committed
204

thomwolf's avatar
thomwolf committed
205
# get the predicted next sub-word (in our case, the word 'man')
thomwolf's avatar
thomwolf committed
206
predicted_index = torch.argmax(predictions[0, -1, :]).item()
thomwolf's avatar
thomwolf committed
207
208
predicted_text = tokenizer.decode(indexed_tokens + [predicted_index])
assert predicted_text == 'Who was Jim Henson? Jim Henson was a man'
thomwolf's avatar
thomwolf committed
209
210
```

thomwolf's avatar
thomwolf committed
211
Examples for each model class of each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the documentation.
thomwolf's avatar
thomwolf committed
212

thomwolf's avatar
thomwolf committed
213
## Quick tour: Fine-tuning/usage scripts
thomwolf's avatar
thomwolf committed
214

thomwolf's avatar
thomwolf committed
215
We include several example script with SOTA performances for NLU and NLG tasks:
thomwolf's avatar
thomwolf committed
216

thomwolf's avatar
thomwolf committed
217
218
219
- fine-tuning Bert/XLNet/XLM with a *sequence-level classifier* on nine different GLUE tasks,
- fine-tuning Bert/XLNet/XLM with a *token-level classifier* on the question answering dataset SQuAD 2.0, and
- using GPT/GPT-2/Transformer-XL and XLNet for conditional language generation.
thomwolf's avatar
thomwolf committed
220

thomwolf's avatar
thomwolf committed
221
Here are three quick examples:
thomwolf's avatar
thomwolf committed
222

thomwolf's avatar
thomwolf committed
223
### Fine-tuning for sequence classification: GLUE tasks examples
thomwolf's avatar
thomwolf committed
224

thomwolf's avatar
thomwolf committed
225
The [General Language Understanding Evaluation (GLUE) benchmark](https://gluebenchmark.com/) is a collection of nine sentence- or sentence-pair language understanding tasks for evaluating and analyzing natural language understanding systems.
thomwolf's avatar
thomwolf committed
226

thomwolf's avatar
thomwolf committed
227
228
229
230
Before running anyone of these GLUE tasks you should download the
[GLUE data](https://gluebenchmark.com/tasks) by running
[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
and unpack it to some directory `$GLUE_DIR`.
thomwolf's avatar
thomwolf committed
231

thomwolf's avatar
thomwolf committed
232
233
234
```shell
export GLUE_DIR=/path/to/glue
export TASK_NAME=MRPC
thomwolf's avatar
thomwolf committed
235

thomwolf's avatar
thomwolf committed
236
237
238
239
240
241
242
243
244
245
246
247
python run_bert_classifier.py \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir $GLUE_DIR/$TASK_NAME \
  --bert_model bert-base-uncased \
  --max_seq_length 128 \
  --train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir /tmp/$TASK_NAME/
thomwolf's avatar
thomwolf committed
248
249
```

thomwolf's avatar
thomwolf committed
250
where task name can be one of CoLA, SST-2, MRPC, STS-B, QQP, MNLI, QNLI, RTE, WNLI.
thomwolf's avatar
thomwolf committed
251

thomwolf's avatar
thomwolf committed
252
The dev set results will be present within the text file 'eval_results.txt' in the specified output_dir. In case of MNLI, since there are two separate dev sets, matched and mismatched, there will be a separate output folder called '/tmp/MNLI-MM/' in addition to '/tmp/MNLI/'.
thomwolf's avatar
thomwolf committed
253

thomwolf's avatar
thomwolf committed
254
#### Fine-tuning XLNet model on the STS-B regression task
thomwolf's avatar
thomwolf committed
255

thomwolf's avatar
thomwolf committed
256
257
This example code fine-tunes XLNet on the STS-B corpus using parallel training on a server with 4 V100 GPUs.
Parallel training is a simple way to use several GPU (but it is slower and less flexible than distributed training, see below).
thomwolf's avatar
thomwolf committed
258

thomwolf's avatar
thomwolf committed
259
260
```shell
export GLUE_DIR=/path/to/glue
thomwolf's avatar
thomwolf committed
261

thomwolf's avatar
thomwolf committed
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
python ./examples/run_glue.py \
    --model_type xlnet \
    --model_name_or_path xlnet-large-cased \
    --do_train  \
    --task_name=sts-b     \
    --data_dir=${GLUE_DIR}/STS-B  \
    --output_dir=./proc_data/sts-b-110   \
    --max_seq_length=128   \
    --per_gpu_eval_batch_size=8   \
    --per_gpu_train_batch_size=8   \
    --gradient_accumulation_steps=1 \
    --max_steps=1200  \
    --model_name=xlnet-large-cased   \
    --overwrite_output_dir   \
    --overwrite_cache \
    --warmup_steps=120
thomwolf's avatar
thomwolf committed
278
279
```

thomwolf's avatar
thomwolf committed
280
281
On this machine we thus have a batch size of 32, please increase `gradient_accumulation_steps` to reach the same batch size if you have a smaller machine.
These hyper-parameters give evaluation results pearsonr of `0.918`.
thomwolf's avatar
thomwolf committed
282

thomwolf's avatar
thomwolf committed
283
#### Fine-tuning Bert model on the MRPC classification task
thomwolf's avatar
thomwolf committed
284

thomwolf's avatar
thomwolf committed
285
This example code fine-tunes the Bert Whole Word Masking model on the Microsoft Research Paraphrase Corpus (MRPC) corpus using distributed training on 8 V100 GPUs to reach a F1 > 92.
thomwolf's avatar
thomwolf committed
286

thomwolf's avatar
thomwolf committed
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
```bash
python -m torch.distributed.launch --nproc_per_node 8 run_bert_classifier.py   \
    --model_type bert \
    --model_name_or_path bert-large-uncased-whole-word-masking \
    --task_name MRPC \
    --do_train   \
    --do_eval   \
    --do_lower_case   \
    --data_dir $GLUE_DIR/MRPC/   \
    --max_seq_length 128   \
    --per_gpu_eval_batch_size=8   \
    --per_gpu_train_batch_size=8   \
    --learning_rate 2e-5   \
    --num_train_epochs 3.0  \
    --output_dir /tmp/mrpc_output/ \
    --overwrite_output_dir   \
    --overwrite_cache \
thomwolf's avatar
thomwolf committed
304
305
```

thomwolf's avatar
thomwolf committed
306
Training with these hyper-parameters gave us the following results:
thomwolf's avatar
thomwolf committed
307

thomwolf's avatar
thomwolf committed
308
309
310
311
312
313
314
```bash
  acc = 0.8823529411764706
  acc_and_f1 = 0.901702786377709
  eval_loss = 0.3418912578906332
  f1 = 0.9210526315789473
  global_step = 174
  loss = 0.07231863956341798
thomwolf's avatar
thomwolf committed
315
316
```

thomwolf's avatar
thomwolf committed
317
### Fine-tuning for question-answering: SQuAD example
thomwolf's avatar
thomwolf committed
318

thomwolf's avatar
thomwolf committed
319
This example code fine-tunes BERT on the SQuAD dataset using distributed training on 8 V100 GPUs and Bert Whole Word Masking uncased model to reach a F1 > 93 on SQuAD:
thomwolf's avatar
thomwolf committed
320

thomwolf's avatar
thomwolf committed
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
```bash
python -m torch.distributed.launch --nproc_per_node=8 run_squad.py \
    --model_type bert \
    --model_name_or_path bert-large-uncased-whole-word-masking \
    --do_train \
    --do_predict \
    --do_lower_case \
    --train_file $SQUAD_DIR/train-v1.1.json \
    --predict_file $SQUAD_DIR/dev-v1.1.json \
    --learning_rate 3e-5 \
    --num_train_epochs 2 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --output_dir ../models/wwm_uncased_finetuned_squad/ \
    --per_gpu_eval_batch_size=3   \
    --per_gpu_train_batch_size=3   \
thomwolf's avatar
thomwolf committed
337
338
```

thomwolf's avatar
thomwolf committed
339
Training with these hyper-parameters gave us the following results:
thomwolf's avatar
thomwolf committed
340

thomwolf's avatar
thomwolf committed
341
342
343
```bash
python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ../models/wwm_uncased_finetuned_squad/predictions.json
{"exact_match": 86.91579943235573, "f1": 93.1532499015869}
thomwolf's avatar
thomwolf committed
344
345
```

thomwolf's avatar
thomwolf committed
346
This is the model provided as `bert-large-uncased-whole-word-masking-finetuned-squad`.
347

thomwolf's avatar
thomwolf committed
348
### Conditional generation: Text generation with GPT, GPT-2, Transformer-XL and XLNet
349

thomwolf's avatar
thomwolf committed
350
351
A conditional generation script is also included to generate text from a prompt.
The generation script include the [tricks](https://github.com/rusiaaman/XLNet-gen#methodology) proposed by by Aman Rusia to get high quality generation with memory models like Transformer-XL and XLNet (include a predefined text to make short inputs longer).
352

thomwolf's avatar
thomwolf committed
353
Here is how to run the script with the small version of OpenAI GPT-2 model:
354

thomwolf's avatar
thomwolf committed
355
356
357
358
359
```shell
python ./examples/run_glue.py \
    --model_type=gpt2 \
    --length=20 \
    --model_name_or_path=gpt2 \
360
361
```

thomwolf's avatar
thomwolf committed
362
## Documentation
thomwolf's avatar
thomwolf committed
363

thomwolf's avatar
thomwolf committed
364
The full documentation is available at https://huggingface.co/pytorch-transformers/.
thomwolf's avatar
thomwolf committed
365

thomwolf's avatar
thomwolf committed
366
## Citation
thomwolf's avatar
thomwolf committed
367

thomwolf's avatar
thomwolf committed
368
369
At the moment, there is no paper to cite for PyTorch-Transformers but we are working on preparing one.
In the meantime, please include a mention of the library and a link to the present repository if you use this work in a published or open-source project.