pipelines.md 15.3 KB
Newer Older
Sylvain Gugger's avatar
Sylvain Gugger committed
1
2
3
4
5
6
7
8
9
10
<!--Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
11
12
13
14

鈿狅笍 Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

Sylvain Gugger's avatar
Sylvain Gugger committed
15
16
17
18
19
20
21
22
23
24
25
26
-->

# Pipelines

The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of
the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity
Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the
[task summary](../task_summary) for examples of use.

There are two categories of pipeline abstractions to be aware about:

- The [`pipeline`] which is the most powerful object encapsulating all other pipelines.
27
- Task-specific pipelines are available for [audio](#audio), [computer vision](#computer-vision), [natural language processing](#natural-language-processing), and [multimodal](#multimodal) tasks.
Sylvain Gugger's avatar
Sylvain Gugger committed
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

## The pipeline abstraction

The *pipeline* abstraction is a wrapper around all the other available pipelines. It is instantiated as any other
pipeline but can provide additional quality of life.

Simple call on one item:

```python
>>> pipe = pipeline("text-classification")
>>> pipe("This restaurant is awesome")
[{'label': 'POSITIVE', 'score': 0.9998743534088135}]
```

If you want to use a specific model from the [hub](https://huggingface.co) you can ignore the task if the model on
the hub already defines it:

```python
46
>>> pipe = pipeline(model="FacebookAI/roberta-large-mnli")
Sylvain Gugger's avatar
Sylvain Gugger committed
47
>>> pipe("This restaurant is awesome")
Samuel Xu's avatar
Samuel Xu committed
48
[{'label': 'NEUTRAL', 'score': 0.7313136458396912}]
Sylvain Gugger's avatar
Sylvain Gugger committed
49
50
```

Samuel Xu's avatar
Samuel Xu committed
51
To call a pipeline on many items, you can call it with a *list*.
Sylvain Gugger's avatar
Sylvain Gugger committed
52
53
54

```python
>>> pipe = pipeline("text-classification")
Samuel Xu's avatar
Samuel Xu committed
55
>>> pipe(["This restaurant is awesome", "This restaurant is awful"])
Sylvain Gugger's avatar
Sylvain Gugger committed
56
57
58
59
[{'label': 'POSITIVE', 'score': 0.9998743534088135},
 {'label': 'NEGATIVE', 'score': 0.9996669292449951}]
```

Samuel Xu's avatar
Samuel Xu committed
60
To iterate over full datasets it is recommended to use a `dataset` directly. This means you don't need to allocate
Sylvain Gugger's avatar
Sylvain Gugger committed
61
62
63
64
65
66
the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on
GPU. If it doesn't don't hesitate to create an issue.

```python
import datasets
from transformers import pipeline
67
from transformers.pipelines.pt_utils import KeyDataset
68
from tqdm.auto import tqdm
Sylvain Gugger's avatar
Sylvain Gugger committed
69
70
71
72
73

pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
dataset = datasets.load_dataset("superb", name="asr", split="test")

# KeyDataset (only *pt*) will simply return the item in the dict returned by the dataset item
Rohit Gupta's avatar
Rohit Gupta committed
74
# as we're not interested in the *target* part of the dataset. For sentence pair use KeyPairDataset
75
for out in tqdm(pipe(KeyDataset(dataset, "file"))):
Sylvain Gugger's avatar
Sylvain Gugger committed
76
77
78
79
80
81
    print(out)
    # {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
    # {"text": ....}
    # ....
```

82
83
84
85
86
87
88
89
For ease of use, a generator is also possible:


```python
from transformers import pipeline

pipe = pipeline("text-classification")

Sylvain Gugger's avatar
Sylvain Gugger committed
90

91
92
93
94
95
96
97
98
99
def data():
    while True:
        # This could come from a dataset, a database, a queue or HTTP request
        # in a server
        # Caveat: because this is iterative, you cannot use `num_workers > 1` variable
        # to use multiple threads to preprocess data. You can still have 1 thread that
        # does the preprocessing while the main runs the big inference
        yield "This is a test"

Sylvain Gugger's avatar
Sylvain Gugger committed
100

101
102
103
104
105
106
107
for out in pipe(data()):
    print(out)
    # {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
    # {"text": ....}
    # ....
```

Sylvain Gugger's avatar
Sylvain Gugger committed
108
109
110
111
[[autodoc]] pipeline

## Pipeline batching

112
113
All pipelines can use batching. This will work
whenever the pipeline uses its streaming ability (so when passing lists or `Dataset` or `generator`).
Sylvain Gugger's avatar
Sylvain Gugger committed
114
115

```python
Sylvain Gugger's avatar
Sylvain Gugger committed
116
from transformers import pipeline
117
from transformers.pipelines.pt_utils import KeyDataset
Sylvain Gugger's avatar
Sylvain Gugger committed
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
import datasets

dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised")
pipe = pipeline("text-classification", device=0)
for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"):
    print(out)
    # [{'label': 'POSITIVE', 'score': 0.9998743534088135}]
    # Exactly the same output as before, but the content are passed
    # as batches to the model
```

<Tip warning={true}>

However, this is not automatically a win for performance. It can be either a 10x speedup or 5x slowdown depending
on hardware, data and the actual model being used.

134
Example where it's mostly a speedup:
Sylvain Gugger's avatar
Sylvain Gugger committed
135
136
137
138

</Tip>

```python
Sylvain Gugger's avatar
Sylvain Gugger committed
139
140
from transformers import pipeline
from torch.utils.data import Dataset
141
from tqdm.auto import tqdm
Sylvain Gugger's avatar
Sylvain Gugger committed
142

Sylvain Gugger's avatar
Sylvain Gugger committed
143
pipe = pipeline("text-classification", device=0)
Sylvain Gugger's avatar
Sylvain Gugger committed
144
145


Sylvain Gugger's avatar
Sylvain Gugger committed
146
147
148
class MyDataset(Dataset):
    def __len__(self):
        return 5000
Sylvain Gugger's avatar
Sylvain Gugger committed
149

Sylvain Gugger's avatar
Sylvain Gugger committed
150
151
    def __getitem__(self, i):
        return "This is a test"
Sylvain Gugger's avatar
Sylvain Gugger committed
152
153


Sylvain Gugger's avatar
Sylvain Gugger committed
154
dataset = MyDataset()
Sylvain Gugger's avatar
Sylvain Gugger committed
155
156

for batch_size in [1, 8, 64, 256]:
Sylvain Gugger's avatar
Sylvain Gugger committed
157
158
    print("-" * 30)
    print(f"Streaming batch_size={batch_size}")
159
    for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)):
Sylvain Gugger's avatar
Sylvain Gugger committed
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
        pass
```

```
# On GTX 970
------------------------------
Streaming no batching
100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 5000/5000 [00:26<00:00, 187.52it/s]
------------------------------
Streaming batch_size=8
100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 5000/5000 [00:04<00:00, 1205.95it/s]
------------------------------
Streaming batch_size=64
100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 5000/5000 [00:02<00:00, 2478.24it/s]
------------------------------
Streaming batch_size=256
100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 5000/5000 [00:01<00:00, 2554.43it/s]
(diminishing returns, saturated the GPU)
```

Example where it's most a slowdown:

```python
Sylvain Gugger's avatar
Sylvain Gugger committed
183
184
185
186
187
188
189
190
191
class MyDataset(Dataset):
    def __len__(self):
        return 5000

    def __getitem__(self, i):
        if i % 64 == 0:
            n = 100
        else:
            n = 1
Sylvain Gugger's avatar
Sylvain Gugger committed
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
        return "This is a test" * n
```

This is a occasional very long sentence compared to the other. In that case, the **whole** batch will need to be 400
tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on
bigger batches, the program simply crashes.


```
------------------------------
Streaming no batching
100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 1000/1000 [00:05<00:00, 183.69it/s]
------------------------------
Streaming batch_size=8
100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻坾 1000/1000 [00:03<00:00, 265.74it/s]
------------------------------
Streaming batch_size=64
100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅| 1000/1000 [00:26<00:00, 37.80it/s]
------------------------------
Streaming batch_size=256
  0%|                                                                                 | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/nicolas/src/transformers/test.py", line 42, in <module>
215
    for out in tqdm(pipe(dataset, batch_size=256), total=len(dataset)):
Sylvain Gugger's avatar
Sylvain Gugger committed
216
217
218
219
220
221
222
223
224
225
226
227
....
    q = q / math.sqrt(dim_per_head)  # (bs, n_heads, q_length, dim_per_head)
RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 3.95 GiB total capacity; 1.72 GiB already allocated; 354.88 MiB free; 2.46 GiB reserved in total by PyTorch)
```

There are no good (general) solutions for this problem, and your mileage may vary depending on your use cases. Rule of
thumb:

For users, a rule of thumb is:

- **Measure performance on your load, with your hardware. Measure, measure, and keep measuring. Real numbers are the
  only way to go.**
228
- If you are latency constrained (live product doing inference), don't batch.
Sylvain Gugger's avatar
Sylvain Gugger committed
229
230
231
232
233
234
235
236
237
238
239
- If you are using CPU, don't batch.
- If you are using throughput (you want to run your model on a bunch of static data), on GPU, then:

  - If you have no clue about the size of the sequence_length ("natural" data), by default don't batch, measure and
    try tentatively to add it, add OOM checks to recover when it will fail (and it will at some point if you don't
    control the sequence_length.)
  - If your sequence_length is super regular, then batching is more likely to be VERY interesting, measure and push
    it until you get OOMs.
  - The larger the GPU the more likely batching is going to be more interesting
- As soon as you enable batching, make sure you can handle OOMs nicely.

240
241
242
## Pipeline chunk batching

`zero-shot-classification` and `question-answering` are slightly specific in the sense, that a single input might yield
Kamal Raj's avatar
Kamal Raj committed
243
multiple forward pass of a model. Under normal circumstances, this would yield issues with `batch_size` argument.
244
245
246
247
248
249
250
251

In order to circumvent this issue, both of these pipelines are a bit specific, they are `ChunkPipeline` instead of
regular `Pipeline`. In short:


```python
preprocessed = pipe.preprocess(inputs)
model_outputs = pipe.forward(preprocessed)
Kamal Raj's avatar
Kamal Raj committed
252
outputs = pipe.postprocess(model_outputs)
253
254
255
256
257
258
259
260
261
262
```

Now becomes:


```python
all_model_outputs = []
for preprocessed in pipe.preprocess(inputs):
    model_outputs = pipe.forward(preprocessed)
    all_model_outputs.append(model_outputs)
Kamal Raj's avatar
Kamal Raj committed
263
outputs = pipe.postprocess(all_model_outputs)
264
265
266
267
268
269
270
```

This should be very transparent to your code because the pipelines are used in
the same way.

This is a simplified view, since the pipeline can handle automatically the batch to ! Meaning you don't have to care
about how many forward passes you inputs are actually going to trigger, you can optimize the `batch_size`
Kamal Raj's avatar
Kamal Raj committed
271
independently of the inputs. The caveats from the previous section still apply.
272

273
274
275
276
277
## Pipeline FP16 inference
Models can be run in FP16 which can be significantly faster on GPU while saving memory. Most models will not suffer noticeable performance loss from this. The larger the model, the less likely that it will.

To enable FP16 inference, you can simply pass `torch_dtype=torch.float16` or `torch_dtype='float16'` to the pipeline constructor. Note that this only works for models with a PyTorch backend. Your inputs will be converted to FP16 internally.

Sylvain Gugger's avatar
Sylvain Gugger committed
278
279
280
281
282
283
284
285
286
287
288
289
290
291
## Pipeline custom code

If you want to override a specific pipeline.

Don't hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most
cases, so `transformers` could maybe support your use case.


If you want to try simply you can:

- Subclass your pipeline of choice

```python
class MyPipeline(TextClassificationPipeline):
Sylvain Gugger's avatar
Sylvain Gugger committed
292
293
    def postprocess():
        # Your code goes here
Sylvain Gugger's avatar
Sylvain Gugger committed
294
        scores = scores * 100
Sylvain Gugger's avatar
Sylvain Gugger committed
295
296
        # And here

Sylvain Gugger's avatar
Sylvain Gugger committed
297
298
299
300
301
302
303
304
305
306
307
308
309

my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)
# or if you use *pipeline* function, then:
my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)
```

That should enable you to do all the custom code you want.


## Implementing a pipeline

[Implementing a new pipeline](../add_new_pipeline)

310
## Audio
Sylvain Gugger's avatar
Sylvain Gugger committed
311

312
Pipelines available for audio tasks include the following.
Sylvain Gugger's avatar
Sylvain Gugger committed
313
314
315
316
317
318
319
320
321
322
323
324
325

### AudioClassificationPipeline

[[autodoc]] AudioClassificationPipeline
    - __call__
    - all

### AutomaticSpeechRecognitionPipeline

[[autodoc]] AutomaticSpeechRecognitionPipeline
    - __call__
    - all

326
327
328
329
330
331
332
### TextToAudioPipeline

[[autodoc]] TextToAudioPipeline
    - __call__
    - all


333
334
335
336
337
338
### ZeroShotAudioClassificationPipeline

[[autodoc]] ZeroShotAudioClassificationPipeline
    - __call__
    - all

339
## Computer vision
Sylvain Gugger's avatar
Sylvain Gugger committed
340

341
Pipelines available for computer vision tasks include the following.
Sylvain Gugger's avatar
Sylvain Gugger committed
342

343
344
### DepthEstimationPipeline
[[autodoc]] DepthEstimationPipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
345
346
347
    - __call__
    - all

348
349
350
### ImageClassificationPipeline

[[autodoc]] ImageClassificationPipeline
351
    - __call__
352
    - all
353

354
### ImageSegmentationPipeline
355

356
[[autodoc]] ImageSegmentationPipeline
357
358
    - __call__
    - all
Sylvain Gugger's avatar
Sylvain Gugger committed
359

360
361
362
363
364
365
### ImageToImagePipeline

[[autodoc]] ImageToImagePipeline
    - __call__
    - all

366
367
368
### ObjectDetectionPipeline

[[autodoc]] ObjectDetectionPipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
369
370
371
    - __call__
    - all

372
373
374
375
376
377
### VideoClassificationPipeline

[[autodoc]] VideoClassificationPipeline
    - __call__
    - all

378
### ZeroShotImageClassificationPipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
379

380
[[autodoc]] ZeroShotImageClassificationPipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
381
382
383
    - __call__
    - all

384
### ZeroShotObjectDetectionPipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
385

386
[[autodoc]] ZeroShotObjectDetectionPipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
387
388
389
    - __call__
    - all

390
## Natural Language Processing
Sylvain Gugger's avatar
Sylvain Gugger committed
391

392
393
394
Pipelines available for natural language processing tasks include the following.

### FillMaskPipeline
395

396
[[autodoc]] FillMaskPipeline
397
398
399
    - __call__
    - all

Sylvain Gugger's avatar
Sylvain Gugger committed
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
### QuestionAnsweringPipeline

[[autodoc]] QuestionAnsweringPipeline
    - __call__
    - all

### SummarizationPipeline

[[autodoc]] SummarizationPipeline
    - __call__
    - all

### TableQuestionAnsweringPipeline

[[autodoc]] TableQuestionAnsweringPipeline
    - __call__

### TextClassificationPipeline

[[autodoc]] TextClassificationPipeline
    - __call__
    - all

### TextGenerationPipeline

[[autodoc]] TextGenerationPipeline
    - __call__
    - all

### Text2TextGenerationPipeline

[[autodoc]] Text2TextGenerationPipeline
    - __call__
    - all

### TokenClassificationPipeline

[[autodoc]] TokenClassificationPipeline
    - __call__
    - all

### TranslationPipeline

[[autodoc]] TranslationPipeline
    - __call__
    - all

447
### ZeroShotClassificationPipeline
448

449
[[autodoc]] ZeroShotClassificationPipeline
450
451
452
    - __call__
    - all

453
## Multimodal
Sylvain Gugger's avatar
Sylvain Gugger committed
454

455
456
457
458
459
Pipelines available for multimodal tasks include the following.

### DocumentQuestionAnsweringPipeline

[[autodoc]] DocumentQuestionAnsweringPipeline
Sylvain Gugger's avatar
Sylvain Gugger committed
460
461
462
    - __call__
    - all

463
### FeatureExtractionPipeline
464

465
[[autodoc]] FeatureExtractionPipeline
466
467
468
    - __call__
    - all

469
470
471
472
473
474
### ImageFeatureExtractionPipeline

[[autodoc]] ImageFeatureExtractionPipeline
    - __call__
    - all

475
### ImageToTextPipeline
476

477
478
479
480
[[autodoc]] ImageToTextPipeline
    - __call__
    - all

481
482
483
484
485
486
### MaskGenerationPipeline

[[autodoc]] MaskGenerationPipeline
    - __call__
    - all

487
488
489
### VisualQuestionAnsweringPipeline

[[autodoc]] VisualQuestionAnsweringPipeline
490
491
492
    - __call__
    - all

Sylvain Gugger's avatar
Sylvain Gugger committed
493
494
495
## Parent class: `Pipeline`

[[autodoc]] Pipeline