README.rst 14.4 KB
Newer Older
Thomas Grainger's avatar
Thomas Grainger committed
1
2
3
torch-vision
============

4
5
6
.. image:: https://travis-ci.org/pytorch/vision.svg?branch=master
    :target: https://travis-ci.org/pytorch/vision

Thomas Grainger's avatar
Thomas Grainger committed
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
This repository consists of:

-  `vision.datasets <#datasets>`__ : Data loaders for popular vision
   datasets
-  `vision.models <#models>`__ : Definitions for popular model
   architectures, such as AlexNet, VGG, and ResNet and pre-trained
   models.
-  `vision.transforms <#transforms>`__ : Common image transformations
   such as random crop, rotations etc.
-  `vision.utils <#utils>`__ : Useful stuff such as saving tensor (3 x H
   x W) as image to disk, given a mini-batch creating a grid of images,
   etc.

Installation
============

Soumith Chintala's avatar
Soumith Chintala committed
23
Anaconda:
Thomas Grainger's avatar
Thomas Grainger committed
24
25
26

.. code:: bash

Soumith Chintala's avatar
Soumith Chintala committed
27
    conda install torchvision -c soumith
Thomas Grainger's avatar
Thomas Grainger committed
28

Soumith Chintala's avatar
Soumith Chintala committed
29
pip:
Thomas Grainger's avatar
Thomas Grainger committed
30
31
32

.. code:: bash

Thomas Grainger's avatar
Thomas Grainger committed
33
    pip install torchvision
Thomas Grainger's avatar
Thomas Grainger committed
34

Soumith Chintala's avatar
Soumith Chintala committed
35
36
37
38
39
40
From source:

.. code:: bash

    python setup.py install

Thomas Grainger's avatar
Thomas Grainger committed
41
42
43
44
45
Datasets
========

The following dataset loaders are available:

46
-  `MNIST and FashionMNIST <#mnist>`__
Thomas Grainger's avatar
Thomas Grainger committed
47
48
49
50
51
-  `COCO (Captioning and Detection) <#coco>`__
-  `LSUN Classification <#lsun>`__
-  `ImageFolder <#imagefolder>`__
-  `Imagenet-12 <#imagenet-12>`__
-  `CIFAR10 and CIFAR100 <#cifar>`__
Elad Hoffer's avatar
Elad Hoffer committed
52
-  `STL10 <#stl10>`__
Soumith Chintala's avatar
Soumith Chintala committed
53
-  `SVHN <#svhn>`__
edgarriba's avatar
edgarriba committed
54
-  `PhotoTour <#phototour>`__
Thomas Grainger's avatar
Thomas Grainger committed
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

Datasets have the API: - ``__getitem__`` - ``__len__`` They all subclass
from ``torch.utils.data.Dataset`` Hence, they can all be multi-threaded
(python multiprocessing) using standard torch.utils.data.DataLoader.

For example:

``torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)``

In the constructor, each dataset has a slightly different API as needed,
but they all take the keyword args:

-  ``transform`` - a function that takes in an image and returns a
   transformed version
-  common stuff like ``ToTensor``, ``RandomCrop``, etc. These can be
   composed together with ``transforms.Compose`` (see transforms section
   below)
-  ``target_transform`` - a function that takes in the target and
   transforms it. For example, take in the caption string and return a
   tensor of word indices.

76
77
78
79
MNIST
~~~~~
``dset.MNIST(root, train=True, transform=None, target_transform=None, download=False)``

80
81
``dset.FashionMNIST(root, train=True, transform=None, target_transform=None, download=False)``

Sri Krishna's avatar
Sri Krishna committed
82
``root``: root directory of dataset where ``processed/training.pt`` and ``processed/test.pt`` exist
83
84
85
86
87
88
89
90
91
92

``train``: ``True`` - use training set, ``False`` - use test set.

``transform``: transform to apply to input images

``target_transform``: transform to apply to targets (class labels)

``download``: whether to download the MNIST data


Thomas Grainger's avatar
Thomas Grainger committed
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
COCO
~~~~

This requires the `COCO API to be
installed <https://github.com/pdollar/coco/tree/master/PythonAPI>`__

Captions:
^^^^^^^^^

``dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])``

Example:

.. code:: python

    import torchvision.datasets as dset
    import torchvision.transforms as transforms
    cap = dset.CocoCaptions(root = 'dir where images are',
                            annFile = 'json annotation file',
                            transform=transforms.ToTensor())

    print('Number of samples: ', len(cap))
    img, target = cap[3] # load 4th sample

    print("Image Size: ", img.size())
    print(target)

Output:

::

    Number of samples: 82783
    Image Size: (3L, 427L, 640L)
    [u'A plane emitting smoke stream flying over a mountain.',
    u'A plane darts across a bright blue sky behind a mountain covered in snow',
    u'A plane leaves a contrail above the snowy mountain top.',
    u'A mountain that has a plane flying overheard in the distance.',
    u'A mountain view with a plume of smoke in the background']

Detection:
^^^^^^^^^^

``dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])``

LSUN
~~~~

``dset.LSUN(db_path, classes='train', [transform, target_transform])``

142
143
144
145
146
147
-  ``db_path`` = root directory for the database files
-  ``classes`` =
-  ``'train'`` - all categories, training set
-  ``'val'`` - all categories, validation set
-  ``'test'`` - all categories, test set
-  [``'bedroom_train'``, ``'church_train'``, ...] : a list of categories to
Thomas Grainger's avatar
Thomas Grainger committed
148
149
150
151
152
153
154
155
156
157
158
159
160
   load

CIFAR
~~~~~

``dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)``

``dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)``

-  ``root`` : root directory of dataset where there is folder
   ``cifar-10-batches-py``
-  ``train`` : ``True`` = Training set, ``False`` = Test set
-  ``download`` : ``True`` = downloads the dataset from the internet and
Uridah Sami Ahmed's avatar
Uridah Sami Ahmed committed
161
   puts it in root directory. If dataset is already downloaded, does not do
Thomas Grainger's avatar
Thomas Grainger committed
162
163
   anything.

Elad Hoffer's avatar
Elad Hoffer committed
164
165
166
167
168
169
170
171
172
STL10
~~~~~

``dset.STL10(root, split='train', transform=None, target_transform=None, download=False)``

-  ``root`` : root directory of dataset where there is folder ``stl10_binary``
-  ``split`` : ``'train'`` = Training set, ``'test'`` = Test set, ``'unlabeled'`` = Unlabeled set,
    ``'train+unlabeled'`` = Training + Unlabeled set (missing label marked as ``-1``)
-  ``download`` : ``True`` = downloads the dataset from the internet and
Uridah Sami Ahmed's avatar
Uridah Sami Ahmed committed
173
    puts it in root directory. If dataset is already downloaded, does not do
Elad Hoffer's avatar
Elad Hoffer committed
174
    anything.
edgarriba's avatar
edgarriba committed
175

176
SVHN
177
~~~~
178

179
180
181
182
Note: The SVHN dataset assigns the label `10` to the digit `0`. However, in this Dataset,
we assign the label `0` to the digit `0` to be compatible with PyTorch loss functions which
expect the class labels to be in the range `[0, C-1]`

183
184
185
186
187
188
189
``dset.SVHN(root, split='train', transform=None, target_transform=None, download=False)``

-  ``root`` : root directory of dataset where there is folder ``SVHN``
-  ``split`` : ``'train'`` = Training set, ``'test'`` = Test set, ``'extra'`` = Extra training set
-  ``download`` : ``True`` = downloads the dataset from the internet and
    puts it in root directory. If dataset is already downloaded, does not do
    anything.
Elad Hoffer's avatar
Elad Hoffer committed
190

Thomas Grainger's avatar
Thomas Grainger committed
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
ImageFolder
~~~~~~~~~~~

A generic data loader where the images are arranged in this way:

::

    root/dog/xxx.png
    root/dog/xxy.png
    root/dog/xxz.png

    root/cat/123.png
    root/cat/nsdf3.png
    root/cat/asd932_.png

``dset.ImageFolder(root="root folder path", [transform, target_transform])``

It has the members:

-  ``self.classes`` - The class names as a list
-  ``self.class_to_idx`` - Corresponding class indices
-  ``self.imgs`` - The list of (image path, class-index) tuples

Imagenet-12
~~~~~~~~~~~

This is simply implemented with an ImageFolder dataset.

The data is preprocessed `as described
here <https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset>`__

`Here is an
example <https://github.com/pytorch/examples/blob/27e2a46c1d1505324032b1d94fc6ce24d5b67e97/imagenet/main.py#L48-L62>`__.

edgarriba's avatar
edgarriba committed
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
PhotoTour
~~~~~~~~~

**Learning Local Image Descriptors Data**
http://phototour.cs.washington.edu/patches/default.htm

.. code:: python

    import torchvision.datasets as dset
    import torchvision.transforms as transforms
    dataset = dset.PhotoTour(root = 'dir where images are',
                             name = 'name of the dataset to load',
                             transform=transforms.ToTensor())

    print('Loaded PhotoTour: {} with {} images.'
          .format(dataset.name, len(dataset.data)))

Thomas Grainger's avatar
Thomas Grainger committed
242
243
244
245
246
247
248
249
250
251
252
253
Models
======

The models subpackage contains definitions for the following model
architectures:

-  `AlexNet <https://arxiv.org/abs/1404.5997>`__: AlexNet variant from
   the "One weird trick" paper.
-  `VGG <https://arxiv.org/abs/1409.1556>`__: VGG-11, VGG-13, VGG-16,
   VGG-19 (with and without batch normalization)
-  `ResNet <https://arxiv.org/abs/1512.03385>`__: ResNet-18, ResNet-34,
   ResNet-50, ResNet-101, ResNet-152
254
255
-  `SqueezeNet <https://arxiv.org/abs/1602.07360>`__: SqueezeNet 1.0, and
   SqueezeNet 1.1
256
257
-  `DenseNet <https://arxiv.org/pdf/1608.06993.pdf>`__: DenseNet-128, DenseNet-169, DenseNet-201 and DenseNet-161
-  `Inception v3 <https://arxiv.org/abs/1512.00567>`__ : Inception v3
Thomas Grainger's avatar
Thomas Grainger committed
258
259
260
261
262
263
264
265
266

You can construct a model with random weights by calling its
constructor:

.. code:: python

    import torchvision.models as models
    resnet18 = models.resnet18()
    alexnet = models.alexnet()
Yili Zhao's avatar
Yili Zhao committed
267
    vgg16 = models.vgg16()
268
    squeezenet = models.squeezenet1_0()
269
    densenet = models.densenet161()
270
    inception = models.inception_v3()
Thomas Grainger's avatar
Thomas Grainger committed
271

272
We provide pre-trained models for the ResNet variants, SqueezeNet 1.0 and 1.1,
273
AlexNet, VGG, Inception v3 and DenseNet using the PyTorch `model zoo <http://pytorch.org/docs/model_zoo.html>`__.
Thomas Grainger's avatar
Thomas Grainger committed
274
275
These can be constructed by passing ``pretrained=True``:

Soumith Chintala's avatar
Soumith Chintala committed
276
277
278
279
280
.. code:: python

    import torchvision.models as models
    resnet18 = models.resnet18(pretrained=True)
    alexnet = models.alexnet(pretrained=True)
281
    squeezenet = models.squeezenet1_0(pretrained=True)
282
    vgg16 = models.vgg16(pretrained=True)
283
    densenet = models.densenet161(pretrained=True)
284
285
    inception = models.inception_v3(pretrained=True)
    
Thomas Grainger's avatar
Thomas Grainger committed
286

287
288
All pre-trained models expect input images normalized in the same way, i.e.
mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected
289
to be at least 224.
290
291
292
293

The images have to be loaded in to a range of [0, 1] and then
normalized using `mean=[0.485, 0.456, 0.406]` and `std=[0.229, 0.224, 0.225]`

294
An example of such normalization can be found in the imagenet example `here <https://github.com/pytorch/examples/blob/42e5b996718797e45c46a25c55b031e6768f8440/imagenet/main.py#L89-L101>`__
295

Thomas Grainger's avatar
Thomas Grainger committed
296
297
298
299
300
301
302
Transforms
==========

Transforms are common image transforms. They can be chained together
using ``transforms.Compose``

``transforms.Compose``
303
~~~~~~~~~~~~~~~~~~~~~~
Thomas Grainger's avatar
Thomas Grainger committed
304
305
306
307
308
309
310
311
312
313
314
315
316
317

One can compose several transforms together. For example.

.. code:: python

    transform = transforms.Compose([
        transforms.RandomSizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
                              std = [ 0.229, 0.224, 0.225 ]),
    ])

Transforms on PIL.Image
Thomas Grainger's avatar
Thomas Grainger committed
318
~~~~~~~~~~~~~~~~~~~~~~~
Thomas Grainger's avatar
Thomas Grainger committed
319
320

``Scale(size, interpolation=Image.BILINEAR)``
Thomas Grainger's avatar
Thomas Grainger committed
321
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
322

323
Rescales the input PIL.Image to the given 'size'. 
Thomas Grainger's avatar
Thomas Grainger committed
324

325
326
327
If 'size' is a 2-element tuple or list in the order of (width, height), it will be the exactly size to scale.

If 'size' is a number, it will indicate the size of the smaller edge. 
Thomas Grainger's avatar
Thomas Grainger committed
328
329
330
331
332
For example, if height > width, then image will be rescaled to (size \*
height / width, size) - size: size of the smaller edge - interpolation:
Default: PIL.Image.BILINEAR

``CenterCrop(size)`` - center-crops the image to the given size
Thomas Grainger's avatar
Thomas Grainger committed
333
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
334
335
336
337
338
339

Crops the given PIL.Image at the center to have a region of the given
size. size can be a tuple (target\_height, target\_width) or an integer,
in which case the target will be of a square shape (size, size)

``RandomCrop(size, padding=0)``
Thomas Grainger's avatar
Thomas Grainger committed
340
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
341
342
343
344
345
346
347
348

Crops the given PIL.Image at a random location to have a region of the
given size. size can be a tuple (target\_height, target\_width) or an
integer, in which case the target will be of a square shape (size, size)
If ``padding`` is non-zero, then the image is first zero-padded on each
side with ``padding`` pixels.

``RandomHorizontalFlip()``
Thomas Grainger's avatar
Thomas Grainger committed
349
^^^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
350
351
352
353
354

Randomly horizontally flips the given PIL.Image with a probability of
0.5

``RandomSizedCrop(size, interpolation=Image.BILINEAR)``
Thomas Grainger's avatar
Thomas Grainger committed
355
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
356
357
358
359
360
361
362
363
364

Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the
original size and and a random aspect ratio of 3/4 to 4/3 of the
original aspect ratio

This is popularly used to train the Inception networks - size: size of
the smaller edge - interpolation: Default: PIL.Image.BILINEAR

``Pad(padding, fill=0)``
Thomas Grainger's avatar
Thomas Grainger committed
365
^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
366
367
368
369
370
371

Pads the given image on each side with ``padding`` number of pixels, and
the padding pixels are filled with pixel value ``fill``. If a ``5x5``
image is padded with ``padding=1`` then it becomes ``7x7``

Transforms on torch.\*Tensor
Thomas Grainger's avatar
Thomas Grainger committed
372
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Thomas Grainger's avatar
Thomas Grainger committed
373
374

``Normalize(mean, std)``
Thomas Grainger's avatar
Thomas Grainger committed
375
^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
376
377
378
379

Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of
the torch.\*Tensor, i.e. channel = (channel - mean) / std

380
381
382
383
384
385
386
387
388
389
390
391
``LinearTransformation(transformation_matrix)``
^^^^^^^^^^^^^^^^^^^^^^^^

Given ``transformation_matrix`` (D x D), where D = (C x H x W), will compute its
dot product with the flattened torch.\*Tensor and then reshape it to its
original dimensions.

Applications:
- whitening: zero-center the data, compute the data covariance matrix [D x D] with 
np.dot(X.T, X), perform SVD on this matrix and pass the principal components as 
transformation_matrix.

Thomas Grainger's avatar
Thomas Grainger committed
392
Conversion Transforms
Thomas Grainger's avatar
Thomas Grainger committed
393
~~~~~~~~~~~~~~~~~~~~~
Thomas Grainger's avatar
Thomas Grainger committed
394
395
396
397
398
399
400
401

-  ``ToTensor()`` - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x
   C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W)
   in the range [0.0, 1.0]
-  ``ToPILImage()`` - Converts a torch.\*Tensor of range [0, 1] and
   shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and
   shape H x W x C to a PIL.Image of range [0, 255]

Guillaume George's avatar
Guillaume George committed
402
Generic Transforms
Thomas Grainger's avatar
Thomas Grainger committed
403
~~~~~~~~~~~~~~~~~~
Thomas Grainger's avatar
Thomas Grainger committed
404
405

``Lambda(lambda)``
Thomas Grainger's avatar
Thomas Grainger committed
406
^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
407
408
409
410
411
412
413
414
415
416
417

Given a Python lambda, applies it to the input ``img`` and returns it.
For example:

.. code:: python

    transforms.Lambda(lambda x: x.add(10))

Utils
=====

418
``make_grid(tensor, nrow=8, padding=2, normalize=False, range=None, scale_each=False, pad_value=0)``
419
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Thomas Grainger's avatar
Thomas Grainger committed
420

421
422
423
Given a 4D mini-batch Tensor of shape (B x C x H x W),
or a list of images all of the same size,
makes a grid of images
Thomas Grainger's avatar
Thomas Grainger committed
424

425
``normalize=True`` will shift the image to the range (0, 1),
426
427
by subtracting the minimum and dividing by the maximum pixel value.

428
if ``range=(min, max)`` where ``min`` and ``max`` are numbers, then these numbers are used to
429
430
normalize the image.

431
432
``scale_each=True`` will scale each image in the batch of images separately rather than
computing the ``(min, max)`` over all images.
433

434
``pad_value=<float>`` sets the value for the padded pixels.
435

436
Example usage is given in this `notebook <https://gist.github.com/anonymous/bf16430f7750c023141c562f3e9f2a91>`__ 
437

438
``save_image(tensor, filename, nrow=8, padding=2, normalize=False, range=None, scale_each=False, pad_value=0)``
439
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Thomas Grainger's avatar
Thomas Grainger committed
440
441
442
443

Saves a given Tensor into an image file.

If given a mini-batch tensor, will save the tensor as a grid of images.
444

445
All options after ``filename`` are passed through to ``make_grid``. Refer to it's documentation for
446
more details