README.rst 9.76 KB
Newer Older
Thomas Grainger's avatar
Thomas Grainger committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
torch-vision
============

This repository consists of:

-  `vision.datasets <#datasets>`__ : Data loaders for popular vision
   datasets
-  `vision.models <#models>`__ : Definitions for popular model
   architectures, such as AlexNet, VGG, and ResNet and pre-trained
   models.
-  `vision.transforms <#transforms>`__ : Common image transformations
   such as random crop, rotations etc.
-  `vision.utils <#utils>`__ : Useful stuff such as saving tensor (3 x H
   x W) as image to disk, given a mini-batch creating a grid of images,
   etc.

Installation
============

Soumith Chintala's avatar
Soumith Chintala committed
20
Anaconda:
Thomas Grainger's avatar
Thomas Grainger committed
21
22
23

.. code:: bash

Soumith Chintala's avatar
Soumith Chintala committed
24
    conda install torchvision -c soumith
Thomas Grainger's avatar
Thomas Grainger committed
25

Soumith Chintala's avatar
Soumith Chintala committed
26
pip:
Thomas Grainger's avatar
Thomas Grainger committed
27
28
29

.. code:: bash

Thomas Grainger's avatar
Thomas Grainger committed
30
    pip install torchvision
Thomas Grainger's avatar
Thomas Grainger committed
31

Soumith Chintala's avatar
Soumith Chintala committed
32
33
34
35
36
37
From source:

.. code:: bash

    python setup.py install

Thomas Grainger's avatar
Thomas Grainger committed
38
39
40
41
42
Datasets
========

The following dataset loaders are available:

43
-  `MNIST <#mnist>`__
Thomas Grainger's avatar
Thomas Grainger committed
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
-  `COCO (Captioning and Detection) <#coco>`__
-  `LSUN Classification <#lsun>`__
-  `ImageFolder <#imagefolder>`__
-  `Imagenet-12 <#imagenet-12>`__
-  `CIFAR10 and CIFAR100 <#cifar>`__

Datasets have the API: - ``__getitem__`` - ``__len__`` They all subclass
from ``torch.utils.data.Dataset`` Hence, they can all be multi-threaded
(python multiprocessing) using standard torch.utils.data.DataLoader.

For example:

``torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)``

In the constructor, each dataset has a slightly different API as needed,
but they all take the keyword args:

-  ``transform`` - a function that takes in an image and returns a
   transformed version
-  common stuff like ``ToTensor``, ``RandomCrop``, etc. These can be
   composed together with ``transforms.Compose`` (see transforms section
   below)
-  ``target_transform`` - a function that takes in the target and
   transforms it. For example, take in the caption string and return a
   tensor of word indices.

70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
MNIST
~~~~~
``dset.MNIST(root, train=True, transform=None, target_transform=None, download=False)``

``root``: root directory of dataset where ``processed/training.pt`` and ``training/test.pt`` exist

``train``: ``True`` - use training set, ``False`` - use test set.

``transform``: transform to apply to input images

``target_transform``: transform to apply to targets (class labels)

``download``: whether to download the MNIST data


Thomas Grainger's avatar
Thomas Grainger committed
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
COCO
~~~~

This requires the `COCO API to be
installed <https://github.com/pdollar/coco/tree/master/PythonAPI>`__

Captions:
^^^^^^^^^

``dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])``

Example:

.. code:: python

    import torchvision.datasets as dset
    import torchvision.transforms as transforms
    cap = dset.CocoCaptions(root = 'dir where images are',
                            annFile = 'json annotation file',
                            transform=transforms.ToTensor())

    print('Number of samples: ', len(cap))
    img, target = cap[3] # load 4th sample

    print("Image Size: ", img.size())
    print(target)

Output:

::

    Number of samples: 82783
    Image Size: (3L, 427L, 640L)
    [u'A plane emitting smoke stream flying over a mountain.',
    u'A plane darts across a bright blue sky behind a mountain covered in snow',
    u'A plane leaves a contrail above the snowy mountain top.',
    u'A mountain that has a plane flying overheard in the distance.',
    u'A mountain view with a plume of smoke in the background']

Detection:
^^^^^^^^^^

``dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])``

LSUN
~~~~

``dset.LSUN(db_path, classes='train', [transform, target_transform])``

134
135
136
137
138
139
-  ``db_path`` = root directory for the database files
-  ``classes`` =
-  ``'train'`` - all categories, training set
-  ``'val'`` - all categories, validation set
-  ``'test'`` - all categories, test set
-  [``'bedroom_train'``, ``'church_train'``, ...] : a list of categories to
Thomas Grainger's avatar
Thomas Grainger committed
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
   load

CIFAR
~~~~~

``dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)``

``dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)``

-  ``root`` : root directory of dataset where there is folder
   ``cifar-10-batches-py``
-  ``train`` : ``True`` = Training set, ``False`` = Test set
-  ``download`` : ``True`` = downloads the dataset from the internet and
   puts it in root directory. If dataset already downloaded, does not do
   anything.

ImageFolder
~~~~~~~~~~~

A generic data loader where the images are arranged in this way:

::

    root/dog/xxx.png
    root/dog/xxy.png
    root/dog/xxz.png

    root/cat/123.png
    root/cat/nsdf3.png
    root/cat/asd932_.png

``dset.ImageFolder(root="root folder path", [transform, target_transform])``

It has the members:

-  ``self.classes`` - The class names as a list
-  ``self.class_to_idx`` - Corresponding class indices
-  ``self.imgs`` - The list of (image path, class-index) tuples

Imagenet-12
~~~~~~~~~~~

This is simply implemented with an ImageFolder dataset.

The data is preprocessed `as described
here <https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset>`__

`Here is an
example <https://github.com/pytorch/examples/blob/27e2a46c1d1505324032b1d94fc6ce24d5b67e97/imagenet/main.py#L48-L62>`__.

Models
======

The models subpackage contains definitions for the following model
architectures:

-  `AlexNet <https://arxiv.org/abs/1404.5997>`__: AlexNet variant from
   the "One weird trick" paper.
-  `VGG <https://arxiv.org/abs/1409.1556>`__: VGG-11, VGG-13, VGG-16,
   VGG-19 (with and without batch normalization)
-  `ResNet <https://arxiv.org/abs/1512.03385>`__: ResNet-18, ResNet-34,
   ResNet-50, ResNet-101, ResNet-152

You can construct a model with random weights by calling its
constructor:

.. code:: python

    import torchvision.models as models
    resnet18 = models.resnet18()
    alexnet = models.alexnet()

We provide pre-trained models for the ResNet variants and AlexNet, using
the PyTorch `model zoo <http://pytorch.org/docs/model_zoo.html>`__.
These can be constructed by passing ``pretrained=True``:

Soumith Chintala's avatar
Soumith Chintala committed
216
217
218
219
220
221
.. code:: python

    import torchvision.models as models
    resnet18 = models.resnet18(pretrained=True)
    alexnet = models.alexnet(pretrained=True)

Thomas Grainger's avatar
Thomas Grainger committed
222
223
224
225
226
227
228
229

Transforms
==========

Transforms are common image transforms. They can be chained together
using ``transforms.Compose``

``transforms.Compose``
230
~~~~~~~~~~~~~~~~~~~~~~
Thomas Grainger's avatar
Thomas Grainger committed
231
232
233
234
235
236
237
238
239
240
241
242
243
244

One can compose several transforms together. For example.

.. code:: python

    transform = transforms.Compose([
        transforms.RandomSizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
                              std = [ 0.229, 0.224, 0.225 ]),
    ])

Transforms on PIL.Image
Thomas Grainger's avatar
Thomas Grainger committed
245
~~~~~~~~~~~~~~~~~~~~~~~
Thomas Grainger's avatar
Thomas Grainger committed
246
247

``Scale(size, interpolation=Image.BILINEAR)``
Thomas Grainger's avatar
Thomas Grainger committed
248
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
249
250
251
252
253
254
255
256
257

Rescales the input PIL.Image to the given 'size'. 'size' will be the
size of the smaller edge.

For example, if height > width, then image will be rescaled to (size \*
height / width, size) - size: size of the smaller edge - interpolation:
Default: PIL.Image.BILINEAR

``CenterCrop(size)`` - center-crops the image to the given size
Thomas Grainger's avatar
Thomas Grainger committed
258
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
259
260
261
262
263
264

Crops the given PIL.Image at the center to have a region of the given
size. size can be a tuple (target\_height, target\_width) or an integer,
in which case the target will be of a square shape (size, size)

``RandomCrop(size, padding=0)``
Thomas Grainger's avatar
Thomas Grainger committed
265
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
266
267
268
269
270
271
272
273

Crops the given PIL.Image at a random location to have a region of the
given size. size can be a tuple (target\_height, target\_width) or an
integer, in which case the target will be of a square shape (size, size)
If ``padding`` is non-zero, then the image is first zero-padded on each
side with ``padding`` pixels.

``RandomHorizontalFlip()``
Thomas Grainger's avatar
Thomas Grainger committed
274
^^^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
275
276
277
278
279

Randomly horizontally flips the given PIL.Image with a probability of
0.5

``RandomSizedCrop(size, interpolation=Image.BILINEAR)``
Thomas Grainger's avatar
Thomas Grainger committed
280
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
281
282
283
284
285
286
287
288
289

Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the
original size and and a random aspect ratio of 3/4 to 4/3 of the
original aspect ratio

This is popularly used to train the Inception networks - size: size of
the smaller edge - interpolation: Default: PIL.Image.BILINEAR

``Pad(padding, fill=0)``
Thomas Grainger's avatar
Thomas Grainger committed
290
^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
291
292
293
294
295
296

Pads the given image on each side with ``padding`` number of pixels, and
the padding pixels are filled with pixel value ``fill``. If a ``5x5``
image is padded with ``padding=1`` then it becomes ``7x7``

Transforms on torch.\*Tensor
Thomas Grainger's avatar
Thomas Grainger committed
297
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Thomas Grainger's avatar
Thomas Grainger committed
298
299

``Normalize(mean, std)``
Thomas Grainger's avatar
Thomas Grainger committed
300
^^^^^^^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
301
302
303
304
305

Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of
the torch.\*Tensor, i.e. channel = (channel - mean) / std

Conversion Transforms
Thomas Grainger's avatar
Thomas Grainger committed
306
~~~~~~~~~~~~~~~~~~~~~
Thomas Grainger's avatar
Thomas Grainger committed
307
308
309
310
311
312
313
314

-  ``ToTensor()`` - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x
   C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W)
   in the range [0.0, 1.0]
-  ``ToPILImage()`` - Converts a torch.\*Tensor of range [0, 1] and
   shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and
   shape H x W x C to a PIL.Image of range [0, 255]

Guillaume George's avatar
Guillaume George committed
315
Generic Transforms
Thomas Grainger's avatar
Thomas Grainger committed
316
~~~~~~~~~~~~~~~~~~
Thomas Grainger's avatar
Thomas Grainger committed
317
318

``Lambda(lambda)``
Thomas Grainger's avatar
Thomas Grainger committed
319
^^^^^^^^^^^^^^^^^^
Thomas Grainger's avatar
Thomas Grainger committed
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342

Given a Python lambda, applies it to the input ``img`` and returns it.
For example:

.. code:: python

    transforms.Lambda(lambda x: x.add(10))

Utils
=====

make\_grid(tensor, nrow=8, padding=2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Given a 4D mini-batch Tensor of shape (B x C x H x W), makes a grid of
images

save\_image(tensor, filename, nrow=8, padding=2)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Saves a given Tensor into an image file.

If given a mini-batch tensor, will save the tensor as a grid of images.