In the constructor, each dataset has a slightly different API as needed, but they all take the keyword args:
-`transform` - a function that takes in an image and returns a transformed version
- common stuff like `ToTensor`, `RandomCrop`, etc. These can be composed together with `transforms.Compose` (see transforms section below)
-`target_transform` - a function that takes in the target and transforms it. For example, take in the caption string and return a tensor of word indices.
### COCO
This requires the [COCO API to be installed](https://github.com/pdollar/coco/tree/master/PythonAPI)
#### Captions:
`dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])`
Example:
```python
importtorchvision.datasetsasdset
importtorchvision.transformsastransforms
cap=dset.CocoCaptions(root='dir where images are',
annFile='json annotation file',
transform=transforms.ToTensor())
print('Number of samples: ',len(cap))
img,target=cap[3]# load 4th sample
print("Image Size: ",img.size())
print(target)
```
Output:
```
Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']
```
#### Detection:
`dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])`
Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the original size
and and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio
This is popularly used to train the Inception networks
- size: size of the smaller edge
- interpolation: Default: PIL.Image.BILINEAR
### `Pad(padding, fill=0)`
Pads the given image on each side with `padding` number of pixels, and the padding pixels are filled with
pixel value `fill`.
If a `5x5` image is padded with `padding=1` then it becomes `7x7`
## Transforms on torch.*Tensor
### `Normalize(mean, std)`
Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of the torch.*Tensor, i.e. channel = (channel - mean) / std
## Conversion Transforms
-`ToTensor()` - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
-`ToPILImage()` - Converts a torch.*Tensor of range [0, 1] and shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and shape H x W x C to a PIL.Image of range [0, 255]
## Generic Transofrms
### `Lambda(lambda)`
Given a Python lambda, applies it to the input `img` and returns it.
For example:
```python
transforms.Lambda(lambdax:x.add(10))
```
# Utils
### make_grid(tensor, nrow=8, padding=2)
Given a 4D mini-batch Tensor of shape (B x C x H x W), makes a grid of images