# torch-vision

This repository consists of:

- [vision.datasets](#datasets) : Data loaders for popular vision datasets
- [vision.transforms](#transforms) : Common image transformations such as random crop, rotations etc.
- `[WIP] vision.models` : Model definitions and Pre-trained models for popular models such as AlexNet, VGG, ResNet etc.

# Installation

Binaries:

```bash
conda install torchvision -c https://conda.anaconda.org/t/6N-MsQ4WZ7jo/soumith
```

From Source:

```bash
pip install -r requirements.txt
pip install .
```

# Datasets

The following dataset loaders are available:

- [COCO (Captioning and Detection)](#coco)
- [LSUN Classification](#lsun)
- [ImageFolder](#imagefolder)
- [Imagenet-12](#imagenet-12)
- [CIFAR10 and CIFAR100](#cifar)

Datasets have the API:
- `__getitem__`
- `__len__`
They all subclass from `torch.utils.data.Dataset`
Hence, they can all be multi-threaded (python multiprocessing) using standard torch.utils.data.DataLoader.

For example:

`torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)`

In the constructor, each dataset has a slightly different API as needed, but they all take the keyword args:

- `transform` - a function that takes in an image and returns a transformed version
  - common stuff like `ToTensor`, `RandomCrop`, etc. These can be composed together with `transforms.Compose` (see transforms section below)
- `target_transform` - a function that takes in the target and transforms it. For example, take in the caption string and return a tensor of word indices.

### COCO

This requires the [COCO API to be installed](https://github.com/pdollar/coco/tree/master/PythonAPI)

#### Captions:

`dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])`

Example:

```python
import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are', 
                        annFile = 'json annotation file', 
                        transform=transforms.ToTensor())

print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sample

print("Image Size: ", img.size())
print(target)
```

Output:

```
Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.', 
u'A plane darts across a bright blue sky behind a mountain covered in snow', 
u'A plane leaves a contrail above the snowy mountain top.', 
u'A mountain that has a plane flying overheard in the distance.', 
u'A mountain view with a plume of smoke in the background']
```

#### Detection:
`dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])`

### LSUN

`dset.LSUN(db_path, classes='train', [transform, target_transform])`

- db_path = root directory for the database files
- classes =
  - 'train' - all categories, training set
  - 'val' - all categories, validation set
  - 'test' - all categories, test set
  - ['bedroom_train', 'church_train', ...] : a list of categories to load


### CIFAR

`dset.CIFAR10(root, train=True, transform=None, target_transform=None, download=False)`
`dset.CIFAR100(root, train=True, transform=None, target_transform=None, download=False)`

- `root` : root directory of dataset where there is folder `cifar-10-batches-py`
- `train` : `True` = Training set, `False` = Test set
- `download` : `True` = downloads the dataset from the internet and puts it in root directory. If dataset already downloaded, does not do anything.

### ImageFolder

A generic data loader where the images are arranged in this way:

```
root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png

root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png
```

`dset.ImageFolder(root="root folder path", [transform, target_transform])`

It has the members:

- `self.classes` - The class names as a list
- `self.class_to_idx` - Corresponding class indices
- `self.imgs` - The list of (image path, class-index) tuples


### Imagenet-12

This is simply implemented with an ImageFolder dataset.

The data is preprocessed [as described here](https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset)

[Here is an example](https://github.com/pytorch/examples/blob/27e2a46c1d1505324032b1d94fc6ce24d5b67e97/imagenet/main.py#L48-L62).


# Transforms

Transforms are common image transforms.
They can be chained together using `transforms.Compose`

### `transforms.Compose`

One can compose several transforms together.
For example.

```python
transform = transforms.Compose([
    transforms.RandomSizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
                          std = [ 0.229, 0.224, 0.225 ]),
])
```

## Transforms on PIL.Image

### `Scale(size, interpolation=Image.BILINEAR)`
Rescales the input PIL.Image to the given 'size'.
'size' will be the size of the smaller edge.

For example, if height > width, then image will be
rescaled to (size * height / width, size)
- size: size of the smaller edge
- interpolation: Default: PIL.Image.BILINEAR

### `CenterCrop(size)` - center-crops the image to the given size
Crops the given PIL.Image at the center to have a region of
the given size. size can be a tuple (target_height, target_width)
or an integer, in which case the target will be of a square shape (size, size)
	
### `RandomCrop(size)`
Crops the given PIL.Image at a random location to have a region of
the given size. size can be a tuple (target_height, target_width)
or an integer, in which case the target will be of a square shape (size, size)

### `RandomHorizontalFlip()`
Randomly horizontally flips the given PIL.Image with a probability of 0.5

### `RandomSizedCrop(size, interpolation=Image.BILINEAR)`
Random crop the given PIL.Image to a random size of (0.08 to 1.0) of the original size
and and a random aspect ratio of 3/4 to 4/3 of the original aspect ratio

This is popularly used to train the Inception networks
- size: size of the smaller edge
- interpolation: Default: PIL.Image.BILINEAR

## Transforms on torch.*Tensor

### `Normalize(mean, std)`
Given mean: (R, G, B) and std: (R, G, B), will normalize each channel of the torch.*Tensor, i.e. channel = (channel - mean) / std

## Conversion Transforms
- `ToTensor()` - Converts a PIL.Image (RGB) or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
- `ToPILImage()` - Converts a torch.*Tensor of range [0, 1] and shape C x H x W or numpy ndarray of dtype=uint8, range[0, 255] and shape H x W x C to a PIL.Image of range [0, 255]