In the constructor, each dataset has a slightly different API as needed, but they all take the keyword args:
-`transform` - a function that takes in an image and returns a transformed version
- common stuff like `ToTensor`, `RandomCrop`, etc. These can be composed together with `transforms.Compose` (see transforms section below)
-`target_transform` - a function that takes in the target and transforms it. For example, take in the caption string and return a tensor of word indices.
The following datasets are available:
- COCO (Captioning and Detection)
- LSUN Classification
- Imagenet-12
- ImageFolder
### COCO
This requires the [COCO API to be installed](https://github.com/pdollar/coco/tree/master/PythonAPI)
#### Captions:
`dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])`
Example:
```python
importtorchvision.datasetsasdset
importtorchvision.transformsastransforms
cap=dset.CocoCaptions(root='dir where images are',annFile='json annotation file',transform=transforms.toTensor)
print('Number of samples:',len(cap))
img,target=cap[3]# load 4th sample
print(img.size())
print(target)
```
Output:
```
```
#### Detection:
`dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])`
- `self.class_to_idx` - Corresponding class indices
- `self.imgs` - The list of (image path, class-index) tuples
### Imagenet-12
This is simply implemented with an ImageFolder dataset, after the data is preprocessed [as described here](https://github.com/facebook/fb.resnet.torch/blob/master/INSTALL.md#download-the-imagenet-dataset)