training_voc.md

# Training Instruction

## VOC 2012 Dataset from Scratch

Full instruction on how to train using VOC 2012 from scratch

Requirement:
  1. Able to detect image using pretrained darknet model
  2. Many Gigabytes of Disk Space
  3. High Speed Internet Connection Preferred
  4. GPU Preferred


### 1. Download Dataset

You can read the full description of dataset [here](http://host.robots.ox.ac.uk/pascal/VOC/)
```bash
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar -O ./data/voc2012_raw.tar
mkdir -p ./data/voc2012_raw
tar -xf ./data/voc2012_raw.tar -C ./data/voc2012_raw
ls ./data/voc2012_raw/VOCdevkit/VOC2012 # Explore the dataset
```

### 2. Transform Dataset

See tools/voc2012.py for implementation, this format is based on [tensorflow object detection API](https://github.com/tensorflow/models/tree/master/research/object_detection). Many fields 
are not required, I left them there for compatibility with official API.

```bash
python tools/voc2012.py \
  --data_dir './data/voc2012_raw/VOCdevkit/VOC2012' \
  --split train \
  --output_file ./data/voc2012_train.tfrecord

python tools/voc2012.py \
  --data_dir './data/voc2012_raw/VOCdevkit/VOC2012' \
  --split val \
  --output_file ./data/voc2012_val.tfrecord
```

You can visualize the dataset using this tool
```
python tools/visualize_dataset.py --classes=./data/voc2012.names
```

It will output one random image with label to `output.jpg`

### 3. Training

You can adjust the parameters based on your setup

#### With Transfer Learning

This step requires loading the pretrained darknet (feature extractor) weights.
```
wget https://pjreddie.com/media/files/yolov3.weights -O data/yolov3.weights
python convert.py
python detect.py --image ./data/meme.jpg # Sanity check

python train.py \
	--dataset ./data/voc2012_train.tfrecord \
	--val_dataset ./data/voc2012_val.tfrecord \
	--classes ./data/voc2012.names \
	--num_classes 20 \
	--mode fit --transfer darknet \
	--batch_size 16 \
	--epochs 10 \
	--weights ./checkpoints/yolov3.tf \
	--weights_num_classes 80 
```

Original pretrained yolov3 has 80 classes, here we demonstrated how to
do transfer learning on 20 classes.

#### Training from random weights (NOT RECOMMENDED)
Training from scratch is very difficult to converge
The original paper trained darknet 
on imagenet before training the whole network as well.

```bash
python train.py \
	--dataset ./data/voc2012_train.tfrecord \
	--val_dataset ./data/voc2012_val.tfrecord \
	--classes ./data/voc2012.names \
	--num_classes 20 \
	--mode fit --transfer none \
	--batch_size 16 \
	--epochs 10 \
```

I have tested this works 100% with correct loss and converging over time.
Each epoch takes around 10 minutes on single AWS p2.xlarge (Nvidia K80 GPU) Instance.

You might see warnings or error messages during training, they are not critical dont' worry too much about them.
There might be a long wait time between each epoch becaues we are calculating validation loss.

### 4. Inference

```bash
# detect from images
python detect.py \
	--classes ./data/voc2012.names \
	--num_classes 20 \
	--weights ./checkpoints/yolov3_train_5.tf \
	--image ./data/street.jpg

# detect from validation set
python detect.py \
	--classes ./data/voc2012.names \
	--num_classes 20 \
	--weights ./checkpoints/yolov3_train_5.tf \
	--tfrecord ./data/voc2012_val.tfrecord
```

You should see some detect objects in the standard output and the visualization at `output.jpg`.
this is just a proof of concept, so it won't be as good as pretrained models.
In my experience, you might need lower score score thershold if you didn't train it enough.