"# Generate SSD anchor box aspect ratios using k-means clustering\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KD164da8WQ0U"
},
"source": [
"Many object detection models use anchor boxes as a region-sampling strategy, so that during training, the model learns to match one of several pre-defined anchor boxes to the ground truth bounding boxes. To optimize the accuracy and efficiency of your object detection model, it's helpful if you tune these anchor boxes to fit your model dataset, because the configuration files that comes with TensorFlow's trained checkpoints include aspect ratios that are intended to cover a very broad set of objects.\n",
"\n",
"So in this notebook tutorial, you'll learn how to discover a set of aspect ratios that are custom-fit for your dataset, as discovered through k-means clustering of all the ground-truth bounding-box ratios.\n",
"\n",
"For demonstration purpsoses, we're using a subset of the [PETS dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) (cats and dogs), which matches some other model training tutorials out there (such as [this one for the Edge TPU](https://colab.sandbox.google.com/github/google-coral/tutorials/blob/master/retrain_ssdlite_mobiledet_qat_tf1.ipynb#scrollTo=LvEMJSafnyEC)), but you can use this script with a different dataset, and we'll show how to tune it to meet your model's goals, including how to optimize speed over accuracy or accuracy over speed.\n",
"\n",
"The result of this notebook is a new [pipeline `.config` file](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/configuring_jobs.md) that you can copy into your model training script. With the new customized anchor box configuration, you should observe a faster training pipeline and slightly improved model accuracy.\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cNBjMwIvCrhf"
},
"source": [
"## Get the required libraries"
]
},
{
"cell_type": "code",
"metadata": {
"id": "hCQlBGJkZTR2"
},
"source": [
"import tensorflow as tf"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "aw-Ba-5RUhMs"
},
"source": [
"# Install the tensorflow Object Detection API...\n",
"# If you're running this offline, you also might need to install the protobuf-compiler:\n",
"Although this notebook does not perform model training, you need to use the same dataset here that you'll use when training the model.\n",
"\n",
"To find the best anchor box ratios, you should use all of your training dataset (or as much of it as is reasonable). That's because, as mentioned in the introduction, you want to measure the precise variety of images that you expect your model to encounter—anything less and the anchor boxes might not cover the variety of objects you model encounters, so it might have weak accuracy. (Whereas the alternative, in which the ratios are based on data that is beyond the scope of your model's application, usually creates an inefficient model that can also have weaker accuracy.)"
"In this case, we want to reduce the PETS dataset to match the collection of cats and dogs used to train the model (in [this training notebook](https://colab.sandbox.google.com/github/google-coral/tutorials/blob/master/retrain_ssdlite_mobiledet_qat_tf1.ipynb)):\n",
"We are trying to find a group of aspect ratios that overlap the majority of object shapes in the dataset. We do that by finding common clusters of bounding boxes of the dataset, using the k-means clustering algorithm to find centroids of these clusters.\n",
"\n",
"To help with this, we need to calculate following:\n",
"\n",
"+ The k-means cluster centroids of the given bounding boxes\n",
"(see the `kmeans_aspect_ratios()` function below).\n",
"\n",
"+ The average intersection of bounding boxes with given aspect ratios.\n",
"(see the `average_iou()` function below).\n",
"This does not affect the outcome of the final box ratios, but serves as a useful metric for you to decide whether the selected boxes are effective and whether you want to try with more/fewer aspect ratios. (We'll discuss this score more below.)\n",
"\n",
"**NOTE:**\n",
"The term \"centroid\" used here refers to the center of the k-means cluster (the boxes (height,width) vector)."
" sys.exit(\"Failed to get aspect ratios due to numerical errors in k-means\")\n",
"\n",
" aspect_ratios = [w/h for w,h in ar]\n",
"\n",
" return aspect_ratios, avg_iou_perc"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "eU2SuLvu55Ds"
},
"source": [
"In the next code block, we'll call the above functions to discover the ideal anchor box aspect ratios.\n",
"\n",
"You can tune the parameters below to suit your performance objectives.\n",
"\n",
"Most importantly, you should consider the number of aspect ratios you want to generate. At opposite ends of the decision spectrum, there are two objectives you might seek:\n",
"\n",
"1. **Low accuracy and fast inference**: Try 2-3 aspect ratios. \n",
" * This is if your application is okay with accuracy or confidence scores around/below 80%.\n",
" * The average IOU score (from `avg_iou_perc`) will be around 70-85.\n",
" * This reduces the model's overall computations during inference, which makes inference faster.\n",
"\n",
"2. **High accuracy and slow inference**: Try 5-6 aspect ratios.\n",
" * This is if your application requires accuracy or confidence scores around 95%.\n",
" * The average IOU score (from `avg_iou_perc`) should be over 95.\n",
" * This increases the model's overall computations during inference, which makes inference slower.\n",
"\n",
"The initial configuration below aims somewhere in between: it searches for 4 aspect ratios.\n"
"If you look at the new `.config` file printed above, you'll find the `anchor_generator` specification, which includes the new `aspect_ratio` values that we generated with the k-means code above.\n",
"\n",
"The original config file ([`ssdlite_mobiledet_edgetpu_320x320_coco_sync_4x4.config`](https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v1_pets.config)) did have some default anchor box aspect ratios already, but we've replaced those with values that are optimized for our dataset. These new anchor boxes should improve the model accuracy (compared to the default anchors) and speed up the training process.\n",
"\n",
"If you want to use this configuration to train a model, then check out this tutorial to [retrain MobileDet for the Coral Edge TPU](https://colab.sandbox.google.com/github/google-coral/tutorials/blob/master/retrain_ssdlite_mobiledet_qat_tf1.ipynb), which uses this exact cats/dogs dataset. Just copy the `.config` file printed above and add it to that training notebook. (Or download the file from the **Files** panel on the left side of the Colab UI: it's called `ssdlite_mobiledet_edgetpu_320x320_custom_aspect_ratios.config`.)\n",
"\n",
"For more information about the pipeline configuration file, read [Configuring the Object Detection Training Pipeline](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/configuring_jobs.md).\n",
"\n",
"### About anchor scales...\n",
"\n",
"This notebook is focused on anchor box aspect ratios because that's often the most difficult to tune for each dataset. But you should also consider different configurations for the anchor box scales, which specify the number of different anchor box sizes and their min/max sizes—which affects how well your model detects objects of varying sizes.\n",
"\n",
"Tuning the anchor scales is much easier to do by hand, by estimating the min/max sizes you expect the model to encounter in your application environment. Just like when choosing the number of aspect ratios above, the number of different box sizes also affects your model accuracy and speed (using more box scales is more accurate, but also slower).\n",
"\n",
"You can also read more about anchor scales in [Configuring the Object Detection Training Pipeline](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/configuring_jobs.md).\n",