slim_walkthrough.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TF-Slim Walkthrough\n",
    "\n",
    "This notebook will walk you through the basics of using TF-Slim to define, train and evaluate neural networks on various tasks. It assumes a basic knowledge of neural networks. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table of contents\n",
    "\n",
    "<a href=\"#Install\">Installation and setup</a><br>\n",
    "<a href='#MLP'>Creating your first neural network with TF-Slim</a><br>\n",
    "<a href='#ReadingTFSlimDatasets'>Reading Data with TF-Slim</a><br>\n",
    "<a href='#CNN'>Training a convolutional neural network (CNN)</a><br>\n",
    "<a href='#Pretained'>Using pre-trained models</a><br>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Installation and setup\n",
    "<a id='Install'></a>\n",
    "\n",
    "Since the stable release of TF 1.0, the latest version of slim has been available as `tf.contrib.slim`.\n",
    "To test that your installation is working, execute the following command; it should run without raising any errors.\n",
    "\n",
    "```\n",
    "python -c \"import tensorflow.contrib.slim as slim; eval = slim.evaluation.evaluate_once\"\n",
    "```\n",
    "\n",
    "Although, to use TF-Slim for image classification (as we do in this notebook), you also have to install the TF-Slim image models library from [here](https://github.com/tensorflow/models/tree/master/slim). Let's suppose you install this into a directory called TF_MODELS. Then you should change directory to  TF_MODELS/slim **before** running this notebook, so that these files are in your python path.\n",
    "\n",
    "To check you've got these two steps to work, just execute the cell below. If it complains about unknown modules, restart the notebook after moving to the TF-Slim models directory.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import absolute_import\n",
    "from __future__ import division\n",
    "from __future__ import print_function\n",
    "\n",
    "import matplotlib\n",
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import math\n",
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "import time\n",
    "\n",
    "from datasets import dataset_utils\n",
    "\n",
    "# Main slim library\n",
    "from tensorflow.contrib import slim"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating your first neural network with TF-Slim\n",
    "<a id='MLP'></a>\n",
    "\n",
    "Below we give some code to create a simple multilayer perceptron (MLP)  which can be used\n",
    "for regression problems. The model has 2 hidden layers.\n",
    "The output is a single node. \n",
    "When this function is called, it will create various nodes, and silently add them to whichever global TF graph is currently in scope. When a node which corresponds to a layer with adjustable parameters (eg., a fully connected layer) is created, additional parameter variable nodes are silently created, and added to the graph. (We will discuss how to train the parameters later.)\n",
    "\n",
    "We use variable scope to put all the nodes under a common name,\n",
    "so that the graph has some hierarchical structure.\n",
    "This is useful when we want to visualize the TF graph in tensorboard, or if we want to query related\n",
    "variables. \n",
    "The fully connected layers all use the same L2 weight decay and ReLu activations, as specified by **arg_scope**. (However, the final layer overrides these defaults, and uses an identity activation function.)\n",
    "\n",
    "We also illustrate how to add a dropout layer after the first fully connected layer (FC1). Note that at test time, \n",
    "we do not drop out nodes, but instead use the average activations; hence we need to know whether the model is being\n",
    "constructed for training or testing, since the computational graph will be different in the two cases\n",
    "(although the variables, storing the model parameters, will be shared, since they have the same name/scope)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def regression_model(inputs, is_training=True, scope=\"deep_regression\"):\n",
    "    \"\"\"Creates the regression model.\n",
    "\n",
    "    Args:\n",
    "        inputs: A node that yields a `Tensor` of size [batch_size, dimensions].\n",
    "        is_training: Whether or not we're currently training the model.\n",
    "        scope: An optional variable_op scope for the model.\n",
    "\n",
    "    Returns:\n",
    "        predictions: 1-D `Tensor` of shape [batch_size] of responses.\n",
    "        end_points: A dict of end points representing the hidden layers.\n",
    "    \"\"\"\n",
    "    with tf.variable_scope(scope, 'deep_regression', [inputs]):\n",
    "        end_points = {}\n",
    "        # Set the default weight _regularizer and acvitation for each fully_connected layer.\n",
    "        with slim.arg_scope([slim.fully_connected],\n",
    "                            activation_fn=tf.nn.relu,\n",
    "                            weights_regularizer=slim.l2_regularizer(0.01)):\n",
    "\n",
    "            # Creates a fully connected layer from the inputs with 32 hidden units.\n",
    "            net = slim.fully_connected(inputs, 32, scope='fc1')\n",
    "            end_points['fc1'] = net\n",
    "\n",
    "            # Adds a dropout layer to prevent over-fitting.\n",
    "            net = slim.dropout(net, 0.8, is_training=is_training)\n",
    "\n",
    "            # Adds another fully connected layer with 16 hidden units.\n",
    "            net = slim.fully_connected(net, 16, scope='fc2')\n",
    "            end_points['fc2'] = net\n",
    "\n",
    "            # Creates a fully-connected layer with a single hidden unit. Note that the\n",
    "            # layer is made linear by setting activation_fn=None.\n",
    "            predictions = slim.fully_connected(net, 1, activation_fn=None, scope='prediction')\n",
    "            end_points['out'] = predictions\n",
    "\n",
    "            return predictions, end_points"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's create the model and examine its structure.\n",
    "\n",
    "We create a TF graph and call regression_model(), which adds nodes (tensors) to the graph. We then examine their shape, and print the names of all the model variables which have been implicitly created inside of each layer. We see that the names of the variables follow the scopes that we specified."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "with tf.Graph().as_default():\n",
    "    # Dummy placeholders for arbitrary number of 1d inputs and outputs\n",
    "    inputs = tf.placeholder(tf.float32, shape=(None, 1))\n",
    "    outputs = tf.placeholder(tf.float32, shape=(None, 1))\n",
    "\n",
    "    # Build model\n",
    "    predictions, end_points = regression_model(inputs)\n",
    "\n",
    "    # Print name and shape of each tensor.\n",
    "    print(\"Layers\")\n",
    "    for k, v in end_points.items():\n",
    "        print('name = {}, shape = {}'.format(v.name, v.get_shape()))\n",
    "\n",
    "    # Print name and shape of parameter nodes  (values not yet initialized)\n",
    "    print(\"\\n\")\n",
    "    print(\"Parameters\")\n",
    "    for v in slim.get_model_variables():\n",
    "        print('name = {}, shape = {}'.format(v.name, v.get_shape()))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's create some 1d regression data .\n",
    "\n",
    "We will train and test the model on some noisy observations of a nonlinear function.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def produce_batch(batch_size, noise=0.3):\n",
    "    xs = np.random.random(size=[batch_size, 1]) * 10\n",
    "    ys = np.sin(xs) + 5 + np.random.normal(size=[batch_size, 1], scale=noise)\n",
    "    return [xs.astype(np.float32), ys.astype(np.float32)]\n",
    "\n",
    "x_train, y_train = produce_batch(200)\n",
    "x_test, y_test = produce_batch(200)\n",
    "plt.scatter(x_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's fit the model to the data\n",
    "\n",
    "The user has to specify the loss function and the optimizer, and slim does the rest.\n",
    "In particular,  the slim.learning.train function does the following:\n",
    "\n",
    "- For each iteration, evaluate the train_op, which updates the parameters using the optimizer applied to the current minibatch. Also, update the global_step.\n",
    "- Occasionally store the model checkpoint in the specified directory. This is useful in case your machine crashes  - then you can simply restart from the specified checkpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def convert_data_to_tensors(x, y):\n",
    "    inputs = tf.constant(x)\n",
    "    inputs.set_shape([None, 1])\n",
    "    \n",
    "    outputs = tf.constant(y)\n",
    "    outputs.set_shape([None, 1])\n",
    "    return inputs, outputs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# The following snippet trains the regression model using a mean_squared_error loss.\n",
    "ckpt_dir = '/tmp/regression_model/'\n",
    "\n",
    "with tf.Graph().as_default():\n",
    "    tf.logging.set_verbosity(tf.logging.INFO)\n",
    "    \n",
    "    inputs, targets = convert_data_to_tensors(x_train, y_train)\n",
    "\n",
    "    # Make the model.\n",
    "    predictions, nodes = regression_model(inputs, is_training=True)\n",
    "\n",
    "    # Add the loss function to the graph.\n",
    "    loss = tf.losses.mean_squared_error(labels=targets, predictions=predictions)\n",
    "    \n",
    "    # The total loss is the uers's loss plus any regularization losses.\n",
    "    total_loss = slim.losses.get_total_loss()\n",
    "\n",
    "    # Specify the optimizer and create the train op:\n",
    "    optimizer = tf.train.AdamOptimizer(learning_rate=0.005)\n",
    "    train_op = slim.learning.create_train_op(total_loss, optimizer) \n",
    "\n",
    "    # Run the training inside a session.\n",
    "    final_loss = slim.learning.train(\n",
    "        train_op,\n",
    "        logdir=ckpt_dir,\n",
    "        number_of_steps=5000,\n",
    "        save_summaries_secs=5,\n",
    "        log_every_n_steps=500)\n",
    "  \n",
    "print(\"Finished training. Last batch loss:\", final_loss)\n",
    "print(\"Checkpoint saved in %s\" % ckpt_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training with multiple loss functions.\n",
    "\n",
    "Sometimes we have multiple objectives we want to simultaneously optimize.\n",
    "In slim, it is easy to add more losses, as we show below. (We do not optimize the total loss in this example,\n",
    "but we show how to compute it.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "with tf.Graph().as_default():\n",
    "    inputs, targets = convert_data_to_tensors(x_train, y_train)\n",
    "    predictions, end_points = regression_model(inputs, is_training=True)\n",
    "\n",
    "    # Add multiple loss nodes.\n",
    "    mean_squared_error_loss = tf.losses.mean_squared_error(labels=targets, predictions=predictions)\n",
    "    absolute_difference_loss = slim.losses.absolute_difference(predictions, targets)\n",
    "\n",
    "    # The following two ways to compute the total loss are equivalent\n",
    "    regularization_loss = tf.add_n(slim.losses.get_regularization_losses())\n",
    "    total_loss1 = mean_squared_error_loss + absolute_difference_loss + regularization_loss\n",
    "\n",
    "    # Regularization Loss is included in the total loss by default.\n",
    "    # This is good for training, but not for testing.\n",
    "    total_loss2 = slim.losses.get_total_loss(add_regularization_losses=True)\n",
    "    \n",
    "    init_op = tf.global_variables_initializer()\n",
    "    \n",
    "    with tf.Session() as sess:\n",
    "        sess.run(init_op) # Will initialize the parameters with random weights.\n",
    "        \n",
    "        total_loss1, total_loss2 = sess.run([total_loss1, total_loss2])\n",
    "        \n",
    "        print('Total Loss1: %f' % total_loss1)\n",
    "        print('Total Loss2: %f' % total_loss2)\n",
    "\n",
    "        print('Regularization Losses:')\n",
    "        for loss in slim.losses.get_regularization_losses():\n",
    "            print(loss)\n",
    "\n",
    "        print('Loss Functions:')\n",
    "        for loss in slim.losses.get_losses():\n",
    "            print(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's load the saved model and use it for prediction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "with tf.Graph().as_default():\n",
    "    inputs, targets = convert_data_to_tensors(x_test, y_test)\n",
    "  \n",
    "    # Create the model structure. (Parameters will be loaded below.)\n",
    "    predictions, end_points = regression_model(inputs, is_training=False)\n",
    "\n",
    "    # Make a session which restores the old parameters from a checkpoint.\n",
    "    sv = tf.train.Supervisor(logdir=ckpt_dir)\n",
    "    with sv.managed_session() as sess:\n",
    "        inputs, predictions, targets = sess.run([inputs, predictions, targets])\n",
    "\n",
    "plt.scatter(inputs, targets, c='r');\n",
    "plt.scatter(inputs, predictions, c='b');\n",
    "plt.title('red=true, blue=predicted')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Let's compute various evaluation metrics on the test set.\n",
    "\n",
    "In TF-Slim termiology, losses are optimized, but metrics (which may not be differentiable, e.g., precision and recall) are just measured. As an illustration, the code below computes mean squared error and mean absolute error metrics on the test set.\n",
    "\n",
    "Each metric declaration creates several local variables (which must be initialized via tf.initialize_local_variables()) and returns both a value_op and an update_op. When evaluated, the value_op returns the current value of the metric. The update_op loads a new batch of data, runs the model, obtains the predictions and accumulates the metric statistics appropriately before returning the current value of the metric. We store these value nodes and update nodes in 2 dictionaries.\n",
    "\n",
    "After creating the metric nodes, we can pass them to slim.evaluation.evaluation, which repeatedly evaluates these nodes the specified number of times. (This allows us to compute the evaluation in a streaming fashion across minibatches, which is usefulf for large datasets.) Finally, we print the final value of each metric.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "with tf.Graph().as_default():\n",
    "    inputs, targets = convert_data_to_tensors(x_test, y_test)\n",
    "    predictions, end_points = regression_model(inputs, is_training=False)\n",
    "\n",
    "    # Specify metrics to evaluate:\n",
    "    names_to_value_nodes, names_to_update_nodes = slim.metrics.aggregate_metric_map({\n",
    "      'Mean Squared Error': slim.metrics.streaming_mean_squared_error(predictions, targets),\n",
    "      'Mean Absolute Error': slim.metrics.streaming_mean_absolute_error(predictions, targets)\n",
    "    })\n",
    "\n",
    "    # Make a session which restores the old graph parameters, and then run eval.\n",
    "    sv = tf.train.Supervisor(logdir=ckpt_dir)\n",
    "    with sv.managed_session() as sess:\n",
    "        metric_values = slim.evaluation.evaluation(\n",
    "            sess,\n",
    "            num_evals=1, # Single pass over data\n",
    "            eval_op=names_to_update_nodes.values(),\n",
    "            final_op=names_to_value_nodes.values())\n",
    "\n",
    "    names_to_values = dict(zip(names_to_value_nodes.keys(), metric_values))\n",
    "    for key, value in names_to_values.items():\n",
    "      print('%s: %f' % (key, value))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reading Data with TF-Slim\n",
    "<a id='ReadingTFSlimDatasets'></a>\n",
    "\n",
    "Reading data with TF-Slim has two main components: A\n",
    "[Dataset](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/data/dataset.py) and a \n",
    "[DatasetDataProvider](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/data/dataset_data_provider.py). The former is a descriptor of a dataset, while the latter performs the actions necessary for actually reading the data. Lets look at each one in detail:\n",
    "\n",
    "\n",
    "## Dataset\n",
    "A TF-Slim\n",
    "[Dataset](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/data/dataset.py)\n",
    "contains descriptive information about a dataset necessary for reading it, such as the list of data files and how to decode them. It also contains metadata including class labels, the size of the train/test splits and descriptions of the tensors that the dataset provides. For example, some datasets contain images with labels. Others augment this data with bounding box annotations, etc. The Dataset object allows us to write generic code using the same API, regardless of the data content and encoding type.\n",
    "\n",
    "TF-Slim's Dataset works especially well when the data is stored as a (possibly sharded)\n",
    "[TFRecords file](https://www.tensorflow.org/versions/r0.10/how_tos/reading_data/index.html#file-formats), where each record contains a [tf.train.Example protocol buffer](https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/core/example/example.proto).\n",
    "TF-Slim uses a consistent convention for naming the keys and values inside each Example record. \n",
    "\n",
    "## DatasetDataProvider\n",
    "\n",
    "A\n",
    "[DatasetDataProvider](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/data/dataset_data_provider.py) is a class which actually reads the data from a dataset. It is highly configurable to read the data in various ways that may make a big impact on the efficiency of your training process. For example, it can be single or multi-threaded. If your data is sharded across many files, it can read each files serially, or from every file simultaneously.\n",
    "\n",
    "## Demo: The Flowers Dataset\n",
    "\n",
    "For convenience, we've include scripts to convert several common image datasets into TFRecord format and have provided\n",
    "the Dataset descriptor files necessary for reading them. We demonstrate how easy it is to use these dataset via the Flowers dataset below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download the Flowers Dataset\n",
    "<a id='DownloadFlowers'></a>\n",
    "\n",
    "We've made available a tarball of the Flowers dataset which has already been converted to TFRecord format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "from datasets import dataset_utils\n",
    "\n",
    "url = \"http://download.tensorflow.org/data/flowers.tar.gz\"\n",
    "flowers_data_dir = '/tmp/flowers'\n",
    "\n",
    "if not tf.gfile.Exists(flowers_data_dir):\n",
    "    tf.gfile.MakeDirs(flowers_data_dir)\n",
    "\n",
    "dataset_utils.download_and_uncompress_tarball(url, flowers_data_dir) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Display some of the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from datasets import flowers\n",
    "import tensorflow as tf\n",
    "\n",
    "from tensorflow.contrib import slim\n",
    "\n",
    "with tf.Graph().as_default(): \n",
    "    dataset = flowers.get_split('train', flowers_data_dir)\n",
    "    data_provider = slim.dataset_data_provider.DatasetDataProvider(\n",
    "        dataset, common_queue_capacity=32, common_queue_min=1)\n",
    "    image, label = data_provider.get(['image', 'label'])\n",
    "    \n",
    "    with tf.Session() as sess:    \n",
    "        with slim.queues.QueueRunners(sess):\n",
    "            for i in range(4):\n",
    "                np_image, np_label = sess.run([image, label])\n",
    "                height, width, _ = np_image.shape\n",
    "                class_name = name = dataset.labels_to_names[np_label]\n",
    "                \n",
    "                plt.figure()\n",
    "                plt.imshow(np_image)\n",
    "                plt.title('%s, %d x %d' % (name, height, width))\n",
    "                plt.axis('off')\n",
    "                plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Convolutional neural nets (CNNs).\n",
    "<a id='CNN'></a>\n",
    "\n",
    "In this section, we show how to train an image classifier using a simple CNN.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define the model.\n",
    "\n",
    "Below we define a simple CNN. Note that the output layer is linear function - we will apply softmax transformation externally to the model, either in the loss function (for training), or in the prediction function (during testing)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def my_cnn(images, num_classes, is_training):  # is_training is not used...\n",
    "    with slim.arg_scope([slim.max_pool2d], kernel_size=[3, 3], stride=2):\n",
    "        net = slim.conv2d(images, 64, [5, 5])\n",
    "        net = slim.max_pool2d(net)\n",
    "        net = slim.conv2d(net, 64, [5, 5])\n",
    "        net = slim.max_pool2d(net)\n",
    "        net = slim.flatten(net)\n",
    "        net = slim.fully_connected(net, 192)\n",
    "        net = slim.fully_connected(net, num_classes, activation_fn=None)       \n",
    "        return net"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Apply the model to some randomly generated images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "\n",
    "with tf.Graph().as_default():\n",
    "    # The model can handle any input size because the first layer is convolutional.\n",
    "    # The size of the model is determined when image_node is first passed into the my_cnn function.\n",
    "    # Once the variables are initialized, the size of all the weight matrices is fixed.\n",
    "    # Because of the fully connected layers, this means that all subsequent images must have the same\n",
    "    # input size as the first image.\n",
    "    batch_size, height, width, channels = 3, 28, 28, 3\n",
    "    images = tf.random_uniform([batch_size, height, width, channels], maxval=1)\n",
    "    \n",
    "    # Create the model.\n",
    "    num_classes = 10\n",
    "    logits = my_cnn(images, num_classes, is_training=True)\n",
    "    probabilities = tf.nn.softmax(logits)\n",
    "  \n",
    "    # Initialize all the variables (including parameters) randomly.\n",
    "    init_op = tf.global_variables_initializer()\n",
    "  \n",
    "    with tf.Session() as sess:\n",
    "        # Run the init_op, evaluate the model outputs and print the results:\n",
    "        sess.run(init_op)\n",
    "        probabilities = sess.run(probabilities)\n",
    "        \n",
    "print('Probabilities Shape:')\n",
    "print(probabilities.shape)  # batch_size x num_classes \n",
    "\n",
    "print('\\nProbabilities:')\n",
    "print(probabilities)\n",
    "\n",
    "print('\\nSumming across all classes (Should equal 1):')\n",
    "print(np.sum(probabilities, 1)) # Each row sums to 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Train the model on the Flowers dataset.\n",
    "\n",
    "Before starting, make sure you've run the code to <a href=\"#DownloadFlowers\">Download the Flowers</a> dataset. Now, we'll get a sense of what it looks like to use TF-Slim's training functions found in\n",
    "[learning.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/slim/python/slim/learning.py). First, we'll create a function, `load_batch`, that loads batches of dataset from a dataset. Next, we'll train a model for a single step (just to demonstrate the API), and evaluate the results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from preprocessing import inception_preprocessing\n",
    "import tensorflow as tf\n",
    "\n",
    "from tensorflow.contrib import slim\n",
    "\n",
    "\n",
    "def load_batch(dataset, batch_size=32, height=299, width=299, is_training=False):\n",
    "    \"\"\"Loads a single batch of data.\n",
    "    \n",
    "    Args:\n",
    "      dataset: The dataset to load.\n",
    "      batch_size: The number of images in the batch.\n",
    "      height: The size of each image after preprocessing.\n",
    "      width: The size of each image after preprocessing.\n",
    "      is_training: Whether or not we're currently training or evaluating.\n",
    "    \n",
    "    Returns:\n",
    "      images: A Tensor of size [batch_size, height, width, 3], image samples that have been preprocessed.\n",
    "      images_raw: A Tensor of size [batch_size, height, width, 3], image samples that can be used for visualization.\n",
    "      labels: A Tensor of size [batch_size], whose values range between 0 and dataset.num_classes.\n",
    "    \"\"\"\n",
    "    data_provider = slim.dataset_data_provider.DatasetDataProvider(\n",
    "        dataset, common_queue_capacity=32,\n",
    "        common_queue_min=8)\n",
    "    image_raw, label = data_provider.get(['image', 'label'])\n",
    "    \n",
    "    # Preprocess image for usage by Inception.\n",
    "    image = inception_preprocessing.preprocess_image(image_raw, height, width, is_training=is_training)\n",
    "    \n",
    "    # Preprocess the image for display purposes.\n",
    "    image_raw = tf.expand_dims(image_raw, 0)\n",
    "    image_raw = tf.image.resize_images(image_raw, [height, width])\n",
    "    image_raw = tf.squeeze(image_raw)\n",
    "\n",
    "    # Batch it up.\n",
    "    images, images_raw, labels = tf.train.batch(\n",
    "          [image, image_raw, label],\n",
    "          batch_size=batch_size,\n",
    "          num_threads=1,\n",
    "          capacity=2 * batch_size)\n",
    "    \n",
    "    return images, images_raw, labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from datasets import flowers\n",
    "\n",
    "# This might take a few minutes.\n",
    "train_dir = '/tmp/tfslim_model/'\n",
    "print('Will save model to %s' % train_dir)\n",
    "\n",
    "with tf.Graph().as_default():\n",
    "    tf.logging.set_verbosity(tf.logging.INFO)\n",
    "\n",
    "    dataset = flowers.get_split('train', flowers_data_dir)\n",
    "    images, _, labels = load_batch(dataset)\n",
    "  \n",
    "    # Create the model:\n",
    "    logits = my_cnn(images, num_classes=dataset.num_classes, is_training=True)\n",
    " \n",
    "    # Specify the loss function:\n",
    "    one_hot_labels = slim.one_hot_encoding(labels, dataset.num_classes)\n",
    "    slim.losses.softmax_cross_entropy(logits, one_hot_labels)\n",
    "    total_loss = slim.losses.get_total_loss()\n",
    "\n",
    "    # Create some summaries to visualize the training process:\n",
    "    tf.summary.scalar('losses/Total Loss', total_loss)\n",
    "  \n",
    "    # Specify the optimizer and create the train op:\n",
    "    optimizer = tf.train.AdamOptimizer(learning_rate=0.01)\n",
    "    train_op = slim.learning.create_train_op(total_loss, optimizer)\n",
    "\n",
    "    # Run the training:\n",
    "    final_loss = slim.learning.train(\n",
    "      train_op,\n",
    "      logdir=train_dir,\n",
    "      number_of_steps=1, # For speed, we just do 1 epoch\n",
    "      save_summaries_secs=1)\n",
    "  \n",
    "    print('Finished training. Final batch loss %d' % final_loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Evaluate some metrics.\n",
    "\n",
    "As we discussed above, we can compute various metrics besides the loss.\n",
    "Below we show how to compute prediction accuracy of the trained model, as well as top-5 classification accuracy. (The difference between evaluation and evaluation_loop is that the latter writes the results to a log directory, so they can be viewed in tensorboard.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from datasets import flowers\n",
    "\n",
    "# This might take a few minutes.\n",
    "with tf.Graph().as_default():\n",
    "    tf.logging.set_verbosity(tf.logging.DEBUG)\n",
    "    \n",
    "    dataset = flowers.get_split('train', flowers_data_dir)\n",
    "    images, _, labels = load_batch(dataset)\n",
    "    \n",
    "    logits = my_cnn(images, num_classes=dataset.num_classes, is_training=False)\n",
    "    predictions = tf.argmax(logits, 1)\n",
    "    \n",
    "    # Define the metrics:\n",
    "    names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({\n",
    "        'eval/Accuracy': slim.metrics.streaming_accuracy(predictions, labels),\n",
    "        'eval/Recall@5': slim.metrics.streaming_recall_at_k(logits, labels, 5),\n",
    "    })\n",
    "\n",
    "    print('Running evaluation Loop...')\n",
    "    checkpoint_path = tf.train.latest_checkpoint(train_dir)\n",
    "    metric_values = slim.evaluation.evaluate_once(\n",
    "        master='',\n",
    "        checkpoint_path=checkpoint_path,\n",
    "        logdir=train_dir,\n",
    "        eval_op=names_to_updates.values(),\n",
    "        final_op=names_to_values.values())\n",
    "\n",
    "    names_to_values = dict(zip(names_to_values.keys(), metric_values))\n",
    "    for name in names_to_values:\n",
    "        print('%s: %f' % (name, names_to_values[name]))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using pre-trained models\n",
    "<a id='Pretrained'></a>\n",
    "\n",
    "Neural nets work best when they have many parameters, making them very flexible function approximators.\n",
    "However, this  means they must be trained on big datasets. Since this process is slow, we provide various pre-trained models - see the list [here](https://github.com/tensorflow/models/tree/master/slim#pre-trained-models).\n",
    "\n",
    "\n",
    "You can either use these models as-is, or you can perform \"surgery\" on them, to modify them for some other task. For example, it is common to \"chop off\" the final pre-softmax layer, and replace it with a new set of weights corresponding to some new set of labels. You can then quickly fine tune the new model on a small new dataset. We illustrate this below, using inception-v1 as the base model. While models like Inception V3 are more powerful, Inception V1 is used for speed purposes.\n",
    "\n",
    "Take into account that VGG and ResNet final layers have only 1000 outputs rather than 1001. The ImageNet dataset provied has an empty background class which can be used to fine-tune the model to other tasks. VGG and ResNet models provided here don't use that class. We provide two examples of using pretrained models: Inception V1 and VGG-19 models to highlight this difference.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download the Inception V1 checkpoint\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from datasets import dataset_utils\n",
    "\n",
    "url = \"http://download.tensorflow.org/models/inception_v1_2016_08_28.tar.gz\"\n",
    "checkpoints_dir = '/tmp/checkpoints'\n",
    "\n",
    "if not tf.gfile.Exists(checkpoints_dir):\n",
    "    tf.gfile.MakeDirs(checkpoints_dir)\n",
    "\n",
    "dataset_utils.download_and_uncompress_tarball(url, checkpoints_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Apply Pre-trained Inception V1 model to Images.\n",
    "\n",
    "We have to convert each image to the size expected by the model checkpoint.\n",
    "There is no easy way to determine this size from the checkpoint itself.\n",
    "So we use a preprocessor to enforce this."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import os\n",
    "import tensorflow as tf\n",
    "\n",
    "try:\n",
    "    import urllib2\n",
    "except ImportError:\n",
    "    import urllib.request as urllib\n",
    "\n",
    "from datasets import imagenet\n",
    "from nets import inception\n",
    "from preprocessing import inception_preprocessing\n",
    "\n",
    "from tensorflow.contrib import slim\n",
    "\n",
    "image_size = inception.inception_v1.default_image_size\n",
    "\n",
    "with tf.Graph().as_default():\n",
    "    url = 'https://upload.wikimedia.org/wikipedia/commons/7/70/EnglishCockerSpaniel_simon.jpg'\n",
    "    image_string = urllib.urlopen(url).read()\n",
    "    image = tf.image.decode_jpeg(image_string, channels=3)\n",
    "    processed_image = inception_preprocessing.preprocess_image(image, image_size, image_size, is_training=False)\n",
    "    processed_images  = tf.expand_dims(processed_image, 0)\n",
    "    \n",
    "    # Create the model, use the default arg scope to configure the batch norm parameters.\n",
    "    with slim.arg_scope(inception.inception_v1_arg_scope()):\n",
    "        logits, _ = inception.inception_v1(processed_images, num_classes=1001, is_training=False)\n",
    "    probabilities = tf.nn.softmax(logits)\n",
    "    \n",
    "    init_fn = slim.assign_from_checkpoint_fn(\n",
    "        os.path.join(checkpoints_dir, 'inception_v1.ckpt'),\n",
    "        slim.get_model_variables('InceptionV1'))\n",
    "    \n",
    "    with tf.Session() as sess:\n",
    "        init_fn(sess)\n",
    "        np_image, probabilities = sess.run([image, probabilities])\n",
    "        probabilities = probabilities[0, 0:]\n",
    "        sorted_inds = [i[0] for i in sorted(enumerate(-probabilities), key=lambda x:x[1])]\n",
    "        \n",
    "    plt.figure()\n",
    "    plt.imshow(np_image.astype(np.uint8))\n",
    "    plt.axis('off')\n",
    "    plt.show()\n",
    "\n",
    "    names = imagenet.create_readable_names_for_imagenet_labels()\n",
    "    for i in range(5):\n",
    "        index = sorted_inds[i]\n",
    "        print('Probability %0.2f%% => [%s]' % (probabilities[index] * 100, names[index]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download the VGG-16 checkpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from datasets import dataset_utils\n",
    "import tensorflow as tf\n",
    "\n",
    "url = \"http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz\"\n",
    "checkpoints_dir = '/tmp/checkpoints'\n",
    "\n",
    "if not tf.gfile.Exists(checkpoints_dir):\n",
    "    tf.gfile.MakeDirs(checkpoints_dir)\n",
    "\n",
    "dataset_utils.download_and_uncompress_tarball(url, checkpoints_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Apply Pre-trained VGG-16 model to Images.\n",
    "\n",
    "We have to convert each image to the size expected by the model checkpoint.\n",
    "There is no easy way to determine this size from the checkpoint itself.\n",
    "So we use a preprocessor to enforce this. Pay attention to the difference caused by 1000 classes instead of 1001."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import os\n",
    "import tensorflow as tf\n",
    "\n",
    "try:\n",
    "    import urllib2\n",
    "except ImportError:\n",
    "    import urllib.request as urllib\n",
    "\n",
    "from datasets import imagenet\n",
    "from nets import vgg\n",
    "from preprocessing import vgg_preprocessing\n",
    "\n",
    "from tensorflow.contrib import slim\n",
    "\n",
    "image_size = vgg.vgg_16.default_image_size\n",
    "\n",
    "with tf.Graph().as_default():\n",
    "    url = 'https://upload.wikimedia.org/wikipedia/commons/d/d9/First_Student_IC_school_bus_202076.jpg'\n",
    "    image_string = urllib.urlopen(url).read()\n",
    "    image = tf.image.decode_jpeg(image_string, channels=3)\n",
    "    processed_image = vgg_preprocessing.preprocess_image(image, image_size, image_size, is_training=False)\n",
    "    processed_images  = tf.expand_dims(processed_image, 0)\n",
    "    \n",
    "    # Create the model, use the default arg scope to configure the batch norm parameters.\n",
    "    with slim.arg_scope(vgg.vgg_arg_scope()):\n",
    "        # 1000 classes instead of 1001.\n",
    "        logits, _ = vgg.vgg_16(processed_images, num_classes=1000, is_training=False)\n",
    "    probabilities = tf.nn.softmax(logits)\n",
    "    \n",
    "    init_fn = slim.assign_from_checkpoint_fn(\n",
    "        os.path.join(checkpoints_dir, 'vgg_16.ckpt'),\n",
    "        slim.get_model_variables('vgg_16'))\n",
    "    \n",
    "    with tf.Session() as sess:\n",
    "        init_fn(sess)\n",
    "        np_image, probabilities = sess.run([image, probabilities])\n",
    "        probabilities = probabilities[0, 0:]\n",
    "        sorted_inds = [i[0] for i in sorted(enumerate(-probabilities), key=lambda x:x[1])]\n",
    "        \n",
    "    plt.figure()\n",
    "    plt.imshow(np_image.astype(np.uint8))\n",
    "    plt.axis('off')\n",
    "    plt.show()\n",
    "    \n",
    "    names = imagenet.create_readable_names_for_imagenet_labels()\n",
    "    for i in range(5):\n",
    "        index = sorted_inds[i]\n",
    "        # Shift the index of a class name by one. \n",
    "        print('Probability %0.2f%% => [%s]' % (probabilities[index] * 100, names[index+1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fine-tune the model on a different set of labels.\n",
    "\n",
    "We will fine tune the inception model on the Flowers dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Note that this may take several minutes.\n",
    "\n",
    "import os\n",
    "\n",
    "from datasets import flowers\n",
    "from nets import inception\n",
    "from preprocessing import inception_preprocessing\n",
    "\n",
    "from tensorflow.contrib import slim\n",
    "image_size = inception.inception_v1.default_image_size\n",
    "\n",
    "\n",
    "def get_init_fn():\n",
    "    \"\"\"Returns a function run by the chief worker to warm-start the training.\"\"\"\n",
    "    checkpoint_exclude_scopes=[\"InceptionV1/Logits\", \"InceptionV1/AuxLogits\"]\n",
    "    \n",
    "    exclusions = [scope.strip() for scope in checkpoint_exclude_scopes]\n",
    "\n",
    "    variables_to_restore = []\n",
    "    for var in slim.get_model_variables():\n",
    "        excluded = False\n",
    "        for exclusion in exclusions:\n",
    "            if var.op.name.startswith(exclusion):\n",
    "                excluded = True\n",
    "                break\n",
    "        if not excluded:\n",
    "            variables_to_restore.append(var)\n",
    "\n",
    "    return slim.assign_from_checkpoint_fn(\n",
    "      os.path.join(checkpoints_dir, 'inception_v1.ckpt'),\n",
    "      variables_to_restore)\n",
    "\n",
    "\n",
    "train_dir = '/tmp/inception_finetuned/'\n",
    "\n",
    "with tf.Graph().as_default():\n",
    "    tf.logging.set_verbosity(tf.logging.INFO)\n",
    "    \n",
    "    dataset = flowers.get_split('train', flowers_data_dir)\n",
    "    images, _, labels = load_batch(dataset, height=image_size, width=image_size)\n",
    "    \n",
    "    # Create the model, use the default arg scope to configure the batch norm parameters.\n",
    "    with slim.arg_scope(inception.inception_v1_arg_scope()):\n",
    "        logits, _ = inception.inception_v1(images, num_classes=dataset.num_classes, is_training=True)\n",
    "        \n",
    "    # Specify the loss function:\n",
    "    one_hot_labels = slim.one_hot_encoding(labels, dataset.num_classes)\n",
    "    slim.losses.softmax_cross_entropy(logits, one_hot_labels)\n",
    "    total_loss = slim.losses.get_total_loss()\n",
    "\n",
    "    # Create some summaries to visualize the training process:\n",
    "    tf.summary.scalar('losses/Total Loss', total_loss)\n",
    "  \n",
    "    # Specify the optimizer and create the train op:\n",
    "    optimizer = tf.train.AdamOptimizer(learning_rate=0.01)\n",
    "    train_op = slim.learning.create_train_op(total_loss, optimizer)\n",
    "    \n",
    "    # Run the training:\n",
    "    final_loss = slim.learning.train(\n",
    "        train_op,\n",
    "        logdir=train_dir,\n",
    "        init_fn=get_init_fn(),\n",
    "        number_of_steps=2)\n",
    "        \n",
    "  \n",
    "print('Finished training. Last batch loss %f' % final_loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Apply fine tuned model to some images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "from datasets import flowers\n",
    "from nets import inception\n",
    "\n",
    "from tensorflow.contrib import slim\n",
    "\n",
    "image_size = inception.inception_v1.default_image_size\n",
    "batch_size = 3\n",
    "\n",
    "with tf.Graph().as_default():\n",
    "    tf.logging.set_verbosity(tf.logging.INFO)\n",
    "    \n",
    "    dataset = flowers.get_split('train', flowers_data_dir)\n",
    "    images, images_raw, labels = load_batch(dataset, height=image_size, width=image_size)\n",
    "    \n",
    "    # Create the model, use the default arg scope to configure the batch norm parameters.\n",
    "    with slim.arg_scope(inception.inception_v1_arg_scope()):\n",
    "        logits, _ = inception.inception_v1(images, num_classes=dataset.num_classes, is_training=True)\n",
    "\n",
    "    probabilities = tf.nn.softmax(logits)\n",
    "    \n",
    "    checkpoint_path = tf.train.latest_checkpoint(train_dir)\n",
    "    init_fn = slim.assign_from_checkpoint_fn(\n",
    "      checkpoint_path,\n",
    "      slim.get_variables_to_restore())\n",
    "    \n",
    "    with tf.Session() as sess:\n",
    "        with slim.queues.QueueRunners(sess):\n",
    "            sess.run(tf.initialize_local_variables())\n",
    "            init_fn(sess)\n",
    "            np_probabilities, np_images_raw, np_labels = sess.run([probabilities, images_raw, labels])\n",
    "    \n",
    "            for i in range(batch_size): \n",
    "                image = np_images_raw[i, :, :, :]\n",
    "                true_label = np_labels[i]\n",
    "                predicted_label = np.argmax(np_probabilities[i, :])\n",
    "                predicted_name = dataset.labels_to_names[predicted_label]\n",
    "                true_name = dataset.labels_to_names[true_label]\n",
    "                \n",
    "                plt.figure()\n",
    "                plt.imshow(image.astype(np.uint8))\n",
    "                plt.title('Ground Truth: [%s], Prediction [%s]' % (true_name, predicted_name))\n",
    "                plt.axis('off')\n",
    "                plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}