Commit c35271ec authored by Raymond Yuan's avatar Raymond Yuan
Browse files

minor updates

parent c4cbe63b
{ {
"nbformat": 4, "cells": [
"nbformat_minor": 0, {
"metadata": { "cell_type": "markdown",
"colab": { "metadata": {
"name": "Image Segmentation", "colab_type": "text",
"version": "0.3.2", "id": "ULKtInm7dAMK"
"provenance": [], },
"private_outputs": true, "source": [
"collapsed_sections": [] "# Image Segmentation with `tf.keras`"
}, ]
"kernelspec": { },
"name": "python3", {
"display_name": "Python 3" "cell_type": "markdown",
}, "metadata": {
"accelerator": "GPU" "colab_type": "text",
}, "id": "7Plun_k1dAML"
"cells": [ },
{ "source": [
"metadata": { "<table class=\"tfo-notebook-buttons\" align=\"left\"><td>\n",
"id": "ULKtInm7dAMK", "<a target=\"_blank\" href=\"http://colab.research.google.com/github/tensorflow/models/blob/segmentation_blogpost/samples/outreach/blogs/segmentation_blogpost/image_segmentation.ipynb\">\n",
"colab_type": "text" " <img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a> \n",
}, "</td><td>\n",
"cell_type": "markdown", "<a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/segmentation_blogpost/samples/outreach/blogs/segmentation_blogpost/image_segmentation.ipynb\"><img width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a></td></table>"
"source": [ ]
"# Image Segmentation with `tf.keras`" },
] {
}, "cell_type": "markdown",
{ "metadata": {
"metadata": { "colab_type": "text",
"id": "7Plun_k1dAML", "id": "cl79rk4KKol8"
"colab_type": "text" },
}, "source": [
"cell_type": "markdown", "In this tutorial we will learn how to segment images. **Segmentation** is the process of generating pixel-wise segmentations giving the class of the object visible at each pixel. For example, we could be identifying the location and boundaries of people within an image or identifying cell nuclei from an image. Formally, image segmentation refers to the process of partitioning an image into a set of pixels that we desire to identify (our target) and the background. \n",
"source": [ "\n",
"<table class=\"tfo-notebook-buttons\" align=\"left\"><td>\n", "Specifically, in this tutorial we will be using the [Kaggle Carvana Image Masking Challenge Dataset](https://www.kaggle.com/c/carvana-image-masking-challenge). \n",
"<a target=\"_blank\" href=\"http://colab.research.google.com/github/tensorflow/models/blob/segmentation_blogpost/samples/outreach/blogs/segmentation_blogpost/image_segmentation.ipynb\">\n", "\n",
" <img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a> \n", "This dataset contains a large number of car images, with each car taken from different angles. In addition, for each car image, we have an associated manually cutout mask; our task will be to automatically create these cutout masks for unseen data. \n",
"</td><td>\n", "\n",
"<a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/segmentation_blogpost/samples/outreach/blogs/segmentation_blogpost/image_segmentation.ipynb\"><img width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a></td></table>" "## Specific concepts that will be covered:\n",
] "In the process, we will build practical experience and develop intuition around the following concepts:\n",
}, "* **[Functional API](https://keras.io/getting-started/functional-api-guide/)** - we will be implementing UNet, a convolutional network model classically used for biomedical image segmentation with the Functional API. \n",
{ " * This model has layers that require multiple input/outputs. This requires the use of the functional API\n",
"metadata": { " * Check out the original [paper](https://arxiv.org/abs/1505.04597), \n",
"id": "cl79rk4KKol8", "U-Net: Convolutional Networks for Biomedical Image Segmentation by Olaf Ronneberger!\n",
"colab_type": "text" "* **Custom Loss Functions and Metrics** - We'll implement a custom loss function using binary [**cross entropy**](https://developers.google.com/machine-learning/glossary/#cross-entropy) and **dice loss**. We'll also implement **dice coefficient** (which is used for our loss) and **mean intersection over union**, that will help us monitor our training process and judge how well we are performing. \n",
}, "* **Saving and loading keras models** - We'll save our best model to disk. When we want to perform inference/evaluate our model, we'll load in the model from disk. \n",
"cell_type": "markdown", "\n",
"source": [ "### We will follow the general workflow:\n",
"In this tutorial we will learn how to segment images. **Segmentation** is the process of generating pixel-wise segmentations giving the class of the object visible at each pixel. For example, we could be identifying the location and boundaries of people within an image or identifying cell nuclei from an image. Formally, image segmentation refers to the process of partitioning an image into a set of pixels that we desire to identify (our target) and the background. \n", "1. Visualize data/perform some exploratory data analysis\n",
"\n", "2. Set up data pipeline and preprocessing\n",
"Specifically, in this tutorial we will be using the [Kaggle Carvana Image Masking Challenge Dataset](https://www.kaggle.com/c/carvana-image-masking-challenge). \n", "3. Build model\n",
"\n", "4. Train model\n",
"This dataset contains a large number of car images, with each car taken from different angles. In addition, for each car image, we have an associated manually cutout mask; our task will be to automatically create these cutout masks for unseen data. \n", "5. Evaluate model\n",
"\n", "6. Repeat\n",
"## Specific concepts that will be covered:\n", "\n",
"In the process, we will build practical experience and develop intuition around the following concepts:\n", "**Audience:** This post is geared towards intermediate users who are comfortable with basic machine learning concepts.\n",
"* **[Functional API](https://keras.io/getting-started/functional-api-guide/)** - we will be implementing UNet, a convolutional network model classically used for biomedical image segmentation with the Functional API. \n", "Note that if you wish to run this notebook, it is highly recommended that you do so with a GPU. \n",
" * This model has layers that require multiple input/outputs. This requires the use of the functional API\n", "\n",
" * Check out the original [paper](https://arxiv.org/abs/1505.04597), \n", "**Time Estimated**: 60 min\n",
"U-Net: Convolutional Networks for Biomedical Image Segmentation by Olaf Ronneberger!\n", "\n",
"* **Custom Loss Functions and Metrics** - We'll implement a custom loss function using binary [**cross entropy**](https://developers.google.com/machine-learning/glossary/#cross-entropy) and **dice loss**. We'll also implement **dice coefficient** (which is used for our loss) and **mean intersection over union**, that will help us monitor our training process and judge how well we are performing. \n", "By: Raymond Yuan, Software Engineering Intern"
"* **Saving and loading keras models** - We'll save our best model to disk. When we want to perform inference/evaluate our model, we'll load in the model from disk. \n", ]
"\n", },
"### We will follow the general workflow:\n", {
"1. Visualize data/perform some exploratory data analysis\n", "cell_type": "code",
"2. Set up data pipeline and preprocessing\n", "execution_count": 0,
"3. Build model\n", "metadata": {
"4. Train model\n", "colab": {},
"5. Evaluate model\n", "colab_type": "code",
"6. Repeat\n", "id": "bJcQiA3OdCY6"
"\n", },
"**Audience:** This post is geared towards intermediate users who are comfortable with basic machine learning concepts.\n", "outputs": [],
"Note that if you wish to run this notebook, it is highly recommended that you do so with a GPU. \n", "source": [
"\n", "!pip install kaggle"
"**Time Estimated**: 60 min\n", ]
"\n", },
"By: Raymond Yuan, Software Engineering Intern" {
] "cell_type": "code",
}, "execution_count": 0,
{ "metadata": {
"metadata": { "colab": {},
"id": "bJcQiA3OdCY6", "colab_type": "code",
"colab_type": "code", "id": "ODNLPGHKKgr-"
"colab": {} },
}, "outputs": [],
"cell_type": "code", "source": [
"source": [ "import os\n",
"!pip install kaggle" "import glob\n",
], "import zipfile\n",
"execution_count": 0, "import functools\n",
"outputs": [] "\n",
}, "import numpy as np\n",
{ "import matplotlib.pyplot as plt\n",
"metadata": { "import matplotlib as mpl\n",
"id": "ODNLPGHKKgr-", "mpl.rcParams['axes.grid'] = False\n",
"colab_type": "code", "mpl.rcParams['figure.figsize'] = (12,12)\n",
"colab": {} "\n",
}, "from sklearn.model_selection import train_test_split\n",
"cell_type": "code", "import matplotlib.image as mpimg\n",
"source": [ "import pandas as pd\n",
"import os\n", "from PIL import Image\n"
"import glob\n", ]
"import zipfile\n", },
"import functools\n", {
"\n", "cell_type": "code",
"import numpy as np\n", "execution_count": 0,
"import matplotlib.pyplot as plt\n", "metadata": {
"import matplotlib as mpl\n", "colab": {},
"mpl.rcParams['axes.grid'] = False\n", "colab_type": "code",
"mpl.rcParams['figure.figsize'] = (12,12)\n", "id": "YQ9VRReUQxXi"
"\n", },
"from sklearn.model_selection import train_test_split\n", "outputs": [],
"import matplotlib.image as mpimg\n", "source": [
"import pandas as pd\n", "import tensorflow as tf\n",
"from PIL import Image\n" "import tensorflow.contrib as tfcontrib\n",
], "from tensorflow.python.keras import layers\n",
"execution_count": 0, "from tensorflow.python.keras import losses\n",
"outputs": [] "from tensorflow.python.keras import models\n",
}, "from tensorflow.python.keras import backend as K "
{ ]
"metadata": { },
"id": "YQ9VRReUQxXi", {
"colab_type": "code", "cell_type": "markdown",
"colab": {} "metadata": {
}, "colab_type": "text",
"cell_type": "code", "id": "RW9gk331S0KA"
"source": [ },
"import tensorflow as tf\n", "source": [
"import tensorflow.contrib as tfcontrib\n", "# Get all the files \n",
"from tensorflow.python.keras import layers\n", "Since this tutorial will be using a dataset from Kaggle, it requires [creating an API Token](https://github.com/Kaggle/kaggle-api#api-credentials) for your Kaggle acccount, and uploading it. "
"from tensorflow.python.keras import losses\n", ]
"from tensorflow.python.keras import models\n", },
"from tensorflow.python.keras import backend as K " {
], "cell_type": "code",
"execution_count": 0, "execution_count": 0,
"outputs": [] "metadata": {
}, "colab": {},
{ "colab_type": "code",
"metadata": { "id": "sAVM1ZTmdAMR"
"id": "RW9gk331S0KA", },
"colab_type": "text" "outputs": [],
}, "source": [
"cell_type": "markdown", "import os\n",
"source": [ "\n",
"# Get all the files \n", "# Upload the API token.\n",
"Since this tutorial will be using a dataset from Kaggle, it requires [creating an API Token](https://github.com/Kaggle/kaggle-api#api-credentials) for your Kaggle acccount, and uploading it. " "def get_kaggle_credentials():\n",
] " token_dir = os.path.join(os.path.expanduser(\"~\"),\".kaggle\")\n",
}, " token_file = os.path.join(token_dir, \"kaggle.json\")\n",
{ " if not os.path.isdir(token_dir):\n",
"metadata": { " os.mkdir(token_dir)\n",
"id": "sAVM1ZTmdAMR", " try:\n",
"colab_type": "code", " with open(token_file,'r') as f:\n",
"colab": {} " pass\n",
}, " except IOError as no_file:\n",
"cell_type": "code", " try:\n",
"source": [ " from google.colab import files\n",
"import os\n", " except ImportError:\n",
"\n", " raise no_file\n",
"# Upload the API token.\n", " \n",
"def get_kaggle_credentials():\n", " uploaded = files.upload()\n",
" token_dir = os.path.join(os.path.expanduser(\"~\"),\".kaggle\")\n", " \n",
" token_file = os.path.join(token_dir, \"kaggle.json\")\n", " if \"kaggle.json\" not in uploaded:\n",
" if not os.path.isdir(token_dir):\n", " raise ValueError(\"You need an API key! see: \"\n",
" os.mkdir(token_dir)\n", " \"https://github.com/Kaggle/kaggle-api#api-credentials\")\n",
" try:\n", " with open(token_file, \"wb\") as f:\n",
" with open(token_file,'r') as f:\n", " f.write(uploaded[\"kaggle.json\"])\n",
" pass\n", " os.chmod(token_file, 600)\n",
" except IOError as no_file:\n", "\n",
" try:\n", "get_kaggle_credentials()\n"
" from google.colab import files\n", ]
" except ImportError:\n", },
" raise no_file\n", {
" \n", "cell_type": "markdown",
" uploaded = files.upload()\n", "metadata": {
" \n", "colab_type": "text",
" if \"kaggle.json\" not in uploaded:\n", "id": "gh6jkMp8dN5B"
" raise ValueError(\"You need an API key! see: \"\n", },
" \"https://github.com/Kaggle/kaggle-api#api-credentials\")\n", "source": [
" with open(token_file, \"wb\") as f:\n", "Only import kaggle after adding the credentials."
" f.write(uploaded[\"kaggle.json\"])\n", ]
" os.chmod(token_file, 600)\n", },
"\n", {
"get_kaggle_credentials()\n" "cell_type": "code",
], "execution_count": 0,
"execution_count": 0, "metadata": {
"outputs": [] "colab": {},
}, "colab_type": "code",
{ "id": "EoWJ1hb9dOV_"
"metadata": { },
"id": "gh6jkMp8dN5B", "outputs": [],
"colab_type": "text" "source": [
}, "import kaggle"
"cell_type": "markdown", ]
"source": [ },
"Only import kaggle after adding the credentials." {
] "cell_type": "markdown",
}, "metadata": {
{ "colab_type": "text",
"metadata": { "id": "wC-byMdadAMT"
"id": "EoWJ1hb9dOV_", },
"colab_type": "code", "source": [
"colab": {} "### We'll download the data from Kaggle\n",
}, "Caution, large download ahead - downloading all files will require 14GB of diskspace. "
"cell_type": "code", ]
"source": [ },
"import kaggle" {
], "cell_type": "code",
"execution_count": 0, "execution_count": 0,
"outputs": [] "metadata": {
}, "colab": {},
{ "colab_type": "code",
"metadata": { "id": "6MOTOyU3dAMU"
"id": "wC-byMdadAMT", },
"colab_type": "text" "outputs": [],
}, "source": [
"cell_type": "markdown", "competition_name = 'carvana-image-masking-challenge'"
"source": [ ]
"### We'll download the data from Kaggle\n", },
"Caution, large download ahead - downloading all files will require 14GB of diskspace. " {
] "cell_type": "code",
}, "execution_count": 0,
{ "metadata": {
"metadata": { "colab": {},
"id": "6MOTOyU3dAMU", "colab_type": "code",
"colab_type": "code", "id": "3gJSCmWjdAMW"
"colab": {} },
}, "outputs": [],
"cell_type": "code", "source": [
"source": [ "# Download data from Kaggle and create a DataFrame.\n",
"competition_name = 'carvana-image-masking-challenge'" "def load_data_from_zip(competition, file):\n",
], " with zipfile.ZipFile(os.path.join(competition, file), \"r\") as zip_ref:\n",
"execution_count": 0, " unzipped_file = zip_ref.namelist()[0]\n",
"outputs": [] " zip_ref.extractall(competition)\n",
}, "\n",
{ "def get_data(competition):\n",
"metadata": { " kaggle.api.competition_download_files(competition, competition)\n",
"id": "3gJSCmWjdAMW", " load_data_from_zip(competition, 'train.zip')\n",
"colab_type": "code", " load_data_from_zip(competition, 'train_masks.zip')\n",
"colab": {} " load_data_from_zip(competition, 'train_masks.csv.zip')\n",
}, " \n"
"cell_type": "code", ]
"source": [ },
"# Download data from Kaggle and create a DataFrame.\n", {
"def load_data_from_zip(competition, file):\n", "cell_type": "markdown",
" with zipfile.ZipFile(os.path.join(competition, file), \"r\") as zip_ref:\n", "metadata": {
" unzipped_file = zip_ref.namelist()[0]\n", "colab_type": "text",
" zip_ref.extractall(competition)\n", "id": "l5SZJKPRdXNX"
"\n", },
"def get_data(competition):\n", "source": [
" kaggle.api.competition_download_files(competition, competition)\n", "You must [accept the competition rules](https://www.kaggle.com/c/carvana-image-masking-challenge/rules) before downloading the data."
" load_data_from_zip(competition, 'train.zip')\n", ]
" load_data_from_zip(competition, 'train_masks.zip')\n", },
" load_data_from_zip(competition, 'train_masks.csv.zip')\n", {
" \n" "cell_type": "code",
], "execution_count": 0,
"execution_count": 0, "metadata": {
"outputs": [] "colab": {},
}, "colab_type": "code",
{ "id": "_SsQjuN2dWmU"
"metadata": { },
"id": "l5SZJKPRdXNX", "outputs": [],
"colab_type": "text" "source": [
}, "get_data(competition_name)"
"cell_type": "markdown", ]
"source": [ },
"You must [accept the competition rules](https://www.kaggle.com/c/carvana-image-masking-challenge/rules) before downloading the data." {
] "cell_type": "code",
}, "execution_count": 0,
{ "metadata": {
"metadata": { "colab": {},
"id": "_SsQjuN2dWmU", "colab_type": "code",
"colab_type": "code", "id": "wT1kb3q0ghhi"
"colab": {} },
}, "outputs": [],
"cell_type": "code", "source": [
"source": [ "img_dir = os.path.join(competition_name, \"train\")\n",
"get_data(competition_name)" "label_dir = os.path.join(competition_name, \"train_masks\")"
], ]
"execution_count": 0, },
"outputs": [] {
}, "cell_type": "code",
{ "execution_count": 0,
"metadata": { "metadata": {
"id": "wT1kb3q0ghhi", "colab": {},
"colab_type": "code", "colab_type": "code",
"colab": {} "id": "9ej-e6cqmRgd"
}, },
"cell_type": "code", "outputs": [],
"source": [ "source": [
"img_dir = os.path.join(competition_name, \"train\")\n", "df_train = pd.read_csv(os.path.join(competition_name, 'train_masks.csv'))\n",
"label_dir = os.path.join(competition_name, \"train_masks\")" "ids_train = df_train['img'].map(lambda s: s.split('.')[0])"
], ]
"execution_count": 0, },
"outputs": [] {
}, "cell_type": "code",
{ "execution_count": 0,
"metadata": { "metadata": {
"id": "9ej-e6cqmRgd", "colab": {},
"colab_type": "code", "colab_type": "code",
"colab": {} "id": "33i4xFXweztH"
}, },
"cell_type": "code", "outputs": [],
"source": [ "source": [
"df_train = pd.read_csv(os.path.join(competition_name, 'train_masks.csv'))\n", "x_train_filenames = []\n",
"ids_train = df_train['img'].map(lambda s: s.split('.')[0])" "y_train_filenames = []\n",
], "for img_id in ids_train:\n",
"execution_count": 0, " x_train_filenames.append(os.path.join(img_dir, \"{}.jpg\".format(img_id)))\n",
"outputs": [] " y_train_filenames.append(os.path.join(label_dir, \"{}_mask.gif\".format(img_id)))"
}, ]
{ },
"metadata": { {
"id": "33i4xFXweztH", "cell_type": "code",
"colab_type": "code", "execution_count": 0,
"colab": {} "metadata": {
}, "colab": {},
"cell_type": "code", "colab_type": "code",
"source": [ "id": "DtutNudKbf70"
"x_train_filenames = []\n", },
"y_train_filenames = []\n", "outputs": [],
"for img_id in ids_train:\n", "source": [
" x_train_filenames.append(os.path.join(img_dir, \"{}.jpg\".format(img_id)))\n", "x_train_filenames, x_val_filenames, y_train_filenames, y_val_filenames = \\\n",
" y_train_filenames.append(os.path.join(label_dir, \"{}_mask.gif\".format(img_id)))" " train_test_split(x_train_filenames, y_train_filenames, test_size=0.2, random_state=42)"
], ]
"execution_count": 0, },
"outputs": [] {
}, "cell_type": "code",
{ "execution_count": 0,
"metadata": { "metadata": {
"id": "DtutNudKbf70", "colab": {},
"colab_type": "code", "colab_type": "code",
"colab": {} "id": "zDycQekHaMqq"
}, },
"cell_type": "code", "outputs": [],
"source": [ "source": [
"x_train_filenames, x_val_filenames, y_train_filenames, y_val_filenames = \\\n", "num_train_examples = len(x_train_filenames)\n",
" train_test_split(x_train_filenames, y_train_filenames, test_size=0.2, random_state=42)" "num_val_examples = len(x_val_filenames)\n",
], "\n",
"execution_count": 0, "print(\"Number of training examples: {}\".format(num_train_examples))\n",
"outputs": [] "print(\"Number of validation examples: {}\".format(num_val_examples))"
}, ]
{ },
"metadata": { {
"id": "zDycQekHaMqq", "cell_type": "markdown",
"colab_type": "code", "metadata": {
"colab": {} "colab_type": "text",
}, "id": "Nhda5fkPS3JD"
"cell_type": "code", },
"source": [ "source": [
"num_train_examples = len(x_train_filenames)\n", "### Here's what the paths look like "
"num_val_examples = len(x_val_filenames)\n", ]
"\n", },
"print(\"Number of training examples: {}\".format(num_train_examples))\n", {
"print(\"Number of validation examples: {}\".format(num_val_examples))" "cell_type": "code",
], "execution_count": 0,
"execution_count": 0, "metadata": {
"outputs": [] "colab": {},
}, "colab_type": "code",
{ "id": "Di1N83ArilzR"
"metadata": { },
"id": "Nhda5fkPS3JD", "outputs": [],
"colab_type": "text" "source": [
}, "x_train_filenames[:10]"
"cell_type": "markdown", ]
"source": [ },
"### Here's what the paths look like " {
] "cell_type": "code",
}, "execution_count": 0,
{ "metadata": {
"metadata": { "colab": {},
"id": "Di1N83ArilzR", "colab_type": "code",
"colab_type": "code", "id": "Gc-BDv1Zio1z"
"colab": {} },
}, "outputs": [],
"cell_type": "code", "source": [
"source": [ "y_train_filenames[:10]"
"x_train_filenames[:10]" ]
], },
"execution_count": 0, {
"outputs": [] "cell_type": "markdown",
}, "metadata": {
{ "colab_type": "text",
"metadata": { "id": "mhvDoZkbcUa1"
"id": "Gc-BDv1Zio1z", },
"colab_type": "code", "source": [
"colab": {} "# Visualize\n",
}, "Let's take a look at some of the examples of different images in our dataset. "
"cell_type": "code", ]
"source": [ },
"y_train_filenames[:10]" {
], "cell_type": "code",
"execution_count": 0, "execution_count": 0,
"outputs": [] "metadata": {
}, "colab": {},
{ "colab_type": "code",
"metadata": { "id": "qUA6SDLhozjj"
"id": "mhvDoZkbcUa1", },
"colab_type": "text" "outputs": [],
}, "source": [
"cell_type": "markdown", "display_num = 5\n",
"source": [ "\n",
"# Visualize\n", "r_choices = np.random.choice(num_train_examples, display_num)\n",
"Let's take a look at some of the examples of different images in our dataset. " "\n",
] "plt.figure(figsize=(10, 15))\n",
}, "for i in range(0, display_num * 2, 2):\n",
{ " img_num = r_choices[i // 2]\n",
"metadata": { " x_pathname = x_train_filenames[img_num]\n",
"id": "qUA6SDLhozjj", " y_pathname = y_train_filenames[img_num]\n",
"colab_type": "code", " \n",
"colab": {} " plt.subplot(display_num, 2, i + 1)\n",
}, " plt.imshow(mpimg.imread(x_pathname))\n",
"cell_type": "code", " plt.title(\"Original Image\")\n",
"source": [ " \n",
"display_num = 5\n", " example_labels = Image.open(y_pathname)\n",
"\n", " label_vals = np.unique(example_labels)\n",
"r_choices = np.random.choice(num_train_examples, display_num)\n", " \n",
"\n", " plt.subplot(display_num, 2, i + 2)\n",
"plt.figure(figsize=(10, 15))\n", " plt.imshow(example_labels)\n",
"for i in range(0, display_num * 2, 2):\n", " plt.title(\"Masked Image\") \n",
" img_num = r_choices[i // 2]\n", " \n",
" x_pathname = x_train_filenames[img_num]\n", "plt.suptitle(\"Examples of Images and their Masks\")\n",
" y_pathname = y_train_filenames[img_num]\n", "plt.show()"
" \n", ]
" plt.subplot(display_num, 2, i + 1)\n", },
" plt.imshow(mpimg.imread(x_pathname))\n", {
" plt.title(\"Original Image\")\n", "cell_type": "markdown",
" \n", "metadata": {
" example_labels = Image.open(y_pathname)\n", "colab_type": "text",
" label_vals = np.unique(example_labels)\n", "id": "d4CPgvPiToB_"
" \n", },
" plt.subplot(display_num, 2, i + 2)\n", "source": [
" plt.imshow(example_labels)\n", "# Set up "
" plt.title(\"Masked Image\") \n", ]
" \n", },
"plt.suptitle(\"Examples of Images and their Masks\")\n", {
"plt.show()" "cell_type": "markdown",
], "metadata": {
"execution_count": 0, "colab_type": "text",
"outputs": [] "id": "HfeMRgyoa2n6"
}, },
{ "source": [
"metadata": { "Let’s begin by setting up some parameters. We’ll standardize and resize all the shapes of the images. We’ll also set up some training parameters: "
"id": "d4CPgvPiToB_", ]
"colab_type": "text" },
}, {
"cell_type": "markdown", "cell_type": "code",
"source": [ "execution_count": 0,
"# Set up " "metadata": {
] "colab": {},
}, "colab_type": "code",
{ "id": "oeDoiSFlothe"
"metadata": { },
"id": "HfeMRgyoa2n6", "outputs": [],
"colab_type": "text" "source": [
}, "img_shape = (256, 256, 3)\n",
"cell_type": "markdown", "batch_size = 3\n",
"source": [ "epochs = 5"
"Let’s begin by setting up some parameters. We’ll standardize and resize all the shapes of the images. We’ll also set up some training parameters: " ]
] },
}, {
{ "cell_type": "markdown",
"metadata": { "metadata": {
"id": "oeDoiSFlothe", "colab_type": "text",
"colab_type": "code", "id": "8_d5ATP21npW"
"colab": {} },
}, "source": [
"cell_type": "code", "Using these exact same parameters may be too computationally intensive for your hardware, so tweak the parameters accordingly. Also, it is important to note that due to the architecture of our UNet version, the size of the image must be evenly divisible by a factor of 32, as we down sample the spatial resolution by a factor of 2 with each `MaxPooling2Dlayer`.\n",
"source": [ "\n",
"img_shape = (256, 256, 3)\n", "\n",
"batch_size = 3\n", "If your machine can support it, you will achieve better performance using a higher resolution input image (e.g. 512 by 512) as this will allow more precise localization and less loss of information during encoding. In addition, you can also make the model deeper.\n",
"epochs = 5" "\n",
], "\n",
"execution_count": 0, "Alternatively, if your machine cannot support it, lower the image resolution and/or batch size. Note that lowering the image resolution will decrease performance and lowering batch size will increase training time.\n"
"outputs": [] ]
}, },
{ {
"metadata": { "cell_type": "markdown",
"id": "8_d5ATP21npW", "metadata": {
"colab_type": "text" "colab_type": "text",
}, "id": "_HONB9JbXxDM"
"cell_type": "markdown", },
"source": [ "source": [
"Using these exact same parameters may be too computationally intensive for your hardware, so tweak the parameters accordingly. Also, it is important to note that due to the architecture of our UNet version, the size of the image must be evenly divisible by a factor of 32, as we down sample the spatial resolution by a factor of 2 with each `MaxPooling2Dlayer`.\n", "# Build our input pipeline with `tf.data`\n",
"\n", "Since we begin with filenames, we will need to build a robust and scalable data pipeline that will play nicely with our model. If you are unfamiliar with **tf.data** you should check out my other tutorial introducing the concept! \n",
"\n", "\n",
"If your machine can support it, you will achieve better performance using a higher resolution input image (e.g. 512 by 512) as this will allow more precise localization and less loss of information during encoding. In addition, you can also make the model deeper.\n", "### Our input pipeline will consist of the following steps:\n",
"\n", "1. Read the bytes of the file in from the filename - for both the image and the label. Recall that our labels are actually images with each pixel annotated as car or background (1, 0). \n",
"\n", "2. Decode the bytes into an image format\n",
"Alternatively, if your machine cannot support it, lower the image resolution and/or batch size. Note that lowering the image resolution will decrease performance and lowering batch size will increase training time.\n" "3. Apply image transformations: (optional, according to input parameters)\n",
] " * `resize` - Resize our images to a standard size (as determined by eda or computation/memory restrictions)\n",
}, " * The reason why this is optional is that U-Net is a fully convolutional network (e.g. with no fully connected units) and is thus not dependent on the input size. However, if you choose to not resize the images, you must use a batch size of 1, since you cannot batch variable image size together\n",
{ " * Alternatively, you could also bucket your images together and resize them per mini-batch to avoid resizing images as much, as resizing may affect your performance through interpolation, etc.\n",
"metadata": { " * `hue_delta` - Adjusts the hue of an RGB image by a random factor. This is only applied to the actual image (not our label image). The `hue_delta` must be in the interval `[0, 0.5]` \n",
"id": "_HONB9JbXxDM", " * `horizontal_flip` - flip the image horizontally along the central axis with a 0.5 probability. This transformation must be applied to both the label and the actual image. \n",
"colab_type": "text" " * `width_shift_range` and `height_shift_range` are ranges (as a fraction of total width or height) within which to randomly translate the image either horizontally or vertically. This transformation must be applied to both the label and the actual image. \n",
}, " * `rescale` - rescale the image by a certain factor, e.g. 1/ 255.\n",
"cell_type": "markdown", "4. Shuffle the data, repeat the data (so we can iterate over it multiple times across epochs), batch the data, then prefetch a batch (for efficiency).\n",
"source": [ "\n",
"# Build our input pipeline with `tf.data`\n", "It is important to note that these transformations that occur in your data pipeline must be symbolic transformations. "
"Since we begin with filenames, we will need to build a robust and scalable data pipeline that will play nicely with our model. If you are unfamiliar with **tf.data** you should check out my other tutorial introducing the concept! \n", ]
"\n", },
"### Our input pipeline will consist of the following steps:\n", {
"1. Read the bytes of the file in from the filename - for both the image and the label. Recall that our labels are actually images with each pixel annotated as car or background (1, 0). \n", "cell_type": "markdown",
"2. Decode the bytes into an image format\n", "metadata": {
"3. Apply image transformations: (optional, according to input parameters)\n", "colab_type": "text",
" * `resize` - Resize our images to a standard size (as determined by eda or computation/memory restrictions)\n", "id": "EtRA8vILbx2_"
" * The reason why this is optional is that U-Net is a fully convolutional network (e.g. with no fully connected units) and is thus not dependent on the input size. However, if you choose to not resize the images, you must use a batch size of 1, since you cannot batch variable image size together\n", },
" * Alternatively, you could also bucket your images together and resize them per mini-batch to avoid resizing images as much, as resizing may affect your performance through interpolation, etc.\n", "source": [
" * `hue_delta` - Adjusts the hue of an RGB image by a random factor. This is only applied to the actual image (not our label image). The `hue_delta` must be in the interval `[0, 0.5]` \n", "#### Why do we do these image transformations?\n",
" * `horizontal_flip` - flip the image horizontally along the central axis with a 0.5 probability. This transformation must be applied to both the label and the actual image. \n", "This is known as **data augmentation**. Data augmentation \"increases\" the amount of training data by augmenting them via a number of random transformations. During training time, our model would never see twice the exact same picture. This helps prevent [overfitting](https://developers.google.com/machine-learning/glossary/#overfitting) and helps the model generalize better to unseen data."
" * `width_shift_range` and `height_shift_range` are ranges (as a fraction of total width or height) within which to randomly translate the image either horizontally or vertically. This transformation must be applied to both the label and the actual image. \n", ]
" * `rescale` - rescale the image by a certain factor, e.g. 1/ 255.\n", },
"4. Shuffle the data, repeat the data (so we can iterate over it multiple times across epochs), batch the data, then prefetch a batch (for efficiency).\n", {
"\n", "cell_type": "markdown",
"It is important to note that these transformations that occur in your data pipeline must be symbolic transformations. " "metadata": {
] "colab_type": "text",
}, "id": "3aGi28u8Cq9M"
{ },
"metadata": { "source": [
"id": "EtRA8vILbx2_", "## Processing each pathname"
"colab_type": "text" ]
}, },
"cell_type": "markdown", {
"source": [ "cell_type": "code",
"#### Why do we do these image transformations?\n", "execution_count": 0,
"This is known as **data augmentation**. Data augmentation \"increases\" the amount of training data by augmenting them via a number of random transformations. During training time, our model would never see twice the exact same picture. This helps prevent [overfitting](https://developers.google.com/machine-learning/glossary/#overfitting) and helps the model generalize better to unseen data." "metadata": {
] "colab": {},
}, "colab_type": "code",
{ "id": "Fb_psznAggwr"
"metadata": { },
"id": "3aGi28u8Cq9M", "outputs": [],
"colab_type": "text" "source": [
}, "def _process_pathnames(fname, label_path):\n",
"cell_type": "markdown", " # We map this function onto each pathname pair \n",
"source": [ " img_str = tf.read_file(fname)\n",
"## Processing each pathname" " img = tf.image.decode_jpeg(img_str, channels=3)\n",
] "\n",
}, " label_img_str = tf.read_file(label_path)\n",
{ " # These are gif images so they return as (num_frames, h, w, c)\n",
"metadata": { " label_img = tf.image.decode_gif(label_img_str)[0]\n",
"id": "Fb_psznAggwr", " # The label image should only have values of 1 or 0, indicating pixel wise\n",
"colab_type": "code", " # object (car) or not (background). We take the first channel only. \n",
"colab": {} " label_img = label_img[:, :, 0]\n",
}, " label_img = tf.expand_dims(label_img, axis=-1)\n",
"cell_type": "code", " return img, label_img"
"source": [ ]
"def _process_pathnames(fname, label_path):\n", },
" # We map this function onto each pathname pair \n", {
" img_str = tf.read_file(fname)\n", "cell_type": "markdown",
" img = tf.image.decode_jpeg(img_str, channels=3)\n", "metadata": {
"\n", "colab_type": "text",
" label_img_str = tf.read_file(label_path)\n", "id": "Y4UE28JiCuOk"
" # These are gif images so they return as (num_frames, h, w, c)\n", },
" label_img = tf.image.decode_gif(label_img_str)[0]\n", "source": [
" # The label image should only have values of 1 or 0, indicating pixel wise\n", "## Shifting the image"
" # object (car) or not (background). We take the first channel only. \n", ]
" label_img = label_img[:, :, 0]\n", },
" label_img = tf.expand_dims(label_img, axis=-1)\n", {
" return img, label_img" "cell_type": "code",
], "execution_count": 0,
"execution_count": 0, "metadata": {
"outputs": [] "colab": {},
}, "colab_type": "code",
{ "id": "xdY046OqtGVH"
"metadata": { },
"id": "Y4UE28JiCuOk", "outputs": [],
"colab_type": "text" "source": [
}, "def shift_img(output_img, label_img, width_shift_range, height_shift_range):\n",
"cell_type": "markdown", " \"\"\"This fn will perform the horizontal or vertical shift\"\"\"\n",
"source": [ " if width_shift_range or height_shift_range:\n",
"## Shifting the image" " if width_shift_range:\n",
] " width_shift_range = tf.random_uniform([], \n",
}, " -width_shift_range * img_shape[1],\n",
{ " width_shift_range * img_shape[1])\n",
"metadata": { " if height_shift_range:\n",
"id": "xdY046OqtGVH", " height_shift_range = tf.random_uniform([],\n",
"colab_type": "code", " -height_shift_range * img_shape[0],\n",
"colab": {} " height_shift_range * img_shape[0])\n",
}, " # Translate both \n",
"cell_type": "code", " output_img = tfcontrib.image.translate(output_img,\n",
"source": [ " [width_shift_range, height_shift_range])\n",
"def shift_img(output_img, label_img, width_shift_range, height_shift_range):\n", " label_img = tfcontrib.image.translate(label_img,\n",
" \"\"\"This fn will perform the horizontal or vertical shift\"\"\"\n", " [width_shift_range, height_shift_range])\n",
" if width_shift_range or height_shift_range:\n", " return output_img, label_img"
" if width_shift_range:\n", ]
" width_shift_range = tf.random_uniform([], \n", },
" -width_shift_range * img_shape[1],\n", {
" width_shift_range * img_shape[1])\n", "cell_type": "markdown",
" if height_shift_range:\n", "metadata": {
" height_shift_range = tf.random_uniform([],\n", "colab_type": "text",
" -height_shift_range * img_shape[0],\n", "id": "qY253aZfCwd2"
" height_shift_range * img_shape[0])\n", },
" # Translate both \n", "source": [
" output_img = tfcontrib.image.translate(output_img,\n", "## Flipping the image randomly "
" [width_shift_range, height_shift_range])\n", ]
" label_img = tfcontrib.image.translate(label_img,\n", },
" [width_shift_range, height_shift_range])\n", {
" return output_img, label_img" "cell_type": "code",
], "execution_count": 0,
"execution_count": 0, "metadata": {
"outputs": [] "colab": {},
}, "colab_type": "code",
{ "id": "OogLSplstur9"
"metadata": { },
"id": "qY253aZfCwd2", "outputs": [],
"colab_type": "text" "source": [
}, "def flip_img(horizontal_flip, tr_img, label_img):\n",
"cell_type": "markdown", " if horizontal_flip:\n",
"source": [ " flip_prob = tf.random_uniform([], 0.0, 1.0)\n",
"## Flipping the image randomly " " tr_img, label_img = tf.cond(tf.less(flip_prob, 0.5),\n",
] " lambda: (tf.image.flip_left_right(tr_img), tf.image.flip_left_right(label_img)),\n",
}, " lambda: (tr_img, label_img))\n",
{ " return tr_img, label_img"
"metadata": { ]
"id": "OogLSplstur9", },
"colab_type": "code", {
"colab": {} "cell_type": "markdown",
}, "metadata": {
"cell_type": "code", "colab_type": "text",
"source": [ "id": "_YIJLIr5Cyyr"
"def flip_img(horizontal_flip, tr_img, label_img):\n", },
" if horizontal_flip:\n", "source": [
" flip_prob = tf.random_uniform([], 0.0, 1.0)\n", "## Assembling our transformations into our augment function"
" tr_img, label_img = tf.cond(tf.less(flip_prob, 0.5),\n", ]
" lambda: (tf.image.flip_left_right(tr_img), tf.image.flip_left_right(label_img)),\n", },
" lambda: (tr_img, label_img))\n", {
" return tr_img, label_img" "cell_type": "code",
], "execution_count": 0,
"execution_count": 0, "metadata": {
"outputs": [] "colab": {},
}, "colab_type": "code",
{ "id": "18WA0Sl3olyn"
"metadata": { },
"id": "_YIJLIr5Cyyr", "outputs": [],
"colab_type": "text" "source": [
}, "def _augment(img,\n",
"cell_type": "markdown", " label_img,\n",
"source": [ " resize=None, # Resize the image to some size e.g. [256, 256]\n",
"## Assembling our transformations into our augment function" " scale=1, # Scale image e.g. 1 / 255.\n",
] " hue_delta=0, # Adjust the hue of an RGB image by random factor\n",
}, " horizontal_flip=False, # Random left right flip,\n",
{ " width_shift_range=0, # Randomly translate the image horizontally\n",
"metadata": { " height_shift_range=0): # Randomly translate the image vertically \n",
"id": "18WA0Sl3olyn", " if resize is not None:\n",
"colab_type": "code", " # Resize both images\n",
"colab": {} " label_img = tf.image.resize_images(label_img, resize)\n",
}, " img = tf.image.resize_images(img, resize)\n",
"cell_type": "code", " \n",
"source": [ " if hue_delta:\n",
"def _augment(img,\n", " img = tf.image.random_hue(img, hue_delta)\n",
" label_img,\n", " \n",
" resize=None, # Resize the image to some size e.g. [256, 256]\n", " img, label_img = flip_img(horizontal_flip, img, label_img)\n",
" scale=1, # Scale image e.g. 1 / 255.\n", " img, label_img = shift_img(img, label_img, width_shift_range, height_shift_range)\n",
" hue_delta=0, # Adjust the hue of an RGB image by random factor\n", " label_img = tf.to_float(label_img) * scale\n",
" horizontal_flip=False, # Random left right flip,\n", " img = tf.to_float(img) * scale \n",
" width_shift_range=0, # Randomly translate the image horizontally\n", " return img, label_img"
" height_shift_range=0): # Randomly translate the image vertically \n", ]
" if resize is not None:\n", },
" # Resize both images\n", {
" label_img = tf.image.resize_images(label_img, resize)\n", "cell_type": "code",
" img = tf.image.resize_images(img, resize)\n", "execution_count": 0,
" \n", "metadata": {
" if hue_delta:\n", "colab": {},
" img = tf.image.random_hue(img, hue_delta)\n", "colab_type": "code",
" \n", "id": "tkNqQaR2HQbd"
" img, label_img = flip_img(horizontal_flip, img, label_img)\n", },
" img, label_img = shift_img(img, label_img, width_shift_range, height_shift_range)\n", "outputs": [],
" label_img = tf.to_float(label_img) * scale\n", "source": [
" img = tf.to_float(img) * scale \n", "def get_baseline_dataset(filenames, \n",
" return img, label_img" " labels,\n",
], " preproc_fn=functools.partial(_augment),\n",
"execution_count": 0, " threads=5, \n",
"outputs": [] " batch_size=batch_size,\n",
}, " shuffle=True): \n",
{ " num_x = len(filenames)\n",
"metadata": { " # Create a dataset from the filenames and labels\n",
"id": "tkNqQaR2HQbd", " dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))\n",
"colab_type": "code", " # Map our preprocessing function to every element in our dataset, taking\n",
"colab": {} " # advantage of multithreading\n",
}, " dataset = dataset.map(_process_pathnames, num_parallel_calls=threads)\n",
"cell_type": "code", " if preproc_fn.keywords is not None and 'resize' not in preproc_fn.keywords:\n",
"source": [ " assert batch_size == 1, \"Batching images must be of the same size\"\n",
"def get_baseline_dataset(filenames, \n", "\n",
" labels,\n", " dataset = dataset.map(preproc_fn, num_parallel_calls=threads)\n",
" preproc_fn=functools.partial(_augment),\n", " \n",
" threads=5, \n", " if shuffle:\n",
" batch_size=batch_size,\n", " dataset = dataset.shuffle(num_x)\n",
" shuffle=True): \n", " \n",
" num_x = len(filenames)\n", " \n",
" # Create a dataset from the filenames and labels\n", " # It's necessary to repeat our data for all epochs \n",
" dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))\n", " dataset = dataset.repeat().batch(batch_size)\n",
" # Map our preprocessing function to every element in our dataset, taking\n", " return dataset"
" # advantage of multithreading\n", ]
" dataset = dataset.map(_process_pathnames, num_parallel_calls=threads)\n", },
" if preproc_fn.keywords is not None and 'resize' not in preproc_fn.keywords:\n", {
" assert batch_size == 1, \"Batching images must be of the same size\"\n", "cell_type": "markdown",
"\n", "metadata": {
" dataset = dataset.map(preproc_fn, num_parallel_calls=threads)\n", "colab_type": "text",
" \n", "id": "zwtgius5CRKc"
" if shuffle:\n", },
" dataset = dataset.shuffle(num_x)\n", "source": [
" \n", "## Set up train and validation datasets\n",
" \n", "Note that we apply image augmentation to our training dataset but not our validation dataset. "
" # It's necessary to repeat our data for all epochs \n", ]
" dataset = dataset.repeat().batch(batch_size)\n", },
" return dataset" {
], "cell_type": "code",
"execution_count": 0, "execution_count": 0,
"outputs": [] "metadata": {
}, "colab": {},
{ "colab_type": "code",
"metadata": { "id": "iu5WmYmOwKrV"
"id": "zwtgius5CRKc", },
"colab_type": "text" "outputs": [],
}, "source": [
"cell_type": "markdown", "tr_cfg = {\n",
"source": [ " 'resize': [img_shape[0], img_shape[1]],\n",
"## Set up train and validation datasets\n", " 'scale': 1 / 255.,\n",
"Note that we apply image augmentation to our training dataset but not our validation dataset. " " 'hue_delta': 0.1,\n",
] " 'horizontal_flip': True,\n",
}, " 'width_shift_range': 0.1,\n",
{ " 'height_shift_range': 0.1\n",
"metadata": { "}\n",
"id": "iu5WmYmOwKrV", "tr_preprocessing_fn = functools.partial(_augment, **tr_cfg)"
"colab_type": "code", ]
"colab": {} },
}, {
"cell_type": "code", "cell_type": "code",
"source": [ "execution_count": 0,
"tr_cfg = {\n", "metadata": {
" 'resize': [img_shape[0], img_shape[1]],\n", "colab": {},
" 'scale': 1 / 255.,\n", "colab_type": "code",
" 'hue_delta': 0.1,\n", "id": "RtzLkDFMpF0T"
" 'horizontal_flip': True,\n", },
" 'width_shift_range': 0.1,\n", "outputs": [],
" 'height_shift_range': 0.1\n", "source": [
"}\n", "val_cfg = {\n",
"tr_preprocessing_fn = functools.partial(_augment, **tr_cfg)" " 'resize': [img_shape[0], img_shape[1]],\n",
], " 'scale': 1 / 255.,\n",
"execution_count": 0, "}\n",
"outputs": [] "val_preprocessing_fn = functools.partial(_augment, **val_cfg)"
}, ]
{ },
"metadata": { {
"id": "RtzLkDFMpF0T", "cell_type": "code",
"colab_type": "code", "execution_count": 0,
"colab": {} "metadata": {
}, "colab": {},
"cell_type": "code", "colab_type": "code",
"source": [ "id": "5cNpECdkaafo"
"val_cfg = {\n", },
" 'resize': [img_shape[0], img_shape[1]],\n", "outputs": [],
" 'scale': 1 / 255.,\n", "source": [
"}\n", "train_ds = get_baseline_dataset(x_train_filenames,\n",
"val_preprocessing_fn = functools.partial(_augment, **val_cfg)" " y_train_filenames,\n",
], " preproc_fn=tr_preprocessing_fn,\n",
"execution_count": 0, " batch_size=batch_size)\n",
"outputs": [] "val_ds = get_baseline_dataset(x_val_filenames,\n",
}, " y_val_filenames, \n",
{ " preproc_fn=val_preprocessing_fn,\n",
"metadata": { " batch_size=batch_size)"
"id": "5cNpECdkaafo", ]
"colab_type": "code", },
"colab": {} {
}, "cell_type": "markdown",
"cell_type": "code", "metadata": {
"source": [ "colab_type": "text",
"train_ds = get_baseline_dataset(x_train_filenames,\n", "id": "Yasuvr5IbFlM"
" y_train_filenames,\n", },
" preproc_fn=tr_preprocessing_fn,\n", "source": [
" batch_size=batch_size)\n", "## Let's see if our image augmentor data pipeline is producing expected results"
"val_ds = get_baseline_dataset(x_val_filenames,\n", ]
" y_val_filenames, \n", },
" preproc_fn=val_preprocessing_fn,\n", {
" batch_size=batch_size)" "cell_type": "code",
], "execution_count": 0,
"execution_count": 0, "metadata": {
"outputs": [] "colab": {},
}, "colab_type": "code",
{ "id": "hjoUqbPdHQej"
"metadata": { },
"id": "Yasuvr5IbFlM", "outputs": [],
"colab_type": "text" "source": [
}, "temp_ds = get_baseline_dataset(x_train_filenames, \n",
"cell_type": "markdown", " y_train_filenames,\n",
"source": [ " preproc_fn=tr_preprocessing_fn,\n",
"## Let's see if our image augmentor data pipeline is producing expected results" " batch_size=1,\n",
] " shuffle=False)\n",
}, "# Let's examine some of these augmented images\n",
{ "data_aug_iter = temp_ds.make_one_shot_iterator()\n",
"metadata": { "next_element = data_aug_iter.get_next()\n",
"id": "hjoUqbPdHQej", "with tf.Session() as sess: \n",
"colab_type": "code", " batch_of_imgs, label = sess.run(next_element)\n",
"colab": {} "\n",
}, " # Running next element in our graph will produce a batch of images\n",
"cell_type": "code", " plt.figure(figsize=(10, 10))\n",
"source": [ " img = batch_of_imgs[0]\n",
"temp_ds = get_baseline_dataset(x_train_filenames, \n", "\n",
" y_train_filenames,\n", " plt.subplot(1, 2, 1)\n",
" preproc_fn=tr_preprocessing_fn,\n", " plt.imshow(img)\n",
" batch_size=1,\n", "\n",
" shuffle=False)\n", " plt.subplot(1, 2, 2)\n",
"# Let's examine some of these augmented images\n", " plt.imshow(label[0, :, :, 0])\n",
"data_aug_iter = temp_ds.make_one_shot_iterator()\n", " plt.show()"
"next_element = data_aug_iter.get_next()\n", ]
"with tf.Session() as sess: \n", },
" batch_of_imgs, label = sess.run(next_element)\n", {
"\n", "cell_type": "markdown",
" # Running next element in our graph will produce a batch of images\n", "metadata": {
" plt.figure(figsize=(10, 10))\n", "colab_type": "text",
" img = batch_of_imgs[0]\n", "id": "fvtxCncKsoRd"
"\n", },
" plt.subplot(1, 2, 1)\n", "source": [
" plt.imshow(img)\n", "# Build the model\n",
"\n", "We'll build the U-Net model. U-Net is especially good with segmentation tasks because it can localize well to provide high resolution segmentation masks. In addition, it works well with small datasets and is relatively robust against overfitting as the training data is in terms of the number of patches within an image, which is much larger than the number of training images itself. Unlike the original model, we will add batch normalization to each of our blocks. \n",
" plt.subplot(1, 2, 2)\n", "\n",
" plt.imshow(label[0, :, :, 0])\n", "The Unet is built with an encoder portion and a decoder portion. The encoder portion is composed of a linear stack of [`Conv`](https://developers.google.com/machine-learning/glossary/#convolution), `BatchNorm`, and [`Relu`](https://developers.google.com/machine-learning/glossary/#ReLU) operations followed by a [`MaxPool`](https://developers.google.com/machine-learning/glossary/#pooling). Each `MaxPool` will reduce the spatial resolution of our feature map by a factor of 2. We keep track of the outputs of each block as we feed these high resolution feature maps with the decoder portion. The Decoder portion is comprised of UpSampling2D, Conv, BatchNorm, and Relus. Note that we concatenate the feature map of the same size on the decoder side. Finally, we add a final Conv operation that performs a convolution along the channels for each individual pixel (kernel size of (1, 1)) that outputs our final segmentation mask in grayscale. \n",
" plt.show()" "## The Keras Functional API\n",
], "The Keras functional API is used when you have multi-input/output models, shared layers, etc. It's a powerful API that allows you to manipulate tensors and build complex graphs with intertwined datastreams easily. In addition it makes **layers** and **models** both callable on tensors. \n",
"execution_count": 0, " * To see more examples check out the [get started guide](https://keras.io/getting-started/functional-api-guide/). \n",
"outputs": [] " \n",
}, " \n",
{ " We'll build these helper functions that will allow us to ensemble our model block operations easily and simply. "
"metadata": { ]
"id": "xszBW-gL1Cyq", },
"colab_type": "code", {
"colab": {} "cell_type": "code",
}, "execution_count": 0,
"cell_type": "code", "metadata": {
"source": [ "colab": {},
"label.shape" "colab_type": "code",
], "id": "zfew1i1F6bK-"
"execution_count": 0, },
"outputs": [] "outputs": [],
}, "source": [
{ "def conv_block(input_tensor, num_filters):\n",
"metadata": { " encoder = layers.Conv2D(num_filters, (3, 3), padding='same')(input_tensor)\n",
"id": "x1LfMEWjkluS", " encoder = layers.BatchNormalization()(encoder)\n",
"colab_type": "code", " encoder = layers.Activation('relu')(encoder)\n",
"colab": {} " encoder = layers.Conv2D(num_filters, (3, 3), padding='same')(encoder)\n",
}, " encoder = layers.BatchNormalization()(encoder)\n",
"cell_type": "code", " encoder = layers.Activation('relu')(encoder)\n",
"source": [ " return encoder\n",
"train_ds" "\n",
], "def encoder_block(input_tensor, num_filters):\n",
"execution_count": 0, " encoder = conv_block(input_tensor, num_filters)\n",
"outputs": [] " encoder_pool = layers.MaxPooling2D((2, 2), strides=(2, 2))(encoder)\n",
}, " \n",
{ " return encoder_pool, encoder\n",
"metadata": { "\n",
"id": "fvtxCncKsoRd", "def decoder_block(input_tensor, concat_tensor, num_filters):\n",
"colab_type": "text" " decoder = layers.Conv2DTranspose(num_filters, (2, 2), strides=(2, 2), padding='same')(input_tensor)\n",
}, " decoder = layers.concatenate([concat_tensor, decoder], axis=-1)\n",
"cell_type": "markdown", " decoder = layers.BatchNormalization()(decoder)\n",
"source": [ " decoder = layers.Activation('relu')(decoder)\n",
"# Build the model\n", " decoder = layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)\n",
"We'll build the U-Net model. U-Net is especially good with segmentation tasks because it can localize well to provide high resolution segmentation masks. In addition, it works well with small datasets and is relatively robust against overfitting as the training data is in terms of the number of patches within an image, which is much larger than the number of training images itself. Unlike the original model, we will add batch normalization to each of our blocks. \n", " decoder = layers.BatchNormalization()(decoder)\n",
"\n", " decoder = layers.Activation('relu')(decoder)\n",
"The Unet is built with an encoder portion and a decoder portion. The encoder portion is composed of a linear stack of [`Conv`](https://developers.google.com/machine-learning/glossary/#convolution), `BatchNorm`, and [`Relu`](https://developers.google.com/machine-learning/glossary/#ReLU) operations followed by a [`MaxPool`](https://developers.google.com/machine-learning/glossary/#pooling). Each `MaxPool` will reduce the spatial resolution of our feature map by a factor of 2. We keep track of the outputs of each block as we feed these high resolution feature maps with the decoder portion. The Decoder portion is comprised of UpSampling2D, Conv, BatchNorm, and Relus. Note that we concatenate the feature map of the same size on the decoder side. Finally, we add a final Conv operation that performs a convolution along the channels for each individual pixel (kernel size of (1, 1)) that outputs our final segmentation mask in grayscale. \n", " decoder = layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)\n",
"## The Keras Functional API\n", " decoder = layers.BatchNormalization()(decoder)\n",
"The Keras functional API is used when you have multi-input/output models, shared layers, etc. It's a powerful API that allows you to manipulate tensors and build complex graphs with intertwined datastreams easily. In addition it makes **layers** and **models** both callable on tensors. \n", " decoder = layers.Activation('relu')(decoder)\n",
" * To see more examples check out the [get started guide](https://keras.io/getting-started/functional-api-guide/). \n", " return decoder"
" \n", ]
" \n", },
" We'll build these helper functions that will allow us to ensemble our model block operations easily and simply. " {
] "cell_type": "code",
}, "execution_count": 0,
{ "metadata": {
"metadata": { "colab": {},
"id": "zfew1i1F6bK-", "colab_type": "code",
"colab_type": "code", "id": "xRLp21S_hpTn"
"colab": {} },
}, "outputs": [],
"cell_type": "code", "source": [
"source": [ "inputs = layers.Input(shape=img_shape)\n",
"def conv_block(input_tensor, num_filters):\n", "# 256\n",
" encoder = layers.Conv2D(num_filters, (3, 3), padding='same')(input_tensor)\n", "\n",
" encoder = layers.BatchNormalization()(encoder)\n", "encoder0_pool, encoder0 = encoder_block(inputs, 32)\n",
" encoder = layers.Activation('relu')(encoder)\n", "# 128\n",
" encoder = layers.Conv2D(num_filters, (3, 3), padding='same')(encoder)\n", "\n",
" encoder = layers.BatchNormalization()(encoder)\n", "encoder1_pool, encoder1 = encoder_block(encoder0_pool, 64)\n",
" encoder = layers.Activation('relu')(encoder)\n", "# 64\n",
" return encoder\n", "\n",
"\n", "encoder2_pool, encoder2 = encoder_block(encoder1_pool, 128)\n",
"def encoder_block(input_tensor, num_filters):\n", "# 32\n",
" encoder = conv_block(input_tensor, num_filters)\n", "\n",
" encoder_pool = layers.MaxPooling2D((2, 2), strides=(2, 2))(encoder)\n", "encoder3_pool, encoder3 = encoder_block(encoder2_pool, 256)\n",
" \n", "# 16\n",
" return encoder_pool, encoder\n", "\n",
"\n", "encoder4_pool, encoder4 = encoder_block(encoder3_pool, 512)\n",
"def decoder_block(input_tensor, concat_tensor, num_filters):\n", "# 8\n",
" decoder = layers.Conv2DTranspose(num_filters, (2, 2), strides=(2, 2), padding='same')(input_tensor)\n", "\n",
" decoder = layers.concatenate([concat_tensor, decoder], axis=-1)\n", "center = conv_block(encoder4_pool, 1024)\n",
" decoder = layers.BatchNormalization()(decoder)\n", "# center\n",
" decoder = layers.Activation('relu')(decoder)\n", "\n",
" decoder = layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)\n", "decoder4 = decoder_block(center, encoder4, 512)\n",
" decoder = layers.BatchNormalization()(decoder)\n", "# 16\n",
" decoder = layers.Activation('relu')(decoder)\n", "\n",
" decoder = layers.Conv2D(num_filters, (3, 3), padding='same')(decoder)\n", "decoder3 = decoder_block(decoder4, encoder3, 256)\n",
" decoder = layers.BatchNormalization()(decoder)\n", "# 32\n",
" decoder = layers.Activation('relu')(decoder)\n", "\n",
" return decoder" "decoder2 = decoder_block(decoder3, encoder2, 128)\n",
], "# 64\n",
"execution_count": 0, "\n",
"outputs": [] "decoder1 = decoder_block(decoder2, encoder1, 64)\n",
}, "# 128\n",
{ "\n",
"metadata": { "decoder0 = decoder_block(decoder1, encoder0, 32)\n",
"id": "xRLp21S_hpTn", "# 256\n",
"colab_type": "code", "\n",
"colab": {} "outputs = layers.Conv2D(1, (1, 1), activation='sigmoid')(decoder0)"
}, ]
"cell_type": "code", },
"source": [ {
"inputs = layers.Input(shape=img_shape)\n", "cell_type": "markdown",
"# 256\n", "metadata": {
"\n", "colab_type": "text",
"encoder0_pool, encoder0 = encoder_block(inputs, 32)\n", "id": "luDqDqu8c1AX"
"# 128\n", },
"\n", "source": [
"encoder1_pool, encoder1 = encoder_block(encoder0_pool, 64)\n", "## Define your model\n",
"# 64\n", "Using functional API, you must define your model by specifying the inputs and outputs associated with the model. "
"\n", ]
"encoder2_pool, encoder2 = encoder_block(encoder1_pool, 128)\n", },
"# 32\n", {
"\n", "cell_type": "code",
"encoder3_pool, encoder3 = encoder_block(encoder2_pool, 256)\n", "execution_count": 0,
"# 16\n", "metadata": {
"\n", "colab": {},
"encoder4_pool, encoder4 = encoder_block(encoder3_pool, 512)\n", "colab_type": "code",
"# 8\n", "id": "76QkTzXVczgc"
"\n", },
"center = conv_block(encoder4_pool, 1024)\n", "outputs": [],
"# center\n", "source": [
"\n", "model = models.Model(inputs=[inputs], outputs=[outputs])"
"decoder4 = decoder_block(center, encoder4, 512)\n", ]
"# 16\n", },
"\n", {
"decoder3 = decoder_block(decoder4, encoder3, 256)\n", "cell_type": "markdown",
"# 32\n", "metadata": {
"\n", "colab_type": "text",
"decoder2 = decoder_block(decoder3, encoder2, 128)\n", "id": "p0tNnmyOdtyr"
"# 64\n", },
"\n", "source": [
"decoder1 = decoder_block(decoder2, encoder1, 64)\n", "# Defining custom metrics and loss functions\n",
"# 128\n", "Defining loss and metric functions are simple with Keras. Simply define a function that takes both the True labels for a given example and the Predicted labels for the same given example. "
"\n", ]
"decoder0 = decoder_block(decoder1, encoder0, 32)\n", },
"# 256\n", {
"\n", "cell_type": "markdown",
"outputs = layers.Conv2D(1, (1, 1), activation='sigmoid')(decoder0)" "metadata": {
], "colab_type": "text",
"execution_count": 0, "id": "sfuBVut0fogM"
"outputs": [] },
}, "source": [
{ "Dice loss is a metric that measures overlap. More info on optimizing for Dice coefficient (our dice loss) can be found in the [paper](http://campar.in.tum.de/pub/milletari2016Vnet/milletari2016Vnet.pdf), where it was introduced. \n",
"metadata": { "\n",
"id": "luDqDqu8c1AX", "We use dice loss here because it performs better at class imbalanced problems by design. In addition, maximizing the dice coefficient and IoU metrics are the actual objectives and goals of our segmentation task. Using cross entropy is more of a proxy which is easier to maximize. Instead, we maximize our objective directly. "
"colab_type": "text" ]
}, },
"cell_type": "markdown", {
"source": [ "cell_type": "code",
"## Define your model\n", "execution_count": 0,
"Using functional API, you must define your model by specifying the inputs and outputs associated with the model. " "metadata": {
] "colab": {},
}, "colab_type": "code",
{ "id": "t_8_hbHECUAW"
"metadata": { },
"id": "76QkTzXVczgc", "outputs": [],
"colab_type": "code", "source": [
"colab": {} "def dice_coeff(y_true, y_pred):\n",
}, " smooth = 1.\n",
"cell_type": "code", " # Flatten\n",
"source": [ " y_true_f = tf.reshape(y_true, [-1])\n",
"model = models.Model(inputs=[inputs], outputs=[outputs])" " y_pred_f = tf.reshape(y_pred, [-1])\n",
], " intersection = tf.reduce_sum(y_true_f * y_pred_f)\n",
"execution_count": 0, " score = (2. * intersection + smooth) / (tf.reduce_sum(y_true_f) + tf.reduce_sum(y_pred_f) + smooth)\n",
"outputs": [] " return score"
}, ]
{ },
"metadata": { {
"id": "p0tNnmyOdtyr", "cell_type": "code",
"colab_type": "text" "execution_count": 0,
}, "metadata": {
"cell_type": "markdown", "colab": {},
"source": [ "colab_type": "code",
"# Defining custom metrics and loss functions\n", "id": "4DgINhlpNaxP"
"Defining loss and metric functions are simple with Keras. Simply define a function that takes both the True labels for a given example and the Predicted labels for the same given example. " },
] "outputs": [],
}, "source": [
{ "def dice_loss(y_true, y_pred):\n",
"metadata": { " loss = 1 - dice_coeff(y_true, y_pred)\n",
"id": "sfuBVut0fogM", " return loss"
"colab_type": "text" ]
}, },
"cell_type": "markdown", {
"source": [ "cell_type": "markdown",
"Dice loss is a metric that measures overlap. More info on optimizing for Dice coefficient (our dice loss) can be found in the [paper](http://campar.in.tum.de/pub/milletari2016Vnet/milletari2016Vnet.pdf), where it was introduced. \n", "metadata": {
"\n", "colab_type": "text",
"We use dice loss here because it performs better at class imbalanced problems by design. In addition, maximizing the dice coefficient and IoU metrics are the actual objectives and goals of our segmentation task. Using cross entropy is more of a proxy which is easier to maximize. Instead, we maximize our objective directly. " "id": "qqClGNFJdANU"
] },
}, "source": [
{ "Here, we'll use a specialized loss function that combines binary cross entropy and our dice loss. This is based on [individuals who competed within this competition obtaining better results empirically](https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40199). Try out your own custom losses to measure performance (e.g. bce + log(dice_loss), only bce, etc.)!"
"metadata": { ]
"id": "t_8_hbHECUAW", },
"colab_type": "code", {
"colab": {} "cell_type": "code",
}, "execution_count": 0,
"cell_type": "code", "metadata": {
"source": [ "colab": {},
"def dice_coeff(y_true, y_pred):\n", "colab_type": "code",
" smooth = 1.\n", "id": "udrfi9JGB-bL"
" # Flatten\n", },
" y_true_f = tf.reshape(y_true, [-1])\n", "outputs": [],
" y_pred_f = tf.reshape(y_pred, [-1])\n", "source": [
" intersection = tf.reduce_sum(y_true_f * y_pred_f)\n", "def bce_dice_loss(y_true, y_pred):\n",
" score = (2. * intersection + smooth) / (tf.reduce_sum(y_true_f) + tf.reduce_sum(y_pred_f) + smooth)\n", " loss = losses.binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)\n",
" return score" " return loss"
], ]
"execution_count": 0, },
"outputs": [] {
}, "cell_type": "markdown",
{ "metadata": {
"metadata": { "colab_type": "text",
"id": "4DgINhlpNaxP", "id": "LifmpjXNc9Gz"
"colab_type": "code", },
"colab": {} "source": [
}, "## Compile your model\n",
"cell_type": "code", "We use our custom loss function to minimize. In addition, we specify what metrics we want to keep track of as we train. Note that metrics are not actually used during the training process to tune the parameters, but are instead used to measure performance of the training process. "
"source": [ ]
"def dice_loss(y_true, y_pred):\n", },
" loss = 1 - dice_coeff(y_true, y_pred)\n", {
" return loss" "cell_type": "code",
], "execution_count": 0,
"execution_count": 0, "metadata": {
"outputs": [] "colab": {},
}, "colab_type": "code",
{ "id": "gflcWk2Cc8Bi"
"metadata": { },
"id": "qqClGNFJdANU", "outputs": [],
"colab_type": "text" "source": [
}, "model.compile(optimizer='adam', loss=bce_dice_loss, metrics=[dice_loss])\n",
"cell_type": "markdown", "\n",
"source": [ "model.summary()"
"Here, we'll use a specialized loss function that combines binary cross entropy and our dice loss. This is based on [individuals who competed within this competition obtaining better results empirically](https://www.kaggle.com/c/carvana-image-masking-challenge/discussion/40199). " ]
] },
}, {
{ "cell_type": "markdown",
"metadata": { "metadata": {
"id": "udrfi9JGB-bL", "colab_type": "text",
"colab_type": "code", "id": "8WG_8iZ_dMbK"
"colab": {} },
}, "source": [
"cell_type": "code", "## Train your model\n",
"source": [ "Training your model with `tf.data` involves simply providing the model's `fit` function with your training/validation dataset, the number of steps, and epochs. \n",
"def bce_dice_loss(y_true, y_pred):\n", "\n",
" loss = losses.binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)\n", "We also include a Model callback, [`ModelCheckpoint`](https://keras.io/callbacks/#modelcheckpoint) that will save the model to disk after each epoch. We configure it such that it only saves our highest performing model. Note that saving the model capture more than just the weights of the model: by default, it saves the model architecture, weights, as well as information about the training process such as the state of the optimizer, etc."
" return loss" ]
], },
"execution_count": 0, {
"outputs": [] "cell_type": "code",
}, "execution_count": 0,
{ "metadata": {
"metadata": { "colab": {},
"id": "LifmpjXNc9Gz", "colab_type": "code",
"colab_type": "text" "id": "1nHnj6199elZ"
}, },
"cell_type": "markdown", "outputs": [],
"source": [ "source": [
"## Compile your model\n", "save_model_path = '/tmp/weights.hdf5'\n",
"We use our custom loss function to minimize. In addition, we specify what metrics we want to keep track of as we train. Note that metrics are not actually used during the training process to tune the parameters, but are instead used to measure performance of the training process. " "cp = tf.keras.callbacks.ModelCheckpoint(filepath=save_model_path, monitor='val_dice_loss', mode='max', save_best_only=True)"
] ]
}, },
{ {
"metadata": { "cell_type": "markdown",
"id": "gflcWk2Cc8Bi", "metadata": {
"colab_type": "code", "colab_type": "text",
"colab": {} "id": "vJP_EvuTb4hH"
}, },
"cell_type": "code", "source": [
"source": [ "Don't forget to specify our model callback in the `fit` function call. "
"model.compile(optimizer='adam', loss=bce_dice_loss, metrics=[dice_loss])\n", ]
"\n", },
"model.summary()" {
], "cell_type": "code",
"execution_count": 0, "execution_count": 0,
"outputs": [] "metadata": {
}, "colab": {},
{ "colab_type": "code",
"metadata": { "id": "UMZcOrq5aaj1"
"id": "8WG_8iZ_dMbK", },
"colab_type": "text" "outputs": [],
}, "source": [
"cell_type": "markdown", "history = model.fit(train_ds, \n",
"source": [ " steps_per_epoch=int(np.ceil(num_train_examples / float(batch_size))),\n",
"## Train your model\n", " epochs=epochs,\n",
"Training your model with `tf.data` involves simply providing the model's `fit` function with your training/validation dataset, the number of steps, and epochs. \n", " validation_data=val_ds,\n",
"\n", " validation_steps=int(np.ceil(num_val_examples / float(batch_size))),\n",
"We also include a Model callback, [`ModelCheckpoint`](https://keras.io/callbacks/#modelcheckpoint) that will save the model to disk after each epoch. We configure it such that it only saves our highest performing model. Note that saving the model capture more than just the weights of the model: by default, it saves the model architecture, weights, as well as information about the training process such as the state of the optimizer, etc." " callbacks=[cp])"
] ]
}, },
{ {
"metadata": { "cell_type": "markdown",
"id": "1nHnj6199elZ", "metadata": {
"colab_type": "code", "colab_type": "text",
"colab": {} "id": "gCAUsoxfTTrh"
}, },
"cell_type": "code", "source": [
"source": [ "# Visualize training process"
"save_model_path = '/tmp/weights.hdf5'\n", ]
"cp = tf.keras.callbacks.ModelCheckpoint(filepath=save_model_path, monitor='val_dice_loss', mode='max', save_best_only=True)" },
], {
"execution_count": 0, "cell_type": "code",
"outputs": [] "execution_count": 0,
}, "metadata": {
{ "colab": {},
"metadata": { "colab_type": "code",
"id": "vJP_EvuTb4hH", "id": "AvntxymYn8rM"
"colab_type": "text" },
}, "outputs": [],
"cell_type": "markdown", "source": [
"source": [ "dice = = history.history['dice_loss']\n",
"Don't forget to specify our model callback in the `fit` function call. " "val_dice = history.history['val_dice_loss']\n",
] "\n",
}, "loss = history.history['loss']\n",
{ "val_loss = history.history['val_loss']\n",
"metadata": { "\n",
"id": "UMZcOrq5aaj1", "epochs_range = range(epochs)\n",
"colab_type": "code", "\n",
"colab": {} "plt.figure(figsize=(16, 8))\n",
}, "plt.subplot(1, 2, 1)\n",
"cell_type": "code", "plt.plot(epochs_range, dice, label='Training Dice Loss')\n",
"source": [ "plt.plot(epochs_range, val_dice, label='Validation Dice Loss')\n",
"history = model.fit(train_ds, \n", "plt.legend(loc='upper right')\n",
" steps_per_epoch=int(np.ceil(num_train_examples / float(batch_size))),\n", "plt.title('Training and Validation Dice Loss')\n",
" epochs=epochs,\n", "\n",
" validation_data=val_ds,\n", "plt.subplot(1, 2, 2)\n",
" validation_steps=int(np.ceil(num_val_examples / float(batch_size))),\n", "plt.plot(epochs_range, loss, label='Training Loss')\n",
" callbacks=[cp])" "plt.plot(epochs_range, val_loss, label='Validation Loss')\n",
], "plt.legend(loc='upper right')\n",
"execution_count": 0, "plt.title('Training and Validation Loss')\n",
"outputs": [] "\n",
}, "plt.show()"
{ ]
"metadata": { },
"id": "gCAUsoxfTTrh", {
"colab_type": "text" "cell_type": "markdown",
}, "metadata": {
"cell_type": "markdown", "colab_type": "text",
"source": [ "id": "dWPhb87GdhkG"
"# Visualize training process" },
] "source": [
}, "Even with only 5 epochs, we see strong performance."
{ ]
"metadata": { },
"id": "AvntxymYn8rM", {
"colab_type": "code", "cell_type": "markdown",
"colab": {} "metadata": {
}, "colab_type": "text",
"cell_type": "code", "id": "MGFKf8yCTYbw"
"source": [ },
"dice = = history.history['dice_loss']\n", "source": [
"val_dice = history.history['val_dice_loss']\n", "# Visualize actual performance \n",
"\n", "We'll visualize our performance on the validation set.\n",
"loss = history.history['loss']\n", "\n",
"val_loss = history.history['val_loss']\n", "Note that in an actual setting (competition, deployment, etc.) we'd evaluate on the test set with the full image resolution. "
"\n", ]
"epochs_range = range(epochs)\n", },
"\n", {
"plt.figure(figsize=(16, 8))\n", "cell_type": "markdown",
"plt.subplot(1, 2, 1)\n", "metadata": {
"plt.plot(epochs_range, dice, label='Training Dice Loss')\n", "colab_type": "text",
"plt.plot(epochs_range, val_dice, label='Validation Dice Loss')\n", "id": "oIddsUcM_KeI"
"plt.legend(loc='upper right')\n", },
"plt.title('Training and Validation Dice Loss')\n", "source": [
"\n", "To load our model we have two options:\n",
"plt.subplot(1, 2, 2)\n", "1. Since our model architecture is already in memory, we can simply call `load_weights(save_model_path)`\n",
"plt.plot(epochs_range, loss, label='Training Loss')\n", "2. If you wanted to load the model from scratch (in a different setting without already having the model architecture in memory) we simply call \n",
"plt.plot(epochs_range, val_loss, label='Validation Loss')\n", "\n",
"plt.legend(loc='upper right')\n", "```model = models.load_model(save_model_path, custom_objects={'bce_dice_loss': bce_dice_loss, 'mean_iou': mean_iou,'dice_coeff': dice_coeff})```, specificing the necessary custom objects, loss and metrics, that we used to train our model. \n",
"plt.title('Training and Validation Loss')\n", "\n",
"\n", "If you want to see more examples, check our the [keras guide](https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model)!"
"plt.show()" ]
], },
"execution_count": 0, {
"outputs": [] "cell_type": "code",
}, "execution_count": 0,
{ "metadata": {
"metadata": { "colab": {},
"id": "dWPhb87GdhkG", "colab_type": "code",
"colab_type": "text" "id": "5Ph7acmrCXm6"
}, },
"cell_type": "markdown", "outputs": [],
"source": [ "source": [
"Even with only 5 epochs, we see strong performance." "# Alternatively, load the weights directly: model.load_weights(save_model_path)\n",
] "model = models.load_model(save_model_path, custom_objects={'bce_dice_loss': bce_dice_loss,\n",
}, " 'dice_coeff': dice_coeff})"
{ ]
"metadata": { },
"id": "MGFKf8yCTYbw", {
"colab_type": "text" "cell_type": "code",
}, "execution_count": 0,
"cell_type": "markdown", "metadata": {
"source": [ "colab": {},
"# Visualize actual performance \n", "colab_type": "code",
"We'll visualize our performance on the validation set.\n", "id": "0GnwZ7CPaamI"
"\n", },
"Note that in an actual setting (competition, deployment, etc.) we'd evaluate on the test set with the full image resolution. " "outputs": [],
] "source": [
}, "# Let's visualize some of the outputs \n",
{ "data_aug_iter = val_ds.make_one_shot_iterator()\n",
"metadata": { "next_element = data_aug_iter.get_next()\n",
"id": "oIddsUcM_KeI", "\n",
"colab_type": "text" "# Running next element in our graph will produce a batch of images\n",
}, "plt.figure(figsize=(10, 20))\n",
"cell_type": "markdown", "for i in range(5):\n",
"source": [ " batch_of_imgs, label = tf.keras.backend.get_session().run(next_element)\n",
"To load our model we have two options:\n", " img = batch_of_imgs[0]\n",
"1. Since our model architecture is already in memory, we can simply call `load_weights(save_model_path)`\n", " predicted_label = model.predict(batch_of_imgs)[0]\n",
"2. If you wanted to load the model from scratch (in a different setting without already having the model architecture in memory) we simply call \n", "\n",
"\n", " plt.subplot(5, 3, 3 * i + 1)\n",
"```model = models.load_model(save_model_path, custom_objects={'bce_dice_loss': bce_dice_loss, 'mean_iou': mean_iou,'dice_coeff': dice_coeff})```, specificing the necessary custom objects, loss and metrics, that we used to train our model. \n", " plt.imshow(img)\n",
"\n", " plt.title(\"Input image\")\n",
"If you want to see more examples, check our the [keras guide](https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model)!" " \n",
] " plt.subplot(5, 3, 3 * i + 2)\n",
}, " plt.imshow(label[0, :, :, 0])\n",
{ " plt.title(\"Actual Mask\")\n",
"metadata": { " plt.subplot(5, 3, 3 * i + 3)\n",
"id": "5Ph7acmrCXm6", " plt.imshow(predicted_label[:, :, 0])\n",
"colab_type": "code", " plt.title(\"Predicted Mask\")\n",
"colab": {} "plt.suptitle(\"Examples of Input Image, Label, and Prediction\")\n",
}, "plt.show()"
"cell_type": "code", ]
"source": [ },
"# Alternatively, load the weights directly: model.load_weights(save_model_path)\n", {
"model = models.load_model(save_model_path, custom_objects={'bce_dice_loss': bce_dice_loss,\n", "cell_type": "markdown",
" 'dice_coeff': dice_coeff})" "metadata": {
], "colab_type": "text",
"execution_count": 0, "id": "iPV7RMA9TjPC"
"outputs": [] },
}, "source": [
{ "# Key Takeaways\n",
"metadata": { "In this tutorial we learned how to train a network to automatically detect and create cutouts of cars from images! \n",
"id": "0GnwZ7CPaamI", "\n",
"colab_type": "code", "## Specific concepts that will we covered:\n",
"colab": {} "In the process, we hopefully built some practical experience and developed intuition around the following concepts\n",
}, "* [**Functional API**](https://keras.io/getting-started/functional-api-guide/) - we implemented UNet with the Functional API. Functional API gives a lego-like API that allows us to build pretty much any network. \n",
"cell_type": "code", "* **Custom Losses and Metrics** - We implemented custom metrics that allow us to see exactly what we need during training time. In addition, we wrote a custom loss function that is specifically suited to our task. \n",
"source": [ "* **Save and load our model** - We saved our best model that we encountered according to our specified metric. When we wanted to perform inference with out best model, we loaded it from disk. Note that saving the model capture more than just the weights of the model: by default, it saves the model architecture, weights, as well as information about the training process such as the state of the optimizer, etc. "
"# Let's visualize some of the outputs \n", ]
"data_aug_iter = val_ds.make_one_shot_iterator()\n", }
"next_element = data_aug_iter.get_next()\n", ],
"\n", "metadata": {
"# Running next element in our graph will produce a batch of images\n", "accelerator": "GPU",
"plt.figure(figsize=(10, 20))\n", "colab": {
"for i in range(5):\n", "collapsed_sections": [],
" batch_of_imgs, label = tf.keras.backend.get_session().run(next_element)\n", "name": "Image Segmentation",
" img = batch_of_imgs[0]\n", "private_outputs": true,
" predicted_label = model.predict(batch_of_imgs)[0]\n", "provenance": [],
"\n", "version": "0.3.2"
" plt.subplot(5, 3, 3 * i + 1)\n", },
" plt.imshow(img)\n", "kernelspec": {
" plt.title(\"Input image\")\n", "display_name": "Python [default]",
" \n", "language": "python",
" plt.subplot(5, 3, 3 * i + 2)\n", "name": "python3"
" plt.imshow(label[0, :, :, 0])\n", },
" plt.title(\"Actual Mask\")\n", "language_info": {
" plt.subplot(5, 3, 3 * i + 3)\n", "codemirror_mode": {
" plt.imshow(predicted_label[:, :, 0])\n", "name": "ipython",
" plt.title(\"Predicted Mask\")\n", "version": 3
"plt.save\n", },
"plt.suptitle(\"Examples of Input Image, Label, and Prediction\")\n", "file_extension": ".py",
"plt.show()" "mimetype": "text/x-python",
], "name": "python",
"execution_count": 0, "nbconvert_exporter": "python",
"outputs": [] "pygments_lexer": "ipython3",
}, "version": "3.6.4"
{ }
"metadata": { },
"id": "iPV7RMA9TjPC", "nbformat": 4,
"colab_type": "text" "nbformat_minor": 1
}, }
"cell_type": "markdown",
"source": [
"# Key Takeaways\n",
"In this tutorial we learned how to train a network to automatically detect and create cutouts of cars from images! \n",
"\n",
"## Specific concepts that will we covered:\n",
"In the process, we hopefully built some practical experience and developed intuition around the following concepts\n",
"* [**Functional API**](https://keras.io/getting-started/functional-api-guide/) - we implemented UNet with the Functional API. Functional API gives a lego-like API that allows us to build pretty much any network. \n",
"* **Custom Losses and Metrics** - We implemented custom metrics that allow us to see exactly what we need during training time. In addition, we wrote a custom loss function that is specifically suited to our task. \n",
"* **Save and load our model** - We saved our best model that we encountered according to our specified metric. When we wanted to perform inference with out best model, we loaded it from disk. Note that saving the model capture more than just the weights of the model: by default, it saves the model architecture, weights, as well as information about the training process such as the state of the optimizer, etc. "
]
}
]
}
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment