Commit 26290cab authored by Mark Daoust's avatar Mark Daoust
Browse files

Stub moved notebooks.

These have all moved to https://github.com/tensorflow/docs/tree/master/site/en
parent b4cd5f5c
...@@ -5,8 +5,6 @@ ...@@ -5,8 +5,6 @@
"colab": { "colab": {
"name": "_index.ipynb", "name": "_index.ipynb",
"version": "0.3.2", "version": "0.3.2",
"views": {},
"default_view": {},
"provenance": [] "provenance": []
} }
}, },
...@@ -27,12 +25,7 @@ ...@@ -27,12 +25,7 @@
"metadata": { "metadata": {
"id": "BZSlp3DAjdYf", "id": "BZSlp3DAjdYf",
"colab_type": "code", "colab_type": "code",
"colab": { "colab": {},
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"cellView": "form" "cellView": "form"
}, },
"cell_type": "code", "cell_type": "code",
...@@ -64,159 +57,26 @@ ...@@ -64,159 +57,26 @@
}, },
{ {
"metadata": { "metadata": {
"id": "DUNzJc4jTj6G", "id": "AMrQVn--Aj1j",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"<table class=\"tfo-notebook-buttons\" align=\"left\"><td>\n",
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/samples/core/get_started/_index.ipynb\">\n",
" <img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /><span>Run in Google Colab</span></a> \n",
"</td><td>\n",
"<a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/samples/core/get_started/_index.ipynb\"><img width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /><span>View source on GitHub</span></a></td></table>"
]
},
{
"metadata": {
"id": "hiH7AC-NTniF",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This is a [Google Colaboratory](https://colab.research.google.com/notebooks/welcome.ipynb) notebook file. Python programs are run directly in the browser—a great way to learn and use TensorFlow. To run the Colab notebook:\n",
"\n",
"1. Connect to a Python runtime: At the top-right of the menu bar, select *CONNECT*.\n",
"2. Run all the notebook code cells: Select *Runtime* > *Run all*.\n",
"\n",
"For more examples and guides (including details for this program), see [Get Started with TensorFlow](https://www.tensorflow.org/get_started/).\n",
"\n",
"Let's get started, import the TensorFlow library into your program:"
]
},
{
"metadata": {
"id": "0trJmd6DjqBZ",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"import tensorflow as tf"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "7NAbSZiaoJ4z",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Load and prepare the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset. Convert the samples from integers to floating-point numbers:"
]
},
{
"metadata": {
"id": "7FP5258xjs-v",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"mnist = tf.keras.datasets.mnist\n",
"\n",
"(x_train, y_train), (x_test, y_test) = mnist.load_data()\n",
"x_train, x_test = x_train / 255.0, x_test / 255.0"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "BPZ68wASog_I",
"colab_type": "text" "colab_type": "text"
}, },
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"Build the `tf.keras` model by stacking layers. Select an optimizer and loss function used for training:" "This file has moved."
] ]
}, },
{ {
"metadata": { "metadata": {
"id": "h3IKyzTCDNGo", "id": "DUNzJc4jTj6G",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"model = tf.keras.models.Sequential([\n",
" tf.keras.layers.Flatten(),\n",
" tf.keras.layers.Dense(512, activation=tf.nn.relu),\n",
" tf.keras.layers.Dropout(0.2),\n",
" tf.keras.layers.Dense(10, activation=tf.nn.softmax)\n",
"])\n",
"\n",
"model.compile(optimizer='adam',\n",
" loss='sparse_categorical_crossentropy',\n",
" metrics=['accuracy'])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "ix4mEL65on-w",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Train and evaluate model:"
]
},
{
"metadata": {
"id": "F7dTAzgHDUh7",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"model.fit(x_train, y_train, epochs=5)\n",
"\n",
"model.evaluate(x_test, y_test)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "T4JfEh7kvx6m",
"colab_type": "text" "colab_type": "text"
}, },
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"You’ve now trained an image classifier with ~98% accuracy on this dataset. See [Get Started with TensorFlow](https://www.tensorflow.org/get_started/) to learn more." "<table class=\"tfo-notebook-buttons\" align=\"left\"><td>\n",
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/_index.ipynb\">\n",
" <img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /><span>Run in Google Colab</span></a> \n",
"</td><td>\n",
"<a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/tutorials/_index.ipynb\"><img width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /><span>View source on GitHub</span></a></td></table>"
] ]
} }
] ]
......
...@@ -5,8 +5,6 @@ ...@@ -5,8 +5,6 @@
"colab": { "colab": {
"name": "Custom training: walkthrough", "name": "Custom training: walkthrough",
"version": "0.3.2", "version": "0.3.2",
"views": {},
"default_view": {},
"provenance": [] "provenance": []
} }
}, },
...@@ -27,12 +25,7 @@ ...@@ -27,12 +25,7 @@
"metadata": { "metadata": {
"id": "BZSlp3DAjdYf", "id": "BZSlp3DAjdYf",
"colab_type": "code", "colab_type": "code",
"colab": { "colab": {},
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"cellView": "form" "cellView": "form"
}, },
"cell_type": "code", "cell_type": "code",
...@@ -70,10 +63,10 @@ ...@@ -70,10 +63,10 @@
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"<table class=\"tfo-notebook-buttons\" align=\"left\"><td>\n", "<table class=\"tfo-notebook-buttons\" align=\"left\"><td>\n",
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/samples/core/tutorials/eager/custom_training_walkthrough.ipynb\">\n", "<a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/eager/custom_training_walkthrough.ipynb\">\n",
" <img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /><span>Run in Google Colab</span></a> \n", " <img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /><span>Run in Google Colab</span></a> \n",
"</td><td>\n", "</td><td>\n",
"<a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/samples/core/tutorials/eager/custom_training_walkthrough.ipynb\"><img width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /><span>View source on GitHub</span></a></td></table>" "<a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/tutorials/eager/custom_training_walkthrough.ipynb\"><img width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /><span>View source on GitHub</span></a></td></table>"
] ]
}, },
{ {
...@@ -87,4 +80,4 @@ ...@@ -87,4 +80,4 @@
] ]
} }
] ]
} }
\ No newline at end of file
...@@ -72,1016 +72,16 @@ ...@@ -72,1016 +72,16 @@
"source": [ "source": [
"<table class=\"tfo-notebook-buttons\" align=\"left\">\n", "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://www.tensorflow.org/versions/master/guide/autograph\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n", " <a target=\"_blank\" href=\"https://www.tensorflow.org/guide/autograph\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/samples/core/guide/autograph.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", " <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/autograph.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/samples/core/guide/autograph.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n", " <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/guide/autograph.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n", " </td>\n",
"</table>" "</table>"
] ]
},
{
"metadata": {
"id": "CydFK2CL7ZHA",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"[AutoGraph](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/autograph/) helps you write complicated graph code using normal Python. Behind the scenes, AutoGraph automatically transforms your code into the equivalent [TensorFlow graph code](https://www.tensorflow.org/guide/graphs). AutoGraph already supports much of the Python language, and that coverage continues to grow. For a list of supported Python language features, see the [Autograph capabilities and limitations](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/autograph/LIMITATIONS.md)."
]
},
{
"metadata": {
"id": "n4EKOpw9mObL",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Setup\n",
"\n",
"To use AutoGraph, install the latest version of TensorFlow:"
]
},
{
"metadata": {
"id": "RSez0n7Ptcvb",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"! pip install -U tf-nightly"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "qLp9VZfit9oR",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Import TensorFlow, AutoGraph, and any supporting modules:"
]
},
{
"metadata": {
"id": "mT7meGqrZTz9",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"from __future__ import division, print_function, absolute_import\n",
"\n",
"import tensorflow as tf\n",
"import tensorflow.keras.layers as layers\n",
"from tensorflow.contrib import autograph\n",
"\n",
"\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Hh1PajmUJMNp",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"We'll enable [eager execution](https://www.tensorflow.org/guide/eager) for demonstration purposes, but AutoGraph works in both eager and [graph execution](https://www.tensorflow.org/guide/graphs) environments:"
]
},
{
"metadata": {
"id": "ks_hiqcSJNvg",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"tf.enable_eager_execution()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "WR4lG3hsuWQT",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Note: AutoGraph converted code is designed to run during graph execution. When eager exectuon is enabled, use explicit graphs (as this example shows) or `tf.contrib.eager.defun`."
]
},
{
"metadata": {
"id": "ohbSnA79mcJV",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Automatically convert Python control flow\n",
"\n",
"AutoGraph will convert much of the Python language into the equivalent TensorFlow graph building code. \n",
"\n",
"Note: In real applications batching is essential for performance. The best code to convert to AutoGraph is code where the control flow is decided at the _batch_ level. If making decisions at the individual _example_ level, you must index and batch the examples to maintain performance while applying the control flow logic. \n",
"\n",
"AutoGraph converts a function like:"
]
},
{
"metadata": {
"id": "aA3gOodCBkOw",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"def square_if_positive(x):\n",
" if x > 0:\n",
" x = x * x\n",
" else:\n",
" x = 0.0\n",
" return x"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "LICw4XQFZrhH",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"To a function that uses graph building:"
]
},
{
"metadata": {
"id": "_EMhGUjRZoKQ",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"print(autograph.to_code(square_if_positive))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "xpK0m4TCvkJq",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Code written for eager execution can run in a `tf.Graph` with the same results, but with the benfits of graph execution:"
]
},
{
"metadata": {
"id": "I1RtBvoKBxq5",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"print('Eager results: %2.2f, %2.2f' % (square_if_positive(tf.constant(9.0)), \n",
" square_if_positive(tf.constant(-9.0))))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Fpk3MxVVv5gn",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Generate a graph-version and call it:"
]
},
{
"metadata": {
"id": "SGjSq0WQvwGs",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"tf_square_if_positive = autograph.to_graph(square_if_positive)\n",
"\n",
"with tf.Graph().as_default(): \n",
" # The result works like a regular op: takes tensors in, returns tensors.\n",
" # You can inspect the graph using tf.get_default_graph().as_graph_def()\n",
" g_out1 = tf_square_if_positive(tf.constant( 9.0))\n",
" g_out2 = tf_square_if_positive(tf.constant(-9.0))\n",
" with tf.Session() as sess:\n",
" print('Graph results: %2.2f, %2.2f\\n' % (sess.run(g_out1), sess.run(g_out2)))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "m-jWmsCmByyw",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"AutoGraph supports common Python statements like `while`, `for`, `if`, `break`, and `return`, with support for nesting. Compare this function with the complicated graph verson displayed in the following code blocks:"
]
},
{
"metadata": {
"id": "toxKBOXbB1ro",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"# Continue in a loop\n",
"def sum_even(items):\n",
" s = 0\n",
" for c in items:\n",
" if c % 2 > 0:\n",
" continue\n",
" s += c\n",
" return s\n",
"\n",
"print('Eager result: %d' % sum_even(tf.constant([10,12,15,20])))\n",
"\n",
"tf_sum_even = autograph.to_graph(sum_even)\n",
"\n",
"with tf.Graph().as_default(), tf.Session() as sess:\n",
" print('Graph result: %d\\n\\n' % sess.run(tf_sum_even(tf.constant([10,12,15,20]))))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "jlyQgxYsYSXr",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"print(autograph.to_code(sum_even))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "FUJJ-WTdCGeq",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Decorator\n",
"\n",
"If you don't need easy access to the original Python function, use the `convert` decorator:"
]
},
{
"metadata": {
"id": "BKhFNXDic4Mw",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"@autograph.convert()\n",
"def fizzbuzz(i, n):\n",
" while i < n:\n",
" msg = ''\n",
" if i % 3 == 0:\n",
" msg += 'Fizz'\n",
" if i % 5 == 0:\n",
" msg += 'Buzz'\n",
" if msg == '':\n",
" msg = tf.as_string(i)\n",
" print(msg)\n",
" i += 1\n",
" return i\n",
"\n",
"with tf.Graph().as_default():\n",
" final_i = fizzbuzz(tf.constant(10), tf.constant(16))\n",
" # The result works like a regular op: takes tensors in, returns tensors.\n",
" # You can inspect the graph using tf.get_default_graph().as_graph_def()\n",
" with tf.Session() as sess:\n",
" sess.run(final_i)\n",
"\n"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "-pkEH6OecW7h",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Examples\n",
"\n",
"Let's demonstrate some useful Python language features.\n"
]
},
{
"metadata": {
"id": "axoRAkWi0CQG",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Assert\n",
"\n",
"AutoGraph automatically converts the Python `assert` statement into the equivalent `tf.Assert` code:"
]
},
{
"metadata": {
"id": "IAOgh62zCPZ4",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"@autograph.convert()\n",
"def inverse(x):\n",
" assert x != 0.0, 'Do not pass zero!'\n",
" return 1.0 / x\n",
"\n",
"with tf.Graph().as_default(), tf.Session() as sess:\n",
" try:\n",
" print(sess.run(inverse(tf.constant(0.0))))\n",
" except tf.errors.InvalidArgumentError as e:\n",
" print('Got error message:\\n %s' % e.message)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "KRu8iIPBCQr5",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Print\n",
"\n",
"Use the Python `print` function in-graph:"
]
},
{
"metadata": {
"id": "ehBac9rUR6nh",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"@autograph.convert()\n",
"def count(n):\n",
" i=0\n",
" while i < n:\n",
" print(i)\n",
" i += 1\n",
" return n\n",
" \n",
"with tf.Graph().as_default(), tf.Session() as sess:\n",
" sess.run(count(tf.constant(5)))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "mtpegD_YR6HK",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Lists\n",
"\n",
"Append to lists in loops (tensor list ops are automatically created):"
]
},
{
"metadata": {
"id": "ABX070KwCczR",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"@autograph.convert()\n",
"def arange(n):\n",
" z = []\n",
" # We ask you to tell us the element dtype of the list\n",
" autograph.set_element_type(z, tf.int32)\n",
" \n",
" for i in range(n):\n",
" z.append(i)\n",
" # when you're done with the list, stack it\n",
" # (this is just like np.stack)\n",
" return autograph.stack(z) \n",
"\n",
"\n",
"with tf.Graph().as_default(), tf.Session() as sess:\n",
" sess.run(arange(tf.constant(10)))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "qj7am2I_xvTJ",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Nested control flow"
]
},
{
"metadata": {
"id": "4yyNOf-Twr6s",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"@autograph.convert()\n",
"def nearest_odd_square(x):\n",
" if x > 0:\n",
" x = x * x\n",
" if x % 2 == 0:\n",
" x = x + 1\n",
" return x\n",
"\n",
"with tf.Graph().as_default(): \n",
" with tf.Session() as sess:\n",
" print(sess.run(nearest_odd_square(tf.constant(4))))\n",
" print(sess.run(nearest_odd_square(tf.constant(5))))\n",
" print(sess.run(nearest_odd_square(tf.constant(6))))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "jXAxjeBr1qWK",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### While loop"
]
},
{
"metadata": {
"id": "ucmZyQVL03bF",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"@autograph.convert()\n",
"def square_until_stop(x, y):\n",
" while x < y:\n",
" x = x * x\n",
" return x\n",
" \n",
"with tf.Graph().as_default(): \n",
" with tf.Session() as sess:\n",
" print(sess.run(square_until_stop(tf.constant(4), tf.constant(100))))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "3N1mz7sNY87N",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### For loop"
]
},
{
"metadata": {
"id": "CFk2fszrY8af",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"@autograph.convert()\n",
"def squares(nums):\n",
"\n",
" result = []\n",
" autograph.set_element_type(result, tf.int64)\n",
"\n",
" for num in nums: \n",
" result.append(num * num)\n",
" \n",
" return autograph.stack(result)\n",
" \n",
"with tf.Graph().as_default(): \n",
" with tf.Session() as sess:\n",
" print(sess.run(squares(tf.constant(np.arange(10)))))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "FXB0Zbwl13PY",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Break"
]
},
{
"metadata": {
"id": "1sjaFcL717Ig",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"@autograph.convert()\n",
"def argwhere_cumsum(x, threshold):\n",
" current_sum = 0.0\n",
" idx = 0\n",
" for i in range(len(x)):\n",
" idx = i\n",
" if current_sum >= threshold:\n",
" break\n",
" current_sum += x[i]\n",
" return idx\n",
"\n",
"N = 10\n",
"with tf.Graph().as_default(): \n",
" with tf.Session() as sess:\n",
" idx = argwhere_cumsum(tf.ones(N), tf.constant(float(N/2)))\n",
" print(sess.run(idx))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "XY4UspHmZNdL",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Interoperation with `tf.Keras`\n",
"\n",
"Now that you've seen the basics, let's build some model components with autograph.\n",
"\n",
"It's relatively simple to integrate `autograph` with `tf.keras`. \n",
"\n",
"\n",
"### Stateless functions\n",
"\n",
"For stateless functions, like `collatz` shown below, the easiest way to include them in a keras model is to wrap them up as a layer uisng `tf.keras.layers.Lambda`."
]
},
{
"metadata": {
"id": "ChZh3q-zcF6C",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"import numpy as np\n",
"\n",
"@autograph.convert()\n",
"def collatz(x):\n",
" x = tf.reshape(x,())\n",
" assert x > 0\n",
" n = tf.convert_to_tensor((0,)) \n",
" while not tf.equal(x, 1):\n",
" n += 1\n",
" if tf.equal(x%2, 0):\n",
" x = x // 2\n",
" else:\n",
" x = 3 * x + 1\n",
" \n",
" return n\n",
"\n",
"with tf.Graph().as_default():\n",
" model = tf.keras.Sequential([\n",
" tf.keras.layers.Lambda(collatz, input_shape=(1,), output_shape=())\n",
" ])\n",
" \n",
"result = model.predict(np.array([6171]))\n",
"result"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "k9LEoa3ud9hA",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Custom Layers and Models\n",
"\n",
"<!--TODO(markdaoust) link to full examples or these referenced models.-->\n",
"\n",
"The easiest way to use AutoGraph with Keras layers and models is to `@autograph.convert()` the `call` method. See the [TensorFlow Keras guide](https://tensorflow.org/guide/keras#build_advanced_models) for details on how to build on these classes. \n",
"\n",
"Here is a simple example of the [stocastic network depth](https://arxiv.org/abs/1603.09382) technique :"
]
},
{
"metadata": {
"id": "DJi_RJkeeOju",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"# `K` is used to check if we're in train or test mode.\n",
"import tensorflow.keras.backend as K\n",
"\n",
"class StocasticNetworkDepth(tf.keras.Sequential):\n",
" def __init__(self, pfirst=1.0, plast=0.5, *args,**kwargs):\n",
" self.pfirst = pfirst\n",
" self.plast = plast\n",
" super().__init__(*args,**kwargs)\n",
" \n",
" def build(self,input_shape):\n",
" super().build(input_shape.as_list())\n",
" self.depth = len(self.layers)\n",
" self.plims = np.linspace(self.pfirst, self.plast, self.depth + 1)[:-1]\n",
" \n",
" @autograph.convert()\n",
" def call(self, inputs):\n",
" training = tf.cast(K.learning_phase(), dtype=bool) \n",
" if not training: \n",
" count = self.depth\n",
" return super(StocasticNetworkDepth, self).call(inputs), count\n",
" \n",
" p = tf.random_uniform((self.depth,))\n",
" \n",
" keeps = (p <= self.plims)\n",
" x = inputs\n",
" \n",
" count = tf.reduce_sum(tf.cast(keeps, tf.int32))\n",
" for i in range(self.depth):\n",
" if keeps[i]:\n",
" x = self.layers[i](x)\n",
" \n",
" # return both the final-layer output and the number of layers executed.\n",
" return x, count"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "NIEzuNL6vMVl",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Let's try it on mnist-shaped data:"
]
},
{
"metadata": {
"id": "FiqyFySkWbeN",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"train_batch = np.random.randn(64, 28, 28, 1).astype(np.float32)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Vz1JTpLOvT4u",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Build a simple stack of `conv` layers, in the stocastic depth model:"
]
},
{
"metadata": {
"id": "XwwtlQAjvUph",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"with tf.Graph().as_default() as g:\n",
" model = StocasticNetworkDepth(\n",
" pfirst=1.0, plast=0.5)\n",
"\n",
" for n in range(20):\n",
" model.add(\n",
" layers.Conv2D(filters=16, activation=tf.nn.relu,\n",
" kernel_size=(3, 3), padding='same'))\n",
"\n",
" model.build(tf.TensorShape((None, None, None, 1)))\n",
" \n",
" init = tf.global_variables_initializer()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "uM3g_v7mvrkg",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Now test it to ensure it behaves as expected in train and test modes:"
]
},
{
"metadata": {
"id": "7tdmuh5Zvm3D",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"# Use an explicit session here so we can set the train/test switch, and\n",
"# inspect the layer count returned by `call`\n",
"with tf.Session(graph=g) as sess:\n",
" init.run()\n",
" \n",
" for phase, name in enumerate(['test','train']):\n",
" K.set_learning_phase(phase)\n",
" result, count = model(tf.convert_to_tensor(train_batch, dtype=tf.float32))\n",
"\n",
" result1, count1 = sess.run((result, count))\n",
" result2, count2 = sess.run((result, count))\n",
"\n",
" delta = (result1 - result2)\n",
" print(name, \"sum abs delta: \", abs(delta).mean())\n",
" print(\" layers 1st call: \", count1)\n",
" print(\" layers 2nd call: \", count2)\n",
" print()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "4LfnJjm0Bm0B",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Advanced example: An in-graph training loop\n",
"\n",
"The previous section showed that AutoGraph can be used inside Keras layers and models. Keras models can also be used in AutoGraph code.\n",
"\n",
"Since writing control flow in AutoGraph is easy, running a training loop in a TensorFlow graph should also be easy. \n",
"\n",
"This example shows how to train a simple Keras model on MNIST with the entire training process—loading batches, calculating gradients, updating parameters, calculating validation accuracy, and repeating until convergence—is performed in-graph."
]
},
{
"metadata": {
"id": "Em5dzSUOtLRP",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Download data"
]
},
{
"metadata": {
"id": "xqoxumv0ssQW",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "znmy4l8ntMvW",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Define the model"
]
},
{
"metadata": {
"id": "Pe-erWQdBoC5",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"def mlp_model(input_shape):\n",
" model = tf.keras.Sequential((\n",
" tf.keras.layers.Dense(100, activation='relu', input_shape=input_shape),\n",
" tf.keras.layers.Dense(100, activation='relu'),\n",
" tf.keras.layers.Dense(10, activation='softmax')))\n",
" model.build()\n",
" return model\n",
"\n",
"\n",
"def predict(m, x, y):\n",
" y_p = m(tf.reshape(x, (-1, 28 * 28)))\n",
" losses = tf.keras.losses.categorical_crossentropy(y, y_p)\n",
" l = tf.reduce_mean(losses)\n",
" accuracies = tf.keras.metrics.categorical_accuracy(y, y_p)\n",
" accuracy = tf.reduce_mean(accuracies)\n",
" return l, accuracy\n",
"\n",
"\n",
"def fit(m, x, y, opt):\n",
" l, accuracy = predict(m, x, y)\n",
" # Autograph automatically adds the necessary `tf.control_dependencies` here.\n",
" # (Without them nothing depends on `opt.minimize`, so it doesn't run.)\n",
" # This makes it much more like eager-code.\n",
" opt.minimize(l)\n",
" return l, accuracy\n",
"\n",
"\n",
"def setup_mnist_data(is_training, batch_size):\n",
" if is_training:\n",
" ds = tf.data.Dataset.from_tensor_slices((train_images, train_labels))\n",
" ds = ds.shuffle(batch_size * 10)\n",
" else:\n",
" ds = tf.data.Dataset.from_tensor_slices((test_images, test_labels))\n",
"\n",
" ds = ds.repeat()\n",
" ds = ds.batch(batch_size)\n",
" return ds\n",
"\n",
"\n",
"def get_next_batch(ds):\n",
" itr = ds.make_one_shot_iterator()\n",
" image, label = itr.get_next()\n",
" x = tf.to_float(image) / 255.0\n",
" y = tf.one_hot(tf.squeeze(label), 10)\n",
" return x, y "
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "oeYV6mKnJGMr",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Define the training loop"
]
},
{
"metadata": {
"id": "3xtg_MMhJETd",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"# Use `recursive = True` to recursively convert functions called by this one.\n",
"@autograph.convert(recursive=True)\n",
"def train(train_ds, test_ds, hp):\n",
" m = mlp_model((28 * 28,))\n",
" opt = tf.train.AdamOptimizer(hp.learning_rate)\n",
" \n",
" # We'd like to save our losses to a list. In order for AutoGraph\n",
" # to convert these lists into their graph equivalent,\n",
" # we need to specify the element type of the lists.\n",
" train_losses = []\n",
" autograph.set_element_type(train_losses, tf.float32)\n",
" test_losses = []\n",
" autograph.set_element_type(test_losses, tf.float32)\n",
" train_accuracies = []\n",
" autograph.set_element_type(train_accuracies, tf.float32)\n",
" test_accuracies = []\n",
" autograph.set_element_type(test_accuracies, tf.float32)\n",
" \n",
" # This entire training loop will be run in-graph.\n",
" i = tf.constant(0)\n",
" while i < hp.max_steps:\n",
" train_x, train_y = get_next_batch(train_ds)\n",
" test_x, test_y = get_next_batch(test_ds)\n",
"\n",
" step_train_loss, step_train_accuracy = fit(m, train_x, train_y, opt)\n",
" step_test_loss, step_test_accuracy = predict(m, test_x, test_y)\n",
" if i % (hp.max_steps // 10) == 0:\n",
" print('Step', i, 'train loss:', step_train_loss, 'test loss:',\n",
" step_test_loss, 'train accuracy:', step_train_accuracy,\n",
" 'test accuracy:', step_test_accuracy)\n",
" train_losses.append(step_train_loss)\n",
" test_losses.append(step_test_loss)\n",
" train_accuracies.append(step_train_accuracy)\n",
" test_accuracies.append(step_test_accuracy)\n",
" i += 1\n",
" \n",
" # We've recorded our loss values and accuracies \n",
" # to a list in a graph with AutoGraph's help.\n",
" # In order to return the values as a Tensor, \n",
" # we need to stack them before returning them.\n",
" return (autograph.stack(train_losses), autograph.stack(test_losses), \n",
" autograph.stack(train_accuracies), autograph.stack(test_accuracies))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "IsHLDZniauLV",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Now build the graph and run the training loop:"
]
},
{
"metadata": {
"id": "HYh6MSZyJOag",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"with tf.Graph().as_default() as g:\n",
" hp = tf.contrib.training.HParams(\n",
" learning_rate=0.005,\n",
" max_steps=500,\n",
" )\n",
" train_ds = setup_mnist_data(True, 50)\n",
" test_ds = setup_mnist_data(False, 1000)\n",
" (train_losses, test_losses, train_accuracies,\n",
" test_accuracies) = train(train_ds, test_ds, hp)\n",
"\n",
" init = tf.global_variables_initializer()\n",
" \n",
"with tf.Session(graph=g) as sess:\n",
" sess.run(init)\n",
" (train_losses, test_losses, train_accuracies,\n",
" test_accuracies) = sess.run([train_losses, test_losses, train_accuracies,\n",
" test_accuracies])\n",
" \n",
"plt.title('MNIST train/test losses')\n",
"plt.plot(train_losses, label='train loss')\n",
"plt.plot(test_losses, label='test loss')\n",
"plt.legend()\n",
"plt.xlabel('Training step')\n",
"plt.ylabel('Loss')\n",
"plt.show()\n",
"plt.title('MNIST train/test accuracies')\n",
"plt.plot(train_accuracies, label='train accuracy')\n",
"plt.plot(test_accuracies, label='test accuracy')\n",
"plt.legend(loc='lower right')\n",
"plt.xlabel('Training step')\n",
"plt.ylabel('Accuracy')\n",
"plt.show()"
],
"execution_count": 0,
"outputs": []
} }
] ]
} }
\ No newline at end of file
...@@ -5,8 +5,6 @@ ...@@ -5,8 +5,6 @@
"colab": { "colab": {
"name": "Custom training: walkthrough", "name": "Custom training: walkthrough",
"version": "0.3.2", "version": "0.3.2",
"views": {},
"default_view": {},
"provenance": [], "provenance": [],
"private_outputs": true, "private_outputs": true,
"collapsed_sections": [], "collapsed_sections": [],
...@@ -32,12 +30,7 @@ ...@@ -32,12 +30,7 @@
"metadata": { "metadata": {
"id": "CPII1rGR2rF9", "id": "CPII1rGR2rF9",
"colab_type": "code", "colab_type": "code",
"colab": { "colab": {},
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"cellView": "form" "cellView": "form"
}, },
"cell_type": "code", "cell_type": "code",
...@@ -64,7 +57,9 @@ ...@@ -64,7 +57,9 @@
}, },
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"# Custom training: walkthrough" "# Custom training: walkthrough\n",
"\n",
"This file has moved."
] ]
}, },
{ {
...@@ -79,1150 +74,13 @@ ...@@ -79,1150 +74,13 @@
" <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/eager/custom_training_walkthrough\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n", " <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/eager/custom_training_walkthrough\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/samples/core/tutorials/eager/custom_training_walkthrough.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", " <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/eager/custom_training_walkthrough.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/samples/core/tutorials/eager/custom_training_walkthrough.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n", " <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/tutorials/eager/custom_training_walkthrough.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n", " </td>\n",
"</table>" "</table>"
] ]
},
{
"metadata": {
"id": "LDrzLFXE8T1l",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This guide uses machine learning to *categorize* Iris flowers by species. It uses TensorFlow's [eager execution](https://www.tensorflow.org/guide/eager) to:\n",
"1. Build a model,\n",
"2. Train this model on example data, and\n",
"3. Use the model to make predictions about unknown data.\n",
"\n",
"## TensorFlow programming\n",
"\n",
"This guide uses these high-level TensorFlow concepts:\n",
"\n",
"* Enable an [eager execution](https://www.tensorflow.org/guide/eager) development environment,\n",
"* Import data with the [Datasets API](https://www.tensorflow.org/guide/datasets),\n",
"* Build models and layers with TensorFlow's [Keras API](https://keras.io/getting-started/sequential-model-guide/).\n",
"\n",
"This tutorial is structured like many TensorFlow programs:\n",
"\n",
"1. Import and parse the data sets.\n",
"2. Select the type of model.\n",
"3. Train the model.\n",
"4. Evaluate the model's effectiveness.\n",
"5. Use the trained model to make predictions."
]
},
{
"metadata": {
"id": "yNr7H-AIoLOR",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Setup program"
]
},
{
"metadata": {
"id": "1J3AuPBT9gyR",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Configure imports and eager execution\n",
"\n",
"Import the required Python modules—including TensorFlow—and enable eager execution for this program. Eager execution makes TensorFlow evaluate operations immediately, returning concrete values instead of creating a [computational graph](https://www.tensorflow.org/guide/graphs) that is executed later. If you are used to a REPL or the `python` interactive console, this feels familiar. Eager execution is available in [Tensorlow >=1.8](https://www.tensorflow.org/install/).\n",
"\n",
"Once eager execution is enabled, it *cannot* be disabled within the same program. See the [eager execution guide](https://www.tensorflow.org/guide/eager) for more details."
]
},
{
"metadata": {
"id": "g4Wzg69bnwK2",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"from __future__ import absolute_import, division, print_function\n",
"\n",
"import os\n",
"import matplotlib.pyplot as plt\n",
"\n",
"import tensorflow as tf\n",
"import tensorflow.contrib.eager as tfe\n",
"\n",
"tf.enable_eager_execution()\n",
"\n",
"print(\"TensorFlow version: {}\".format(tf.VERSION))\n",
"print(\"Eager execution: {}\".format(tf.executing_eagerly()))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Zx7wc0LuuxaJ",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## The Iris classification problem\n",
"\n",
"Imagine you are a botanist seeking an automated way to categorize each Iris flower you find. Machine learning provides many algorithms to classify flowers statistically. For instance, a sophisticated machine learning program could classify flowers based on photographs. Our ambitions are more modest—we're going to classify Iris flowers based on the length and width measurements of their [sepals](https://en.wikipedia.org/wiki/Sepal) and [petals](https://en.wikipedia.org/wiki/Petal).\n",
"\n",
"The Iris genus entails about 300 species, but our program will only classify the following three:\n",
"\n",
"* Iris setosa\n",
"* Iris virginica\n",
"* Iris versicolor\n",
"\n",
"<table>\n",
" <tr><td>\n",
" <img src=\"https://www.tensorflow.org/images/iris_three_species.jpg\"\n",
" alt=\"Petal geometry compared for three iris species: Iris setosa, Iris virginica, and Iris versicolor\">\n",
" </td></tr>\n",
" <tr><td align=\"center\">\n",
" <b>Figure 1.</b> <a href=\"https://commons.wikimedia.org/w/index.php?curid=170298\">Iris setosa</a> (by <a href=\"https://commons.wikimedia.org/wiki/User:Radomil\">Radomil</a>, CC BY-SA 3.0), <a href=\"https://commons.wikimedia.org/w/index.php?curid=248095\">Iris versicolor</a>, (by <a href=\"https://commons.wikimedia.org/wiki/User:Dlanglois\">Dlanglois</a>, CC BY-SA 3.0), and <a href=\"https://www.flickr.com/photos/33397993@N05/3352169862\">Iris virginica</a> (by <a href=\"https://www.flickr.com/photos/33397993@N05\">Frank Mayfield</a>, CC BY-SA 2.0).<br/>&nbsp;\n",
" </td></tr>\n",
"</table>\n",
"\n",
"Fortunately, someone has already created a [data set of 120 Iris flowers](https://en.wikipedia.org/wiki/Iris_flower_data_set) with the sepal and petal measurements. This is a classic dataset that is popular for beginner machine learning classification problems."
]
},
{
"metadata": {
"id": "3Px6KAg0Jowz",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Import and parse the training dataset\n",
"\n",
"Download the dataset file and convert it into a structure that can be used by this Python program.\n",
"\n",
"### Download the dataset\n",
"\n",
"Download the training dataset file using the [tf.keras.utils.get_file](https://www.tensorflow.org/api_docs/python/tf/keras/utils/get_file) function. This returns the file path of the downloaded file."
]
},
{
"metadata": {
"id": "J6c7uEU9rjRM",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"train_dataset_url = \"http://download.tensorflow.org/data/iris_training.csv\"\n",
"\n",
"train_dataset_fp = tf.keras.utils.get_file(fname=os.path.basename(train_dataset_url),\n",
" origin=train_dataset_url)\n",
"\n",
"print(\"Local copy of the dataset file: {}\".format(train_dataset_fp))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "qnX1-aLors4S",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Inspect the data\n",
"\n",
"This dataset, `iris_training.csv`, is a plain text file that stores tabular data formatted as comma-separated values (CSV). Use the `head -n5` command to take a peak at the first five entries:"
]
},
{
"metadata": {
"id": "FQvb_JYdrpPm",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"!head -n5 {train_dataset_fp}"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "kQhzD6P-uBoq",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"From this view of the dataset, notice the following:\n",
"\n",
"1. The first line is a header containing information about the dataset:\n",
" * There are 120 total examples. Each example has four features and one of three possible label names. \n",
"2. Subsequent rows are data records, one *[example](https://developers.google.com/machine-learning/glossary/#example)* per line, where:\n",
" * The first four fields are *[features](https://developers.google.com/machine-learning/glossary/#feature)*: these are characteristics of an example. Here, the fields hold float numbers representing flower measurements.\n",
" * The last column is the *[label](https://developers.google.com/machine-learning/glossary/#label)*: this is the value we want to predict. For this dataset, it's an integer value of 0, 1, or 2 that corresponds to a flower name.\n",
"\n",
"Let's write that out in code:"
]
},
{
"metadata": {
"id": "9Edhevw7exl6",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"# column order in CSV file\n",
"column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']\n",
"\n",
"feature_names = column_names[:-1]\n",
"label_name = column_names[-1]\n",
"\n",
"print(\"Features: {}\".format(feature_names))\n",
"print(\"Label: {}\".format(label_name))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "CCtwLoJhhDNc",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Each label is associated with string name (for example, \"setosa\"), but machine learning typically relies on numeric values. The label numbers are mapped to a named representation, such as:\n",
"\n",
"* `0`: Iris setosa\n",
"* `1`: Iris versicolor\n",
"* `2`: Iris virginica\n",
"\n",
"For more information about features and labels, see the [ML Terminology section of the Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/framing/ml-terminology)."
]
},
{
"metadata": {
"id": "sVNlJlUOhkoX",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"class_names = ['Iris setosa', 'Iris versicolor', 'Iris virginica']"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "dqPkQExM2Pwt",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Create a `tf.data.Dataset`\n",
"\n",
"TensorFlow's [Dataset API](https://www.tensorflow.org/guide/datasets) handles many common cases for loading data into a model. This is a high-level API for reading data and transforming it into a form used for training. See the [Datasets Quick Start guide](https://www.tensorflow.org/get_started/datasets_quickstart) for more information.\n",
"\n",
"\n",
"Since the dataset is a CSV-formatted text file, use the [make_csv_dataset](https://www.tensorflow.org/api_docs/python/tf/contrib/data/make_csv_dataset) function to parse the data into a suitable format. Since this function generates data for training models, the default behavior is to shuffle the data (`shuffle=True, shuffle_buffer_size=10000`), and repeat the dataset forever (`num_epochs=None`). We also set the [batch_size](https://developers.google.com/machine-learning/glossary/#batch_size) parameter."
]
},
{
"metadata": {
"id": "WsxHnz1ebJ2S",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"batch_size = 32\n",
"\n",
"train_dataset = tf.contrib.data.make_csv_dataset(\n",
" train_dataset_fp,\n",
" batch_size, \n",
" column_names=column_names,\n",
" label_name=label_name,\n",
" num_epochs=1)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "gB_RSn62c-3G",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The `make_csv_dataset` function returns a `tf.data.Dataset` of `(features, label)` pairs, where `features` is a dictionary: `{'feature_name': value}`\n",
"\n",
"With eager execution enabled, these `Dataset` objects are iterable. Let's look at a batch of features:"
]
},
{
"metadata": {
"id": "iDuG94H-C122",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"features, labels = next(iter(train_dataset))\n",
"\n",
"features"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "E63mArnQaAGz",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Notice that like-features are grouped together, or *batched*. Each example row's fields are appended to the corresponding feature array. Change the `batch_size` to set the number of examples stored in these feature arrays.\n",
"\n",
"You can start to see some clusters by plotting a few features from the batch:"
]
},
{
"metadata": {
"id": "me5Wn-9FcyyO",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"plt.scatter(features['petal_length'],\n",
" features['sepal_length'],\n",
" c=labels,\n",
" cmap='viridis')\n",
"\n",
"plt.xlabel(\"Petal length\")\n",
"plt.ylabel(\"Sepal length\");"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "YlxpSyHlhT6M",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"To simplify the model building step, create a function to repackage the features dictionary into a single array with shape: `(batch_size, num_features)`.\n",
"\n",
"This function uses the [tf.stack](https://www.tensorflow.org/api_docs/python/tf/stack) method which takes values from a list of tensors and creates a combined tensor at the specified dimension."
]
},
{
"metadata": {
"id": "jm932WINcaGU",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"def pack_features_vector(features, labels):\n",
" \"\"\"Pack the features into a single array.\"\"\"\n",
" features = tf.stack(list(features.values()), axis=1)\n",
" return features, labels"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "V1Vuph_eDl8x",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Then use the [tf.data.Dataset.map](https://www.tensorflow.org/api_docs/python/tf/data/dataset/map) method to pack the `features` of each `(features,label)` pair into the training dataset:"
]
},
{
"metadata": {
"id": "ZbDkzGZIkpXf",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"train_dataset = train_dataset.map(pack_features_vector)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "NLy0Q1xCldVO",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The features element of the `Dataset` are now arrays with shape `(batch_size, num_features)`. Let's look at the first few examples:"
]
},
{
"metadata": {
"id": "kex9ibEek6Tr",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"features, labels = next(iter(train_dataset))\n",
"\n",
"print(features[:5])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "LsaVrtNM3Tx5",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Select the type of model\n",
"\n",
"### Why model?\n",
"\n",
"A *[model](https://developers.google.com/machine-learning/crash-course/glossary#model)* is a relationship between features and the label. For the Iris classification problem, the model defines the relationship between the sepal and petal measurements and the predicted Iris species. Some simple models can be described with a few lines of algebra, but complex machine learning models have a large number of parameters that are difficult to summarize.\n",
"\n",
"Could you determine the relationship between the four features and the Iris species *without* using machine learning? That is, could you use traditional programming techniques (for example, a lot of conditional statements) to create a model? Perhaps—if you analyzed the dataset long enough to determine the relationships between petal and sepal measurements to a particular species. And this becomes difficult—maybe impossible—on more complicated datasets. A good machine learning approach *determines the model for you*. If you feed enough representative examples into the right machine learning model type, the program will figure out the relationships for you.\n",
"\n",
"### Select the model\n",
"\n",
"We need to select the kind of model to train. There are many types of models and picking a good one takes experience. This tutorial uses a neural network to solve the Iris classification problem. *[Neural networks](https://developers.google.com/machine-learning/glossary/#neural_network)* can find complex relationships between features and the label. It is a highly-structured graph, organized into one or more *[hidden layers](https://developers.google.com/machine-learning/glossary/#hidden_layer)*. Each hidden layer consists of one or more *[neurons](https://developers.google.com/machine-learning/glossary/#neuron)*. There are several categories of neural networks and this program uses a dense, or *[fully-connected neural network](https://developers.google.com/machine-learning/glossary/#fully_connected_layer)*: the neurons in one layer receive input connections from *every* neuron in the previous layer. For example, Figure 2 illustrates a dense neural network consisting of an input layer, two hidden layers, and an output layer:\n",
"\n",
"<table>\n",
" <tr><td>\n",
" <img src=\"https://www.tensorflow.org/images/custom_estimators/full_network.png\"\n",
" alt=\"A diagram of the network architecture: Inputs, 2 hidden layers, and outputs\">\n",
" </td></tr>\n",
" <tr><td align=\"center\">\n",
" <b>Figure 2.</b> A neural network with features, hidden layers, and predictions.<br/>&nbsp;\n",
" </td></tr>\n",
"</table>\n",
"\n",
"When the model from Figure 2 is trained and fed an unlabeled example, it yields three predictions: the likelihood that this flower is the given Iris species. This prediction is called *[inference](https://developers.google.com/machine-learning/crash-course/glossary#inference)*. For this example, the sum of the output predictions is 1.0. In Figure 2, this prediction breaks down as: `0.02` for *Iris setosa*, `0.95` for *Iris versicolor*, and `0.03` for *Iris virginica*. This means that the model predicts—with 95% probability—that an unlabeled example flower is an *Iris versicolor*."
]
},
{
"metadata": {
"id": "W23DIMVPQEBt",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Create a model using Keras\n",
"\n",
"The TensorFlow [tf.keras](https://www.tensorflow.org/api_docs/python/tf/keras) API is the preferred way to create models and layers. This makes it easy to build models and experiment while Keras handles the complexity of connecting everything together.\n",
"\n",
"The [tf.keras.Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) model is a linear stack of layers. Its constructor takes a list of layer instances, in this case, two [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layers with 10 nodes each, and an output layer with 3 nodes representing our label predictions. The first layer's `input_shape` parameter corresponds to the number of features from the dataset, and is required."
]
},
{
"metadata": {
"id": "2fZ6oL2ig3ZK",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"model = tf.keras.Sequential([\n",
" tf.keras.layers.Dense(10, activation=tf.nn.relu, input_shape=(4,)), # input shape required\n",
" tf.keras.layers.Dense(10, activation=tf.nn.relu),\n",
" tf.keras.layers.Dense(3)\n",
"])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "FHcbEzMpxbHL",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The *[activation function](https://developers.google.com/machine-learning/crash-course/glossary#activation_function)* determines the output shape of each node in the layer. These non-linearities are important—without them the model would be equivalent to a single layer. There are many [available activations](https://www.tensorflow.org/api_docs/python/tf/keras/activations), but [ReLU](https://developers.google.com/machine-learning/crash-course/glossary#ReLU) is common for hidden layers.\n",
"\n",
"The ideal number of hidden layers and neurons depends on the problem and the dataset. Like many aspects of machine learning, picking the best shape of the neural network requires a mixture of knowledge and experimentation. As a rule of thumb, increasing the number of hidden layers and neurons typically creates a more powerful model, which requires more data to train effectively."
]
},
{
"metadata": {
"id": "2wFKnhWCpDSS",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Using the model\n",
"\n",
"Let's have a quick look at what this model does to a batch of features:"
]
},
{
"metadata": {
"id": "xe6SQ5NrpB-I",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"predictions = model(features)\n",
"predictions[:5]"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "wxyXOhwVr5S3",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Here, each example returns a [logit](https://developers.google.com/machine-learning/crash-course/glossary#logit) for each class. \n",
"\n",
"To convert these logits to a probability for each class, use the [softmax](https://developers.google.com/machine-learning/crash-course/glossary#softmax) function:"
]
},
{
"metadata": {
"id": "_tRwHZmTNTX2",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"tf.nn.softmax(predictions[:5])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "uRZmchElo481",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Taking the `tf.argmax` across classes gives us the predicted class index. But, the model hasn't been trained yet, so these aren't good predictions."
]
},
{
"metadata": {
"id": "-Jzm_GoErz8B",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"print(\"Prediction: {}\".format(tf.argmax(predictions, axis=1)))\n",
"print(\" Labels: {}\".format(labels))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Vzq2E5J2QMtw",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Train the model\n",
"\n",
"*[Training](https://developers.google.com/machine-learning/crash-course/glossary#training)* is the stage of machine learning when the model is gradually optimized, or the model *learns* the dataset. The goal is to learn enough about the structure of the training dataset to make predictions about unseen data. If you learn *too much* about the training dataset, then the predictions only work for the data it has seen and will not be generalizable. This problem is called *[overfitting](https://developers.google.com/machine-learning/crash-course/glossary#overfitting)*—it's like memorizing the answers instead of understanding how to solve a problem.\n",
"\n",
"The Iris classification problem is an example of *[supervised machine learning](https://developers.google.com/machine-learning/glossary/#supervised_machine_learning)*: the model is trained from examples that contain labels. In *[unsupervised machine learning](https://developers.google.com/machine-learning/glossary/#unsupervised_machine_learning)*, the examples don't contain labels. Instead, the model typically finds patterns among the features."
]
},
{
"metadata": {
"id": "RaKp8aEjKX6B",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Define the loss and gradient function\n",
"\n",
"Both training and evaluation stages need to calculate the model's *[loss](https://developers.google.com/machine-learning/crash-course/glossary#loss)*. This measures how off a model's predictions are from the desired label, in other words, how bad the model is performing. We want to minimize, or optimize, this value.\n",
"\n",
"Our model will calculate its loss using the [tf.keras.losses.categorical_crossentropy](https://www.tensorflow.org/api_docs/python/tf/losses/sparse_softmax_cross_entropy) function which takes the model's class probability predictions and the desired label, and returns the average loss across the examples."
]
},
{
"metadata": {
"id": "tMAT4DcMPwI-",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"def loss(model, x, y):\n",
" y_ = model(x)\n",
" return tf.losses.sparse_softmax_cross_entropy(labels=y, logits=y_)\n",
"\n",
"\n",
"l = loss(model, features, labels)\n",
"print(\"Loss test: {}\".format(l))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "3IcPqA24QM6B",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Use the [tf.GradientTape](https://www.tensorflow.org/api_docs/python/tf/GradientTape) context to calculate the *[gradients](https://developers.google.com/machine-learning/crash-course/glossary#gradient)* used to optimize our model. For more examples of this, see the [eager execution guide](https://www.tensorflow.org/guide/eager)."
]
},
{
"metadata": {
"id": "x57HcKWhKkei",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"def grad(model, inputs, targets):\n",
" with tf.GradientTape() as tape:\n",
" loss_value = loss(model, inputs, targets)\n",
" return loss_value, tape.gradient(loss_value, model.trainable_variables)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "lOxFimtlKruu",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Create an optimizer\n",
"\n",
"An *[optimizer](https://developers.google.com/machine-learning/crash-course/glossary#optimizer)* applies the computed gradients to the model's variables to minimize the `loss` function. You can think of the loss function as a curved surface (see Figure 3) and we want to find its lowest point by walking around. The gradients point in the direction of steepest ascent—so we'll travel the opposite way and move down the hill. By iteratively calculating the loss and gradient for each batch, we'll adjust the model during training. Gradually, the model will find the best combination of weights and bias to minimize loss. And the lower the loss, the better the model's predictions.\n",
"\n",
"<table>\n",
" <tr><td>\n",
" <img src=\"https://cs231n.github.io/assets/nn3/opt1.gif\" width=\"70%\"\n",
" alt=\"Optimization algorithms visualized over time in 3D space.\">\n",
" </td></tr>\n",
" <tr><td align=\"center\">\n",
" <b>Figure 3.</b> Optimization algorithms visualized over time in 3D space. (Source: <a href=\"http://cs231n.github.io/neural-networks-3/\">Stanford class CS231n</a>, MIT License)<br/>&nbsp;\n",
" </td></tr>\n",
"</table>\n",
"\n",
"TensorFlow has many [optimization algorithms](https://www.tensorflow.org/api_guides/python/train) available for training. This model uses the [tf.train.GradientDescentOptimizer](https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer) that implements the *[stochastic gradient descent](https://developers.google.com/machine-learning/crash-course/glossary#gradient_descent)* (SGD) algorithm. The `learning_rate` sets the step size to take for each iteration down the hill. This is a *hyperparameter* that you'll commonly adjust to achieve better results."
]
},
{
"metadata": {
"id": "XkUd6UiZa_dF",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Let's setup the optimizer and the `global_step` counter:"
]
},
{
"metadata": {
"id": "8xxi2NNGKwG_",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)\n",
"\n",
"global_step = tf.train.get_or_create_global_step()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "pJVRZ0hP52ZB",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"We'll use this to calculate a single optimization step:"
]
},
{
"metadata": {
"id": "rxRNTFVe56RG",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"loss_value, grads = grad(model, features, labels)\n",
"\n",
"print(\"Step: {}, Initial Loss: {}\".format(global_step.numpy(),\n",
" loss_value.numpy()))\n",
"\n",
"optimizer.apply_gradients(zip(grads, model.variables), global_step)\n",
"\n",
"print(\"Step: {}, Loss: {}\".format(global_step.numpy(),\n",
" loss(model, features, labels).numpy()))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "7Y2VSELvwAvW",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Training loop\n",
"\n",
"With all the pieces in place, the model is ready for training! A training loop feeds the dataset examples into the model to help it make better predictions. The following code block sets up these training steps:\n",
"\n",
"1. Iterate each *epoch*. An epoch is one pass through the dataset.\n",
"2. Within an epoch, iterate over each example in the training `Dataset` grabbing its *features* (`x`) and *label* (`y`).\n",
"3. Using the example's features, make a prediction and compare it with the label. Measure the inaccuracy of the prediction and use that to calculate the model's loss and gradients.\n",
"4. Use an `optimizer` to update the model's variables.\n",
"5. Keep track of some stats for visualization.\n",
"6. Repeat for each epoch.\n",
"\n",
"The `num_epochs` variable is the number of times to loop over the dataset collection. Counter-intuitively, training a model longer does not guarantee a better model. `num_epochs` is a *[hyperparameter](https://developers.google.com/machine-learning/glossary/#hyperparameter)* that you can tune. Choosing the right number usually requires both experience and experimentation."
]
},
{
"metadata": {
"id": "AIgulGRUhpto",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"## Note: Rerunning this cell uses the same model variables\n",
"\n",
"# keep results for plotting\n",
"train_loss_results = []\n",
"train_accuracy_results = []\n",
"\n",
"num_epochs = 201\n",
"\n",
"for epoch in range(num_epochs):\n",
" epoch_loss_avg = tfe.metrics.Mean()\n",
" epoch_accuracy = tfe.metrics.Accuracy()\n",
"\n",
" # Training loop - using batches of 32\n",
" for x, y in train_dataset:\n",
" # Optimize the model\n",
" loss_value, grads = grad(model, x, y)\n",
" optimizer.apply_gradients(zip(grads, model.variables),\n",
" global_step)\n",
"\n",
" # Track progress\n",
" epoch_loss_avg(loss_value) # add current batch loss\n",
" # compare predicted label to actual label\n",
" epoch_accuracy(tf.argmax(model(x), axis=1, output_type=tf.int32), y)\n",
"\n",
" # end epoch\n",
" train_loss_results.append(epoch_loss_avg.result())\n",
" train_accuracy_results.append(epoch_accuracy.result())\n",
" \n",
" if epoch % 50 == 0:\n",
" print(\"Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}\".format(epoch,\n",
" epoch_loss_avg.result(),\n",
" epoch_accuracy.result()))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "2FQHVUnm_rjw",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Visualize the loss function over time"
]
},
{
"metadata": {
"id": "j3wdbmtLVTyr",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"While it's helpful to print out the model's training progress, it's often *more* helpful to see this progress. [TensorBoard](https://www.tensorflow.org/guide/summaries_and_tensorboard) is a nice visualization tool that is packaged with TensorFlow, but we can create basic charts using the `matplotlib` module.\n",
"\n",
"Interpreting these charts takes some experience, but you really want to see the *loss* go down and the *accuracy* go up."
]
},
{
"metadata": {
"id": "agjvNd2iUGFn",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"fig, axes = plt.subplots(2, sharex=True, figsize=(12, 8))\n",
"fig.suptitle('Training Metrics')\n",
"\n",
"axes[0].set_ylabel(\"Loss\", fontsize=14)\n",
"axes[0].plot(train_loss_results)\n",
"\n",
"axes[1].set_ylabel(\"Accuracy\", fontsize=14)\n",
"axes[1].set_xlabel(\"Epoch\", fontsize=14)\n",
"axes[1].plot(train_accuracy_results);"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Zg8GoMZhLpGH",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Evaluate the model's effectiveness\n",
"\n",
"Now that the model is trained, we can get some statistics on its performance.\n",
"\n",
"*Evaluating* means determining how effectively the model makes predictions. To determine the model's effectiveness at Iris classification, pass some sepal and petal measurements to the model and ask the model to predict what Iris species they represent. Then compare the model's prediction against the actual label. For example, a model that picked the correct species on half the input examples has an *[accuracy](https://developers.google.com/machine-learning/glossary/#accuracy)* of `0.5`. Figure 4 shows a slightly more effective model, getting 4 out of 5 predictions correct at 80% accuracy:\n",
"\n",
"<table cellpadding=\"8\" border=\"0\">\n",
" <colgroup>\n",
" <col span=\"4\" >\n",
" <col span=\"1\" bgcolor=\"lightblue\">\n",
" <col span=\"1\" bgcolor=\"lightgreen\">\n",
" </colgroup>\n",
" <tr bgcolor=\"lightgray\">\n",
" <th colspan=\"4\">Example features</th>\n",
" <th colspan=\"1\">Label</th>\n",
" <th colspan=\"1\" >Model prediction</th>\n",
" </tr>\n",
" <tr>\n",
" <td>5.9</td><td>3.0</td><td>4.3</td><td>1.5</td><td align=\"center\">1</td><td align=\"center\">1</td>\n",
" </tr>\n",
" <tr>\n",
" <td>6.9</td><td>3.1</td><td>5.4</td><td>2.1</td><td align=\"center\">2</td><td align=\"center\">2</td>\n",
" </tr>\n",
" <tr>\n",
" <td>5.1</td><td>3.3</td><td>1.7</td><td>0.5</td><td align=\"center\">0</td><td align=\"center\">0</td>\n",
" </tr>\n",
" <tr>\n",
" <td>6.0</td> <td>3.4</td> <td>4.5</td> <td>1.6</td> <td align=\"center\">1</td><td align=\"center\" bgcolor=\"red\">2</td>\n",
" </tr>\n",
" <tr>\n",
" <td>5.5</td><td>2.5</td><td>4.0</td><td>1.3</td><td align=\"center\">1</td><td align=\"center\">1</td>\n",
" </tr>\n",
" <tr><td align=\"center\" colspan=\"6\">\n",
" <b>Figure 4.</b> An Iris classifier that is 80% accurate.<br/>&nbsp;\n",
" </td></tr>\n",
"</table>"
]
},
{
"metadata": {
"id": "z-EvK7hGL0d8",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Setup the test dataset\n",
"\n",
"Evaluating the model is similar to training the model. The biggest difference is the examples come from a separate *[test set](https://developers.google.com/machine-learning/crash-course/glossary#test_set)* rather than the training set. To fairly assess a model's effectiveness, the examples used to evaluate a model must be different from the examples used to train the model.\n",
"\n",
"The setup for the test `Dataset` is similar to the setup for training `Dataset`. Download the CSV text file and parse that values, then give it a little shuffle:"
]
},
{
"metadata": {
"id": "Ps3_9dJ3Lodk",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"test_url = \"http://download.tensorflow.org/data/iris_test.csv\"\n",
"\n",
"test_fp = tf.keras.utils.get_file(fname=os.path.basename(test_url),\n",
" origin=test_url)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "SRMWCu30bnxH",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"test_dataset = tf.contrib.data.make_csv_dataset(\n",
" train_dataset_fp,\n",
" batch_size, \n",
" column_names=column_names,\n",
" label_name='species',\n",
" num_epochs=1,\n",
" shuffle=False)\n",
"\n",
"test_dataset = test_dataset.map(pack_features_vector)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "HFuOKXJdMAdm",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Evaluate the model on the test dataset\n",
"\n",
"Unlike the training stage, the model only evaluates a single [epoch](https://developers.google.com/machine-learning/glossary/#epoch) of the test data. In the following code cell, we iterate over each example in the test set and compare the model's prediction against the actual label. This is used to measure the model's accuracy across the entire test set."
]
},
{
"metadata": {
"id": "Tw03-MK1cYId",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"test_accuracy = tfe.metrics.Accuracy()\n",
"\n",
"for (x, y) in test_dataset:\n",
" logits = model(x)\n",
" prediction = tf.argmax(logits, axis=1, output_type=tf.int32)\n",
" test_accuracy(prediction, y)\n",
"\n",
"print(\"Test set accuracy: {:.3%}\".format(test_accuracy.result()))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "HcKEZMtCOeK-",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"We can see on the last batch, for example, the model is usually correct:"
]
},
{
"metadata": {
"id": "uNwt2eMeOane",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"tf.stack([y,prediction],axis=1)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "7Li2r1tYvW7S",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Use the trained model to make predictions\n",
"\n",
"We've trained a model and \"proven\" that it's good—but not perfect—at classifying Iris species. Now let's use the trained model to make some predictions on [unlabeled examples](https://developers.google.com/machine-learning/glossary/#unlabeled_example); that is, on examples that contain features but not a label.\n",
"\n",
"In real-life, the unlabeled examples could come from lots of different sources including apps, CSV files, and data feeds. For now, we're going to manually provide three unlabeled examples to predict their labels. Recall, the label numbers are mapped to a named representation as:\n",
"\n",
"* `0`: Iris setosa\n",
"* `1`: Iris versicolor\n",
"* `2`: Iris virginica"
]
},
{
"metadata": {
"id": "kesTS5Lzv-M2",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"predict_dataset = tf.convert_to_tensor([\n",
" [5.1, 3.3, 1.7, 0.5,],\n",
" [5.9, 3.0, 4.2, 1.5,],\n",
" [6.9, 3.1, 5.4, 2.1]\n",
"])\n",
"\n",
"predictions = model(predict_dataset)\n",
"\n",
"for i, logits in enumerate(predictions):\n",
" class_idx = tf.argmax(logits).numpy()\n",
" p = tf.nn.softmax(logits)[class_idx]\n",
" name = class_names[class_idx]\n",
" print(\"Example {} prediction: {} ({:4.1f}%)\".format(i, name, 100*p))"
],
"execution_count": 0,
"outputs": []
} }
] ]
} }
\ No newline at end of file
...@@ -62,6 +62,16 @@ ...@@ -62,6 +62,16 @@
"# Build a linear model with Estimators" "# Build a linear model with Estimators"
] ]
}, },
{
"metadata": {
"id": "gkkpAk4sEvQR",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This file has moved."
]
},
{ {
"metadata": { "metadata": {
"id": "uJl4gaPFzxQz", "id": "uJl4gaPFzxQz",
...@@ -74,1327 +84,13 @@ ...@@ -74,1327 +84,13 @@
" <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/estimators/linear\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n", " <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/estimators/linear\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/samples/core/tutorials/estimators/linear.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", " <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/estimators/linear.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/samples/core/tutorials/estimators/linear.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n", " <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/tutorials/estimators/linear.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n", " </td>\n",
"</table>" "</table>"
] ]
},
{
"metadata": {
"id": "77aETSYDcdoK",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This tutorial uses the `tf.estimator` API in TensorFlow to solve a benchmark binary classification problem. Estimators are TensorFlow's most scalable and production-oriented model type. For more information see the [Estimator guide](https://www.tensorflow.org/guide/estimators).\n",
"\n",
"## Overview\n",
"\n",
"Using census data which contains data a person's age, education, marital status, and occupation (the *features*), we will try to predict whether or not the person earns more than 50,000 dollars a year (the target *label*). We will train a *logistic regression* model that, given an individual's information, outputs a number between 0 and 1—this can be interpreted as the probability that the individual has an annual income of over 50,000 dollars.\n",
"\n",
"Key Point: As a modeler and developer, think about how this data is used and the potential benefits and harm a model's predictions can cause. A model like this could reinforce societal biases and disparities. Is each feature relevant to the problem you want to solve or will it introduce bias? For more information, read about [ML fairness](https://developers.google.com/machine-learning/fairness-overview/).\n",
"\n",
"## Setup\n",
"\n",
"Import TensorFlow, feature column support, and supporting modules:"
]
},
{
"metadata": {
"id": "NQgONe5ecYvE",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"import tensorflow as tf\n",
"import tensorflow.feature_column as fc \n",
"\n",
"import os\n",
"import sys\n",
"\n",
"import matplotlib.pyplot as plt\n",
"from IPython.display import clear_output"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Rpb1JSMj1nqk",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"And let's enable [eager execution](https://www.tensorflow.org/guide/eager) to inspect this program as we run it:"
]
},
{
"metadata": {
"id": "tQzxON782Eby",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"tf.enable_eager_execution()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "-MPr95UccYvL",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Download the official implementation\n",
"\n",
"We'll use the [wide and deep model](https://github.com/tensorflow/models/tree/master/official/wide_deep/) available in TensorFlow's [model repository](https://github.com/tensorflow/models/). Download the code, add the root directory to your Python path, and jump to the `wide_deep` directory:"
]
},
{
"metadata": {
"id": "tTwQzWcn8aBu",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"! pip install requests\n",
"! git clone --depth 1 https://github.com/tensorflow/models"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "sRpuysc73Eb-",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Add the root directory of the repository to your Python path:"
]
},
{
"metadata": {
"id": "yVvFyhnkcYvL",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"models_path = os.path.join(os.getcwd(), 'models')\n",
"\n",
"sys.path.append(models_path)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "15Ethw-wcYvP",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Download the dataset:"
]
},
{
"metadata": {
"id": "6QilS4-0cYvQ",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"from official.wide_deep import census_dataset\n",
"from official.wide_deep import census_main\n",
"\n",
"census_dataset.download(\"/tmp/census_data/\")"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "cD5e3ibAcYvS",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Command line usage\n",
"\n",
"The repo includes a complete program for experimenting with this type of model.\n",
"\n",
"To execute the tutorial code from the command line first add the path to tensorflow/models to your `PYTHONPATH`."
]
},
{
"metadata": {
"id": "DYOkY8boUptJ",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"#export PYTHONPATH=${PYTHONPATH}:\"$(pwd)/models\"\n",
"#running from python you need to set the `os.environ` or the subprocess will not see the directory.\n",
"\n",
"if \"PYTHONPATH\" in os.environ:\n",
" os.environ['PYTHONPATH'] += os.pathsep + models_path\n",
"else:\n",
" os.environ['PYTHONPATH'] = models_path"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "5r0V9YUMUyoh",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Use `--help` to see what command line options are available: "
]
},
{
"metadata": {
"id": "1_3tBaLW4YM4",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"!python -m official.wide_deep.census_main --help"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "RrMLazEN6DMj",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Now run the model:\n"
]
},
{
"metadata": {
"id": "py7MarZl5Yh6",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"!python -m official.wide_deep.census_main --model_type=wide --train_epochs=2"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "AmZ4CpaOcYvV",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Read the U.S. Census data\n",
"\n",
"This example uses the [U.S Census Income Dataset](https://archive.ics.uci.edu/ml/datasets/Census+Income) from 1994 and 1995. We have provided the [census_dataset.py](https://github.com/tensorflow/models/tree/master/official/wide_deep/census_dataset.py) script to download the data and perform a little cleanup.\n",
"\n",
"Since the task is a *binary classification problem*, we'll construct a label column named \"label\" whose value is 1 if the income is over 50K, and 0 otherwise. For reference, see the `input_fn` in [census_main.py](https://github.com/tensorflow/models/tree/master/official/wide_deep/census_main.py).\n",
"\n",
"Let's look at the data to see which columns we can use to predict the target label:"
]
},
{
"metadata": {
"id": "N6Tgye8bcYvX",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"!ls /tmp/census_data/"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "6y3mj9zKcYva",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"train_file = \"/tmp/census_data/adult.data\"\n",
"test_file = \"/tmp/census_data/adult.test\""
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "EO_McKgE5il2",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"[pandas](https://pandas.pydata.org/) provides some convenient utilities for data analysis. Here's a list of columns available in the Census Income dataset:"
]
},
{
"metadata": {
"id": "vkn1FNmpcYvb",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"import pandas\n",
"\n",
"train_df = pandas.read_csv(train_file, header = None, names = census_dataset._CSV_COLUMNS)\n",
"test_df = pandas.read_csv(test_file, header = None, names = census_dataset._CSV_COLUMNS)\n",
"\n",
"train_df.head()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "QZZtXes4cYvf",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The columns are grouped into two types: *categorical* and *continuous* columns:\n",
"\n",
"* A column is called *categorical* if its value can only be one of the categories in a finite set. For example, the relationship status of a person (wife, husband, unmarried, etc.) or the education level (high school, college, etc.) are categorical columns.\n",
"* A column is called *continuous* if its value can be any numerical value in a continuous range. For example, the capital gain of a person (e.g. $14,084) is a continuous column.\n",
"\n",
"## Converting Data into Tensors\n",
"\n",
"When building a `tf.estimator` model, the input data is specified by using an *input function* (or `input_fn`). This builder function returns a `tf.data.Dataset` of batches of `(features-dict, label)` pairs. It is not called until it is passed to `tf.estimator.Estimator` methods such as `train` and `evaluate`.\n",
"\n",
"The input builder function returns the following pair:\n",
"\n",
"1. `features`: A dict from feature names to `Tensors` or `SparseTensors` containing batches of features.\n",
"2. `labels`: A `Tensor` containing batches of labels.\n",
"\n",
"The keys of the `features` are used to configure the model's input layer.\n",
"\n",
"Note: The input function is called while constructing the TensorFlow graph, *not* while running the graph. It is returning a representation of the input data as a sequence of TensorFlow graph operations.\n",
"\n",
"For small problems like this, it's easy to make a `tf.data.Dataset` by slicing the `pandas.DataFrame`:"
]
},
{
"metadata": {
"id": "N7zNJflKcYvg",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"def easy_input_function(df, label_key, num_epochs, shuffle, batch_size):\n",
" label = df[label_key]\n",
" ds = tf.data.Dataset.from_tensor_slices((dict(df),label))\n",
"\n",
" if shuffle:\n",
" ds = ds.shuffle(10000)\n",
"\n",
" ds = ds.batch(batch_size).repeat(num_epochs)\n",
"\n",
" return ds"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "WeEgNR9AcYvh",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Since we have eager execution enabled, it's easy to inspect the resulting dataset:"
]
},
{
"metadata": {
"id": "ygaKuikecYvi",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"ds = easy_input_function(train_df, label_key='income_bracket', num_epochs=5, shuffle=True, batch_size=10)\n",
"\n",
"for feature_batch, label_batch in ds.take(1):\n",
" print('Some feature keys:', list(feature_batch.keys())[:5])\n",
" print()\n",
" print('A batch of Ages :', feature_batch['age'])\n",
" print()\n",
" print('A batch of Labels:', label_batch )"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "O_KZxQUucYvm",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"But this approach has severly-limited scalability. Larger datasets should be streamed from disk. The `census_dataset.input_fn` provides an example of how to do this using `tf.decode_csv` and `tf.data.TextLineDataset`: \n",
"\n",
"<!-- TODO(markdaoust): This `input_fn` should use `tf.contrib.data.make_csv_dataset` -->"
]
},
{
"metadata": {
"id": "vUTeXaEUcYvn",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"import inspect\n",
"print(inspect.getsource(census_dataset.input_fn))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "yyGcv_e-cYvq",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This `input_fn` returns equivalent output:"
]
},
{
"metadata": {
"id": "Mv3as_CEcYvu",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"ds = census_dataset.input_fn(train_file, num_epochs=5, shuffle=True, batch_size=10)\n",
"\n",
"for feature_batch, label_batch in ds.take(1):\n",
" print('Feature keys:', list(feature_batch.keys())[:5])\n",
" print()\n",
" print('Age batch :', feature_batch['age'])\n",
" print()\n",
" print('Label batch :', label_batch )"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "810fnfY5cYvz",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Because `Estimators` expect an `input_fn` that takes no arguments, we typically wrap configurable input function into an obejct with the expected signature. For this notebook configure the `train_inpf` to iterate over the data twice:"
]
},
{
"metadata": {
"id": "wnQdpEcVcYv0",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"import functools\n",
"\n",
"train_inpf = functools.partial(census_dataset.input_fn, train_file, num_epochs=2, shuffle=True, batch_size=64)\n",
"test_inpf = functools.partial(census_dataset.input_fn, test_file, num_epochs=1, shuffle=False, batch_size=64)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "pboNpNWhcYv4",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Selecting and Engineering Features for the Model\n",
"\n",
"Estimators use a system called [feature columns](https://www.tensorflow.org/guide/feature_columns) to describe how the model should interpret each of the raw input features. An Estimator expects a vector of numeric inputs, and feature columns describe how the model should convert each feature.\n",
"\n",
"Selecting and crafting the right set of feature columns is key to learning an effective model. A *feature column* can be either one of the raw inputs in the original features `dict` (a *base feature column*), or any new columns created using transformations defined over one or multiple base columns (a *derived feature columns*).\n",
"\n",
"A feature column is an abstract concept of any raw or derived variable that can be used to predict the target label."
]
},
{
"metadata": {
"id": "_hh-cWdU__Lq",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Base Feature Columns"
]
},
{
"metadata": {
"id": "BKz6LA8_ACI7",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Numeric columns\n",
"\n",
"The simplest `feature_column` is `numeric_column`. This indicates that a feature is a numeric value that should be input to the model directly. For example:"
]
},
{
"metadata": {
"id": "ZX0r2T5OcYv6",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"age = fc.numeric_column('age')"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "tnLUiaHxcYv-",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The model will use the `feature_column` definitions to build the model input. You can inspect the resulting output using the `input_layer` function:"
]
},
{
"metadata": {
"id": "kREtIPfwcYv_",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"fc.input_layer(feature_batch, [age]).numpy()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "OPuLduCucYwD",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The following will train and evaluate a model using only the `age` feature:"
]
},
{
"metadata": {
"id": "9R5eSJ1pcYwE",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"classifier = tf.estimator.LinearClassifier(feature_columns=[age])\n",
"classifier.train(train_inpf)\n",
"result = classifier.evaluate(test_inpf)\n",
"\n",
"clear_output() # used for display in notebook\n",
"print(result)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "YDZGcdTdcYwI",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Similarly, we can define a `NumericColumn` for each continuous feature column\n",
"that we want to use in the model:"
]
},
{
"metadata": {
"id": "uqPbUqlxcYwJ",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"education_num = tf.feature_column.numeric_column('education_num')\n",
"capital_gain = tf.feature_column.numeric_column('capital_gain')\n",
"capital_loss = tf.feature_column.numeric_column('capital_loss')\n",
"hours_per_week = tf.feature_column.numeric_column('hours_per_week')\n",
"\n",
"my_numeric_columns = [age,education_num, capital_gain, capital_loss, hours_per_week]\n",
"\n",
"fc.input_layer(feature_batch, my_numeric_columns).numpy()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "cBGDN97IcYwQ",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"You could retrain a model on these features by changing the `feature_columns` argument to the constructor:"
]
},
{
"metadata": {
"id": "XN8k5S95cYwR",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"classifier = tf.estimator.LinearClassifier(feature_columns=my_numeric_columns)\n",
"classifier.train(train_inpf)\n",
"\n",
"result = classifier.evaluate(test_inpf)\n",
"\n",
"clear_output()\n",
"\n",
"for key,value in sorted(result.items()):\n",
" print('%s: %s' % (key, value))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "jBRq9_AzcYwU",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Categorical columns\n",
"\n",
"To define a feature column for a categorical feature, create a `CategoricalColumn` using one of the `tf.feature_column.categorical_column*` functions.\n",
"\n",
"If you know the set of all possible feature values of a column—and there are only a few of them—use `categorical_column_with_vocabulary_list`. Each key in the list is assigned an auto-incremented ID starting from 0. For example, for the `relationship` column we can assign the feature string `Husband` to an integer ID of 0 and \"Not-in-family\" to 1, etc."
]
},
{
"metadata": {
"id": "0IjqSi9tcYwV",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"relationship = fc.categorical_column_with_vocabulary_list(\n",
" 'relationship',\n",
" ['Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried', 'Other-relative'])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "-RjoWv-7cYwW",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This creates a sparse one-hot vector from the raw input feature.\n",
"\n",
"The `input_layer` function we're using is designed for DNN models and expects dense inputs. To demonstrate the categorical column we must wrap it in a `tf.feature_column.indicator_column` to create the dense one-hot output (Linear `Estimators` can often skip this dense-step).\n",
"\n",
"Note: the other sparse-to-dense option is `tf.feature_column.embedding_column`.\n",
"\n",
"Run the input layer, configured with both the `age` and `relationship` columns:"
]
},
{
"metadata": {
"id": "kI43CYlncYwY",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"fc.input_layer(feature_batch, [age, fc.indicator_column(relationship)])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "tTudP7WHcYwb",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"If we don't know the set of possible values in advance, use the `categorical_column_with_hash_bucket` instead:"
]
},
{
"metadata": {
"id": "8pSBaliCcYwb",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"occupation = tf.feature_column.categorical_column_with_hash_bucket(\n",
" 'occupation', hash_bucket_size=1000)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "fSAPrqQkcYwd",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Here, each possible value in the feature column `occupation` is hashed to an integer ID as we encounter them in training. The example batch has a few different occupations:"
]
},
{
"metadata": {
"id": "dCvQNv36cYwe",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"for item in feature_batch['occupation'].numpy():\n",
" print(item.decode())"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "KP5hN2rAcYwh",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"If we run `input_layer` with the hashed column, we see that the output shape is `(batch_size, hash_bucket_size)`:"
]
},
{
"metadata": {
"id": "0Y16peWacYwh",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"occupation_result = fc.input_layer(feature_batch, [fc.indicator_column(occupation)])\n",
"\n",
"occupation_result.numpy().shape"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "HMW2MzWAcYwk",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"It's easier to see the actual results if we take the `tf.argmax` over the `hash_bucket_size` dimension. Notice how any duplicate occupations are mapped to the same pseudo-random index:"
]
},
{
"metadata": {
"id": "q_ryRglmcYwk",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"tf.argmax(occupation_result, axis=1).numpy()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "j1e5NfyKcYwn",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Note: Hash collisions are unavoidable, but often have minimal impact on model quiality. The effect may be noticable if the hash buckets are being used to compress the input space. See [this notebook](https://colab.research.google.com/github/tensorflow/models/blob/master/samples/outreach/blogs/housing_prices.ipynb) for a more visual example of the effect of these hash collisions.\n",
"\n",
"No matter how we choose to define a `SparseColumn`, each feature string is mapped into an integer ID by looking up a fixed mapping or by hashing. Under the hood, the `LinearModel` class is responsible for managing the mapping and creating `tf.Variable` to store the model parameters (model *weights*) for each feature ID. The model parameters are learned through the model training process described later.\n",
"\n",
"Let's do the similar trick to define the other categorical features:"
]
},
{
"metadata": {
"id": "0Z5eUrd_cYwo",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"education = tf.feature_column.categorical_column_with_vocabulary_list(\n",
" 'education', [\n",
" 'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college',\n",
" 'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school',\n",
" '5th-6th', '10th', '1st-4th', 'Preschool', '12th'])\n",
"\n",
"marital_status = tf.feature_column.categorical_column_with_vocabulary_list(\n",
" 'marital_status', [\n",
" 'Married-civ-spouse', 'Divorced', 'Married-spouse-absent',\n",
" 'Never-married', 'Separated', 'Married-AF-spouse', 'Widowed'])\n",
"\n",
"workclass = tf.feature_column.categorical_column_with_vocabulary_list(\n",
" 'workclass', [\n",
" 'Self-emp-not-inc', 'Private', 'State-gov', 'Federal-gov',\n",
" 'Local-gov', '?', 'Self-emp-inc', 'Without-pay', 'Never-worked'])\n",
"\n",
"\n",
"my_categorical_columns = [relationship, occupation, education, marital_status, workclass]"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "ASQJM1pEcYwr",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"It's easy to use both sets of columns to configure a model that uses all these features:"
]
},
{
"metadata": {
"id": "_i_MLoo9cYws",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"classifier = tf.estimator.LinearClassifier(feature_columns=my_numeric_columns+my_categorical_columns)\n",
"classifier.train(train_inpf)\n",
"result = classifier.evaluate(test_inpf)\n",
"\n",
"clear_output()\n",
"\n",
"for key,value in sorted(result.items()):\n",
" print('%s: %s' % (key, value))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "zdKEqF6xcYwv",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Derived feature columns"
]
},
{
"metadata": {
"id": "RgYaf_48FSU2",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Make Continuous Features Categorical through Bucketization\n",
"\n",
"Sometimes the relationship between a continuous feature and the label is not linear. For example, *age* and *income*—a person's income may grow in the early stage of their career, then the growth may slow at some point, and finally, the income decreases after retirement. In this scenario, using the raw `age` as a real-valued feature column might not be a good choice because the model can only learn one of the three cases:\n",
"\n",
"1. Income always increases at some rate as age grows (positive correlation),\n",
"2. Income always decreases at some rate as age grows (negative correlation), or\n",
"3. Income stays the same no matter at what age (no correlation).\n",
"\n",
"If we want to learn the fine-grained correlation between income and each age group separately, we can leverage *bucketization*. Bucketization is a process of dividing the entire range of a continuous feature into a set of consecutive buckets, and then converting the original numerical feature into a bucket ID (as a categorical feature) depending on which bucket that value falls into. So, we can define a `bucketized_column` over `age` as:"
]
},
{
"metadata": {
"id": "KT4pjD9AcYww",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"age_buckets = tf.feature_column.bucketized_column(\n",
" age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "S-XOscrEcYwx",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"`boundaries` is a list of bucket boundaries. In this case, there are 10 boundaries, resulting in 11 age group buckets (from age 17 and below, 18-24, 25-29, ..., to 65 and over).\n",
"\n",
"With bucketing, the model sees each bucket a one-hot feature:"
]
},
{
"metadata": {
"id": "Lr40vm3qcYwy",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"fc.input_layer(feature_batch, [age, age_buckets]).numpy()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Z_tQI9j8cYw1",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"#### Learn complex relationships with crossed column\n",
"\n",
"Using each base feature column separately may not be enough to explain the data. For example, the correlation between education and the label (earning > 50,000 dollars) may be different for different occupations. Therefore, if we only learn a single model weight for `education=\"Bachelors\"` and `education=\"Masters\"`, we won't capture every education-occupation combination (e.g. distinguishing between `education=\"Bachelors\"` AND `occupation=\"Exec-managerial\"` AND `education=\"Bachelors\" AND occupation=\"Craft-repair\"`).\n",
"\n",
"To learn the differences between different feature combinations, we can add *crossed feature columns* to the model:"
]
},
{
"metadata": {
"id": "IAPhPzXscYw1",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"education_x_occupation = tf.feature_column.crossed_column(\n",
" ['education', 'occupation'], hash_bucket_size=1000)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "UeTxMunbcYw5",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"We can also create a `crossed_column` over more than two columns. Each constituent column can be either a base feature column that is categorical (`SparseColumn`), a bucketized real-valued feature column, or even another `CrossColumn`. For example:"
]
},
{
"metadata": {
"id": "y8UaBld9cYw7",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"age_buckets_x_education_x_occupation = tf.feature_column.crossed_column(\n",
" [age_buckets, 'education', 'occupation'], hash_bucket_size=1000)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "HvKmW6U5cYw8",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"These crossed columns always use hash buckets to avoid the exponential explosion in the number of categories, and put the control over number of model weights in the hands of the user.\n",
"\n",
"For a visual example the effect of hash-buckets with crossed columns see [this notebook](https://colab.research.google.com/github/tensorflow/models/blob/master/samples/outreach/blogs/housing_prices.ipynb)\n"
]
},
{
"metadata": {
"id": "HtjpheB6cYw9",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Define the logistic regression model\n",
"\n",
"After processing the input data and defining all the feature columns, we can put them together and build a *logistic regression* model. The previous section showed several types of base and derived feature columns, including:\n",
"\n",
"* `CategoricalColumn`\n",
"* `NumericColumn`\n",
"* `BucketizedColumn`\n",
"* `CrossedColumn`\n",
"\n",
"All of these are subclasses of the abstract `FeatureColumn` class and can be added to the `feature_columns` field of a model:"
]
},
{
"metadata": {
"id": "Klmf3OxpcYw-",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"import tempfile\n",
"\n",
"base_columns = [\n",
" education, marital_status, relationship, workclass, occupation,\n",
" age_buckets,\n",
"]\n",
"\n",
"crossed_columns = [\n",
" tf.feature_column.crossed_column(\n",
" ['education', 'occupation'], hash_bucket_size=1000),\n",
" tf.feature_column.crossed_column(\n",
" [age_buckets, 'education', 'occupation'], hash_bucket_size=1000),\n",
"]\n",
"\n",
"model = tf.estimator.LinearClassifier(\n",
" model_dir=tempfile.mkdtemp(), \n",
" feature_columns=base_columns + crossed_columns,\n",
" optimizer=tf.train.FtrlOptimizer(learning_rate=0.1))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "jRhnPxUucYxC",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The model automatically learns a bias term, which controls the prediction made without observing any features. The learned model files are stored in `model_dir`.\n",
"\n",
"## Train and evaluate the model\n",
"\n",
"After adding all the features to the model, let's train the model. Training a model is just a single command using the `tf.estimator` API:"
]
},
{
"metadata": {
"id": "ZlrIBuoecYxD",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"train_inpf = functools.partial(census_dataset.input_fn, train_file, \n",
" num_epochs=40, shuffle=True, batch_size=64)\n",
"\n",
"model.train(train_inpf)\n",
"\n",
"clear_output() # used for notebook display"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "IvY3a9pzcYxH",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"After the model is trained, evaluate the accuracy of the model by predicting the labels of the holdout data:"
]
},
{
"metadata": {
"id": "L9nVJEO8cYxI",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"results = model.evaluate(test_inpf)\n",
"\n",
"clear_output()\n",
"\n",
"for key,value in sorted(result.items()):\n",
" print('%s: %0.2f' % (key, value))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "E0fAibNDcYxL",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The first line of the output should display something like: `accuracy: 0.83`, which means the accuracy is 83%. You can try using more features and transformations to see if you can do better!\n",
"\n",
"After the model is evaluated, we can use it to predict whether an individual has an annual income of over 50,000 dollars given an individual's information input.\n",
"\n",
"Let's look in more detail how the model performed:"
]
},
{
"metadata": {
"id": "8R5bz5CxcYxL",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"import numpy as np\n",
"\n",
"predict_df = test_df[:20].copy()\n",
"\n",
"pred_iter = model.predict(\n",
" lambda:easy_input_function(predict_df, label_key='income_bracket',\n",
" num_epochs=1, shuffle=False, batch_size=10))\n",
"\n",
"classes = np.array(['<=50K', '>50K'])\n",
"pred_class_id = []\n",
"\n",
"for pred_dict in pred_iter:\n",
" pred_class_id.append(pred_dict['class_ids'])\n",
"\n",
"predict_df['predicted_class'] = classes[np.array(pred_class_id)]\n",
"predict_df['correct'] = predict_df['predicted_class'] == predict_df['income_bracket']\n",
"\n",
"clear_output()\n",
"\n",
"predict_df[['income_bracket','predicted_class', 'correct']]"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "N_uCpFTicYxN",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"For a working end-to-end example, download our [example code](https://github.com/tensorflow/models/tree/master/official/wide_deep/census_main.py) and set the `model_type` flag to `wide`."
]
},
{
"metadata": {
"id": "oyKy1lM_3gkL",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Adding Regularization to Prevent Overfitting\n",
"\n",
"Regularization is a technique used to avoid overfitting. Overfitting happens when a model performs well on the data it is trained on, but worse on test data that the model has not seen before. Overfitting can occur when a model is excessively complex, such as having too many parameters relative to the number of observed training data. Regularization allows you to control the model's complexity and make the model more generalizable to unseen data.\n",
"\n",
"You can add L1 and L2 regularizations to the model with the following code:"
]
},
{
"metadata": {
"id": "lzMUSBQ03hHx",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"model_l1 = tf.estimator.LinearClassifier(\n",
" feature_columns=base_columns + crossed_columns,\n",
" optimizer=tf.train.FtrlOptimizer(\n",
" learning_rate=0.1,\n",
" l1_regularization_strength=10.0,\n",
" l2_regularization_strength=0.0))\n",
"\n",
"model_l1.train(train_inpf)\n",
"\n",
"results = model_l1.evaluate(test_inpf)\n",
"clear_output()\n",
"for key in sorted(results):\n",
" print('%s: %0.2f' % (key, results[key]))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "ofmPL212JIy2",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"model_l2 = tf.estimator.LinearClassifier(\n",
" feature_columns=base_columns + crossed_columns,\n",
" optimizer=tf.train.FtrlOptimizer(\n",
" learning_rate=0.1,\n",
" l1_regularization_strength=0.0,\n",
" l2_regularization_strength=10.0))\n",
"\n",
"model_l2.train(train_inpf)\n",
"\n",
"results = model_l2.evaluate(test_inpf)\n",
"clear_output()\n",
"for key in sorted(results):\n",
" print('%s: %0.2f' % (key, results[key]))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Lp1Rfy_k4e7w",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"These regularized models don't perform much better than the base model. Let's look at the model's weight distributions to better see the effect of the regularization:"
]
},
{
"metadata": {
"id": "Wb6093N04XlS",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"def get_flat_weights(model):\n",
" weight_names = [\n",
" name for name in model.get_variable_names()\n",
" if \"linear_model\" in name and \"Ftrl\" not in name]\n",
"\n",
" weight_values = [model.get_variable_value(name) for name in weight_names]\n",
"\n",
" weights_flat = np.concatenate([item.flatten() for item in weight_values], axis=0)\n",
"\n",
" return weights_flat\n",
"\n",
"weights_flat = get_flat_weights(model)\n",
"weights_flat_l1 = get_flat_weights(model_l1)\n",
"weights_flat_l2 = get_flat_weights(model_l2)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "GskJmtfmL0p-",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The models have many zero-valued weights caused by unused hash bins (there are many more hash bins than categories in some columns). We can mask these weights when viewing the weight distributions:"
]
},
{
"metadata": {
"id": "rM3agZe3MT3D",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"weight_mask = weights_flat != 0\n",
"\n",
"weights_base = weights_flat[weight_mask]\n",
"weights_l1 = weights_flat_l1[weight_mask]\n",
"weights_l2 = weights_flat_l2[weight_mask]"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "NqBpxLLQNEBE",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Now plot the distributions:"
]
},
{
"metadata": {
"id": "IdFK7wWa5_0K",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"plt.figure()\n",
"_ = plt.hist(weights_base, bins=np.linspace(-3,3,30))\n",
"plt.title('Base Model')\n",
"plt.ylim([0,500])\n",
"\n",
"plt.figure()\n",
"_ = plt.hist(weights_l1, bins=np.linspace(-3,3,30))\n",
"plt.title('L1 - Regularization')\n",
"plt.ylim([0,500])\n",
"\n",
"plt.figure()\n",
"_ = plt.hist(weights_l2, bins=np.linspace(-3,3,30))\n",
"plt.title('L2 - Regularization')\n",
"_=plt.ylim([0,500])\n",
"\n"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Mv6knhFa5-iJ",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Both types of regularization squeeze the distribution of weights towards zero. L2 regularization has a greater effect in the tails of the distribution eliminating extreme weights. L1 regularization produces more exactly-zero values, in this case it sets ~200 to zero."
]
} }
] ]
} }
\ No newline at end of file
...@@ -94,6 +94,16 @@ ...@@ -94,6 +94,16 @@
"# Train your first neural network: basic classification" "# Train your first neural network: basic classification"
] ]
}, },
{
"metadata": {
"id": "-VrEyagzFU63",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This file has moved."
]
},
{ {
"metadata": { "metadata": {
"id": "S5Uhzt6vVIB2", "id": "S5Uhzt6vVIB2",
...@@ -106,890 +116,13 @@ ...@@ -106,890 +116,13 @@
" <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/keras/basic_classification\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n", " <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/keras/basic_classification\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/samples/core/tutorials/keras/basic_classification.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", " <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/keras/basic_classification.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/samples/core/tutorials/keras/basic_classification.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n", " <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/tutorials/keras/basic_classification.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n", " </td>\n",
"</table>" "</table>"
] ]
},
{
"metadata": {
"id": "FbVhjPpzn6BM",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This guide trains a neural network model to classify images of clothing, like sneakers and shirts. It's okay if you don't understand all the details, this is a fast-paced overview of a complete TensorFlow program with the details explained as we go.\n",
"\n",
"This guide uses [tf.keras](https://www.tensorflow.org/guide/keras), a high-level API to build and train models in TensorFlow."
]
},
{
"metadata": {
"id": "dzLKpmZICaWN",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"# TensorFlow and tf.keras\n",
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"\n",
"# Helper libraries\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"print(tf.__version__)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "yR0EdgrLCaWR",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Import the Fashion MNIST dataset"
]
},
{
"metadata": {
"id": "DLdCchMdCaWQ",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This guide uses the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset which contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution (28 by 28 pixels), as seen here:\n",
"\n",
"<table>\n",
" <tr><td>\n",
" <img src=\"https://tensorflow.org/images/fashion-mnist-sprite.png\"\n",
" alt=\"Fashion MNIST sprite\" width=\"600\">\n",
" </td></tr>\n",
" <tr><td align=\"center\">\n",
" <b>Figure 1.</b> <a href=\"https://github.com/zalandoresearch/fashion-mnist\">Fashion-MNIST samples</a> (by Zalando, MIT License).<br/>&nbsp;\n",
" </td></tr>\n",
"</table>\n",
"\n",
"Fashion MNIST is intended as a drop-in replacement for the classic [MNIST](http://yann.lecun.com/exdb/mnist/) dataset—often used as the \"Hello, World\" of machine learning programs for computer vision. The MNIST dataset contains images of handwritten digits (0, 1, 2, etc) in an identical format to the articles of clothing we'll use here.\n",
"\n",
"This guide uses Fashion MNIST for variety, and because it's a slightly more challenging problem than regular MNIST. Both datasets are relatively small and are used to verify that an algorithm works as expected. They're good starting points to test and debug code. \n",
"\n",
"We will use 60,000 images to train the network and 10,000 images to evaluate how accurately the network learned to classify images. You can access the Fashion MNIST directly from TensorFlow, just import and load the data:"
]
},
{
"metadata": {
"id": "7MqDQO0KCaWS",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"fashion_mnist = keras.datasets.fashion_mnist\n",
"\n",
"(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "t9FDsUlxCaWW",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Loading the dataset returns four NumPy arrays:\n",
"\n",
"* The `train_images` and `train_labels` arrays are the *training set*—the data the model uses to learn.\n",
"* The model is tested against the *test set*, the `test_images`, and `test_labels` arrays.\n",
"\n",
"The images are 28x28 NumPy arrays, with pixel values ranging between 0 and 255. The *labels* are an array of integers, ranging from 0 to 9. These correspond to the *class* of clothing the image represents:\n",
"\n",
"<table>\n",
" <tr>\n",
" <th>Label</th>\n",
" <th>Class</th> \n",
" </tr>\n",
" <tr>\n",
" <td>0</td>\n",
" <td>T-shirt/top</td> \n",
" </tr>\n",
" <tr>\n",
" <td>1</td>\n",
" <td>Trouser</td> \n",
" </tr>\n",
" <tr>\n",
" <td>2</td>\n",
" <td>Pullover</td> \n",
" </tr>\n",
" <tr>\n",
" <td>3</td>\n",
" <td>Dress</td> \n",
" </tr>\n",
" <tr>\n",
" <td>4</td>\n",
" <td>Coat</td> \n",
" </tr>\n",
" <tr>\n",
" <td>5</td>\n",
" <td>Sandal</td> \n",
" </tr>\n",
" <tr>\n",
" <td>6</td>\n",
" <td>Shirt</td> \n",
" </tr>\n",
" <tr>\n",
" <td>7</td>\n",
" <td>Sneaker</td> \n",
" </tr>\n",
" <tr>\n",
" <td>8</td>\n",
" <td>Bag</td> \n",
" </tr>\n",
" <tr>\n",
" <td>9</td>\n",
" <td>Ankle boot</td> \n",
" </tr>\n",
"</table>\n",
"\n",
"Each image is mapped to a single label. Since the *class names* are not included with the dataset, store them here to use later when plotting the images:"
]
},
{
"metadata": {
"id": "IjnLH5S2CaWx",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', \n",
" 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Brm0b_KACaWX",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Explore the data\n",
"\n",
"Let's explore the format of the dataset before training the model. The following shows there are 60,000 images in the training set, with each image represented as 28 x 28 pixels:"
]
},
{
"metadata": {
"id": "zW5k_xz1CaWX",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"train_images.shape"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "cIAcvQqMCaWf",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Likewise, there are 60,000 labels in the training set:"
]
},
{
"metadata": {
"id": "TRFYHB2mCaWb",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"len(train_labels)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "YSlYxFuRCaWk",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Each label is an integer between 0 and 9:"
]
},
{
"metadata": {
"id": "XKnCTHz4CaWg",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"train_labels"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "TMPI88iZpO2T",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"There are 10,000 images in the test set. Again, each image is represented as 28 x 28 pixels:"
]
},
{
"metadata": {
"id": "2KFnYlcwCaWl",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"test_images.shape"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "rd0A0Iu0CaWq",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"And the test set contains 10,000 images labels:"
]
},
{
"metadata": {
"id": "iJmPr5-ACaWn",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"len(test_labels)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "ES6uQoLKCaWr",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Preprocess the data\n",
"\n",
"The data must be preprocessed before training the network. If you inspect the first image in the training set, you will see that the pixel values fall in the range of 0 to 255:"
]
},
{
"metadata": {
"id": "m4VEw8Ud9Quh",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"plt.figure()\n",
"plt.imshow(train_images[0])\n",
"plt.colorbar()\n",
"plt.grid(False)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Wz7l27Lz9S1P",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"We scale these values to a range of 0 to 1 before feeding to the neural network model. For this, cast the datatype of the image components from an integer to a float, and divide by 255. Here's the function to preprocess the images:"
]
},
{
"metadata": {
"id": "3jCZdQNNCaWv",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"It's important that the *training set* and the *testing set* are preprocessed in the same way:"
]
},
{
"metadata": {
"id": "bW5WzIPlCaWv",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"train_images = train_images / 255.0\n",
"\n",
"test_images = test_images / 255.0"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Ee638AlnCaWz",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Display the first 25 images from the *training set* and display the class name below each image. Verify that the data is in the correct format and we're ready to build and train the network."
]
},
{
"metadata": {
"id": "oZTImqg_CaW1",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"plt.figure(figsize=(10,10))\n",
"for i in range(25):\n",
" plt.subplot(5,5,i+1)\n",
" plt.xticks([])\n",
" plt.yticks([])\n",
" plt.grid(False)\n",
" plt.imshow(train_images[i], cmap=plt.cm.binary)\n",
" plt.xlabel(class_names[train_labels[i]])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "59veuiEZCaW4",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Build the model\n",
"\n",
"Building the neural network requires configuring the layers of the model, then compiling the model."
]
},
{
"metadata": {
"id": "Gxg1XGm0eOBy",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Setup the layers\n",
"\n",
"The basic building block of a neural network is the *layer*. Layers extract representations from the data fed into them. And, hopefully, these representations are more meaningful for the problem at hand.\n",
"\n",
"Most of deep learning consists of chaining together simple layers. Most layers, like `tf.keras.layers.Dense`, have parameters that are learned during training."
]
},
{
"metadata": {
"id": "9ODch-OFCaW4",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"model = keras.Sequential([\n",
" keras.layers.Flatten(input_shape=(28, 28)),\n",
" keras.layers.Dense(128, activation=tf.nn.relu),\n",
" keras.layers.Dense(10, activation=tf.nn.softmax)\n",
"])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "gut8A_7rCaW6",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The first layer in this network, `tf.keras.layers.Flatten`, transforms the format of the images from a 2d-array (of 28 by 28 pixels), to a 1d-array of 28 * 28 = 784 pixels. Think of this layer as unstacking rows of pixels in the image and lining them up. This layer has no parameters to learn; it only reformats the data.\n",
"\n",
"After the pixels are flattened, the network consists of a sequence of two `tf.keras.layers.Dense` layers. These are densely-connected, or fully-connected, neural layers. The first `Dense` layer has 128 nodes (or neurons). The second (and last) layer is a 10-node *softmax* layer—this returns an array of 10 probability scores that sum to 1. Each node contains a score that indicates the probability that the current image belongs to one of the 10 classes.\n",
"\n",
"### Compile the model\n",
"\n",
"Before the model is ready for training, it needs a few more settings. These are added during the model's *compile* step:\n",
"\n",
"* *Loss function* —This measures how accurate the model is during training. We want to minimize this function to \"steer\" the model in the right direction.\n",
"* *Optimizer* —This is how the model is updated based on the data it sees and its loss function.\n",
"* *Metrics* —Used to monitor the training and testing steps. The following example uses *accuracy*, the fraction of the images that are correctly classified."
]
},
{
"metadata": {
"id": "Lhan11blCaW7",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"model.compile(optimizer=tf.train.AdamOptimizer(), \n",
" loss='sparse_categorical_crossentropy',\n",
" metrics=['accuracy'])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "qKF6uW-BCaW-",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Train the model\n",
"\n",
"Training the neural network model requires the following steps:\n",
"\n",
"1. Feed the training data to the model—in this example, the `train_images` and `train_labels` arrays.\n",
"2. The model learns to associate images and labels.\n",
"3. We ask the model to make predictions about a test set—in this example, the `test_images` array. We verify that the predictions match the labels from the `test_labels` array. \n",
"\n",
"To start training, call the `model.fit` method—the model is \"fit\" to the training data:"
]
},
{
"metadata": {
"id": "xvwvpA64CaW_",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"model.fit(train_images, train_labels, epochs=5)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "W3ZVOhugCaXA",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"As the model trains, the loss and accuracy metrics are displayed. This model reaches an accuracy of about 0.88 (or 88%) on the training data."
]
},
{
"metadata": {
"id": "oEw4bZgGCaXB",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Evaluate accuracy\n",
"\n",
"Next, compare how the model performs on the test dataset:"
]
},
{
"metadata": {
"id": "VflXLEeECaXC",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"test_loss, test_acc = model.evaluate(test_images, test_labels)\n",
"\n",
"print('Test accuracy:', test_acc)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "yWfgsmVXCaXG",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"It turns out, the accuracy on the test dataset is a little less than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of *overfitting*. Overfitting is when a machine learning model performs worse on new data than on their training data. "
]
},
{
"metadata": {
"id": "xsoS7CPDCaXH",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Make predictions\n",
"\n",
"With the model trained, we can use it to make predictions about some images."
]
},
{
"metadata": {
"id": "Gl91RPhdCaXI",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"predictions = model.predict(test_images)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "x9Kk1voUCaXJ",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Here, the model has predicted the label for each image in the testing set. Let's take a look at the first prediction:"
]
},
{
"metadata": {
"id": "3DmJEUinCaXK",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"predictions[0]"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "-hw1hgeSCaXN",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"A prediction is an array of 10 numbers. These describe the \"confidence\" of the model that the image corresponds to each of the 10 different articles of clothing. We can see which label has the highest confidence value:"
]
},
{
"metadata": {
"id": "qsqenuPnCaXO",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"np.argmax(predictions[0])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "E51yS7iCCaXO",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"So the model is most confident that this image is an ankle boot, or `class_names[9]`. And we can check the test label to see this is correct:"
]
},
{
"metadata": {
"id": "Sd7Pgsu6CaXP",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"test_labels[0]"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "ygh2yYC972ne",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"We can graph this to look at the full set of 10 channels"
]
},
{
"metadata": {
"id": "DvYmmrpIy6Y1",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"def plot_image(i, predictions_array, true_label, img):\n",
" predictions_array, true_label, img = predictions_array[i], true_label[i], img[i]\n",
" plt.grid(False)\n",
" plt.xticks([])\n",
" plt.yticks([])\n",
" \n",
" plt.imshow(img, cmap=plt.cm.binary)\n",
"\n",
" predicted_label = np.argmax(predictions_array)\n",
" if predicted_label == true_label:\n",
" color = 'blue'\n",
" else:\n",
" color = 'red'\n",
" \n",
" plt.xlabel(\"{} {:2.0f}% ({})\".format(class_names[predicted_label],\n",
" 100*np.max(predictions_array),\n",
" class_names[true_label]),\n",
" color=color)\n",
"\n",
"def plot_value_array(i, predictions_array, true_label):\n",
" predictions_array, true_label = predictions_array[i], true_label[i]\n",
" plt.grid(False)\n",
" plt.xticks([])\n",
" plt.yticks([])\n",
" thisplot = plt.bar(range(10), predictions_array, color=\"#777777\")\n",
" plt.ylim([0, 1]) \n",
" predicted_label = np.argmax(predictions_array)\n",
" \n",
" thisplot[predicted_label].set_color('red')\n",
" thisplot[true_label].set_color('blue')"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "d4Ov9OFDMmOD",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Let's look at the 0th image, predictions, and prediction array. "
]
},
{
"metadata": {
"id": "HV5jw-5HwSmO",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"i = 0\n",
"plt.figure(figsize=(6,3))\n",
"plt.subplot(1,2,1)\n",
"plot_image(i, predictions, test_labels, test_images)\n",
"plt.subplot(1,2,2)\n",
"plot_value_array(i, predictions, test_labels)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Ko-uzOufSCSe",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"i = 12\n",
"plt.figure(figsize=(6,3))\n",
"plt.subplot(1,2,1)\n",
"plot_image(i, predictions, test_labels, test_images)\n",
"plt.subplot(1,2,2)\n",
"plot_value_array(i, predictions, test_labels)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "kgdvGD52CaXR",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Let's plot several images with their predictions. Correct prediction labels are blue and incorrect prediction labels are red. The number gives the percent (out of 100) for the predicted label. Note that it can be wrong even when very confident. "
]
},
{
"metadata": {
"id": "hQlnbqaw2Qu_",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"# Plot the first X test images, their predicted label, and the true label\n",
"# Color correct predictions in blue, incorrect predictions in red\n",
"num_rows = 5\n",
"num_cols = 3\n",
"num_images = num_rows*num_cols\n",
"plt.figure(figsize=(2*2*num_cols, 2*num_rows))\n",
"for i in range(num_images):\n",
" plt.subplot(num_rows, 2*num_cols, 2*i+1)\n",
" plot_image(i, predictions, test_labels, test_images)\n",
" plt.subplot(num_rows, 2*num_cols, 2*i+2)\n",
" plot_value_array(i, predictions, test_labels)\n"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "R32zteKHCaXT",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Finally, use the trained model to make a prediction about a single image. "
]
},
{
"metadata": {
"id": "yRJ7JU7JCaXT",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"# Grab an image from the test dataset\n",
"img = test_images[0]\n",
"\n",
"print(img.shape)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "vz3bVp21CaXV",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"`tf.keras` models are optimized to make predictions on a *batch*, or collection, of examples at once. So even though we're using a single image, we need to add it to a list:"
]
},
{
"metadata": {
"id": "lDFh5yF_CaXW",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"# Add the image to a batch where it's the only member.\n",
"img = (np.expand_dims(img,0))\n",
"\n",
"print(img.shape)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "EQ5wLTkcCaXY",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Now predict the image:"
]
},
{
"metadata": {
"id": "o_rzNSdrCaXY",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"predictions_single = model.predict(img)\n",
"\n",
"print(predictions_single)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "6Ai-cpLjO-3A",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"plot_value_array(0, predictions_single, test_labels)\n",
"_ = plt.xticks(range(10), class_names, rotation=45)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "cU1Y2OAMCaXb",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"`model.predict` returns a list of lists, one for each image in the batch of data. Grab the predictions for our (only) image in the batch:"
]
},
{
"metadata": {
"id": "2tRmdq_8CaXb",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"np.argmax(predictions_single[0])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "YFc2HbEVCaXd",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"And, as before, the model predicts a label of 9."
]
} }
] ]
} }
\ No newline at end of file
...@@ -94,6 +94,16 @@ ...@@ -94,6 +94,16 @@
"# Predict house prices: regression" "# Predict house prices: regression"
] ]
}, },
{
"metadata": {
"id": "dmtJAyM7FwUf",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This file has moved."
]
},
{ {
"metadata": { "metadata": {
"id": "bBIlTPscrIT9", "id": "bBIlTPscrIT9",
...@@ -106,484 +116,13 @@ ...@@ -106,484 +116,13 @@
" <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/keras/basic_regression\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n", " <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/keras/basic_regression\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/samples/core/tutorials/keras/basic_regression.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", " <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/keras/basic_regression.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/samples/core/tutorials/keras/basic_regression.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n", " <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/tutorials/keras/basic_regression.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n", " </td>\n",
"</table>" "</table>"
] ]
},
{
"metadata": {
"id": "AHp3M9ZmrIxj",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"In a *regression* problem, we aim to predict the output of a continuous value, like a price or a probability. Contrast this with a *classification* problem, where we aim to predict a discrete label (for example, where a picture contains an apple or an orange). \n",
"\n",
"This notebook builds a model to predict the median price of homes in a Boston suburb during the mid-1970s. To do this, we'll provide the model with some data points about the suburb, such as the crime rate and the local property tax rate.\n",
"\n",
"This example uses the `tf.keras` API, see [this guide](https://www.tensorflow.org/guide/keras) for details."
]
},
{
"metadata": {
"id": "1rRo8oNqZ-Rj",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"from __future__ import absolute_import, division, print_function\n",
"\n",
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"\n",
"import numpy as np\n",
"\n",
"print(tf.__version__)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "F_72b0LCNbjx",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## The Boston Housing Prices dataset\n",
"\n",
"This [dataset](https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html) is accessible directly in TensorFlow. Download and shuffle the training set:"
]
},
{
"metadata": {
"id": "p9kxxgzvzlyz",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"boston_housing = keras.datasets.boston_housing\n",
"\n",
"(train_data, train_labels), (test_data, test_labels) = boston_housing.load_data()\n",
"\n",
"# Shuffle the training set\n",
"order = np.argsort(np.random.random(train_labels.shape))\n",
"train_data = train_data[order]\n",
"train_labels = train_labels[order]"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "PwEKwRJgsgJ6",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Examples and features\n",
"\n",
"This dataset is much smaller than the others we've worked with so far: it has 506 total examples are split between 404 training examples and 102 test examples:"
]
},
{
"metadata": {
"id": "Ujqcgkipr65P",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"print(\"Training set: {}\".format(train_data.shape)) # 404 examples, 13 features\n",
"print(\"Testing set: {}\".format(test_data.shape)) # 102 examples, 13 features"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "0LRPXE3Oz3Nq",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The dataset contains 13 different features:\n",
"\n",
"1. Per capita crime rate.\n",
"2. The proportion of residential land zoned for lots over 25,000 square feet.\n",
"3. The proportion of non-retail business acres per town.\n",
"4. Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).\n",
"5. Nitric oxides concentration (parts per 10 million).\n",
"6. The average number of rooms per dwelling.\n",
"7. The proportion of owner-occupied units built before 1940.\n",
"8. Weighted distances to five Boston employment centers.\n",
"9. Index of accessibility to radial highways.\n",
"10. Full-value property-tax rate per $10,000.\n",
"11. Pupil-teacher ratio by town.\n",
"12. 1000 * (Bk - 0.63) ** 2 where Bk is the proportion of Black people by town.\n",
"13. Percentage lower status of the population.\n",
"\n",
"Each one of these input data features is stored using a different scale. Some features are represented by a proportion between 0 and 1, other features are ranges between 1 and 12, some are ranges between 0 and 100, and so on. This is often the case with real-world data, and understanding how to explore and clean such data is an important skill to develop.\n",
"\n",
"Key Point: As a modeler and developer, think about how this data is used and the potential benefits and harm a model's predictions can cause. A model like this could reinforce societal biases and disparities. Is a feature relevant to the problem you want to solve or will it introduce bias? For more information, read about [ML fairness](https://developers.google.com/machine-learning/fairness-overview/)."
]
},
{
"metadata": {
"id": "8tYsm8Gs03J4",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"print(train_data[0]) # Display sample features, notice the different scales"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Q7muNf-d1-ne",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Use the [pandas](https://pandas.pydata.org) library to display the first few rows of the dataset in a nicely formatted table:"
]
},
{
"metadata": {
"id": "pYVyGhdyCpIM",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"import pandas as pd\n",
"\n",
"column_names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',\n",
" 'TAX', 'PTRATIO', 'B', 'LSTAT']\n",
"\n",
"df = pd.DataFrame(train_data, columns=column_names)\n",
"df.head()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "wb9S7Mia2lpf",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Labels\n",
"\n",
"The labels are the house prices in thousands of dollars. (You may notice the mid-1970s prices.)"
]
},
{
"metadata": {
"id": "I8NwI2ND2t4Y",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"print(train_labels[0:10]) # Display first 10 entries"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "mRklxK5s388r",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Normalize features\n",
"\n",
"It's recommended to normalize features that use different scales and ranges. For each feature, subtract the mean of the feature and divide by the standard deviation:"
]
},
{
"metadata": {
"id": "ze5WQP8R1TYg",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"# Test data is *not* used when calculating the mean and std\n",
"\n",
"mean = train_data.mean(axis=0)\n",
"std = train_data.std(axis=0)\n",
"train_data = (train_data - mean) / std\n",
"test_data = (test_data - mean) / std\n",
"\n",
"print(train_data[0]) # First training sample, normalized"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "BuiClDk45eS4",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Although the model *might* converge without feature normalization, it makes training more difficult, and it makes the resulting model more dependent on the choice of units used in the input."
]
},
{
"metadata": {
"id": "SmjdzxKzEu1-",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Create the model\n",
"\n",
"Let's build our model. Here, we'll use a `Sequential` model with two densely connected hidden layers, and an output layer that returns a single, continuous value. The model building steps are wrapped in a function, `build_model`, since we'll create a second model, later on."
]
},
{
"metadata": {
"id": "c26juK7ZG8j-",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"def build_model():\n",
" model = keras.Sequential([\n",
" keras.layers.Dense(64, activation=tf.nn.relu,\n",
" input_shape=(train_data.shape[1],)),\n",
" keras.layers.Dense(64, activation=tf.nn.relu),\n",
" keras.layers.Dense(1)\n",
" ])\n",
"\n",
" optimizer = tf.train.RMSPropOptimizer(0.001)\n",
"\n",
" model.compile(loss='mse',\n",
" optimizer=optimizer,\n",
" metrics=['mae'])\n",
" return model\n",
"\n",
"model = build_model()\n",
"model.summary()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "0-qWCsh6DlyH",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Train the model\n",
"\n",
"The model is trained for 500 epochs, and record the training and validation accuracy in the `history` object."
]
},
{
"metadata": {
"id": "sD7qHCmNIOY0",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"# Display training progress by printing a single dot for each completed epoch\n",
"class PrintDot(keras.callbacks.Callback):\n",
" def on_epoch_end(self, epoch, logs):\n",
" if epoch % 100 == 0: print('')\n",
" print('.', end='')\n",
"\n",
"EPOCHS = 500\n",
"\n",
"# Store training stats\n",
"history = model.fit(train_data, train_labels, epochs=EPOCHS,\n",
" validation_split=0.2, verbose=0,\n",
" callbacks=[PrintDot()])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "tQm3pc0FYPQB",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Visualize the model's training progress using the stats stored in the `history` object. We want to use this data to determine how long to train *before* the model stops making progress."
]
},
{
"metadata": {
"id": "B6XriGbVPh2t",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"\n",
"def plot_history(history):\n",
" plt.figure()\n",
" plt.xlabel('Epoch')\n",
" plt.ylabel('Mean Abs Error [1000$]')\n",
" plt.plot(history.epoch, np.array(history.history['mean_absolute_error']),\n",
" label='Train Loss')\n",
" plt.plot(history.epoch, np.array(history.history['val_mean_absolute_error']),\n",
" label = 'Val loss')\n",
" plt.legend()\n",
" plt.ylim([0, 5])\n",
"\n",
"plot_history(history)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "AqsuANc11FYv",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This graph shows little improvement in the model after about 200 epochs. Let's update the `model.fit` method to automatically stop training when the validation score doesn't improve. We'll use a *callback* that tests a training condition for every epoch. If a set amount of epochs elapses without showing improvement, then automatically stop the training.\n",
"\n",
"You can learn more about this callback [here](https://www.tensorflow.org/versions/master/api_docs/python/tf/keras/callbacks/EarlyStopping)."
]
},
{
"metadata": {
"id": "fdMZuhUgzMZ4",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"model = build_model()\n",
"\n",
"# The patience parameter is the amount of epochs to check for improvement\n",
"early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)\n",
"\n",
"history = model.fit(train_data, train_labels, epochs=EPOCHS,\n",
" validation_split=0.2, verbose=0,\n",
" callbacks=[early_stop, PrintDot()])\n",
"\n",
"plot_history(history)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "3St8-DmrX8P4",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The graph shows the average error is about \\\\$2,500 dollars. Is this good? Well, \\$2,500 is not an insignificant amount when some of the labels are only $15,000.\n",
"\n",
"Let's see how did the model performs on the test set:"
]
},
{
"metadata": {
"id": "jl_yNr5n1kms",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"[loss, mae] = model.evaluate(test_data, test_labels, verbose=0)\n",
"\n",
"print(\"Testing set Mean Abs Error: ${:7.2f}\".format(mae * 1000))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "ft603OzXuEZC",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Predict\n",
"\n",
"Finally, predict some housing prices using data in the testing set:"
]
},
{
"metadata": {
"id": "Xe7RXH3N3CWU",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"test_predictions = model.predict(test_data).flatten()\n",
"\n",
"plt.scatter(test_labels, test_predictions)\n",
"plt.xlabel('True Values [1000$]')\n",
"plt.ylabel('Predictions [1000$]')\n",
"plt.axis('equal')\n",
"plt.xlim(plt.xlim())\n",
"plt.ylim(plt.ylim())\n",
"_ = plt.plot([-100, 100], [-100, 100])\n"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "f-OHX4DiXd8x",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"error = test_predictions - test_labels\n",
"plt.hist(error, bins = 50)\n",
"plt.xlabel(\"Prediction Error [1000$]\")\n",
"_ = plt.ylabel(\"Count\")"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "vgGQuV-yqYZH",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Conclusion\n",
"\n",
"This notebook introduced a few techniques to handle a regression problem.\n",
"\n",
"* Mean Squared Error (MSE) is a common loss function used for regression problems (different than classification problems).\n",
"* Similarly, evaluation metrics used for regression differ from classification. A common regression metric is Mean Absolute Error (MAE).\n",
"* When input data features have values with different ranges, each feature should be scaled independently.\n",
"* If there is not much training data, prefer a small network with few hidden layers to avoid overfitting.\n",
"* Early stopping is a useful technique to prevent overfitting."
]
} }
] ]
} }
\ No newline at end of file
...@@ -5,8 +5,6 @@ ...@@ -5,8 +5,6 @@
"colab": { "colab": {
"name": "basic-text-classification.ipynb", "name": "basic-text-classification.ipynb",
"version": "0.3.2", "version": "0.3.2",
"views": {},
"default_view": {},
"provenance": [], "provenance": [],
"private_outputs": true, "private_outputs": true,
"collapsed_sections": [], "collapsed_sections": [],
...@@ -32,12 +30,7 @@ ...@@ -32,12 +30,7 @@
"metadata": { "metadata": {
"id": "ioaprt5q5US7", "id": "ioaprt5q5US7",
"colab_type": "code", "colab_type": "code",
"colab": { "colab": {},
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"cellView": "form" "cellView": "form"
}, },
"cell_type": "code", "cell_type": "code",
...@@ -61,12 +54,7 @@ ...@@ -61,12 +54,7 @@
"metadata": { "metadata": {
"id": "yCl0eTNH5RS3", "id": "yCl0eTNH5RS3",
"colab_type": "code", "colab_type": "code",
"colab": { "colab": {},
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"cellView": "form" "cellView": "form"
}, },
"cell_type": "code", "cell_type": "code",
...@@ -106,6 +94,16 @@ ...@@ -106,6 +94,16 @@
"# Text classification with movie reviews" "# Text classification with movie reviews"
] ]
}, },
{
"metadata": {
"id": "NBM1gMpEGN_d",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This file has moved."
]
},
{ {
"metadata": { "metadata": {
"id": "hKY4XMc9o8iB", "id": "hKY4XMc9o8iB",
...@@ -118,699 +116,13 @@ ...@@ -118,699 +116,13 @@
" <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/keras/basic_text_classification\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n", " <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/keras/basic_text_classification\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/samples/core/tutorials/keras/basic_text_classification.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", " <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/keras/basic_text_classification.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/samples/core/tutorials/keras/basic_text_classification.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n", " <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/tutorials/keras/basic_text_classification.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n", " </td>\n",
"</table>" "</table>"
] ]
},
{
"metadata": {
"id": "Eg62Pmz3o83v",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"\n",
"This notebook classifies movie reviews as *positive* or *negative* using the text of the review. This is an example of *binary*—or two-class—classification, an important and widely applicable kind of machine learning problem. \n",
"\n",
"We'll use the [IMDB dataset](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb) that contains the text of 50,000 movie reviews from the [Internet Movie Database](https://www.imdb.com/). These are split into 25,000 reviews for training and 25,000 reviews for testing. The training and testing sets are *balanced*, meaning they contain an equal number of positive and negative reviews. \n",
"\n",
"This notebook uses [tf.keras](https://www.tensorflow.org/guide/keras), a high-level API to build and train models in TensorFlow. For a more advanced text classification tutorial using `tf.keras`, see the [MLCC Text Classification Guide](https://developers.google.com/machine-learning/guides/text-classification/)."
]
},
{
"metadata": {
"id": "2ew7HTbPpCJH",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"\n",
"import numpy as np\n",
"\n",
"print(tf.__version__)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "iAsKG535pHep",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Download the IMDB dataset\n",
"\n",
"The IMDB dataset comes packaged with TensorFlow. It has already been preprocessed such that the reviews (sequences of words) have been converted to sequences of integers, where each integer represents a specific word in a dictionary.\n",
"\n",
"The following code downloads the IMDB dataset to your machine (or uses a cached copy if you've already downloaded it):"
]
},
{
"metadata": {
"id": "zXXx5Oc3pOmN",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"imdb = keras.datasets.imdb\n",
"\n",
"(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "odr-KlzO-lkL",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The argument `num_words=10000` keeps the top 10,000 most frequently occurring words in the training data. The rare words are discarded to keep the size of the data manageable."
]
},
{
"metadata": {
"id": "l50X3GfjpU4r",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Explore the data \n",
"\n",
"Let's take a moment to understand the format of the data. The dataset comes preprocessed: each example is an array of integers representing the words of the movie review. Each label is an integer value of either 0 or 1, where 0 is a negative review, and 1 is a positive review."
]
},
{
"metadata": {
"id": "y8qCnve_-lkO",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"print(\"Training entries: {}, labels: {}\".format(len(train_data), len(train_labels)))"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "RnKvHWW4-lkW",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The text of reviews have been converted to integers, where each integer represents a specific word in a dictionary. Here's what the first review looks like:"
]
},
{
"metadata": {
"id": "QtTS4kpEpjbi",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"print(train_data[0])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "hIE4l_72x7DP",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Movie reviews may be different lengths. The below code shows the number of words in the first and second reviews. Since inputs to a neural network must be the same length, we'll need to resolve this later."
]
},
{
"metadata": {
"id": "X-6Ii9Pfx6Nr",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"len(train_data[0]), len(train_data[1])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "4wJg2FiYpuoX",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Convert the integers back to words\n",
"\n",
"It may be useful to know how to convert integers back to text. Here, we'll create a helper function to query a dictionary object that contains the integer to string mapping:"
]
},
{
"metadata": {
"id": "tr5s_1alpzop",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"# A dictionary mapping words to an integer index\n",
"word_index = imdb.get_word_index()\n",
"\n",
"# The first indices are reserved\n",
"word_index = {k:(v+3) for k,v in word_index.items()} \n",
"word_index[\"<PAD>\"] = 0\n",
"word_index[\"<START>\"] = 1\n",
"word_index[\"<UNK>\"] = 2 # unknown\n",
"word_index[\"<UNUSED>\"] = 3\n",
"\n",
"reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])\n",
"\n",
"def decode_review(text):\n",
" return ' '.join([reverse_word_index.get(i, '?') for i in text])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "U3CNRvEZVppl",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Now we can use the `decode_review` function to display the text for the first review:"
]
},
{
"metadata": {
"id": "s_OqxmH6-lkn",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"decode_review(train_data[0])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "lFP_XKVRp4_S",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Prepare the data\n",
"\n",
"The reviews—the arrays of integers—must be converted to tensors before fed into the neural network. This conversion can be done a couple of ways:\n",
"\n",
"* One-hot-encode the arrays to convert them into vectors of 0s and 1s. For example, the sequence [3, 5] would become a 10,000-dimensional vector that is all zeros except for indices 3 and 5, which are ones. Then, make this the first layer in our network—a Dense layer—that can handle floating point vector data. This approach is memory intensive, though, requiring a `num_words * num_reviews` size matrix.\n",
"\n",
"* Alternatively, we can pad the arrays so they all have the same length, then create an integer tensor of shape `num_examples * max_length`. We can use an embedding layer capable of handling this shape as the first layer in our network.\n",
"\n",
"In this tutorial, we will use the second approach. \n",
"\n",
"Since the movie reviews must be the same length, we will use the [pad_sequences](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/sequence/pad_sequences) function to standardize the lengths:"
]
},
{
"metadata": {
"id": "2jQv-omsHurp",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"train_data = keras.preprocessing.sequence.pad_sequences(train_data,\n",
" value=word_index[\"<PAD>\"],\n",
" padding='post',\n",
" maxlen=256)\n",
"\n",
"test_data = keras.preprocessing.sequence.pad_sequences(test_data,\n",
" value=word_index[\"<PAD>\"],\n",
" padding='post',\n",
" maxlen=256)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "VO5MBpyQdipD",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Let's look at the length of the examples now:"
]
},
{
"metadata": {
"id": "USSSBnkE-lky",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"len(train_data[0]), len(train_data[1])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "QJoxZGyfjT5V",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"And inspect the (now padded) first review:"
]
},
{
"metadata": {
"id": "TG8X9cqi-lk9",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"print(train_data[0])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "LLC02j2g-llC",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Build the model\n",
"\n",
"The neural network is created by stacking layers—this requires two main architectural decisions:\n",
"\n",
"* How many layers to use in the model?\n",
"* How many *hidden units* to use for each layer?\n",
"\n",
"In this example, the input data consists of an array of word-indices. The labels to predict are either 0 or 1. Let's build a model for this problem:"
]
},
{
"metadata": {
"id": "xpKOoWgu-llD",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"# input shape is the vocabulary count used for the movie reviews (10,000 words)\n",
"vocab_size = 10000\n",
"\n",
"model = keras.Sequential()\n",
"model.add(keras.layers.Embedding(vocab_size, 16))\n",
"model.add(keras.layers.GlobalAveragePooling1D())\n",
"model.add(keras.layers.Dense(16, activation=tf.nn.relu))\n",
"model.add(keras.layers.Dense(1, activation=tf.nn.sigmoid))\n",
"\n",
"model.summary()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "6PbKQ6mucuKL",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The layers are stacked sequentially to build the classifier:\n",
"\n",
"1. The first layer is an `Embedding` layer. This layer takes the integer-encoded vocabulary and looks up the embedding vector for each word-index. These vectors are learned as the model trains. The vectors add a dimension to the output array. The resulting dimensions are: `(batch, sequence, embedding)`.\n",
"2. Next, a `GlobalAveragePooling1D` layer returns a fixed-length output vector for each example by averaging over the sequence dimension. This allows the model can handle input of variable length, in the simplest way possible.\n",
"3. This fixed-length output vector is piped through a fully-connected (`Dense`) layer with 16 hidden units.\n",
"4. The last layer is densely connected with a single output node. Using the `sigmoid` activation function, this value is a float between 0 and 1, representing a probability, or confidence level."
]
},
{
"metadata": {
"id": "0XMwnDOp-llH",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Hidden units\n",
"\n",
"The above model has two intermediate or \"hidden\" layers, between the input and output. The number of outputs (units, nodes, or neurons) is the dimension of the representational space for the layer. In other words, the amount of freedom the network is allowed when learning an internal representation.\n",
"\n",
"If a model has more hidden units (a higher-dimensional representation space), and/or more layers, then the network can learn more complex representations. However, it makes the network more computationally expensive and may lead to learning unwanted patterns—patterns that improve performance on training data but not on the test data. This is called *overfitting*, and we'll explore it later."
]
},
{
"metadata": {
"id": "L4EqVWg4-llM",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Loss function and optimizer\n",
"\n",
"A model needs a loss function and an optimizer for training. Since this is a binary classification problem and the model outputs of a probability (a single-unit layer with a sigmoid activation), we'll use the `binary_crossentropy` loss function. \n",
"\n",
"This isn't the only choice for a loss function, you could, for instance, choose `mean_squared_error`. But, generally, `binary_crossentropy` is better for dealing with probabilities—it measures the \"distance\" between probability distributions, or in our case, between the ground-truth distribution and the predictions.\n",
"\n",
"Later, when we are exploring regression problems (say, to predict the price of a house), we will see how to use another loss function called mean squared error.\n",
"\n",
"Now, configure the model to use an optimizer and a loss function:"
]
},
{
"metadata": {
"id": "Mr0GP-cQ-llN",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"model.compile(optimizer=tf.train.AdamOptimizer(),\n",
" loss='binary_crossentropy',\n",
" metrics=['accuracy'])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "hCWYwkug-llQ",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Create a validation set\n",
"\n",
"When training, we want to check the accuracy of the model on data it hasn't seen before. Create a *validation set* by setting apart 10,000 examples from the original training data. (Why not use the testing set now? Our goal is to develop and tune our model using only the training data, then use the test data just once to evaluate our accuracy)."
]
},
{
"metadata": {
"id": "-NpcXY9--llS",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"x_val = train_data[:10000]\n",
"partial_x_train = train_data[10000:]\n",
"\n",
"y_val = train_labels[:10000]\n",
"partial_y_train = train_labels[10000:]"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "35jv_fzP-llU",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Train the model\n",
"\n",
"Train the model for 40 epochs in mini-batches of 512 samples. This is 40 iterations over all samples in the `x_train` and `y_train` tensors. While training, monitor the model's loss and accuracy on the 10,000 samples from the validation set:"
]
},
{
"metadata": {
"id": "tXSGrjWZ-llW",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"history = model.fit(partial_x_train,\n",
" partial_y_train,\n",
" epochs=40,\n",
" batch_size=512,\n",
" validation_data=(x_val, y_val),\n",
" verbose=1)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "9EEGuDVuzb5r",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Evaluate the model\n",
"\n",
"And let's see how the model performs. Two values will be returned. Loss (a number which represents our error, lower values are better), and accuracy."
]
},
{
"metadata": {
"id": "zOMKywn4zReN",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"results = model.evaluate(test_data, test_labels)\n",
"\n",
"print(results)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "z1iEXVTR0Z2t",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This fairly naive approach achieves an accuracy of about 87%. WIth more advanced approaches, the model should get closer to 95%."
]
},
{
"metadata": {
"id": "5KggXVeL-llZ",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Create a graph of accuracy and loss over time\n",
"\n",
"`model.fit()` returns a `History` object that contains a dictionary with everything that happened during training:"
]
},
{
"metadata": {
"id": "VcvSXvhp-llb",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"history_dict = history.history\n",
"history_dict.keys()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "nRKsqL40-lle",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"There are four entries: one for each monitored metric during training and validation. We can use these to plot the training and validation loss for comparison, as well as the training and validation accuracy:"
]
},
{
"metadata": {
"id": "nGoYf2Js-lle",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"acc = history.history['acc']\n",
"val_acc = history.history['val_acc']\n",
"loss = history.history['loss']\n",
"val_loss = history.history['val_loss']\n",
"\n",
"epochs = range(1, len(acc) + 1)\n",
"\n",
"# \"bo\" is for \"blue dot\"\n",
"plt.plot(epochs, loss, 'bo', label='Training loss')\n",
"# b is for \"solid blue line\"\n",
"plt.plot(epochs, val_loss, 'b', label='Validation loss')\n",
"plt.title('Training and validation loss')\n",
"plt.xlabel('Epochs')\n",
"plt.ylabel('Loss')\n",
"plt.legend()\n",
"\n",
"plt.show()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "6hXx-xOv-llh",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"plt.clf() # clear figure\n",
"acc_values = history_dict['acc']\n",
"val_acc_values = history_dict['val_acc']\n",
"\n",
"plt.plot(epochs, acc, 'bo', label='Training acc')\n",
"plt.plot(epochs, val_acc, 'b', label='Validation acc')\n",
"plt.title('Training and validation accuracy')\n",
"plt.xlabel('Epochs')\n",
"plt.ylabel('Accuracy')\n",
"plt.legend()\n",
"\n",
"plt.show()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "oFEmZ5zq-llk",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"\n",
"In this plot, the dots represent the training loss and accuracy, and the solid lines are the validation loss and accuracy.\n",
"\n",
"Notice the training loss *decreases* with each epoch and the training accuracy *increases* with each epoch. This is expected when using a gradient descent optimization—it should minimize the desired quantity on every iteration.\n",
"\n",
"This isn't the case for the validation loss and accuracy—they seem to peak after about twenty epochs. This is an example of overfitting: the model performs better on the training data than it does on data it has never seen before. After this point, the model over-optimizes and learns representations *specific* to the training data that do not *generalize* to test data.\n",
"\n",
"For this particular case, we could prevent overfitting by simply stopping the training after twenty or so epochs. Later, you'll see how to do this automatically with a callback."
]
} }
] ]
} }
\ No newline at end of file
...@@ -5,8 +5,6 @@ ...@@ -5,8 +5,6 @@
"colab": { "colab": {
"name": "overfit-and-underfit.ipynb", "name": "overfit-and-underfit.ipynb",
"version": "0.3.2", "version": "0.3.2",
"views": {},
"default_view": {},
"provenance": [], "provenance": [],
"private_outputs": true, "private_outputs": true,
"collapsed_sections": [ "collapsed_sections": [
...@@ -35,12 +33,7 @@ ...@@ -35,12 +33,7 @@
"metadata": { "metadata": {
"id": "lzyBOpYMdp3F", "id": "lzyBOpYMdp3F",
"colab_type": "code", "colab_type": "code",
"colab": { "colab": {},
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"cellView": "form" "cellView": "form"
}, },
"cell_type": "code", "cell_type": "code",
...@@ -64,12 +57,7 @@ ...@@ -64,12 +57,7 @@
"metadata": { "metadata": {
"id": "m_x4KfSJ7Vt7", "id": "m_x4KfSJ7Vt7",
"colab_type": "code", "colab_type": "code",
"colab": { "colab": {},
"autoexec": {
"startup": false,
"wait_interval": 0
}
},
"cellView": "form" "cellView": "form"
}, },
"cell_type": "code", "cell_type": "code",
...@@ -109,6 +97,16 @@ ...@@ -109,6 +97,16 @@
"# Explore overfitting and underfitting" "# Explore overfitting and underfitting"
] ]
}, },
{
"metadata": {
"id": "Xy5k1IC0GrV9",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"This file has moved."
]
},
{ {
"metadata": { "metadata": {
"id": "kRTxFhXAlnl1", "id": "kRTxFhXAlnl1",
...@@ -121,641 +119,13 @@ ...@@ -121,641 +119,13 @@
" <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/keras/overfit_and_underfit\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n", " <a target=\"_blank\" href=\"https://www.tensorflow.org/tutorials/keras/overfit_and_underfit\"><img src=\"https://www.tensorflow.org/images/tf_logo_32px.png\" />View on TensorFlow.org</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/samples/core/tutorials/keras/overfit_and_underfit.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n", " <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/keras/overfit_and_underfit.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" </td>\n", " </td>\n",
" <td>\n", " <td>\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/samples/core/tutorials/keras/overfit_and_underfit.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n", " <a target=\"_blank\" href=\"https://github.com/tensorflow/docs/blob/master/site/en/tutorials/keras/overfit_and_underfit.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
" </td>\n", " </td>\n",
"</table>" "</table>"
] ]
},
{
"metadata": {
"id": "19rPukKZsPG6",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"As always, the code in this example will use the `tf.keras` API, which you can learn more about in the TensorFlow [Keras guide](https://www.tensorflow.org/guide/keras).\n",
"\n",
"In both of the previous examples—classifying movie reviews, and predicting housing prices—we saw that the accuracy of our model on the validation data would peak after training for a number of epochs, and would then start decreasing. \n",
"\n",
"In other words, our model would *overfit* to the training data. Learning how to deal with overfitting is important. Although it's often possible to achieve high accuracy on the *training set*, what we really want is to develop models that generalize well to a *testing data* (or data they haven't seen before).\n",
"\n",
"The opposite of overfitting is *underfitting*. Underfitting occurs when there is still room for improvement on the test data. This can happen for a number of reasons: If the model is not powerful enough, is over-regularized, or has simply not been trained long enough. This means the network has not learned the relevant patterns in the training data. \n",
"\n",
"If you train for too long though, the model will start to overfit and learn patterns from the training data that don't generalize to the test data. We need to strike a balance. Understanding how to train for an appropriate number of epochs as we'll explore below is a useful skill.\n",
"\n",
"To prevent overfitting, the best solution is to use more training data. A model trained on more data will naturally generalize better. When that is no longer possible, the next best solution is to use techniques like regularization. These place constraints on the quantity and type of information your model can store. If a network can only afford to memorize a small number of patterns, the optimization process will force it to focus on the most prominent patterns, which have a better chance of generalizing well.\n",
"\n",
"In this notebook, we'll explore two common regularization techniques—weight regularization and dropout—and use them to improve our IMDB movie review classification notebook."
]
},
{
"metadata": {
"id": "5pZ8A2liqvgk",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"print(tf.__version__)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "1cweoTiruj8O",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Download the IMDB dataset\n",
"\n",
"Rather than using an embedding as in the previous notebook, here we will multi-hot encode the sentences. This model will quickly overfit to the training set. It will be used to demonstrate when overfitting occurs, and how to fight it. \n",
"\n",
"Multi-hot-encoding our lists means turning them into vectors of 0s and 1s. Concretely, this would mean for instance turning the sequence `[3, 5]` into a 10,000-dimensional vector that would be all-zeros except for indices 3 and 5, which would be ones. "
]
},
{
"metadata": {
"id": "QpzE4iqZtJly",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"NUM_WORDS = 10000\n",
"\n",
"(train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words=NUM_WORDS)\n",
"\n",
"def multi_hot_sequences(sequences, dimension):\n",
" # Create an all-zero matrix of shape (len(sequences), dimension)\n",
" results = np.zeros((len(sequences), dimension))\n",
" for i, word_indices in enumerate(sequences):\n",
" results[i, word_indices] = 1.0 # set specific indices of results[i] to 1s\n",
" return results\n",
"\n",
"\n",
"train_data = multi_hot_sequences(train_data, dimension=NUM_WORDS)\n",
"test_data = multi_hot_sequences(test_data, dimension=NUM_WORDS)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "MzWVeXe3NBTn",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Let's look at one of the resulting multi-hot vectors. The word indices are sorted by frequency, so it is expected that there are more 1-values near index zero, as we can see in this plot:"
]
},
{
"metadata": {
"id": "71kr5rG4LkGM",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"plt.plot(train_data[0])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "lglk41MwvU5o",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Demonstrate overfitting\n",
"\n",
"The simplest way to prevent overfitting is to reduce the size of the model, i.e. the number of learnable parameters in the model (which is determined by the number of layers and the number of units per layer). In deep learning, the number of learnable parameters in a model is often referred to as the model's \"capacity\". Intuitively, a model with more parameters will have more \"memorization capacity\" and therefore will be able to easily learn a perfect dictionary-like mapping between training samples and their targets, a mapping without any generalization power, but this would be useless when making predictions on previously unseen data. \n",
"\n",
"Always keep this in mind: deep learning models tend to be good at fitting to the training data, but the real challenge is generalization, not fitting.\n",
"\n",
"On the other hand, if the network has limited memorization resources, it will not be able to learn the mapping as easily. To minimize its loss, it will have to learn compressed representations that have more predictive power. At the same time, if you make your model too small, it will have difficulty fitting to the training data. There is a balance between \"too much capacity\" and \"not enough capacity\".\n",
"\n",
"Unfortunately, there is no magical formula to determine the right size or architecture of your model (in terms of the number of layers, or what the right size for each layer). You will have to experiment using a series of different architectures.\n",
"\n",
"To find an appropriate model size, it's best to start with relatively few layers and parameters, then begin increasing the size of the layers or adding new layers until you see diminishing returns on the validation loss. Let's try this on our movie review classification network. \n",
"\n",
"We'll create a simple model using only ```Dense``` layers as a baseline, then create smaller and larger versions, and compare them."
]
},
{
"metadata": {
"id": "_ReKHdC2EgVu",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Create a baseline model"
]
},
{
"metadata": {
"id": "QKgdXPx9usBa",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"baseline_model = keras.Sequential([\n",
" # `input_shape` is only required here so that `.summary` works. \n",
" keras.layers.Dense(16, activation=tf.nn.relu, input_shape=(NUM_WORDS,)),\n",
" keras.layers.Dense(16, activation=tf.nn.relu),\n",
" keras.layers.Dense(1, activation=tf.nn.sigmoid)\n",
"])\n",
"\n",
"baseline_model.compile(optimizer='adam',\n",
" loss='binary_crossentropy',\n",
" metrics=['accuracy', 'binary_crossentropy'])\n",
"\n",
"baseline_model.summary()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "LqG3MXF5xSjR",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"baseline_history = baseline_model.fit(train_data,\n",
" train_labels,\n",
" epochs=20,\n",
" batch_size=512,\n",
" validation_data=(test_data, test_labels),\n",
" verbose=2)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "L-DGRBbGxI6G",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Create a smaller model"
]
},
{
"metadata": {
"id": "SrfoVQheYSO5",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Let's create a model with less hidden units to compare against the baseline model that we just created:"
]
},
{
"metadata": {
"id": "jksi-XtaxDAh",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"smaller_model = keras.Sequential([\n",
" keras.layers.Dense(4, activation=tf.nn.relu, input_shape=(NUM_WORDS,)),\n",
" keras.layers.Dense(4, activation=tf.nn.relu),\n",
" keras.layers.Dense(1, activation=tf.nn.sigmoid)\n",
"])\n",
"\n",
"smaller_model.compile(optimizer='adam',\n",
" loss='binary_crossentropy',\n",
" metrics=['accuracy', 'binary_crossentropy'])\n",
"\n",
"smaller_model.summary()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "jbngCZliYdma",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"And train the model using the same data:"
]
},
{
"metadata": {
"id": "Ofn1AwDhx-Fe",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"smaller_history = smaller_model.fit(train_data,\n",
" train_labels,\n",
" epochs=20,\n",
" batch_size=512,\n",
" validation_data=(test_data, test_labels),\n",
" verbose=2)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "vIPuf23FFaVn",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Create a bigger model\n",
"\n",
"As an exercise, you can create an even larger model, and see how quickly it begins overfitting. Next, let's add to this benchmark a network that has much more capacity, far more than the problem would warrant:"
]
},
{
"metadata": {
"id": "ghQwwqwqvQM9",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"bigger_model = keras.models.Sequential([\n",
" keras.layers.Dense(512, activation=tf.nn.relu, input_shape=(NUM_WORDS,)),\n",
" keras.layers.Dense(512, activation=tf.nn.relu),\n",
" keras.layers.Dense(1, activation=tf.nn.sigmoid)\n",
"])\n",
"\n",
"bigger_model.compile(optimizer='adam',\n",
" loss='binary_crossentropy',\n",
" metrics=['accuracy','binary_crossentropy'])\n",
"\n",
"bigger_model.summary()"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "D-d-i5DaYmr7",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"And, again, train the model using the same data:"
]
},
{
"metadata": {
"id": "U1A99dhqvepf",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"bigger_history = bigger_model.fit(train_data, train_labels,\n",
" epochs=20,\n",
" batch_size=512,\n",
" validation_data=(test_data, test_labels),\n",
" verbose=2)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Fy3CMUZpzH3d",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Plot the training and validation loss\n",
"\n",
"<!--TODO(markdaoust): This should be a one-liner with tensorboard -->"
]
},
{
"metadata": {
"id": "HSlo1F4xHuuM",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"The solid lines show the training loss, and the dashed lines show the validation loss (remember: a lower validation loss indicates a better model). Here, the smaller network begins overfitting later than the baseline model (after 6 epochs rather than 4) and its performance degrades much more slowly once it starts overfitting. "
]
},
{
"metadata": {
"id": "0XmKDtOWzOpk",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"def plot_history(histories, key='binary_crossentropy'):\n",
" plt.figure(figsize=(16,10))\n",
" \n",
" for name, history in histories:\n",
" val = plt.plot(history.epoch, history.history['val_'+key],\n",
" '--', label=name.title()+' Val')\n",
" plt.plot(history.epoch, history.history[key], color=val[0].get_color(),\n",
" label=name.title()+' Train')\n",
"\n",
" plt.xlabel('Epochs')\n",
" plt.ylabel(key.replace('_',' ').title())\n",
" plt.legend()\n",
"\n",
" plt.xlim([0,max(history.epoch)])\n",
"\n",
"\n",
"plot_history([('baseline', baseline_history),\n",
" ('smaller', smaller_history),\n",
" ('bigger', bigger_history)])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Bi6hBhdnSfjA",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Notice that the larger network begins overfitting almost right away, after just one epoch, and overfits much more severely. The more capacity the network has, the quicker it will be able to model the training data (resulting in a low training loss), but the more susceptible it is to overfitting (resulting in a large difference between the training and validation loss)."
]
},
{
"metadata": {
"id": "ASdv7nsgEFhx",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Strategies"
]
},
{
"metadata": {
"id": "4rHoVWcswFLa",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Add weight regularization\n",
"\n"
]
},
{
"metadata": {
"id": "kRxWepNawbBK",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"You may be familiar with Occam's Razor principle: given two explanations for something, the explanation most likely to be correct is the \"simplest\" one, the one that makes the least amount of assumptions. This also applies to the models learned by neural networks: given some training data and a network architecture, there are multiple sets of weights values (multiple models) that could explain the data, and simpler models are less likely to overfit than complex ones.\n",
"\n",
"A \"simple model\" in this context is a model where the distribution of parameter values has less entropy (or a model with fewer parameters altogether, as we saw in the section above). Thus a common way to mitigate overfitting is to put constraints on the complexity of a network by forcing its weights only to take small values, which makes the distribution of weight values more \"regular\". This is called \"weight regularization\", and it is done by adding to the loss function of the network a cost associated with having large weights. This cost comes in two flavors:\n",
"\n",
"* L1 regularization, where the cost added is proportional to the absolute value of the weights coefficients (i.e. to what is called the \"L1 norm\" of the weights).\n",
"\n",
"* L2 regularization, where the cost added is proportional to the square of the value of the weights coefficients (i.e. to what is called the \"L2 norm\" of the weights). L2 regularization is also called weight decay in the context of neural networks. Don't let the different name confuse you: weight decay is mathematically the exact same as L2 regularization.\n",
"\n",
"In `tf.keras`, weight regularization is added by passing weight regularizer instances to layers as keyword arguments. Let's add L2 weight regularization now."
]
},
{
"metadata": {
"id": "HFGmcwduwVyQ",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"l2_model = keras.models.Sequential([\n",
" keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),\n",
" activation=tf.nn.relu, input_shape=(NUM_WORDS,)),\n",
" keras.layers.Dense(16, kernel_regularizer=keras.regularizers.l2(0.001),\n",
" activation=tf.nn.relu),\n",
" keras.layers.Dense(1, activation=tf.nn.sigmoid)\n",
"])\n",
"\n",
"l2_model.compile(optimizer='adam',\n",
" loss='binary_crossentropy',\n",
" metrics=['accuracy', 'binary_crossentropy'])\n",
"\n",
"l2_model_history = l2_model.fit(train_data, train_labels,\n",
" epochs=20,\n",
" batch_size=512,\n",
" validation_data=(test_data, test_labels),\n",
" verbose=2)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "bUUHoXb7w-_C",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"```l2(0.001)``` means that every coefficient in the weight matrix of the layer will add ```0.001 * weight_coefficient_value``` to the total loss of the network. Note that because this penalty is only added at training time, the loss for this network will be much higher at training than at test time.\n",
"\n",
"Here's the impact of our L2 regularization penalty:"
]
},
{
"metadata": {
"id": "7wkfLyxBZdh_",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"plot_history([('baseline', baseline_history),\n",
" ('l2', l2_model_history)])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "Kx1YHMsVxWjP",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"As you can see, the L2 regularized model has become much more resistant to overfitting than the baseline model, even though both models have the same number of parameters."
]
},
{
"metadata": {
"id": "HmnBNOOVxiG8",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Add dropout\n",
"\n",
"Dropout is one of the most effective and most commonly used regularization techniques for neural networks, developed by Hinton and his students at the University of Toronto. Dropout, applied to a layer, consists of randomly \"dropping out\" (i.e. set to zero) a number of output features of the layer during training. Let's say a given layer would normally have returned a vector [0.2, 0.5, 1.3, 0.8, 1.1] for a given input sample during training; after applying dropout, this vector will have a few zero entries distributed at random, e.g. [0, 0.5, \n",
"1.3, 0, 1.1]. The \"dropout rate\" is the fraction of the features that are being zeroed-out; it is usually set between 0.2 and 0.5. At test time, no units are dropped out, and instead the layer's output values are scaled down by a factor equal to the dropout rate, so as to balance for the fact that more units are active than at training time.\n",
"\n",
"In tf.keras you can introduce dropout in a network via the Dropout layer, which gets applied to the output of layer right before.\n",
"\n",
"Let's add two Dropout layers in our IMDB network to see how well they do at reducing overfitting:"
]
},
{
"metadata": {
"id": "OFEYvtrHxSWS",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"dpt_model = keras.models.Sequential([\n",
" keras.layers.Dense(16, activation=tf.nn.relu, input_shape=(NUM_WORDS,)),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(16, activation=tf.nn.relu),\n",
" keras.layers.Dropout(0.5),\n",
" keras.layers.Dense(1, activation=tf.nn.sigmoid)\n",
"])\n",
"\n",
"dpt_model.compile(optimizer='adam',\n",
" loss='binary_crossentropy',\n",
" metrics=['accuracy','binary_crossentropy'])\n",
"\n",
"dpt_model_history = dpt_model.fit(train_data, train_labels,\n",
" epochs=20,\n",
" batch_size=512,\n",
" validation_data=(test_data, test_labels),\n",
" verbose=2)"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "SPZqwVchx5xp",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"plot_history([('baseline', baseline_history),\n",
" ('dropout', dpt_model_history)])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "gjfnkEeQyAFG",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Adding dropout is a clear improvement over the baseline model. \n",
"\n",
"\n",
"To recap: here the most common ways to prevent overfitting in neural networks:\n",
"\n",
"* Get more training data.\n",
"* Reduce the capacity of the network.\n",
"* Add weight regularization.\n",
"* Add dropout.\n",
"\n",
"And two important approaches not covered in this guide are data-augmentation and batch normalization."
]
} }
] ]
} }
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment