"This tutorial describes how to use machine learning to *categorize* Iris flowers by species. It uses [TensorFlow](https://www.tensorflow.org)'s eager execution to (1) build a *model*, (2) *train* the model on example data, and (3) use the model to make *predictions* on unknown data. Machine learning experience isn't required to follow this guide, but you'll need to read some Python code.\n",
"This guide uses machine learning to *categorize* Iris flowers by species. It uses [TensorFlow](https://www.tensorflow.org)'s eager execution to:\n",
"1. Build a *model*,\n",
"2. *Train* this model on example data, and\n",
"3. Use the model to make *predictions* about unknown data.\n",
"\n",
"Machine learning experience isn't required, but you'll need to read some Python code.\n",
"\n",
"\n",
"## TensorFlow programming\n",
"## TensorFlow programming\n",
"\n",
"\n",
...
@@ -95,7 +98,7 @@
...
@@ -95,7 +98,7 @@
"* Import data with the [Datasets API](https://www.tensorflow.org/programmers_guide/datasets),\n",
"* Import data with the [Datasets API](https://www.tensorflow.org/programmers_guide/datasets),\n",
"* Build models and layers with TensorFlow's [Keras API](https://keras.io/getting-started/sequential-model-guide/).\n",
"* Build models and layers with TensorFlow's [Keras API](https://keras.io/getting-started/sequential-model-guide/).\n",
"\n",
"\n",
"This tutorial shows these APIs and is structured like many other TensorFlow programs:\n",
"This tutorial is structured like many TensorFlow programs:\n",
"\n",
"\n",
"1. Import and parse the data sets.\n",
"1. Import and parse the data sets.\n",
"2. Select the type of model.\n",
"2. Select the type of model.\n",
...
@@ -103,11 +106,13 @@
...
@@ -103,11 +106,13 @@
"4. Evaluate the model's effectiveness.\n",
"4. Evaluate the model's effectiveness.\n",
"5. Use the trained model to make predictions.\n",
"5. Use the trained model to make predictions.\n",
"\n",
"\n",
"To learn more about using TensorFlow, see the [Getting Started guide](https://www.tensorflow.org/get_started/) and the [example tutorials](https://www.tensorflow.org/tutorials/). If you'd like to learn about the basics of machine learning, consider taking the [Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/).\n",
"For more TensorFlow examples, see the [Get Started](https://www.tensorflow.org/get_started/) and [Tutorials](https://www.tensorflow.org/tutorials/) sections. To learn machine learning basics, consider taking the [Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/).\n",
"\n",
"\n",
"## Run the notebook\n",
"## Run the notebook\n",
"\n",
"\n",
"This tutorial is available as an interactive [Colab notebook](https://colab.research.google.com) for you to run and change the Python code directly in the browser. The notebook handles setup and dependencies while you \"play\" cells to execute the code blocks. This is a fun way to explore the program and test ideas. If you are unfamiliar with Python notebook environments, there are a couple of things to keep in mind:\n",
"This tutorial is available as an interactive [Colab notebook](https://colab.research.google.com) that can execute and modify Python code directly in the browser. The notebook handles setup and dependencies while you \"play\" cells to run the code blocks. This is a fun way to explore the program and test ideas.\n",
"\n",
"If you are unfamiliar with Python notebook environments, there are a couple of things to keep in mind:\n",
"\n",
"\n",
"1. Executing code requires connecting to a runtime environment. In the Colab notebook menu, select *Runtime > Connect to runtime...*\n",
"1. Executing code requires connecting to a runtime environment. In the Colab notebook menu, select *Runtime > Connect to runtime...*\n",
"2. Notebook cells are arranged sequentially to gradually build the program. Typically, later code cells depend on prior code cells, though you can always rerun a code block. To execute the entire notebook in order, select *Runtime > Run all*. To rerun a code cell, select the cell and click the *play icon* on the left."
"2. Notebook cells are arranged sequentially to gradually build the program. Typically, later code cells depend on prior code cells, though you can always rerun a code block. To execute the entire notebook in order, select *Runtime > Run all*. To rerun a code cell, select the cell and click the *play icon* on the left."
...
@@ -162,7 +167,7 @@
...
@@ -162,7 +167,7 @@
"source": [
"source": [
"### Configure imports and eager execution\n",
"### Configure imports and eager execution\n",
"\n",
"\n",
"Import the required Python modules, including TensorFlow, and enable eager execution for this program. Eager execution makes TensorFlow evaluate operations immediately, returning concrete values instead of creating a [computational graph](https://www.tensorflow.org/programmers_guide/graphs) that is executed later. If you are used to a REPL or the `python` interactive console, you'll feel at home.\n",
"Import the required Python modules—including TensorFlow—and enable eager execution for this program. Eager execution makes TensorFlow evaluate operations immediately, returning concrete values instead of creating a [computational graph](https://www.tensorflow.org/programmers_guide/graphs) that is executed later. If you are used to a REPL or the `python` interactive console, this feels familiar.\n",
"\n",
"\n",
"Once eager execution is enabled, it *cannot* be disabled within the same program. See the [eager execution guide](https://www.tensorflow.org/programmers_guide/eager) for more details."
"Once eager execution is enabled, it *cannot* be disabled within the same program. See the [eager execution guide](https://www.tensorflow.org/programmers_guide/eager) for more details."
]
]
...
@@ -207,7 +212,7 @@
...
@@ -207,7 +212,7 @@
"\n",
"\n",
"Imagine you are a botanist seeking an automated way to categorize each Iris flower you find. Machine learning provides many algorithms to statistically classify flowers. For instance, a sophisticated machine learning program could classify flowers based on photographs. Our ambitions are more modest—we're going to classify Iris flowers based on the length and width measurements of their [sepals](https://en.wikipedia.org/wiki/Sepal) and [petals](https://en.wikipedia.org/wiki/Petal).\n",
"Imagine you are a botanist seeking an automated way to categorize each Iris flower you find. Machine learning provides many algorithms to statistically classify flowers. For instance, a sophisticated machine learning program could classify flowers based on photographs. Our ambitions are more modest—we're going to classify Iris flowers based on the length and width measurements of their [sepals](https://en.wikipedia.org/wiki/Sepal) and [petals](https://en.wikipedia.org/wiki/Petal).\n",
"\n",
"\n",
"The Iris genus entails about 300 species, but our program will classify only the following three:\n",
"The Iris genus entails about 300 species, but our program will only classify the following three:\n",
"\n",
"\n",
"* Iris setosa\n",
"* Iris setosa\n",
"* Iris virginica\n",
"* Iris virginica\n",
...
@@ -235,7 +240,7 @@
...
@@ -235,7 +240,7 @@
"source": [
"source": [
"## Import and parse the training dataset\n",
"## Import and parse the training dataset\n",
"\n",
"\n",
"We need to download the dataset file and convert it to a structure that can be used by this Python program.\n",
"Download the dataset file and convert it to a structure that can be used by this Python program.\n",
"\n",
"\n",
"### Download the dataset\n",
"### Download the dataset\n",
"\n",
"\n",
...
@@ -302,7 +307,7 @@
...
@@ -302,7 +307,7 @@
},
},
"cell_type": "markdown",
"cell_type": "markdown",
"source": [
"source": [
"From this view of the dataset, we see the following:\n",
"From this view of the dataset, notice the following:\n",
"\n",
"\n",
"1. The first line is a header containing information about the dataset:\n",
"1. The first line is a header containing information about the dataset:\n",
" * There are 120 total examples. Each example has four features and one of three possible label names. \n",
" * There are 120 total examples. Each example has four features and one of three possible label names. \n",
...
@@ -310,6 +315,41 @@
...
@@ -310,6 +315,41 @@
" * The first four fields are *[features](https://developers.google.com/machine-learning/glossary/#feature)*: these are characteristics of an example. Here, the fields hold float numbers representing flower measurements.\n",
" * The first four fields are *[features](https://developers.google.com/machine-learning/glossary/#feature)*: these are characteristics of an example. Here, the fields hold float numbers representing flower measurements.\n",
" * The last column is the *[label](https://developers.google.com/machine-learning/glossary/#label)*: this is the value we want to predict. For this dataset, it's an integer value of 0, 1, or 2 that corresponds to a flower name.\n",
" * The last column is the *[label](https://developers.google.com/machine-learning/glossary/#label)*: this is the value we want to predict. For this dataset, it's an integer value of 0, 1, or 2 that corresponds to a flower name.\n",
"Each label is associated with string name (for example, \"setosa\"), but machine learning typically relies on numeric values. The label numbers are mapped to a named representation, such as:\n",
"Each label is associated with string name (for example, \"setosa\"), but machine learning typically relies on numeric values. The label numbers are mapped to a named representation, such as:\n",
"\n",
"\n",
"* `0`: Iris setosa\n",
"* `0`: Iris setosa\n",
...
@@ -319,6 +359,24 @@
...
@@ -319,6 +359,24 @@
"For more information about features and labels, see the [ML Terminology section of the Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/framing/ml-terminology)."
"For more information about features and labels, see the [ML Terminology section of the Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/framing/ml-terminology)."
"Since our dataset is a CSV-formatted text file, we'll parse the feature and label values into a format our Python model can use. Each line—or row—in the file is passed to the `parse_csv` function which grabs the first four feature fields and combines them into a single tensor. Then, the last field is parsed as the label. The function returns *both* the `features` and `label` tensors:"
"TensorFlow's [Dataset API](https://www.tensorflow.org/programmers_guide/datasets) handles many common cases for loading data into a model. This is a high-level API for reading data and transforming it into a form used for training. See the [Datasets Quick Start guide](https://www.tensorflow.org/get_started/datasets_quickstart) for more information.\n",
"\n",
"\n",
"Since the dataset is a CSV-formatted text file, use the the [make_csv_dataset](https://www.tensorflow.org/api_docs/python/tf/contrib/data/make_csv_dataset) function to parse the data into a suitable format. Since this function generates data for training models, the default behavior is to shuffle the data (`shuffle=True, shuffle_buffer_size=10000`), and repeat the dataset forever (`num_epochs=None`). We also set the [batch_size](https://developers.google.com/machine-learning/glossary/#batch_size) parameter."
"The `make_csv_dataset` function returns a `tf.data.Dataset` of `(features, label)` pairs, where `features` is a dictionary: `{'feature_name': value}`\n",
"\n",
"\n",
"TensorFlow's [Dataset API](https://www.tensorflow.org/programmers_guide/datasets) handles many common cases for feeding data into a model. This is a high-level API for reading data and transforming it into a form used for training. See the [Datasets Quick Start guide](https://www.tensorflow.org/get_started/datasets_quickstart) for more information.\n",
"With eager execution enabled, these `Dataset` objects are iterable. Let's look at a batch of features:"
]
},
{
"metadata": {
"id": "iDuG94H-C122",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"features, labels = next(iter(train_dataset))\n",
"\n",
"\n",
"This program uses [tf.data.TextLineDataset](https://www.tensorflow.org/api_docs/python/tf/data/TextLineDataset) to load a CSV-formatted text file and is parsed with our `parse_csv` function. A [tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) represents an input pipeline as a collection of elements and a series of transformations that act on those elements. Transformation methods are chained together or called sequentially—just make sure to keep a reference to the returned `Dataset` object.\n",
"features"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "E63mArnQaAGz",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Notice that like-features are grouped together, or *batched*. Each example row's fields are appended to the corresponding feature array. Change the `batch_size` to set the number of examples stored in these feature arrays.\n",
"\n",
"\n",
"Training works best if the examples are in random order. Use `tf.data.Dataset.shuffle` to randomize entries, setting `buffer_size` to a value larger than the number of examples (120 in this case). To train the model faster, the dataset's *[batch size](https://developers.google.com/machine-learning/glossary/#batch_size)* is set to `32` examples to train at once."
"You can start to see some clusters by plotting a few features from the batch:"
"To simplify the model building step, create a function to repackage the features dictionary into a single array with shape: `(batch_size, num_features)`.\n",
"\n",
"This function uses the [tf.stack](https://www.tensorflow.org/api_docs/python/tf/stack) method which takes values from a list of tensors and creates a combined tensor at the specified dimension."
]
},
{
"metadata": {
"id": "jm932WINcaGU",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"def pack_features_vector(features, labels):\n",
" \"\"\"Pack the features into a single array.\"\"\"\n",
" features = tf.stack(list(features.values()), axis=1)\n",
" return features, labels"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "V1Vuph_eDl8x",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Then use the [tf.data.Dataset.map](https://www.tensorflow.org/api_docs/python/tf/data/dataset/map) method to pack the `features` of each `(features,label)` pair into the training dataset:"
"The features element of the `Dataset` are now arrays with shape `(batch_size, num_features)`. Let's look at the first few examples:"
]
},
{
"metadata": {
"id": "kex9ibEek6Tr",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"features, labels = next(iter(train_dataset))\n",
"\n",
"print(features[:5])"
],
],
"execution_count": 0,
"execution_count": 0,
"outputs": []
"outputs": []
...
@@ -440,9 +617,9 @@
...
@@ -440,9 +617,9 @@
"source": [
"source": [
"### Create a model using Keras\n",
"### Create a model using Keras\n",
"\n",
"\n",
"The TensorFlow [tf.keras](https://www.tensorflow.org/api_docs/python/tf/keras) API is the preferred way to create models and layers. This makes it easy to build models and experiment while Keras handles the complexity of connecting everything together. See the [Keras documentation](https://keras.io/) for details.\n",
"The TensorFlow [tf.keras](https://www.tensorflow.org/api_docs/python/tf/keras) API is the preferred way to create models and layers. This makes it easy to build models and experiment while Keras handles the complexity of connecting everything together.\n",
"\n",
"\n",
"The [tf.keras.Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) model is a linear stack of layers. Its constructor takes a list of layer instances, in this case, two [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layers with 10 nodes each, and an output layer with 3 nodes representing our label predictions. The first layer's `input_shape` parameter corresponds to the amount of features from the dataset, and is required."
"The [tf.keras.Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) model is a linear stack of layers. Its constructor takes a list of layer instances, in this case, two [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layers with 10 nodes each, and an output layer with 3 nodes representing our label predictions. The first layer's `input_shape` parameter corresponds to the number of features from the dataset, and is required."
]
]
},
},
{
{
...
@@ -474,11 +651,101 @@
...
@@ -474,11 +651,101 @@
},
},
"cell_type": "markdown",
"cell_type": "markdown",
"source": [
"source": [
"The *[activation function](https://developers.google.com/machine-learning/crash-course/glossary#activation_function)* determines the output of a single neuron to the next layer. This is loosely based on how brain neurons are connected. There are many [available activations](https://www.tensorflow.org/api_docs/python/tf/keras/activations), but [ReLU](https://developers.google.com/machine-learning/crash-course/glossary#ReLU) is common for hidden layers.\n",
"The *[activation function](https://developers.google.com/machine-learning/crash-course/glossary#activation_function)* determines the output shape of each node in the layer. These non-linearities are important—without them the model would be equivalent to a single layer. There are many [available activations](https://www.tensorflow.org/api_docs/python/tf/keras/activations), but [ReLU](https://developers.google.com/machine-learning/crash-course/glossary#ReLU) is common for hidden layers.\n",
"\n",
"\n",
"The ideal number of hidden layers and neurons depends on the problem and the dataset. Like many aspects of machine learning, picking the best shape of the neural network requires a mixture of knowledge and experimentation. As a rule of thumb, increasing the number of hidden layers and neurons typically creates a more powerful model, which requires more data to train effectively."
"The ideal number of hidden layers and neurons depends on the problem and the dataset. Like many aspects of machine learning, picking the best shape of the neural network requires a mixture of knowledge and experimentation. As a rule of thumb, increasing the number of hidden layers and neurons typically creates a more powerful model, which requires more data to train effectively."
]
]
},
},
{
"metadata": {
"id": "2wFKnhWCpDSS",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"### Using the model\n",
"\n",
"Let's have a quick look at what this model does to a batch of features:"
]
},
{
"metadata": {
"id": "xe6SQ5NrpB-I",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"predictions = model(features)\n",
"predictions[:5]"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "wxyXOhwVr5S3",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Here, each example returns a [logit](https://developers.google.com/machine-learning/crash-course/glossary#logit) for each class. \n",
"\n",
"To convert these logits to a probability for each class, use the [softmax](https://developers.google.com/machine-learning/crash-course/glossary#softmax) function:"
]
},
{
"metadata": {
"id": "_tRwHZmTNTX2",
"colab_type": "code",
"colab": {
"autoexec": {
"startup": false,
"wait_interval": 0
}
}
},
"cell_type": "code",
"source": [
"tf.nn.softmax(predictions[:5])"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "uRZmchElo481",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Taking the `tf.argmax` across classes gives us the predicted class index. But, the model hasn't been trained yet, so these aren't good predictions."
"Both training and evaluation stages need to calculate the model's *[loss](https://developers.google.com/machine-learning/crash-course/glossary#loss)*. This measures how off a model's predictions are from the desired label, in other words, how bad the model is performing. We want to minimize, or optimize, this value.\n",
"Both training and evaluation stages need to calculate the model's *[loss](https://developers.google.com/machine-learning/crash-course/glossary#loss)*. This measures how off a model's predictions are from the desired label, in other words, how bad the model is performing. We want to minimize, or optimize, this value.\n",
"\n",
"\n",
"Our model will calculate its loss using the [tf.losses.sparse_softmax_cross_entropy](https://www.tensorflow.org/api_docs/python/tf/losses/sparse_softmax_cross_entropy) function which takes the model's prediction and the desired label. The returned loss value is progressively larger as the prediction gets worse."
"Our model will calculate its loss using the [tf.keras.losses.categorical_crossentropy](https://www.tensorflow.org/api_docs/python/tf/losses/sparse_softmax_cross_entropy) function which takes the model's class probability predictions and the desired label, and returns the average loss across the examples."
"The `grad` function uses the `loss` function and the [tf.GradientTape](https://www.tensorflow.org/api_docs/python/tf/GradientTape) to record operations that compute the *[gradients](https://developers.google.com/machine-learning/crash-course/glossary#gradient)* used to optimize our model. For more examples of this, see the [eager execution guide](https://www.tensorflow.org/programmers_guide/eager)."
"Use the [tf.GradientTape](https://www.tensorflow.org/api_docs/python/tf/GradientTape) context to calculate the *[gradients](https://developers.google.com/machine-learning/crash-course/glossary#gradient)* used to optimize our model. For more examples of this, see the [eager execution guide](https://www.tensorflow.org/programmers_guide/eager)."
"An *[optimizer](https://developers.google.com/machine-learning/crash-course/glossary#optimizer)* applies the computed gradients to the model's variables to minimize the `loss` function. You can think of a curved surface (see Figure 3) and we want to find its lowest point by walking around. The gradients point in the direction of steepest ascent—so we'll travel the opposite way and move down the hill. By iteratively calculating the loss and gradient for each batch, we'll adjust the model during training. Gradually, the model will find the best combination of weights and bias to minimize loss. And the lower the loss, the better the model's predictions.\n",
"An *[optimizer](https://developers.google.com/machine-learning/crash-course/glossary#optimizer)* applies the computed gradients to the model's variables to minimize the `loss` function. You can think of the loss function as a curved surface (see Figure 3) and we want to find its lowest point by walking around. The gradients point in the direction of steepest ascent—so we'll travel the opposite way and move down the hill. By iteratively calculating the loss and gradient for each batch, we'll adjust the model during training. Gradually, the model will find the best combination of weights and bias to minimize loss. And the lower the loss, the better the model's predictions.\n",
"\n",
"\n",
"<table>\n",
"<table>\n",
" <tr><td>\n",
" <tr><td>\n",
...
@@ -567,6 +853,16 @@
...
@@ -567,6 +853,16 @@
"TensorFlow has many [optimization algorithms](https://www.tensorflow.org/api_guides/python/train) available for training. This model uses the [tf.train.GradientDescentOptimizer](https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer) that implements the *[stochastic gradient descent](https://developers.google.com/machine-learning/crash-course/glossary#gradient_descent)* (SGD) algorithm. The `learning_rate` sets the step size to take for each iteration down the hill. This is a *hyperparameter* that you'll commonly adjust to achieve better results."
"TensorFlow has many [optimization algorithms](https://www.tensorflow.org/api_guides/python/train) available for training. This model uses the [tf.train.GradientDescentOptimizer](https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer) that implements the *[stochastic gradient descent](https://developers.google.com/machine-learning/crash-course/glossary#gradient_descent)* (SGD) algorithm. The `learning_rate` sets the step size to take for each iteration down the hill. This is a *hyperparameter* that you'll commonly adjust to achieve better results."
]
]
},
},
{
"metadata": {
"id": "XkUd6UiZa_dF",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Let's setup the optimizer and the `global_step` counter:"
"With all the pieces in place, the model is ready for training! A training loop feeds the dataset examples into the model to help it make better predictions. The following code block sets up these training steps:\n",
"With all the pieces in place, the model is ready for training! A training loop feeds the dataset examples into the model to help it make better predictions. The following code block sets up these training steps:\n",
"\n",
"\n",
"1. Iterate each epoch. An epoch is one pass through the dataset.\n",
"1. Iterate each *epoch*. An epoch is one pass through the dataset.\n",
"2. Within an epoch, iterate over each example in the training `Dataset` grabbing its *features* (`x`) and *label* (`y`).\n",
"2. Within an epoch, iterate over each example in the training `Dataset` grabbing its *features* (`x`) and *label* (`y`).\n",
"3. Using the example's features, make a prediction and compare it with the label. Measure the inaccuracy of the prediction and use that to calculate the model's loss and gradients.\n",
"3. Using the example's features, make a prediction and compare it with the label. Measure the inaccuracy of the prediction and use that to calculate the model's loss and gradients.\n",
"4. Use an `optimizer` to update the model's variables.\n",
"4. Use an `optimizer` to update the model's variables.\n",