"Note: you can run [**this notebook, live in Google Colab**](https://colab.research.google.com/github/tensorflow/models/blob/master/samples/core/get_started/eager.ipynb) with zero setup.\n",
"Note: you can run **[this notebook, live in Google Colab](https://colab.research.google.com/github/tensorflow/models/blob/master/samples/core/get_started/eager.ipynb)** with zero setup.\n",
"\n",
"This tutorial describes how to use machine learning to *categorize* Iris flowers by species. It uses [TensorFlow](https://www.tensorflow.org)'s eager execution to 1. build a *model*, 2. *train* the model on example data, and 3. use the model to make *predictions* on unknown data. Machine Learning experience isn't required to follow this guide, but you'll need to read some Python code.\n",
"\n",
...
...
@@ -219,7 +219,7 @@
"\n",
"### Download the dataset\n",
"\n",
"Download the training dataset file using the [`tf.keras.utils.get_file`](https://www.tensorflow.org/api_docs/python/tf/keras/utils/get_file) function. This returns the file path of the downloaded file."
"Download the training dataset file using the `[tf.keras.utils.get_file](https://www.tensorflow.org/api_docs/python/tf/keras/utils/get_file)` function. This returns the file path of the downloaded file."
]
},
{
...
...
@@ -286,9 +286,9 @@
"\n",
"1. The first line is a header containing information about the dataset:\n",
" * There are 120 total examples. Each example has four features and one of three possible label names. \n",
"2. Subsequent rows are data records, one [*example*](https://developers.google.com/machine-learning/glossary/#example) per line, where:\n",
" * The first four fields are [*features*](https://developers.google.com/machine-learning/glossary/#feature): these are characteristics of an example. Here, the fields hold float numbers representing flower measurements.\n",
" * The last column is the [*label*](https://developers.google.com/machine-learning/glossary/#label): this is the value we want to predict. For this dataset, it's an integer value of 0, 1, or 2 that corresponds to a flower name.\n",
"2. Subsequent rows are data records, one *[example](https://developers.google.com/machine-learning/glossary/#example)* per line, where:\n",
" * The first four fields are *[features](https://developers.google.com/machine-learning/glossary/#feature)*: these are characteristics of an example. Here, the fields hold float numbers representing flower measurements.\n",
" * The last column is the *[label](https://developers.google.com/machine-learning/glossary/#label)*: this is the value we want to predict. For this dataset, it's an integer value of 0, 1, or 2 that corresponds to a flower name.\n",
"\n",
"Each label is associated with string name (for example, \"setosa\"), but machine learning typically relies on numeric values. The label numbers are mapped to a named representation, such as:\n",
"\n",
...
...
@@ -347,9 +347,9 @@
"\n",
"TensorFlow's [Dataset API](https://www.tensorflow.org/programmers_guide/datasets) handles many common cases for feeding data into a model. This is a high-level API for reading data and transforming it into a form used for training. See the [Datasets Quick Start guide](https://www.tensorflow.org/get_started/datasets_quickstart) for more information.\n",
"\n",
"This program uses [`tf.data.TextLineDataset`](https://www.tensorflow.org/api_docs/python/tf/data/TextLineDataset) to load a CSV-formatted text file and is parsed with our `parse_csv` function. A [`tf.data.Dataset`](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) represents an input pipeline as a collection of elements and a series of transformations that act on those elements. Transformation methods are chained together or called sequentially—just make sure to keep a reference to the returned `Dataset` object.\n",
"This program uses `[tf.data.TextLineDataset](https://www.tensorflow.org/api_docs/python/tf/data/TextLineDataset)` to load a CSV-formatted text file and is parsed with our `parse_csv` function. A `[tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset)` represents an input pipeline as a collection of elements and a series of transformations that act on those elements. Transformation methods are chained together or called sequentially—just make sure to keep a reference to the returned `Dataset` object.\n",
"\n",
"Training works best if the examples are in random order. Use `tf.data.Dataset.shuffle` to randomize entries, setting `buffer_size` to a value larger than the number of examples (120 in this case). To train the model faster, the dataset's [*batch size*](https://developers.google.com/machine-learning/glossary/#batch_size) is set to `32` examples to train at once."
"Training works best if the examples are in random order. Use `tf.data.Dataset.shuffle` to randomize entries, setting `buffer_size` to a value larger than the number of examples (120 in this case). To train the model faster, the dataset's *[batch size](https://developers.google.com/machine-learning/glossary/#batch_size)* is set to `32` examples to train at once."
]
},
{
...
...
@@ -390,13 +390,13 @@
"\n",
"### Why model?\n",
"\n",
"A [*model*](https://developers.google.com/machine-learning/crash-course/glossary#model) is the relationship between features and the label. For the Iris classification problem, the model defines the relationship between the sepal and petal measurements and the predicted Iris species. Some simple models can be described with a few lines of algebra, but complex machine learning models have a large number of parameters that are difficult to summarize.\n",
"A *[model](https://developers.google.com/machine-learning/crash-course/glossary#model)* is the relationship between features and the label. For the Iris classification problem, the model defines the relationship between the sepal and petal measurements and the predicted Iris species. Some simple models can be described with a few lines of algebra, but complex machine learning models have a large number of parameters that are difficult to summarize.\n",
"\n",
"Could you determine the relationship between the four features and the Iris species *without* using machine learning? That is, could you use traditional programming techniques (for example, a lot of conditional statements) to create a model? Perhaps—if you analyzed the dataset long enough to determine the relationships between petal and sepal measurements to a particular species. And this becomes difficult—maybe impossible—on more complicated datasets. A good machine learning approach *determines the model for you*. If you feed enough representative examples into the right machine learning model type, the program will figure out the relationships for you.\n",
"\n",
"### Select the model\n",
"\n",
"We need to select the kind of model to train. There are many types of models and picking a good one takes experience. This tutorial uses a neural network to solve the Iris classification problem. [*Neural networks*](https://developers.google.com/machine-learning/glossary/#neural_network) can find complex relationships between features and the label. It is a highly-structured graph, organized into one or more [*hidden layers*](https://developers.google.com/machine-learning/glossary/#hidden_layer). Each hidden layer consists of one or more [*neurons*](https://developers.google.com/machine-learning/glossary/#neuron). There are several categories of neural networks and this program uses a dense, or [*fully-connected neural network*](https://developers.google.com/machine-learning/glossary/#fully_connected_layer): the neurons in one layer receive input connections from *every* neuron in the previous layer. For example, Figure 2 illustrates a dense neural network consisting of an input layer, two hidden layers, and an output layer:\n",
"We need to select the kind of model to train. There are many types of models and picking a good one takes experience. This tutorial uses a neural network to solve the Iris classification problem. *[Neural networks](https://developers.google.com/machine-learning/glossary/#neural_network)* can find complex relationships between features and the label. It is a highly-structured graph, organized into one or more *[hidden layers](https://developers.google.com/machine-learning/glossary/#hidden_layer)*. Each hidden layer consists of one or more *[neurons](https://developers.google.com/machine-learning/glossary/#neuron)*. There are several categories of neural networks and this program uses a dense, or *[fully-connected neural network](https://developers.google.com/machine-learning/glossary/#fully_connected_layer)*: the neurons in one layer receive input connections from *every* neuron in the previous layer. For example, Figure 2 illustrates a dense neural network consisting of an input layer, two hidden layers, and an output layer:\n",
"When the model from Figure 2 is trained and fed an unlabeled example, it yields three predictions: the likelihood that this flower is the given Iris species. This prediction is called [*inference*](https://developers.google.com/machine-learning/crash-course/glossary#inference). For this example, the sum of the output predictions are 1.0. In Figure 2, this prediction breaks down as: `0.03` for *Iris setosa*, `0.95` for *Iris versicolor*, and `0.02` for *Iris virginica*. This means that the model predicts—with 95% probability—that an unlabeled example flower is an *Iris versicolor*."
"When the model from Figure 2 is trained and fed an unlabeled example, it yields three predictions: the likelihood that this flower is the given Iris species. This prediction is called *[inference](https://developers.google.com/machine-learning/crash-course/glossary#inference)*. For this example, the sum of the output predictions are 1.0. In Figure 2, this prediction breaks down as: `0.03` for *Iris setosa*, `0.95` for *Iris versicolor*, and `0.02` for *Iris virginica*. This means that the model predicts—with 95% probability—that an unlabeled example flower is an *Iris versicolor*."
]
},
{
...
...
@@ -418,9 +418,9 @@
"source": [
"### Create a model using Keras\n",
"\n",
"The TensorFlow [`tf.keras`](https://www.tensorflow.org/api_docs/python/tf/keras) API is the preferred way to create models and layers. This makes it easy to build models and experiment while Keras handles the complexity of connecting everything together. See the [Keras documentation](https://keras.io/) for details.\n",
"The TensorFlow `[tf.keras](https://www.tensorflow.org/api_docs/python/tf/keras)` API is the preferred way to create models and layers. This makes it easy to build models and experiment while Keras handles the complexity of connecting everything together. See the [Keras documentation](https://keras.io/) for details.\n",
"\n",
"The [`tf.keras.Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) model is a linear stack of layers. Its constructor takes a list of layer instances, in this case, two [`Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layers with 10 nodes each, and an output layer with 3 nodes representing our label predictions. The first layer's `input_shape` parameter corresponds to the amount of features from the dataset, and is required."
"The `[tf.keras.Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential)` model is a linear stack of layers. Its constructor takes a list of layer instances, in this case, two `[Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)` layers with 10 nodes each, and an output layer with 3 nodes representing our label predictions. The first layer's `input_shape` parameter corresponds to the amount of features from the dataset, and is required."
]
},
{
...
...
@@ -452,7 +452,7 @@
},
"cell_type": "markdown",
"source": [
"The [*activation function*](https://developers.google.com/machine-learning/crash-course/glossary#activation_function) determines the output of a single neuron to the next layer. This is loosely based on how brain neurons are connected. There are many [available activations](https://www.tensorflow.org/api_docs/python/tf/keras/activations), but [ReLU](https://developers.google.com/machine-learning/crash-course/glossary#ReLU) is common for hidden layers.\n",
"The *[activation function](https://developers.google.com/machine-learning/crash-course/glossary#activation_function)* determines the output of a single neuron to the next layer. This is loosely based on how brain neurons are connected. There are many [available activations](https://www.tensorflow.org/api_docs/python/tf/keras/activations), but [ReLU](https://developers.google.com/machine-learning/crash-course/glossary#ReLU) is common for hidden layers.\n",
"\n",
"The ideal number of hidden layers and neurons depends on the problem and the dataset. Like many aspects of machine learning, picking the best shape of the neural network requires a mixture of knowledge and experimentation. As a rule of thumb, increasing the number of hidden layers and neurons typically creates a more powerful model, which requires more data to train effectively."
]
...
...
@@ -466,9 +466,9 @@
"source": [
"## Train the model\n",
"\n",
"[*Training*](https://developers.google.com/machine-learning/crash-course/glossary#training) is the stage of machine learning when the model is gradually optimized, or the model *learns* the dataset. The goal is to learn enough about the structure of the training dataset to make predictions about unseen data. If you learn *too much* about the training dataset, then the predictions only work for the data it has seen and will not be generalizable. This problem is called [*overfitting*](https://developers.google.com/machine-learning/crash-course/glossary#overfitting)—it's like memorizing the answers instead of understanding how to solve a problem.\n",
"*[Training](https://developers.google.com/machine-learning/crash-course/glossary#training)* is the stage of machine learning when the model is gradually optimized, or the model *learns* the dataset. The goal is to learn enough about the structure of the training dataset to make predictions about unseen data. If you learn *too much* about the training dataset, then the predictions only work for the data it has seen and will not be generalizable. This problem is called *[overfitting](https://developers.google.com/machine-learning/crash-course/glossary#overfitting)*—it's like memorizing the answers instead of understanding how to solve a problem.\n",
"\n",
"The Iris classification problem is an example of [*supervised machine learning*](https://developers.google.com/machine-learning/glossary/#supervised_machine_learning): the model is trained from examples that contain labels. In [*unsupervised machine learning*](https://developers.google.com/machine-learning/glossary/#unsupervised_machine_learning), the examples don't contain labels. Instead, the model typically finds patterns among the features."
"The Iris classification problem is an example of *[supervised machine learning](https://developers.google.com/machine-learning/glossary/#supervised_machine_learning)*: the model is trained from examples that contain labels. In *[unsupervised machine learning](https://developers.google.com/machine-learning/glossary/#unsupervised_machine_learning)*, the examples don't contain labels. Instead, the model typically finds patterns among the features."
]
},
{
...
...
@@ -480,9 +480,9 @@
"source": [
"### Define the loss and gradient function\n",
"\n",
"Both training and evaluation stages need to calculate the model's [*loss*](https://developers.google.com/machine-learning/crash-course/glossary#loss). This measures how off a model's predictions are from the desired label, in other words, how bad the model is performing. We want to minimize, or optimize, this value.\n",
"Both training and evaluation stages need to calculate the model's *[loss](https://developers.google.com/machine-learning/crash-course/glossary#loss)*. This measures how off a model's predictions are from the desired label, in other words, how bad the model is performing. We want to minimize, or optimize, this value.\n",
"\n",
"Our model will calculate its loss using the [`tf.losses.sparse_softmax_cross_entropy`](https://www.tensorflow.org/api_docs/python/tf/losses/sparse_softmax_cross_entropy) function which takes the model's prediction and the desired label. The returned loss value is progressively larger as the prediction gets worse."
"Our model will calculate its loss using the `[tf.losses.sparse_softmax_cross_entropy](https://www.tensorflow.org/api_docs/python/tf/losses/sparse_softmax_cross_entropy)` function which takes the model's prediction and the desired label. The returned loss value is progressively larger as the prediction gets worse."
]
},
{
...
...
@@ -518,7 +518,7 @@
},
"cell_type": "markdown",
"source": [
"The `grad` function uses the `loss` function and the [`tfe.GradientTape`](https://www.tensorflow.org/api_docs/python/tf/contrib/eager/GradientTape) to record operations that compute the [*gradients*](https://developers.google.com/machine-learning/crash-course/glossary#gradient) used to optimize our model. For more examples of this, see the [eager execution guide](https://www.tensorflow.org/programmers_guide/eager)."
"The `grad` function uses the `loss` function and the `[tfe.GradientTape](https://www.tensorflow.org/api_docs/python/tf/contrib/eager/GradientTape)` to record operations that compute the *[gradients](https://developers.google.com/machine-learning/crash-course/glossary#gradient)* used to optimize our model. For more examples of this, see the [eager execution guide](https://www.tensorflow.org/programmers_guide/eager)."
]
},
{
...
...
@@ -530,7 +530,7 @@
"source": [
"### Create an optimizer\n",
"\n",
"An [*optimizer*](https://developers.google.com/machine-learning/crash-course/glossary#optimizer) applies the computed gradients to the model's variables to minimize the `loss` function. You can think of a curved surface (see Figure 3) and we want to find its lowest point by walking around. The gradients point in the direction of steepest the ascent—so we'll travel the opposite way and move down the hill. By iteratively calculating the loss and gradients for each *step* (or [*learning rate*](https://developers.google.com/machine-learning/crash-course/glossary#learning_rate)), we'll adjust the model during training. Gradually, the model will find the best combination of weights and bias to minimize loss. And the lower the loss, the better the model's predictions.\n",
"An *[optimizer](https://developers.google.com/machine-learning/crash-course/glossary#optimizer)* applies the computed gradients to the model's variables to minimize the `loss` function. You can think of a curved surface (see Figure 3) and we want to find its lowest point by walking around. The gradients point in the direction of steepest the ascent—so we'll travel the opposite way and move down the hill. By iteratively calculating the loss and gradients for each *step* (or *[learning rate](https://developers.google.com/machine-learning/crash-course/glossary#learning_rate)*), we'll adjust the model during training. Gradually, the model will find the best combination of weights and bias to minimize loss. And the lower the loss, the better the model's predictions.\n",
"TensorFlow has many [optimization algorithms](https://www.tensorflow.org/api_guides/python/train) available for training. This model uses the [`tf.train.GradientDescentOptimizer`](https://www.tensorflow.org/versions/r1.0/api_docs/python/tf/train/GradientDescentOptimizer) that implements the [*standard gradient descent*](https://developers.google.com/machine-learning/crash-course/glossary#gradient_descent) (SGD) algorithm. The `learning_rate` sets the step size to take for each iteration down the hill. This is a *hyperparameter* that you'll commonly adjust to achieve better results."
"TensorFlow has many [optimization algorithms](https://www.tensorflow.org/api_guides/python/train) available for training. This model uses the `[tf.train.GradientDescentOptimizer](https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer)` that implements the *[standard gradient descent](https://developers.google.com/machine-learning/crash-course/glossary#gradient_descent)* (SGD) algorithm. The `learning_rate` sets the step size to take for each iteration down the hill. This is a *hyperparameter* that you'll commonly adjust to achieve better results."
]
},
{
...
...
@@ -578,7 +578,7 @@
"5. Keep track of some stats for visualization.\n",
"6. Repeat for each epoch.\n",
"\n",
"The `num_epochs` variable is the amount of times to loop over the dataset collection. Counter-intuitively, training a model longer does not guarantee a better model. `num_epochs` is a [*hyperparameter*](https://developers.google.com/machine-learning/glossary/#hyperparameter) that you can tune. Choosing the right number usually requires both experience and experimentation."
"The `num_epochs` variable is the amount of times to loop over the dataset collection. Counter-intuitively, training a model longer does not guarantee a better model. `num_epochs` is a *[hyperparameter](https://developers.google.com/machine-learning/glossary/#hyperparameter)* that you can tune. Choosing the right number usually requires both experience and experimentation."
]
},
{
...
...
@@ -691,7 +691,7 @@
"\n",
"Now that the model is trained, we can get some statistics on its performance.\n",
"\n",
"*Evaluating* means determining how effectively the model makes predictions. To determine the model's effectiveness at Iris classification, pass some sepal and petal measurements to the model and ask the model to predict what Iris species they represent. Then compare the model's prediction against the actual label. For example, a model that picked the correct species on half the input examples has an [*accuracy*](https://developers.google.com/machine-learning/glossary/#accuracy) of `0.5`. Figure 4 shows a slightly more effective model, getting 4 out of 5 predictions correct at 80% accuracy:\n",
"*Evaluating* means determining how effectively the model makes predictions. To determine the model's effectiveness at Iris classification, pass some sepal and petal measurements to the model and ask the model to predict what Iris species they represent. Then compare the model's prediction against the actual label. For example, a model that picked the correct species on half the input examples has an *[accuracy](https://developers.google.com/machine-learning/glossary/#accuracy)* of `0.5`. Figure 4 shows a slightly more effective model, getting 4 out of 5 predictions correct at 80% accuracy:\n",
"\n",
"<figure>\n",
" <table cellpadding=\"8\" border=\"0\">\n",
...
...
@@ -734,7 +734,7 @@
"source": [
"### Setup the test dataset\n",
"\n",
"Evaluating the model is similiar to training the model. The biggest difference is the examples come from a separate [*test set*](https://developers.google.com/machine-learning/crash-course/glossary#test_set) rather than the training set. To fairly assess a model's effectiveness, the examples used to evaluate a model must be different from the examples used to train the model.\n",
"Evaluating the model is similiar to training the model. The biggest difference is the examples come from a separate *[test set](https://developers.google.com/machine-learning/crash-course/glossary#test_set)* rather than the training set. To fairly assess a model's effectiveness, the examples used to evaluate a model must be different from the examples used to train the model.\n",
"\n",
"The setup for the test `Dataset` is similiar to the setup for training `Dataset`. Download the CSV text file and parse that values, then give it a little shuffle:"
"Note: you can run [**this notebook, live in Google Colab**](https://colab.research.google.com/github/tensorflow/models/blob/master/samples/outreach/demos/eager_execution.ipynb) with zero setup. \n",
"Note: you can run **[this notebook, live in Google Colab](https://colab.research.google.com/github/tensorflow/models/blob/master/samples/outreach/demos/eager_execution.ipynb)** with zero setup. \n",