"\u001b[0;32m~/venv3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py\u001b[0m in \u001b[0;36menable_eager_execution_internal\u001b[0;34m(config, device_policy, execution_mode, server_def)\u001b[0m\n\u001b[1;32m 5306\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5307\u001b[0m raise ValueError(\n\u001b[0;32m-> 5308\u001b[0;31m \"tf.enable_eager_execution must be called at program startup.\")\n\u001b[0m\u001b[1;32m 5309\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5310\u001b[0m \u001b[0;31m# Monkey patch to get rid of an unnecessary conditional since the context is\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValueError\u001b[0m: tf.enable_eager_execution must be called at program startup."
]
}
],
"source": [
"import tensorflow as tf\n",
"import tensorflow.feature_column as fc \n",
"tf.enable_eager_execution()\n",
"\n",
"import os\n",
"import sys\n",
"from IPython.display import clear_output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Download the [tutorial code from github](https://github.com/tensorflow/models/tree/master/official/wide_deep/),\n",
" add the root directory to your python path, and jump to the `wide_deep` directory:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"fatal: destination path 'models' already exists and is not an empty directory.\r\n"
"Because `Estimators` expect an `input_fn` that takes no arguments, we typically wrap configurable input function into an obejct with the expected signature. For this notebook configure the `train_inpf` to iterate over the data twice:"
"## Selecting and Engineering Features for the Model\n",
"\n",
"Estimators use a system called `feature_columns` to describe how the model\n",
"should interpret each of the raw input features. An Estimator exepcts a vector\n",
"of numeric inputs, and feature columns describe how the model shoukld convert\n",
"each feature.\n",
"\n",
"Selecting and crafting the right set of feature columns is key to learning an\n",
"effective model. A **feature column** can be either one of the raw columns in\n",
"the original dataframe (let's call them **base feature columns**), or any new\n",
"columns created based on some transformations defined over one or multiple base\n",
"columns (let's call them **derived feature columns**). Basically, \"feature\n",
"column\" is an abstract concept of any raw or derived variable that can be used\n",
"to predict the target label.\n",
"\n",
"### Base Feature Columns\n",
"\n",
"#### Numeric columns\n",
"\n",
"The simplest `feature_column` is `numeric_column`. This indicates that a feature is a numeric value that should be input to the model directly. For example:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"age = fc.numeric_column('age')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The model will use the `feature_column` definitions to build the model input. You can inspect the resulting output using the `input_layer` function:"
"To define a feature column for a categorical feature, we can create a\n",
"`CategoricalColumn` using one of the `tf.feature_column.categorical_column*` functions.\n",
"\n",
"If you know the set of all possible feature values of a column and there are only a few of them, you can use `categorical_column_with_vocabulary_list`. Each key in the list will get assigned an auto-incremental ID starting from 0. For example, for the `relationship` column we can assign the feature string `Husband` to an integer ID of 0 and \"Not-in-family\" to 1, etc., by doing:"
"This will create a sparse one-hot vector from the raw input feature.\n",
"\n",
"The `input_layer` function we're using for demonstration is designed for DNN models, and so expects dense inputs. To demonstrate the categorical column we must wrap it in a `tf.feature_column.indicator_column` to create the dense one-hot output (Linear `Estimators` can often skip this dense-step).\n",
"\n",
"Note: the other sparse-to-dense option is `tf.feature_column.embedding_column`.\n",
"\n",
"Run the input layer, configured with both the `age` and `relationship` columns:"
"It's easier to see the actual results if we take the tf.argmax over the `hash_bucket_size` dimension.\n",
"\n",
"In the output below, note how any duplicate occupations are mapped to the same pseudo-random index:\n",
"\n",
"Note: Hash collisions are unavoidable, but often have minimal impact on model quiality. The effeect may be noticable if the hash buckets are being used to compress the input space. See [this notebook](https://colab.research.google.com/github/tensorflow/models/blob/master/samples/outreach/blogs/housing_prices.ipynb) for a more visual example of the effect of these hash collisions."
"These crossed columns always use hash buckets to avoid the exponential explosion in the number of categories, and put the control over number of model weights in the hands of the user.\n",
"\n",
"For a visual example the effect of hash-buckets with crossed columns see [this notebook](https://colab.research.google.com/github/tensorflow/models/blob/master/samples/outreach/blogs/housing_prices.ipynb)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Defining The Logistic Regression Model\n",
"\n",
"After processing the input data and defining all the feature columns, we're now\n",
"ready to put them all together and build a Logistic Regression model. In the\n",
"previous section we've seen several types of base and derived feature columns,\n",
"including:\n",
"\n",
"* `CategoricalColumn`\n",
"* `NumericColumn`\n",
"* `BucketizedColumn`\n",
"* `CrossedColumn`\n",
"\n",
"All of these are subclasses of the abstract `FeatureColumn` class, and can be\n",
"added to the `feature_columns` field of a model:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Using default config.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"I0711 14:48:54.071429 140466218788608 tf_logging.py:115] Using default config.\n"
"Because `Estimators` expect an `input_fn` that takes no arguments, we typically wrap configurable input function into an obejct with the expected signature. For this notebook configure the `train_inpf` to iterate over the data twice:"
"## Selecting and Engineering Features for the Model\n",
"\n",
"Estimators use a system called `feature_columns` to describe how the model\n",
"should interpret each of the raw input features. An Estimator exepcts a vector\n",
"of numeric inputs, and feature columns describe how the model shoukld convert\n",
"each feature.\n",
"\n",
"Selecting and crafting the right set of feature columns is key to learning an\n",
"effective model. A **feature column** can be either one of the raw columns in\n",
"the original dataframe (let's call them **base feature columns**), or any new\n",
"columns created based on some transformations defined over one or multiple base\n",
"columns (let's call them **derived feature columns**). Basically, \"feature\n",
"column\" is an abstract concept of any raw or derived variable that can be used\n",
"to predict the target label.\n",
"\n",
"### Base Feature Columns\n",
"\n",
"#### Numeric columns\n",
"\n",
"The simplest `feature_column` is `numeric_column`. This indicates that a feature is a numeric value that should be input to the model directly. For example:"
"To define a feature column for a categorical feature, we can create a\n",
"`CategoricalColumn` using one of the `tf.feature_column.categorical_column*` functions.\n",
"\n",
"If you know the set of all possible feature values of a column and there are only a few of them, you can use `categorical_column_with_vocabulary_list`. Each key in the list will get assigned an auto-incremental ID starting from 0. For example, for the `relationship` column we can assign the feature string `Husband` to an integer ID of 0 and \"Not-in-family\" to 1, etc., by doing:"
"This will create a sparse one-hot vector from the raw input feature.\n",
"\n",
"The `input_layer` function we're using for demonstration is designed for DNN models, and so expects dense inputs. To demonstrate the categorical column we must wrap it in a `tf.feature_column.indicator_column` to create the dense one-hot output (Linear `Estimators` can often skip this dense-step).\n",
"\n",
"Note: the other sparse-to-dense option is `tf.feature_column.embedding_column`.\n",
"\n",
"Run the input layer, configured with both the `age` and `relationship` columns:"
"It's easier to see the actual results if we take the tf.argmax over the `hash_bucket_size` dimension.\n",
"\n",
"In the output below, note how any duplicate occupations are mapped to the same pseudo-random index:\n",
"\n",
"Note: Hash collisions are unavoidable, but often have minimal impact on model quiality. The effeect may be noticable if the hash buckets are being used to compress the input space. See [this notebook](https://colab.research.google.com/github/tensorflow/models/blob/master/samples/outreach/blogs/housing_prices.ipynb) for a more visual example of the effect of these hash collisions."
"These crossed columns always use hash buckets to avoid the exponential explosion in the number of categories, and put the control over number of model weights in the hands of the user.\n",
"\n",
"For a visual example the effect of hash-buckets with crossed columns see [this notebook](https://colab.research.google.com/github/tensorflow/models/blob/master/samples/outreach/blogs/housing_prices.ipynb)\n",
"\n"
]
},
{
"metadata": {
"id": "HtjpheB6cYw9",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"## Defining The Logistic Regression Model\n",
"\n",
"After processing the input data and defining all the feature columns, we're now\n",
"ready to put them all together and build a Logistic Regression model. In the\n",
"previous section we've seen several types of base and derived feature columns,\n",
"including:\n",
"\n",
"* `CategoricalColumn`\n",
"* `NumericColumn`\n",
"* `BucketizedColumn`\n",
"* `CrossedColumn`\n",
"\n",
"All of these are subclasses of the abstract `FeatureColumn` class, and can be\n",
"added to the `feature_columns` field of a model:"