"\u001b[0;32m~/venv3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py\u001b[0m in \u001b[0;36menable_eager_execution_internal\u001b[0;34m(config, device_policy, execution_mode, server_def)\u001b[0m\n\u001b[1;32m 5306\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5307\u001b[0m raise ValueError(\n\u001b[0;32m-> 5308\u001b[0;31m \"tf.enable_eager_execution must be called at program startup.\")\n\u001b[0m\u001b[1;32m 5309\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5310\u001b[0m \u001b[0;31m# Monkey patch to get rid of an unnecessary conditional since the context is\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mValueError\u001b[0m: tf.enable_eager_execution must be called at program startup."
]
}
],
"source": [
"import tensorflow as tf\n",
"import tensorflow.feature_column as fc \n",
"tf.enable_eager_execution()\n",
"\n",
"\n",
"3. Execute the data download script we provide to you:"
"import os\n",
"import sys\n",
"from IPython.display import clear_output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Download the [tutorial code from github](https://github.com/tensorflow/models/tree/master/official/wide_deep/),\n",
" add the root directory to your python path, and jump to the `wide_deep` directory:"
]
]
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {},
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"fatal: destination path 'models' already exists and is not an empty directory.\r\n"
"Because `Estimators` expect an `input_fn` that takes no arguments, we typically wrap configurable input function into an obejct with the expected signature. For this notebook configure the `train_inpf` to iterate over the data twice:"
"## Selecting and Engineering Features for the Model\n",
"## Selecting and Engineering Features for the Model\n",
"\n",
"\n",
"Estimators use a system called `feature_columns` to describe how the model\n",
"should interpret each of the raw input features. An Estimator exepcts a vector\n",
"of numeric inputs, and feature columns describe how the model shoukld convert\n",
"each feature.\n",
"\n",
"Selecting and crafting the right set of feature columns is key to learning an\n",
"Selecting and crafting the right set of feature columns is key to learning an\n",
"effective model. A **feature column** can be either one of the raw columns in\n",
"effective model. A **feature column** can be either one of the raw columns in\n",
"the original dataframe (let's call them **base feature columns**), or any new\n",
"the original dataframe (let's call them **base feature columns**), or any new\n",
...
@@ -195,27 +601,258 @@
...
@@ -195,27 +601,258 @@
"column\" is an abstract concept of any raw or derived variable that can be used\n",
"column\" is an abstract concept of any raw or derived variable that can be used\n",
"to predict the target label.\n",
"to predict the target label.\n",
"\n",
"\n",
"### Base Categorical Feature Columns\n",
"### Base Feature Columns\n",
"\n",
"#### Numeric columns\n",
"\n",
"The simplest `feature_column` is `numeric_column`. This indicates that a feature is a numeric value that should be input to the model directly. For example:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"age = fc.numeric_column('age')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The model will use the `feature_column` definitions to build the model input. You can inspect the resulting output using the `input_layer` function:"
"To define a feature column for a categorical feature, we can create a\n",
"To define a feature column for a categorical feature, we can create a\n",
"`CategoricalColumn` using the tf.feature_column API. If you know the set of all\n",
"`CategoricalColumn` using one of the `tf.feature_column.categorical_column*` functions.\n",
"possible feature values of a column and there are only a few of them, you can\n",
"\n",
"use `categorical_column_with_vocabulary_list`. Each key in the list will get\n",
"If you know the set of all possible feature values of a column and there are only a few of them, you can use `categorical_column_with_vocabulary_list`. Each key in the list will get assigned an auto-incremental ID starting from 0. For example, for the `relationship` column we can assign the feature string `Husband` to an integer ID of 0 and \"Not-in-family\" to 1, etc., by doing:"
"assigned an auto-incremental ID starting from 0. For example, for the\n",
"`relationship` column we can assign the feature string \"Husband\" to an integer\n",
"ID of 0 and \"Not-in-family\" to 1, etc., by doing:"
"This will create a sparse one-hot vector from the raw input feature.\n",
"\n",
"The `input_layer` function we're using for demonstration is designed for DNN models, and so expects dense inputs. To demonstrate the categorical column we must wrap it in a `tf.feature_column.indicator_column` to create the dense one-hot output (Linear `Estimators` can often skip this dense-step).\n",
"\n",
"Note: the other sparse-to-dense option is `tf.feature_column.embedding_column`.\n",
"\n",
"Run the input layer, configured with both the `age` and `relationship` columns:"
"It's easier to see the actual results if we take the tf.argmax over the `hash_bucket_size` dimension.\n",
"\n",
"In the output below, note how any duplicate occupations are mapped to the same pseudo-random index:\n",
"\n",
"Note: Hash collisions are unavoidable, but often have minimal impact on model quiality. The effeect may be noticable if the hash buckets are being used to compress the input space. See [this notebook](https://colab.research.google.com/github/tensorflow/models/blob/master/samples/outreach/blogs/housing_prices.ipynb) for a more visual example of the effect of these hash collisions."
"These crossed columns always use hash buckets to avoid the exponential explosion in the number of categories, and put the control over number of model weights in the hands of the user.\n",
"\n",
"For a visual example the effect of hash-buckets with crossed columns see [this notebook](https://colab.research.google.com/github/tensorflow/models/blob/master/samples/outreach/blogs/housing_prices.ipynb)\n",
"\n"
]
},
{
{
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {},
"metadata": {},
...
@@ -428,10 +1211,41 @@
...
@@ -428,10 +1211,41 @@
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": null,
"execution_count": 36,
"metadata": {},
"metadata": {},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"INFO:tensorflow:Using default config.\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"I0711 14:48:54.071429 140466218788608 tf_logging.py:115] Using default config.\n"
"The model prediction output would be like `[b'1']` or `[b'0']` which means whether corresponding individual has an annual income of over 50,000 dollars or not.\n",
"\n",
"If you'd like to see a working end-to-end example, you can download our\n",
"If you'd like to see a working end-to-end example, you can download our\n",