Commit 18fc407e authored by Mark Daoust's avatar Mark Daoust
Browse files

Fix review comments - part 2

parent 957ee975
......@@ -122,6 +122,19 @@
" add the root directory to your python path, and jump to the `wide_deep` directory:"
]
},
{
"metadata": {
"id": "tTwQzWcn8aBu",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"! git clone --depth 1 https://github.com/tensorflow/models"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "yVvFyhnkcYvL",
......@@ -130,12 +143,9 @@
},
"cell_type": "code",
"source": [
"if \"wide_deep\" not in os.getcwd():\n",
" ! git clone --depth 1 https://github.com/tensorflow/models\n",
" models_path = os.path.join(os.getcwd(), 'models')\n",
" sys.path.append(models_path) \n",
" os.environ['PYTHONPATH'] += os.pathsep+models_path\n",
" os.chdir(\"models/official/wide_deep\")"
"models_path = os.path.join(os.getcwd(), 'models')\n",
"\n",
"sys.path.append(models_path)"
],
"execution_count": 0,
"outputs": []
......@@ -158,8 +168,8 @@
},
"cell_type": "code",
"source": [
"import census_dataset\n",
"import census_main\n",
"from official.wide_deep import census_dataset\n",
"from official.wide_deep import census_main\n",
"\n",
"census_dataset.download(\"/tmp/census_data/\")"
],
......@@ -173,19 +183,65 @@
},
"cell_type": "markdown",
"source": [
"Execute the tutorial code with the following command to train the model described in this tutorial, from the command line:"
"To execute the tutorial code from the command line first add the path to tensorflow/models to your `PYTHONPATH`."
]
},
{
"metadata": {
"id": "DYOkY8boUptJ",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"#export PYTHONPATH=${PYTHONPATH}:\"$(pwd)/models\"\n",
"os.environ['PYTHONPATH'] += os.pathsep+models_path"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "5r0V9YUMUyoh",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Use `--help` to see what command line options are available: "
]
},
{
"metadata": {
"id": "vbJ8jPAhcYvT",
"id": "1_3tBaLW4YM4",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"output = !python -m census_main --model_type=wide --train_epochs=2\n",
"print([line for line in output if 'accuracy:' in line])"
"!python -m official.wide_deep.census_main --help"
],
"execution_count": 0,
"outputs": []
},
{
"metadata": {
"id": "RrMLazEN6DMj",
"colab_type": "text"
},
"cell_type": "markdown",
"source": [
"Now run the model:\n"
]
},
{
"metadata": {
"id": "py7MarZl5Yh6",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
"!python -m official.wide_deep.census_main --model_type=wide --train_epochs=2"
],
"execution_count": 0,
"outputs": []
......@@ -322,8 +378,7 @@
"cell_type": "code",
"source": [
"def easy_input_function(df, label_key, num_epochs, shuffle, batch_size):\n",
" df = df.copy()\n",
" label = df.pop(label_key)\n",
" label = df[label_key]\n",
" ds = tf.data.Dataset.from_tensor_slices((dict(df),label))\n",
"\n",
" if shuffle:\n",
......@@ -374,9 +429,9 @@
"cell_type": "markdown",
"source": [
"But this approach has severly-limited scalability. For larger data it should be streamed off disk.\n",
"the `census_dataset.input_fn` provides an example of how to do this using `tf.decode_csv` and `tf.data.TextLineDataset`: \n",
"The `census_dataset.input_fn` provides an example of how to do this using `tf.decode_csv` and `tf.data.TextLineDataset`: \n",
"\n",
"TODO(markdaoust): This `input_fn` should use `tf.contrib.data.make_csv_dataset`"
"<!-- TODO(markdaoust): This `input_fn` should use `tf.contrib.data.make_csv_dataset` -->"
]
},
{
......@@ -470,7 +525,7 @@
"\n",
"Estimators use a system called `feature_columns` to describe how the model\n",
"should interpret each of the raw input features. An Estimator exepcts a vector\n",
"of numeric inputs, and feature columns describe how the model shoukld convert\n",
"of numeric inputs, and feature columns describe how the model should convert\n",
"each feature.\n",
"\n",
"Selecting and crafting the right set of feature columns is key to learning an\n",
......@@ -752,7 +807,7 @@
},
"cell_type": "markdown",
"source": [
"if we run `input_layer` with the hashed column we see that the output shape is `(batch_size, hash_bucket_size)`"
"If we run `input_layer` with the hashed column we see that the output shape is `(batch_size, hash_bucket_size)`"
]
},
{
......@@ -1259,11 +1314,24 @@
"\n",
"For more about estimators:\n",
"\n",
"- The [TensorFlow Hub transfer-learning tutorial](https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub)\n",
"- The [TensorFlow Hub text classification tutorial](https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub) uses `hub.text_embedding_column` to easily ingest free form text. \n",
"- The [Gradient-boosted-trees estimator tutorial](https://github.com/tensorflow/models/tree/master/official/boosted_trees)\n",
"- This [blog post]( https://medium.com/tensorflow/classifying-text-with-tensorflow-estimators) on processing text with `Estimators`\n",
"- How to [build a custom CNN estimator](https://www.tensorflow.org/tutorials/estimators/cnn)"
]
},
{
"metadata": {
"id": "amMnupRPVtsa",
"colab_type": "code",
"colab": {}
},
"cell_type": "code",
"source": [
""
],
"execution_count": 0,
"outputs": []
}
]
}
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment