Fix review comments - part 2

18fc407e · Mark Daoust · 957ee975 · 18fc407e
Commit 18fc407e authored Jul 12, 2018 by Mark Daoust
Hide whitespace changes
Inline Side-by-side

Showing with 87 additions and 19 deletions

samples/core/tutorials/estimators/wide.ipynb samples/core/tutorials/estimators/wide.ipynb +87 -19

No files found.
--- a/samples/core/tutorials/estimators/wide.ipynb
+++ b/samples/core/tutorials/estimators/wide.ipynb
@@ -122,6 +122,19 @@
        " add the root directory to your python path, and jump to the `wide_deep` directory:"
      ]
    },
+    {
+      "metadata": {
+        "id": "tTwQzWcn8aBu",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "cell_type": "code",
+      "source": [
+        "! git clone --depth 1 https://github.com/tensorflow/models"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
    {
      "metadata": {
        "id": "yVvFyhnkcYvL",
@@ -130,12 +143,9 @@
      },
      "cell_type": "code",
      "source": [
-        "if \"wide_deep\" not in os.getcwd():\n",
-        "    ! git clone --depth 1 https://github.com/tensorflow/models\n",
-        "    models_path = os.path.join(os.getcwd(), 'models')\n",
-        "    sys.path.append(models_path)   \n",
-        "    os.environ['PYTHONPATH'] += os.pathsep+models_path\n",
-        "    os.chdir(\"models/official/wide_deep\")"
+        "models_path = os.path.join(os.getcwd(), 'models')\n",
+        "\n",
+        "sys.path.append(models_path)"
      ],
      "execution_count": 0,
      "outputs": []
@@ -158,8 +168,8 @@
      },
      "cell_type": "code",
      "source": [
-        "import census_dataset\n",
-        "import census_main\n",
+        "from official.wide_deep import census_dataset\n",
+        "from official.wide_deep import census_main\n",
        "\n",
        "census_dataset.download(\"/tmp/census_data/\")"
      ],
@@ -173,19 +183,65 @@
      },
      "cell_type": "markdown",
      "source": [
-        "Execute the tutorial code with the following command to train the model described in this tutorial, from the command line:"
+        "To execute the tutorial code from the command line first add the path to tensorflow/models to your `PYTHONPATH`."
+      ]
+    },
+    {
+      "metadata": {
+        "id": "DYOkY8boUptJ",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "cell_type": "code",
+      "source": [
+        "#export PYTHONPATH=${PYTHONPATH}:\"$(pwd)/models\"\n",
+        "os.environ['PYTHONPATH'] += os.pathsep+models_path"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "metadata": {
+        "id": "5r0V9YUMUyoh",
+        "colab_type": "text"
+      },
+      "cell_type": "markdown",
+      "source": [
+        "Use `--help` to see what command line options are available: "
      ]
    },
    {
      "metadata": {
-        "id": "vbJ8jPAhcYvT",
+        "id": "1_3tBaLW4YM4",
        "colab_type": "code",
        "colab": {}
      },
      "cell_type": "code",
      "source": [
-        "output = !python -m census_main --model_type=wide --train_epochs=2\n",
-        "print([line for line in output if 'accuracy:' in line])"
+        "!python -m official.wide_deep.census_main --help"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "metadata": {
+        "id": "RrMLazEN6DMj",
+        "colab_type": "text"
+      },
+      "cell_type": "markdown",
+      "source": [
+        "Now run the model:\n"
+      ]
+    },
+    {
+      "metadata": {
+        "id": "py7MarZl5Yh6",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "cell_type": "code",
+      "source": [
+        "!python -m official.wide_deep.census_main --model_type=wide --train_epochs=2"
      ],
      "execution_count": 0,
      "outputs": []
@@ -322,8 +378,7 @@
      "cell_type": "code",
      "source": [
        "def easy_input_function(df, label_key, num_epochs, shuffle, batch_size):\n",
-        "  df = df.copy()\n",
-        "  label = df.pop(label_key)\n",
+        "  label = df[label_key]\n",
        "  ds = tf.data.Dataset.from_tensor_slices((dict(df),label))\n",
        "\n",
        "  if shuffle:\n",
@@ -374,9 +429,9 @@
      "cell_type": "markdown",
      "source": [
        "But this approach has severly-limited scalability. For larger data it should be streamed off disk.\n",
-        "the `census_dataset.input_fn` provides an example of how to do this using `tf.decode_csv` and `tf.data.TextLineDataset`: \n",
+        "The `census_dataset.input_fn` provides an example of how to do this using `tf.decode_csv` and `tf.data.TextLineDataset`: \n",
        "\n",
-        "TODO(markdaoust): This `input_fn` should use `tf.contrib.data.make_csv_dataset`"
+        "<!-- TODO(markdaoust): This `input_fn` should use `tf.contrib.data.make_csv_dataset` -->"
      ]
    },
    {
@@ -470,7 +525,7 @@
        "\n",
        "Estimators use a system called `feature_columns` to describe how the model\n",
        "should interpret each of the raw input features. An Estimator exepcts a vector\n",
-        "of numeric inputs, and feature columns describe how the model shoukld convert\n",
+        "of numeric inputs, and feature columns describe how the model should convert\n",
        "each feature.\n",
        "\n",
        "Selecting and crafting the right set of feature columns is key to learning an\n",
@@ -752,7 +807,7 @@
      },
      "cell_type": "markdown",
      "source": [
-        "if we run `input_layer` with the hashed column we see that the output shape is `(batch_size, hash_bucket_size)`"
+        "If we run `input_layer` with the hashed column we see that the output shape is `(batch_size, hash_bucket_size)`"
      ]
    },
    {
@@ -1259,11 +1314,24 @@
        "\n",
        "For more about estimators:\n",
        "\n",
-        "- The [TensorFlow Hub transfer-learning tutorial](https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub)\n",
+        "- The [TensorFlow Hub text classification tutorial](https://www.tensorflow.org/hub/tutorials/text_classification_with_tf_hub) uses `hub.text_embedding_column` to easily ingest free form text. \n",
        "- The [Gradient-boosted-trees estimator tutorial](https://github.com/tensorflow/models/tree/master/official/boosted_trees)\n",
        "- This [blog post]( https://medium.com/tensorflow/classifying-text-with-tensorflow-estimators) on processing text with `Estimators`\n",
        "- How to [build a custom CNN estimator](https://www.tensorflow.org/tutorials/estimators/cnn)"
      ]
+    },
+    {
+      "metadata": {
+        "id": "amMnupRPVtsa",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "cell_type": "code",
+      "source": [
+        ""
+      ],
+      "execution_count": 0,
+      "outputs": []
    }
  ]
 }
\ No newline at end of file