Commit c539b46d authored by Neal Wu's avatar Neal Wu
Browse files

Additional inception fixes

parent 9da44850
...@@ -111,15 +111,12 @@ ready to train or evaluate with the ImageNet data set. ...@@ -111,15 +111,12 @@ ready to train or evaluate with the ImageNet data set.
intensive task and depending on your compute setup may take several days or even intensive task and depending on your compute setup may take several days or even
weeks. weeks.
*Before proceeding* please read the [Convolutional Neural Networks] *Before proceeding* please read the [Convolutional Neural Networks](https://www.tensorflow.org/tutorials/deep_cnn/index.html) tutorial; in
(https://www.tensorflow.org/tutorials/deep_cnn/index.html) tutorial in particular, focus on [Training a Model Using Multiple GPU Cards](https://www.tensorflow.org/tutorials/deep_cnn/index.html#launching_and_training_the_model_on_multiple_gpu_cards). The model training method is nearly identical to that described in the
particular focus on [Training a Model Using Multiple GPU Cards]
(https://www.tensorflow.org/tutorials/deep_cnn/index.html#training-a-model-using-multiple-gpu-cards)
. The model training method is nearly identical to that described in the
CIFAR-10 multi-GPU model training. Briefly, the model training CIFAR-10 multi-GPU model training. Briefly, the model training
* Places an individual model replica on each GPU. Split the batch across the * Places an individual model replica on each GPU.
GPUs. * Splits the batch across the GPUs.
* Updates model parameters synchronously by waiting for all GPUs to finish * Updates model parameters synchronously by waiting for all GPUs to finish
processing a batch of data. processing a batch of data.
...@@ -245,11 +242,9 @@ We term each machine that maintains model parameters a `ps`, short for ...@@ -245,11 +242,9 @@ We term each machine that maintains model parameters a `ps`, short for
`ps` as the model parameters may be sharded across multiple machines. `ps` as the model parameters may be sharded across multiple machines.
Variables may be updated with synchronous or asynchronous gradient updates. One Variables may be updated with synchronous or asynchronous gradient updates. One
may construct a an [`Optimizer`] may construct a an [`Optimizer`](https://www.tensorflow.org/api_docs/python/train.html#optimizers) in TensorFlow
(https://www.tensorflow.org/api_docs/python/train.html#optimizers) in TensorFlow that constructs the necessary graph for either case diagrammed below from the
that constructs the necessary graph for either case diagrammed below from TensorFlow [Whitepaper](http://download.tensorflow.org/paper/whitepaper2015.pdf):
TensorFlow [Whitepaper]
(http://download.tensorflow.org/paper/whitepaper2015.pdf):
<div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;"> <div style="width:40%; margin:auto; margin-bottom:10px; margin-top:20px;">
<img style="width:100%" <img style="width:100%"
...@@ -380,10 +375,8 @@ training Inception in a distributed manner. ...@@ -380,10 +375,8 @@ training Inception in a distributed manner.
Evaluating an Inception v3 model on the ImageNet 2012 validation data set Evaluating an Inception v3 model on the ImageNet 2012 validation data set
requires running a separate binary. requires running a separate binary.
The evaluation procedure is nearly identical to [Evaluating a Model] The evaluation procedure is nearly identical to [Evaluating a Model](https://www.tensorflow.org/tutorials/deep_cnn/index.html#evaluating_a_model)
(https://www.tensorflow.org/tutorials/deep_cnn/index.html#evaluating-a-model) described in the [Convolutional Neural Network](https://www.tensorflow.org/tutorials/deep_cnn/index.html) tutorial.
described in the [Convolutional Neural Network]
(https://www.tensorflow.org/tutorials/deep_cnn/index.html) tutorial.
**WARNING** Be careful not to run the evaluation and training binary on the same **WARNING** Be careful not to run the evaluation and training binary on the same
GPU or else you might run out of memory. Consider running the evaluation on a GPU or else you might run out of memory. Consider running the evaluation on a
...@@ -438,8 +431,7 @@ daisy, dandelion, roses, sunflowers, tulips ...@@ -438,8 +431,7 @@ daisy, dandelion, roses, sunflowers, tulips
There is a single automated script that downloads the data set and converts it There is a single automated script that downloads the data set and converts it
to the TFRecord format. Much like the ImageNet data set, each record in the to the TFRecord format. Much like the ImageNet data set, each record in the
TFRecord format is a serialized `tf.Example` proto whose entries include a TFRecord format is a serialized `tf.Example` proto whose entries include a
JPEG-encoded string and an integer label. Please see [`parse_example_proto`] JPEG-encoded string and an integer label. Please see [`parse_example_proto`](inception/image_processing.py) for details.
(inception/image_processing.py) for details.
The script just takes a few minutes to run depending your network connection The script just takes a few minutes to run depending your network connection
speed for downloading and processing the images. Your hard disk requires 200MB speed for downloading and processing the images. Your hard disk requires 200MB
...@@ -471,14 +463,12 @@ and `validation-?????-of-00002`, respectively. ...@@ -471,14 +463,12 @@ and `validation-?????-of-00002`, respectively.
**NOTE** If you wish to prepare a custom image data set for transfer learning, **NOTE** If you wish to prepare a custom image data set for transfer learning,
you will need to invoke [`build_image_data.py`](inception/data/build_image_data.py) on you will need to invoke [`build_image_data.py`](inception/data/build_image_data.py) on
your custom data set. Please see the associated options and assumptions behind your custom data set. Please see the associated options and assumptions behind
this script by reading the comments section of [`build_image_data.py`] this script by reading the comments section of [`build_image_data.py`](inception/data/build_image_data.py). Also, if your custom data has a different
(inception/data/build_image_data.py). Also, if your custom data has a different
number of examples or classes, you need to change the appropriate values in number of examples or classes, you need to change the appropriate values in
[`imagenet_data.py`](inception/imagenet_data.py). [`imagenet_data.py`](inception/imagenet_data.py).
The second piece you will need is a trained Inception v3 image model. You have The second piece you will need is a trained Inception v3 image model. You have
the option of either training one yourself (See [How to Train from Scratch] the option of either training one yourself (See [How to Train from Scratch](#how-to-train-from-scratch) for details) or you can download a pre-trained
(#how-to-train-from-scratch) for details) or you can download a pre-trained
model like so: model like so:
```shell ```shell
...@@ -806,8 +796,7 @@ comments in [`image_processing.py`](inception/image_processing.py) for more deta ...@@ -806,8 +796,7 @@ comments in [`image_processing.py`](inception/image_processing.py) for more deta
#### The model runs out of CPU memory. #### The model runs out of CPU memory.
In lieu of buying more CPU memory, an easy fix is to decrease In lieu of buying more CPU memory, an easy fix is to decrease
`--input_queue_memory_factor`. See [Adjusting Memory Demands] `--input_queue_memory_factor`. See [Adjusting Memory Demands](#adjusting-memory-demands).
(#adjusting-memory-demands).
#### The model runs out of GPU memory. #### The model runs out of GPU memory.
......
...@@ -32,7 +32,7 @@ a sharded data set consisting of TFRecord files ...@@ -32,7 +32,7 @@ a sharded data set consisting of TFRecord files
train_directory/train-00000-of-01024 train_directory/train-00000-of-01024
train_directory/train-00001-of-01024 train_directory/train-00001-of-01024
... ...
train_directory/train-00127-of-01024 train_directory/train-01023-of-01024
and and
...@@ -50,7 +50,7 @@ contains the following fields: ...@@ -50,7 +50,7 @@ contains the following fields:
image/width: integer, image width in pixels image/width: integer, image width in pixels
image/colorspace: string, specifying the colorspace, always 'RGB' image/colorspace: string, specifying the colorspace, always 'RGB'
image/channels: integer, specifying the number of channels, always 3 image/channels: integer, specifying the number of channels, always 3
image/format: string, specifying the format, always'JPEG' image/format: string, specifying the format, always 'JPEG'
image/filename: string containing the basename of the image file image/filename: string containing the basename of the image file
e.g. 'n01440764_10026.JPEG' or 'ILSVRC2012_val_00000293.JPEG' e.g. 'n01440764_10026.JPEG' or 'ILSVRC2012_val_00000293.JPEG'
...@@ -60,7 +60,7 @@ contains the following fields: ...@@ -60,7 +60,7 @@ contains the following fields:
image/class/text: string specifying the human-readable version of the label image/class/text: string specifying the human-readable version of the label
e.g. 'dog' e.g. 'dog'
If you data set involves bounding boxes, please look at build_imagenet_data.py. If your data set involves bounding boxes, please look at build_imagenet_data.py.
""" """
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -72,7 +72,6 @@ import random ...@@ -72,7 +72,6 @@ import random
import sys import sys
import threading import threading
import numpy as np import numpy as np
import tensorflow as tf import tensorflow as tf
...@@ -306,7 +305,7 @@ def _process_image_files(name, filenames, texts, labels, num_shards): ...@@ -306,7 +305,7 @@ def _process_image_files(name, filenames, texts, labels, num_shards):
spacing = np.linspace(0, len(filenames), FLAGS.num_threads + 1).astype(np.int) spacing = np.linspace(0, len(filenames), FLAGS.num_threads + 1).astype(np.int)
ranges = [] ranges = []
for i in range(len(spacing) - 1): for i in range(len(spacing) - 1):
ranges.append([spacing[i], spacing[i+1]]) ranges.append([spacing[i], spacing[i + 1]])
# Launch a thread for each batch. # Launch a thread for each batch.
print('Launching %d threads for spacings: %s' % (FLAGS.num_threads, ranges)) print('Launching %d threads for spacings: %s' % (FLAGS.num_threads, ranges))
......
...@@ -36,7 +36,7 @@ a sharded data set consisting of 1024 and 128 TFRecord files, respectively. ...@@ -36,7 +36,7 @@ a sharded data set consisting of 1024 and 128 TFRecord files, respectively.
train_directory/train-00000-of-01024 train_directory/train-00000-of-01024
train_directory/train-00001-of-01024 train_directory/train-00001-of-01024
... ...
train_directory/train-00127-of-01024 train_directory/train-01023-of-01024
and and
...@@ -54,7 +54,7 @@ serialized Example proto. The Example proto contains the following fields: ...@@ -54,7 +54,7 @@ serialized Example proto. The Example proto contains the following fields:
image/width: integer, image width in pixels image/width: integer, image width in pixels
image/colorspace: string, specifying the colorspace, always 'RGB' image/colorspace: string, specifying the colorspace, always 'RGB'
image/channels: integer, specifying the number of channels, always 3 image/channels: integer, specifying the number of channels, always 3
image/format: string, specifying the format, always'JPEG' image/format: string, specifying the format, always 'JPEG'
image/filename: string containing the basename of the image file image/filename: string containing the basename of the image file
e.g. 'n01440764_10026.JPEG' or 'ILSVRC2012_val_00000293.JPEG' e.g. 'n01440764_10026.JPEG' or 'ILSVRC2012_val_00000293.JPEG'
...@@ -80,7 +80,7 @@ serialized Example proto. The Example proto contains the following fields: ...@@ -80,7 +80,7 @@ serialized Example proto. The Example proto contains the following fields:
Note that the length of xmin is identical to the length of xmax, ymin and ymax Note that the length of xmin is identical to the length of xmax, ymin and ymax
for each example. for each example.
Running this script using 16 threads may take around ~2.5 hours on a HP Z420. Running this script using 16 threads may take around ~2.5 hours on an HP Z420.
""" """
from __future__ import absolute_import from __future__ import absolute_import
from __future__ import division from __future__ import division
...@@ -92,7 +92,6 @@ import random ...@@ -92,7 +92,6 @@ import random
import sys import sys
import threading import threading
import numpy as np import numpy as np
import tensorflow as tf import tensorflow as tf
...@@ -435,7 +434,7 @@ def _process_image_files(name, filenames, synsets, labels, humans, ...@@ -435,7 +434,7 @@ def _process_image_files(name, filenames, synsets, labels, humans,
ranges = [] ranges = []
threads = [] threads = []
for i in range(len(spacing) - 1): for i in range(len(spacing) - 1):
ranges.append([spacing[i], spacing[i+1]]) ranges.append([spacing[i], spacing[i + 1]])
# Launch a thread for each batch. # Launch a thread for each batch.
print('Launching %d threads for spacings: %s' % (FLAGS.num_threads, ranges)) print('Launching %d threads for spacings: %s' % (FLAGS.num_threads, ranges))
......
...@@ -35,7 +35,7 @@ ...@@ -35,7 +35,7 @@
set -e set -e
if [ -z "$1" ]; then if [ -z "$1" ]; then
echo "usage download_and_preprocess_flowers.sh [data dir]" echo "Usage: download_and_preprocess_flowers.sh [data dir]"
exit exit
fi fi
......
...@@ -35,7 +35,7 @@ ...@@ -35,7 +35,7 @@
set -e set -e
if [ -z "$1" ]; then if [ -z "$1" ]; then
echo "usage download_and_preprocess_flowers.sh [data dir]" echo "Usage: download_and_preprocess_flowers.sh [data dir]"
exit exit
fi fi
......
...@@ -49,7 +49,7 @@ ...@@ -49,7 +49,7 @@
set -e set -e
if [ -z "$1" ]; then if [ -z "$1" ]; then
echo "usage download_and_preprocess_imagenet.sh [data dir]" echo "Usage: download_and_preprocess_imagenet.sh [data dir]"
exit exit
fi fi
...@@ -84,7 +84,7 @@ BOUNDING_BOX_FILE="${SCRATCH_DIR}/imagenet_2012_bounding_boxes.csv" ...@@ -84,7 +84,7 @@ BOUNDING_BOX_FILE="${SCRATCH_DIR}/imagenet_2012_bounding_boxes.csv"
BOUNDING_BOX_DIR="${SCRATCH_DIR}bounding_boxes/" BOUNDING_BOX_DIR="${SCRATCH_DIR}bounding_boxes/"
"${BOUNDING_BOX_SCRIPT}" "${BOUNDING_BOX_DIR}" "${LABELS_FILE}" \ "${BOUNDING_BOX_SCRIPT}" "${BOUNDING_BOX_DIR}" "${LABELS_FILE}" \
| sort >"${BOUNDING_BOX_FILE}" | sort > "${BOUNDING_BOX_FILE}"
echo "Finished downloading and preprocessing the ImageNet data." echo "Finished downloading and preprocessing the ImageNet data."
# Build the TFRecords version of the ImageNet data. # Build the TFRecords version of the ImageNet data.
......
...@@ -24,7 +24,7 @@ ...@@ -24,7 +24,7 @@
# downloading the raw images. # downloading the raw images.
# #
# usage: # usage:
# ./download_imagenet.sh [dirname] # ./download_imagenet.sh [dir name] [synsets file]
set -e set -e
if [ "x$IMAGENET_ACCESS_KEY" == x -o "x$IMAGENET_USERNAME" == x ]; then if [ "x$IMAGENET_ACCESS_KEY" == x -o "x$IMAGENET_USERNAME" == x ]; then
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment