README.md 4.99 KB
Newer Older
Will Cromar's avatar
Will Cromar committed
1
2
# Image Classification

Allen Wang's avatar
Allen Wang committed
3
This folder contains TF 2.0 model examples for image classification:
Will Cromar's avatar
Will Cromar committed
4
5

* [MNIST](#mnist)
Allen Wang's avatar
Allen Wang committed
6
7
8
9
* [Classifier Trainer](#classifier-trainer), a framework that uses the Keras
compile/fit methods for image classification models, including:
  * ResNet
  * EfficientNet[^1]
Will Cromar's avatar
Will Cromar committed
10

Allen Wang's avatar
Allen Wang committed
11
[^1]: Currently a work in progress. We cannot match "AutoAugment (AA)" in [the original version](https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet).
Will Cromar's avatar
Will Cromar committed
12
13
14
For more information about other types of models, please refer to this
[README file](../../README.md).

Allen Wang's avatar
Allen Wang committed
15
## Before you begin
16
Please make sure that you have the latest version of TensorFlow
17
installed and
18
19
[add the models folder to your Python path](/official/#running-the-models).

Allen Wang's avatar
Allen Wang committed
20
### ImageNet preparation
21

22
Download the ImageNet dataset and convert it to TFRecord format.
23
24
25
26
The following [script](https://github.com/tensorflow/tpu/blob/master/tools/datasets/imagenet_to_gcs.py)
and [README](https://github.com/tensorflow/tpu/tree/master/tools/datasets#imagenet_to_gcspy)
provide a few options.

Will Cromar's avatar
Will Cromar committed
27
### Running on Cloud TPUs
Will Cromar's avatar
Will Cromar committed
28

Allen Wang's avatar
Allen Wang committed
29
Note: These models will **not** work with TPUs on Colab.
Will Cromar's avatar
Will Cromar committed
30

Allen Wang's avatar
Allen Wang committed
31
You can train image classification models on Cloud TPUs using
32
33
34
[tf.distribute.experimental.TPUStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/TPUStrategy?version=nightly).
If you are not familiar with Cloud TPUs, it is strongly recommended that you go
through the
Will Cromar's avatar
Will Cromar committed
35
36
37
[quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to
create a TPU and GCE VM.

38
39
40
41
42
43
44
45
46
47
48
49
50
### Running on multiple GPU hosts

You can also train these models on multiple hosts, each with GPUs, using
[tf.distribute.experimental.MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy).

The easiest way to run multi-host benchmarks is to set the
[`TF_CONFIG`](https://www.tensorflow.org/guide/distributed_training#TF_CONFIG)
appropriately at each host.  e.g., to run using `MultiWorkerMirroredStrategy` on
2 hosts, the `cluster` in `TF_CONFIG` should have 2 `host:port` entries, and
host `i` should have the `task` in `TF_CONFIG` set to `{"type": "worker",
"index": i}`.  `MultiWorkerMirroredStrategy` will automatically use all the
available GPUs at each host.

Allen Wang's avatar
Allen Wang committed
51
52
53
54
## MNIST

To download the data and run the MNIST sample model locally for the first time,
run one of the following command:
Will Cromar's avatar
Will Cromar committed
55
56

```bash
Allen Wang's avatar
Allen Wang committed
57
python3 mnist_main.py \
Will Cromar's avatar
Will Cromar committed
58
59
  --model_dir=$MODEL_DIR \
  --data_dir=$DATA_DIR \
Allen Wang's avatar
Allen Wang committed
60
61
62
63
  --train_epochs=10 \
  --distribution_strategy=one_device \
  --num_gpus=$NUM_GPUS \
  --download
Will Cromar's avatar
Will Cromar committed
64
65
```

Allen Wang's avatar
Allen Wang committed
66
To train the model on a Cloud TPU, run the following command:
Will Cromar's avatar
Will Cromar committed
67
68

```bash
Allen Wang's avatar
Allen Wang committed
69
python3 mnist_main.py \
Will Cromar's avatar
Will Cromar committed
70
71
72
  --tpu=$TPU_NAME \
  --model_dir=$MODEL_DIR \
  --data_dir=$DATA_DIR \
Allen Wang's avatar
Allen Wang committed
73
  --train_epochs=10 \
Will Cromar's avatar
Will Cromar committed
74
  --distribution_strategy=tpu \
Allen Wang's avatar
Allen Wang committed
75
  --download
Will Cromar's avatar
Will Cromar committed
76
77
```

Allen Wang's avatar
Allen Wang committed
78
Note: the `--download` flag is only required the first time you run the model.
Will Cromar's avatar
Will Cromar committed
79

Will Cromar's avatar
Will Cromar committed
80

Allen Wang's avatar
Allen Wang committed
81
82
83
84
85
86
## Classifier Trainer
The classifier trainer is a unified framework for running image classification
models using Keras's compile/fit methods. Experiments should be provided in the
form of YAML files, some examples are included within the configs/examples
folder. Please see [configs/examples](./configs/examples) for more example
configurations.
Will Cromar's avatar
Will Cromar committed
87

Allen Wang's avatar
Allen Wang committed
88
89
90
91
92
The provided configuration files use a per replica batch size and is scaled
by the number of devices. For instance, if `batch size` = 64, then for 1 GPU
the global batch size would be 64 * 1 = 64. For 8 GPUs, the global batch size
would be 64 * 8 = 512. Similarly, for a v3-8 TPU, the global batch size would
be 64 * 8 = 512, and for a v3-32, the global batch size is 64 * 32 = 2048.
Will Cromar's avatar
Will Cromar committed
93

Allen Wang's avatar
Allen Wang committed
94
95
96
### ResNet50

#### On GPU:
Will Cromar's avatar
Will Cromar committed
97
```bash
Allen Wang's avatar
Allen Wang committed
98
99
100
101
python3 classifier_trainer.py \
  --mode=train_and_eval \
  --model_type=resnet \
  --dataset=imagenet \
Will Cromar's avatar
Will Cromar committed
102
103
  --model_dir=$MODEL_DIR \
  --data_dir=$DATA_DIR \
Allen Wang's avatar
Allen Wang committed
104
105
  --config_file=configs/examples/resnet/imagenet/gpu.yaml \
  --params_override='runtime.num_gpus=$NUM_GPUS'
Will Cromar's avatar
Will Cromar committed
106
107
```

Allen Wang's avatar
Allen Wang committed
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
#### On TPU:
```bash
python3 classifier_trainer.py \
  --mode=train_and_eval \
  --model_type=resnet \
  --dataset=imagenet \
  --tpu=$TPU_NAME \
  --model_dir=$MODEL_DIR \
  --data_dir=$DATA_DIR \
  --config_file=config/examples/resnet/imagenet/tpu.yaml
```

### EfficientNet
**Note: EfficientNet development is a work in progress.**
#### On GPU:
```bash
python3 classifier_trainer.py \
  --mode=train_and_eval \
  --model_type=efficientnet \
  --dataset=imagenet \
  --model_dir=$MODEL_DIR \
  --data_dir=$DATA_DIR \
  --config_file=configs/examples/efficientnet/imagenet/efficientnet-b0-gpu.yaml \
  --params_override='runtime.num_gpus=$NUM_GPUS'
```
Will Cromar's avatar
Will Cromar committed
133

Allen Wang's avatar
Allen Wang committed
134
135

#### On TPU:
Will Cromar's avatar
Will Cromar committed
136
```bash
Allen Wang's avatar
Allen Wang committed
137
138
139
140
python3 classifier_trainer.py \
  --mode=train_and_eval \
  --model_type=efficientnet \
  --dataset=imagenet \
Will Cromar's avatar
Will Cromar committed
141
142
143
  --tpu=$TPU_NAME \
  --model_dir=$MODEL_DIR \
  --data_dir=$DATA_DIR \
Allen Wang's avatar
Allen Wang committed
144
  --config_file=config/examples/efficientnet/imagenet/efficientnet-b0-tpu.yaml
Will Cromar's avatar
Will Cromar committed
145
146
```

Allen Wang's avatar
Allen Wang committed
147
148
149
150
Note that the number of GPU devices can be overridden in the command line using
`--params_overrides`. The TPU does not need this override as the device is fixed
by providing the TPU address or name with the `--tpu` flag.