Port multi host gpu training instructions.

PiperOrigin-RevId: 303779613

Port multi host gpu training instructions.
PiperOrigin-RevId: 303779613
70a3d96e · Hongkun Yu · A. Unique TensorFlower · fc02382c · 70a3d96e
Commit 70a3d96e authored Mar 30, 2020 by Hongkun Yu Committed by A. Unique TensorFlower Mar 30, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 16 additions and 2 deletions

official/vision/image_classification/README.md official/vision/image_classification/README.md +16 -2

No files found.
--- a/official/vision/image_classification/README.md
+++ b/official/vision/image_classification/README.md
@@ -29,11 +29,25 @@ provide a few options.
 Note: These models will **not** work with TPUs on Colab.
 You can train image classification models on Cloud TPUs using
-`tf.distribute.TPUStrategy`. If you are not familiar with Cloud TPUs, it is
+[tf.distribute.experimental.TPUStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/TPUStrategy?version=nightly).
-strongly recommended that you go through the
+If you are not familiar with Cloud TPUs, it is strongly recommended that you go
+through the
 [quickstart](https://cloud.google.com/tpu/docs/quickstart) to learn how to
 create a TPU and GCE VM.
+### Running on multiple GPU hosts
+You can also train these models on multiple hosts, each with GPUs, using
+[tf.distribute.experimental.MultiWorkerMirroredStrategy](https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy).
+The easiest way to run multi-host benchmarks is to set the
+[`TF_CONFIG`](https://www.tensorflow.org/guide/distributed_training#TF_CONFIG)
+appropriately at each host.  e.g., to run using `MultiWorkerMirroredStrategy` on
+2 hosts, the `cluster` in `TF_CONFIG` should have 2 `host:port` entries, and
+host `i` should have the `task` in `TF_CONFIG` set to `{"type": "worker",
+"index": i}`.  `MultiWorkerMirroredStrategy` will automatically use all the
+available GPUs at each host.
 ## MNIST
 To download the data and run the MNIST sample model locally for the first time,