Update the instruction to run TF2 object detection models in CAIP

PiperOrigin-RevId: 385042521

Update the instruction to run TF2 object detection models in CAIP
PiperOrigin-RevId: 385042521
c705089f · A. Unique TensorFlower · TF Object Detection Team · d095658a · c705089f · c705089f
Commit c705089f authored Jul 15, 2021 by A. Unique TensorFlower Committed by TF Object Detection Team Jul 15, 2021
2 changed files
--- a/research/object_detection/dockerfiles/tf2_ai_platform/Dockerfile
+++ b/research/object_detection/dockerfiles/tf2_ai_platform/Dockerfile
+FROM tensorflow/tensorflow:latest-gpu
+
+ARG DEBIAN_FRONTEND=noninteractive
+
+# Install apt dependencies
+RUN apt-get update && apt-get install -y \
+    git \
+    gpg-agent \
+    python3-cairocffi \
+    protobuf-compiler \
+    python3-pil \
+    python3-lxml \
+    python3-tk \
+    python3-opencv \
+    wget
+
+# Installs google cloud sdk, this is mostly for using gsutil to export model.
+RUN wget -nv \
+    https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz && \
+    mkdir /root/tools && \
+    tar xvzf google-cloud-sdk.tar.gz -C /root/tools && \
+    rm google-cloud-sdk.tar.gz && \
+    /root/tools/google-cloud-sdk/install.sh --usage-reporting=false \
+        --path-update=false --bash-completion=false \
+        --disable-installation-options && \
+    rm -rf /root/.config/* && \
+    ln -s /root/.config /config && \
+    rm -rf /root/tools/google-cloud-sdk/.install/.backup
+
+# Path configuration
+ENV PATH $PATH:/root/tools/google-cloud-sdk/bin
+# Make sure gsutil will use the default service account
+RUN echo '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg
+
+WORKDIR /home/tensorflow
+
+## Copy this code (make sure you are under the ../models/research directory)
+COPY . /home/tensorflow/models
+
+# Compile protobuf configs
+RUN (cd /home/tensorflow/models/ && protoc object_detection/protos/*.proto --python_out=.)
+WORKDIR /home/tensorflow/models/
+
+RUN cp object_detection/packages/tf2/setup.py ./
+ENV PATH="/home/tensorflow/.local/bin:${PATH}"
+
+RUN python -m pip install -U pip
+RUN python -m pip install .
+
+ENTRYPOINT ["python", "object_detection/model_main_tf2.py"]
--- a/research/object_detection/g3doc/tf2_training_and_evaluation.md
+++ b/research/object_detection/g3doc/tf2_training_and_evaluation.md
@@ -187,21 +187,28 @@ evaluation jobs for a few iterations [locally on their own machines](#local).

 ### Training with multiple GPUs

-A user can start a training job on Cloud AI Platform using the following
-command:
+A user can start a training job on Cloud AI Platform following the instruction
+https://cloud.google.com/ai-platform/training/docs/custom-containers-training.

 ```bash
+git clone https://github.com/tensorflow/models.git
+
 # From the tensorflow/models/research/ directory
-cp object_detection/packages/tf2/setup.py .
+cp object_detection/dockerfiles/tf2_ai_platform/Dockerfile .
+
+docker build -t gcr.io/${DOCKER_IMAGE_URI} .
+
+docker push gcr.io/${DOCKER_IMAGE_URI}
+```
+
+```bash
 gcloud ai-platform jobs submit training object_detection_`date +%m_%d_%Y_%H_%M_%S` \
-    --runtime-version 2.1 \
-    --python-version 3.6 \
    --job-dir=gs://${MODEL_DIR} \
-    --package-path ./object_detection \
-    --module-name object_detection.model_main_tf2 \
    --region us-central1 \
    --master-machine-type n1-highcpu-16 \
    --master-accelerator count=8,type=nvidia-tesla-v100 \
+    --master-image-uri gcr.io/${DOCKER_IMAGE_URI} \
+    --scale-tier CUSTOM \
    -- \
    --model_dir=gs://${MODEL_DIR} \
    --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH}
@@ -210,15 +217,16 @@ gcloud ai-platform jobs submit training object_detection_`date +%m_%d_%Y_%H_%M_%
 Where `gs://${MODEL_DIR}` specifies the directory on Google Cloud Storage where
 the training checkpoints and events will be written to and
 `gs://${PIPELINE_CONFIG_PATH}` points to the pipeline configuration stored on
-Google Cloud Storage.
+Google Cloud Storage, and `gcr.io/${DOCKER_IMAGE_URI}` points to the docker
+image stored in Google Container Registry.

 Users can monitor the progress of their training job on the
 [ML Engine Dashboard](https://console.cloud.google.com/ai-platform/jobs).

 ### Training with TPU

-Launching a training job with a TPU compatible pipeline config requires using a
-similar command:
+Launching a training job with a TPU compatible pipeline config requires using
+the following command:

 ```bash
 # From the tensorflow/models/research/ directory
@@ -246,16 +254,11 @@ Evaluation jobs run on a single machine. Run the following command to start the
 evaluation job:

 ```bash
-# From the tensorflow/models/research/ directory
-cp object_detection/packages/tf2/setup.py .
 gcloud ai-platform jobs submit training object_detection_eval_`date +%m_%d_%Y_%H_%M_%S` \
-    --runtime-version 2.1 \
-    --python-version 3.6 \
    --job-dir=gs://${MODEL_DIR} \
-    --package-path ./object_detection \
-    --module-name object_detection.model_main_tf2 \
    --region us-central1 \
    --scale-tier BASIC_GPU \
+    --master-image-uri gcr.io/${DOCKER_IMAGE_URI} \
    -- \
    --model_dir=gs://${MODEL_DIR} \
    --pipeline_config_path=gs://${PIPELINE_CONFIG_PATH} \
@@ -264,8 +267,9 @@ gcloud ai-platform jobs submit training object_detection_eval_`date +%m_%d_%Y_%H

 where `gs://${MODEL_DIR}` points to the directory on Google Cloud Storage where
 training checkpoints are saved and `gs://{PIPELINE_CONFIG_PATH}` points to where
-the model configuration file stored on Google Cloud Storage. Evaluation events
-are written to `gs://${MODEL_DIR}/eval`
+the model configuration file stored on Google Cloud Storage, and
+`gcr.io/${DOCKER_IMAGE_URI}` points to the docker image stored in Google
+Container Registry. Evaluation events are written to `gs://${MODEL_DIR}/eval`

 Typically one starts an evaluation job concurrently with the training job. Note
 that we do not support running evaluation on TPU.