"vscode:/vscode.git/clone" did not exist on "00b05ab3533b35ae216c96c7dca7b3b61659237e"
Commit 25fe395c authored by Eli Bixby's avatar Eli Bixby
Browse files

Add CMLE specific instructions

parent 79e40801
......@@ -81,6 +81,40 @@ $ python cifar10_main.py --data_dir=/prefix/to/downloaded/data/cifar-10-batches-
## How to run on distributed mode
### (Optional) Running on Google Cloud Machine Learning Engine
This example can be run on Google Cloud Machine Learning Engine (ML Engine), which will configure the environment and take care of running workers, parameters servers, and masters in a fault tolerant way.
To install the command line tool, and set up a project and billing, see the quickstart [here](https://cloud.google.com/ml-engine/docs/quickstarts/command-line).
You'll also need a Google Cloud Storage bucket for the data. If you followed the instructions above, you can just run:
```
MY_BUCKET=gs://<my-bucket-name>
gsutil cp -r cifar-10-batches-py $MY_BUCKET/
```
Then run the following command from the `tutorials/image` directory of this repository (the parent directory of this README):
```
gcloud ml-engine jobs submit training cifarmultigpu \
--runtime-version 1.2 \
--staging-bucket $MY_BUCKET \
--config cifar10_estimator/job_config.yaml \
--package-path cifar10_estimator/ \
--region us-central1 \
--module-name cifar10_estimator.cifar10_main \
-- \
--data_dir=$MY_BUCKET/cifar-10-batches-py \
--model_dir=$MY_BUCKLET/model_dirs/cifarmultigpu \
--is_cpu_ps=True \
--force_gpu_compatible=True \
--num_gpus=4 \
--train_steps=1000 \
--run_experiment=True
```
### Set TF_CONFIG
Considering that you already have multiple hosts configured, all you need is a `TF_CONFIG`
......
trainingInput:
scaleTier: CUSTOM
masterType: complex_model_m_gpu
workerType: complex_model_m_gpu
parameterServerType: complex_model_m
workerCount: 1
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment