"configs/vscode:/vscode.git/clone" did not exist on "a87c110bf78b40dfcd3356802c1fb2e36a33f3b3"
Commit c5d3f1f2 authored by Ido Shamay's avatar Ido Shamay Committed by drpngx
Browse files

inception: Added protocol flag when running distributed (#1401)

Default is TensorFlow default of 'grpc' communication protocol.
If TensorFlow was complied with Verbs support 'grpc+verbs' can be
used to accelerate the tensor passing communication.
parent 39c59d13
...@@ -367,6 +367,13 @@ I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPo ...@@ -367,6 +367,13 @@ I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:206] Initialize HostPo
I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:202] Started server with target: grpc://localhost:2222 I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:202] Started server with target: grpc://localhost:2222
``` ```
If you compiled TensorFlow (from v1.1-rc3) with VERBS support and you have the
required device and IB verbs SW stack, you can specify --protocol='grpc+verbs'
In order to use Verbs RDMA for Tensor passing between workers and ps.
Need to add the the --protocol flag in all tasks (ps and workers).
The default protocol is the TensorFlow default protocol of grpc.
[Congratulations!](https://www.youtube.com/watch?v=9bZkp7q19f0) You are now [Congratulations!](https://www.youtube.com/watch?v=9bZkp7q19f0) You are now
training Inception in a distributed manner. training Inception in a distributed manner.
......
...@@ -45,7 +45,8 @@ def main(unused_args): ...@@ -45,7 +45,8 @@ def main(unused_args):
{'ps': ps_hosts, {'ps': ps_hosts,
'worker': worker_hosts}, 'worker': worker_hosts},
job_name=FLAGS.job_name, job_name=FLAGS.job_name,
task_index=FLAGS.task_id) task_index=FLAGS.task_id,
protocol=FLAGS.protocol)
if FLAGS.job_name == 'ps': if FLAGS.job_name == 'ps':
# `ps` jobs wait for incoming connections from the workers. # `ps` jobs wait for incoming connections from the workers.
......
...@@ -42,6 +42,9 @@ tf.app.flags.DEFINE_string('worker_hosts', '', ...@@ -42,6 +42,9 @@ tf.app.flags.DEFINE_string('worker_hosts', '',
"""Comma-separated list of hostname:port for the """ """Comma-separated list of hostname:port for the """
"""worker jobs. e.g. """ """worker jobs. e.g. """
"""'machine1:2222,machine2:1111,machine2:2222'""") """'machine1:2222,machine2:1111,machine2:2222'""")
tf.app.flags.DEFINE_string('protocol', 'grpc',
"""Communication protocol to use in distributed """
"""execution (default grpc) """)
tf.app.flags.DEFINE_string('train_dir', '/tmp/imagenet_train', tf.app.flags.DEFINE_string('train_dir', '/tmp/imagenet_train',
"""Directory where to write event logs """ """Directory where to write event logs """
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment