Commit e0ce190d authored by Chao Ma's avatar Chao Ma Committed by Minjie Wang
Browse files

[Doc] Update README.MD of demo for distributed sampler (#579)

* Update README.md for the demo of distributed sampler

* Update README.MD of demo for distributed sampler
parent e0004f71
...@@ -13,11 +13,18 @@ ...@@ -13,11 +13,18 @@
pip install mxnet --pre pip install mxnet --pre
``` ```
### Usage ### Usage Guide
Assume that the user has already launched two instances (`instance_0` & `instance_1`) on AWS EC2, and also these two instances have the correct authority to access each other by TCP/IP protocol. Now we can treat `instance_0` as `Trainer` and `instance_1` as `Sampler`. Then, the user can start the trainer process and sampler process on these two instances separately. We have already provided a set of scripts to start the trainer and sampler process and users just need to change the `--ip` to their own IP address. Assume that the user has already launched two instances (`instance_0` & `instance_1`) on AWS EC2, and also these two instances have the correct authority to access each other by TCP/IP protocol. Now we can treat `instance_0` as `Trainer` and `instance_1` as `Sampler`. Then, the user can start the trainer process and sampler process on these two instances separately. We have already provided a set of scripts to start the trainer and sampler process and users just need to change the `--ip` to their own IP address.
For the sampler instance_0, users can change the `--num-sampler` option to set the number of the sampler. The `sampler.py` script will start `--num-sampler` processes concurrently to maximalize the system utilization. Users can also launch many samplers in parallel across a set of machines. For example, if we have `10` sampler instance and for each instance, we set the `--num-sampler` to `2`, we need to set the `--num-sampler` to `20`. Once we start the trainer process, users will see the following logging output:
```
[04:48:20] .../socket_communicator.cc:68: Bind to 127.0.0.1:2049
[04:48:20] .../socket_communicator.cc:74: Listen on 127.0.0.1:2049, wait sender connect ...
```
After that user can start the sampler process. For the sampler instance_0, users can change the `--num-sampler` option to set the number of the sampler. The `sampler.py` script will start `--num-sampler` processes concurrently to maximalize the system utilization. Users can also launch many samplers in parallel across a set of machines. For example, if we have `10` sampler instance and for each instance, we set the `--num-sampler` to `2`, we need to set the `--num-sampler` of the trainer instance to `20`.
### Neighbor Sampling & Skip Connection ### Neighbor Sampling & Skip Connection
......
...@@ -13,11 +13,18 @@ Dependencies ...@@ -13,11 +13,18 @@ Dependencies
pip install torch requests pip install torch requests
`` ``
### Usage ### Usage Guide
Assume that the user has already launched two instances (`instance_0` & `instance_1`) on AWS EC2, and also these two instances have the correct authority to access each other by TCP/IP protocol. Now we can treat `instance_0` as `Trainer` and `instance_1` as `Sampler`. Then, the user can start the trainer process and sampler process on these two instances separately. We have already provided a set of scripts to start the trainer and sampler process and users just need to change the `--ip` to their own IP address. Assume that the user has already launched two instances (`instance_0` & `instance_1`) on AWS EC2, and also these two instances have the correct authority to access each other by TCP/IP protocol. Now we can treat `instance_0` as `Trainer` and `instance_1` as `Sampler`. Then, the user can start the trainer process and sampler process on these two instances separately. We have already provided a set of scripts to start the trainer and sampler process and users just need to change the `--ip` to their own IP address.
For the sampler instance_0, users can change the `--num-sampler` option to set the number of the sampler. The `sampler.py` script will start `--num-sampler` processes concurrently to maximalize the system utilization. Users can also launch many samplers in parallel across a set of machines. For example, if we have `10` sampler instance and for each instance, we set the `--num-sampler` to `2`, we need to set the `--num-sampler` to `20`. Once we start the trainer process, users will see the following logging output:
```
[04:48:20] .../socket_communicator.cc:68: Bind to 127.0.0.1:2049
[04:48:20] .../socket_communicator.cc:74: Listen on 127.0.0.1:2049, wait sender connect ...
```
After that user can start the sampler process. For the sampler instance_0, users can change the `--num-sampler` option to set the number of the sampler. The `sampler.py` script will start `--num-sampler` processes concurrently to maximalize the system utilization. Users can also launch many samplers in parallel across a set of machines. For example, if we have `10` sampler instance and for each instance, we set the `--num-sampler` to `2`, we need to set the `--num-sampler` of the trainer instance to `20`.
### Neighbor Sampling & Skip Connection ### Neighbor Sampling & Skip Connection
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment