RemoteMachineMode.md 2.14 KB
Newer Older
Deshui Yu's avatar
Deshui Yu committed
1
**Run an Experiment on Multiple Machines**
Chi Song's avatar
Chi Song committed
2

Deshui Yu's avatar
Deshui Yu committed
3
===
4

Chi Song's avatar
Chi Song committed
5
6
7
NNI supports running an experiment on multiple machines through SSH channel, called `remote` mode. NNI assumes that you have access to those machines, and already setup the environment for running deep learning training code.

e.g. Three machines and you login in with account `bob` (Note: the account is not necessarily the same on different machine):
SparkSnail's avatar
SparkSnail committed
8
9
10
11

| IP  | Username| Password |
| -------- |---------|-------|
| 10.1.1.1 | bob | bob123    |
Deshui Yu's avatar
Deshui Yu committed
12
13
14
| 10.1.1.2 | bob | bob123    |
| 10.1.1.3 | bob | bob123    |

15
## Setup NNI environment
Chi Song's avatar
Chi Song committed
16

Deshui Yu's avatar
Deshui Yu committed
17
18
Install NNI on each of your machines following the install guide [here](GetStarted.md).

19
20
21
22
For remote machines that are used only to run trials but not the nnictl, you can just install python SDK:

* __Install python SDK through pip__

Chi Song's avatar
Chi Song committed
23
24
25
  ```bash
  python3 -m pip install --user --upgrade nni-sdk
  ```
26

Deshui Yu's avatar
Deshui Yu committed
27
## Run an experiment
Chi Song's avatar
Chi Song committed
28

29
30
Install NNI on another machine which has network accessibility to those three machines above, or you can just use any machine above to run nnictl command line tool.

Chi Song's avatar
Chi Song committed
31
32
33
We use `examples/trials/mnist-annotation` as an example here. `cat ~/nni/examples/trials/mnist-annotation/config_remote.yml` to see the detailed configuration file:

```yaml
34
35
36
37
38
39
40
41
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
#choice: true, false
Deshui Yu's avatar
Deshui Yu committed
42
43
useAnnotation: true
tuner:
44
45
  #choice: TPE, Random, Anneal, Evolution, BatchTuner
  #SMAC (SMAC should be installed through nnictl)
46
47
  builtinTunerName: TPE
  classArgs:
48
    #choice: maximize, minimize
49
    optimize_mode: maximize
Deshui Yu's avatar
Deshui Yu committed
50
trial:
51
52
  command: python3 mnist.py
  codeDir: .
53
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
54
55
56
57
58
#machineList can be empty if the platform is local
machineList:
  - ip: 10.1.1.1
    username: bob
    passwd: bob123
59
60
    #port can be skip if using default ssh port 22
    #port: 22
Deshui Yu's avatar
Deshui Yu committed
61
62
63
64
65
66
67
  - ip: 10.1.1.2
    username: bob
    passwd: bob123
  - ip: 10.1.1.3
    username: bob
    passwd: bob123
```
Chi Song's avatar
Chi Song committed
68

69
Simply filling the `machineList` section and then run:
Chi Song's avatar
Chi Song committed
70
71

```bash
72
nnictl create --config ~/nni/examples/trials/mnist-annotation/config_remote.yml
Deshui Yu's avatar
Deshui Yu committed
73
```
Chi Song's avatar
Chi Song committed
74
75

to start the experiment.