RemoteMachineMode.md 2.3 KB
Newer Older
Yan Ni's avatar
Yan Ni committed
1
# Run an Experiment on Multiple Machines
2

Chi Song's avatar
Chi Song committed
3
4
5
NNI supports running an experiment on multiple machines through SSH channel, called `remote` mode. NNI assumes that you have access to those machines, and already setup the environment for running deep learning training code.

e.g. Three machines and you login in with account `bob` (Note: the account is not necessarily the same on different machine):
SparkSnail's avatar
SparkSnail committed
6
7
8
9

| IP  | Username| Password |
| -------- |---------|-------|
| 10.1.1.1 | bob | bob123    |
Deshui Yu's avatar
Deshui Yu committed
10
11
12
| 10.1.1.2 | bob | bob123    |
| 10.1.1.3 | bob | bob123    |

13
## Setup NNI environment
Chi Song's avatar
Chi Song committed
14

Yan Ni's avatar
Yan Ni committed
15
Install NNI on each of your machines following the install guide [here](QuickStart.md).
Deshui Yu's avatar
Deshui Yu committed
16
17

## Run an experiment
Chi Song's avatar
Chi Song committed
18

19
20
Install NNI on another machine which has network accessibility to those three machines above, or you can just use any machine above to run nnictl command line tool.

Chi Song's avatar
Chi Song committed
21
22
We use `examples/trials/mnist-annotation` as an example here. `cat ~/nni/examples/trials/mnist-annotation/config_remote.yml` to see the detailed configuration file:

Yan Ni's avatar
Yan Ni committed
23
```yaml
24
25
26
27
28
29
30
31
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
#choice: true, false
Deshui Yu's avatar
Deshui Yu committed
32
33
useAnnotation: true
tuner:
34
35
  #choice: TPE, Random, Anneal, Evolution, BatchTuner
  #SMAC (SMAC should be installed through nnictl)
36
37
  builtinTunerName: TPE
  classArgs:
38
    #choice: maximize, minimize
39
    optimize_mode: maximize
Deshui Yu's avatar
Deshui Yu committed
40
trial:
41
42
  command: python3 mnist.py
  codeDir: .
43
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
44
45
46
47
48
#machineList can be empty if the platform is local
machineList:
  - ip: 10.1.1.1
    username: bob
    passwd: bob123
49
50
    #port can be skip if using default ssh port 22
    #port: 22
Deshui Yu's avatar
Deshui Yu committed
51
52
53
54
55
56
57
  - ip: 10.1.1.2
    username: bob
    passwd: bob123
  - ip: 10.1.1.3
    username: bob
    passwd: bob123
```
58
59
You can use different systems to run experiments on the remote machine.
#### Linux and MacOS
60
Simply filling the `machineList` section and then run:
Chi Song's avatar
Chi Song committed
61
62

```bash
63
nnictl create --config ~/nni/examples/trials/mnist-annotation/config_remote.yml
Deshui Yu's avatar
Deshui Yu committed
64
```
Chi Song's avatar
Chi Song committed
65
66

to start the experiment.
67

68
69
70
71
72
73
74
75
76
#### Windows
Simply filling the `machineList` section and then run:

```bash
nnictl create --config %userprofile%\nni\examples\trials\mnist-annotation\config_remote.yml
```

to start the experiment.

77
## version check
78
NNI support version check feature in since version 0.6, [refer](PaiMode.md)