RemoteMachineMode.md 2.37 KB
Newer Older
Yan Ni's avatar
Yan Ni committed
1
# Run an Experiment on Multiple Machines
2

Chi Song's avatar
Chi Song committed
3
4
5
NNI supports running an experiment on multiple machines through SSH channel, called `remote` mode. NNI assumes that you have access to those machines, and already setup the environment for running deep learning training code.

e.g. Three machines and you login in with account `bob` (Note: the account is not necessarily the same on different machine):
SparkSnail's avatar
SparkSnail committed
6
7
8
9

| IP  | Username| Password |
| -------- |---------|-------|
| 10.1.1.1 | bob | bob123    |
Deshui Yu's avatar
Deshui Yu committed
10
11
12
| 10.1.1.2 | bob | bob123    |
| 10.1.1.3 | bob | bob123    |

13
## Setup NNI environment
Chi Song's avatar
Chi Song committed
14

xuehui's avatar
xuehui committed
15
Install NNI on each of your machines following the install guide [here](../Tutorial/QuickStart.md).
Deshui Yu's avatar
Deshui Yu committed
16
17

## Run an experiment
Chi Song's avatar
Chi Song committed
18

19
Install NNI on another machine which has network accessibility to those three machines above, or you can just run `nnictl` on any one of the three to launch the experiment.
20

21
We use `examples/trials/mnist-annotation` as an example here. Shown here is `examples/trials/mnist-annotation/config_remote.yml`:
Chi Song's avatar
Chi Song committed
22

Yan Ni's avatar
Yan Ni committed
23
```yaml
24
25
26
27
28
29
30
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
LongzeSong's avatar
LongzeSong committed
31
32
# search space file
searchSpacePath: search_space.json
33
#choice: true, false
Deshui Yu's avatar
Deshui Yu committed
34
35
useAnnotation: true
tuner:
36
37
  #choice: TPE, Random, Anneal, Evolution, BatchTuner
  #SMAC (SMAC should be installed through nnictl)
38
39
  builtinTunerName: TPE
  classArgs:
40
    #choice: maximize, minimize
41
    optimize_mode: maximize
Deshui Yu's avatar
Deshui Yu committed
42
trial:
43
44
  command: python3 mnist.py
  codeDir: .
45
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
46
47
48
49
50
#machineList can be empty if the platform is local
machineList:
  - ip: 10.1.1.1
    username: bob
    passwd: bob123
51
52
    #port can be skip if using default ssh port 22
    #port: 22
Deshui Yu's avatar
Deshui Yu committed
53
54
55
56
57
58
59
  - ip: 10.1.1.2
    username: bob
    passwd: bob123
  - ip: 10.1.1.3
    username: bob
    passwd: bob123
```
Chi Song's avatar
Chi Song committed
60

61
Files in `codeDir` will be automatically uploaded to the remote machine. You can run NNI on different operating systems (Windows, Linux, MacOS) to spawn experiments on the remote machines (only Linux allowed):
62
63

```bash
64
nnictl create --config examples/trials/mnist-annotation/config_remote.yml
65
66
```

67
68
69
You can also use public/private key pairs instead of username/password for authentication. For advanced usages, please refer to [Experiment Config Reference](../Tutorial/ExperimentConfig.md).

## Version check
70

71
NNI support version check feature in since version 0.6, [reference](PaiMode.md).