RemoteMachineMode.md 2.12 KB
Newer Older
Deshui Yu's avatar
Deshui Yu committed
1
2
**Run an Experiment on Multiple Machines**
===
3
4
5
NNI supports running an experiment on multiple machines through SSH channel, called `remote` mode. NNI assumes that you have access to those machines, and already setup the environment for running deep learning training code. 

e.g. Three machines and you login in with account `bob` (Note: the account is not necessarily the same on different machine): 
SparkSnail's avatar
SparkSnail committed
6
7
8
9

| IP  | Username| Password |
| -------- |---------|-------|
| 10.1.1.1 | bob | bob123    |
Deshui Yu's avatar
Deshui Yu committed
10
11
12
| 10.1.1.2 | bob | bob123    |
| 10.1.1.3 | bob | bob123    |

13
## Setup NNI environment
Deshui Yu's avatar
Deshui Yu committed
14
15
Install NNI on each of your machines following the install guide [here](GetStarted.md).

16
17
18
19
For remote machines that are used only to run trials but not the nnictl, you can just install python SDK:

* __Install python SDK through pip__

Gems Guo's avatar
Gems Guo committed
20
      python3 -m pip install --user --upgrade nni-sdk
21

Deshui Yu's avatar
Deshui Yu committed
22
## Run an experiment
23
24
25
Install NNI on another machine which has network accessibility to those three machines above, or you can just use any machine above to run nnictl command line tool.

We use `examples/trials/mnist-annotation` as an example here. `cat ~/nni/examples/trials/mnist-annotation/config_remote.yml` to see the detailed configuration file: 
Deshui Yu's avatar
Deshui Yu committed
26
```
27
28
29
30
31
32
33
34
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
#choice: true, false
Deshui Yu's avatar
Deshui Yu committed
35
36
useAnnotation: true
tuner:
37
38
  #choice: TPE, Random, Anneal, Evolution, BatchTuner
  #SMAC (SMAC should be installed through nnictl)
39
40
  builtinTunerName: TPE
  classArgs:
41
    #choice: maximize, minimize
42
    optimize_mode: maximize
Deshui Yu's avatar
Deshui Yu committed
43
trial:
44
45
  command: python3 mnist.py
  codeDir: .
46
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
47
48
49
50
51
#machineList can be empty if the platform is local
machineList:
  - ip: 10.1.1.1
    username: bob
    passwd: bob123
52
53
    #port can be skip if using default ssh port 22
    #port: 22
Deshui Yu's avatar
Deshui Yu committed
54
55
56
57
58
59
60
  - ip: 10.1.1.2
    username: bob
    passwd: bob123
  - ip: 10.1.1.3
    username: bob
    passwd: bob123
```
61
Simply filling the `machineList` section and then run:
Deshui Yu's avatar
Deshui Yu committed
62
```
63
nnictl create --config ~/nni/examples/trials/mnist-annotation/config_remote.yml
Deshui Yu's avatar
Deshui Yu committed
64
```
65
to start the experiment.