RemoteMachineMode.md 2.06 KB
Newer Older
Deshui Yu's avatar
Deshui Yu committed
1
2
**Run an Experiment on Multiple Machines**
===
SparkSnail's avatar
SparkSnail committed
3
4
5
6
7
NNI supports running an experiment on multiple machines, called remote machine mode. Let's say you have multiple machines with the account `bob` (Note: the account is not necessarily the same on multiple machines): 

| IP  | Username| Password |
| -------- |---------|-------|
| 10.1.1.1 | bob | bob123    |
Deshui Yu's avatar
Deshui Yu committed
8
9
10
11
12
13
| 10.1.1.2 | bob | bob123    |
| 10.1.1.3 | bob | bob123    |

## Setup environment
Install NNI on each of your machines following the install guide [here](GetStarted.md).

14
15
16
17
For remote machines that are used only to run trials but not the nnictl, you can just install python SDK:

* __Install python SDK through pip__

18
      python3 -m pip install --user nni
19
20
21
22
23
24
25

* __Install python SDK through source code__

      git clone https://github.com/Microsoft/NeuralNetworkIntelligence
      cd src/sdk/pynni
      python3 setup.py install

Deshui Yu's avatar
Deshui Yu committed
26
27
28
29
30
31
32
33
34
35
36
## Run an experiment
Still using `examples/trials/mnist-annotation` as an example here. The yaml file you need is shown below: 
```
authorName: your_name
experimentName: auto_mnist
# how many trials could be concurrently running
trialConcurrency: 2
# maximum experiment running duration
maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100
SparkSnail's avatar
SparkSnail committed
37
# choice: local, remote, pai
38
trainingServicePlatform: remote 
Deshui Yu's avatar
Deshui Yu committed
39
40
41
# choice: true, false  
useAnnotation: true
tuner:
42
43
44
  builtinTunerName: TPE
  classArgs:
    optimize_mode: maximize
Deshui Yu's avatar
Deshui Yu committed
45
trial:
46
47
48
  command: python mnist.py
  codeDir: /usr/share/nni/examples/trials/mnist-annotation
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#machineList can be empty if the platform is local
machineList:
  - ip: 10.1.1.1
    username: bob
    passwd: bob123
  - ip: 10.1.1.2
    username: bob
    passwd: bob123
  - ip: 10.1.1.3
    username: bob
    passwd: bob123
```
Simply filling the `machineList` section. This yaml file is named `exp_remote.yaml`, then run:
```
nnictl create --config exp_remote.yaml
```
SparkSnail's avatar
SparkSnail committed
65
to start the experiment. This command can be executed on one of those three machines above, and can also be executed on another machine which has NNI installed and has network accessibility to those three machines.