RemoteMachineMode.md 2.61 KB
Newer Older
1
# Run an Experiment on Remote Machines
2

3
NNI can run one experiment on multiple remote machines through SSH, called `remote` mode. It's like a lightweight training platform. In this mode, NNI can be started from your computer, and dispatch trials to remote machines in parallel.
Chi Song's avatar
Chi Song committed
4

5
## Remote machine requirements
SparkSnail's avatar
SparkSnail committed
6

7
* It only supports Linux as remote machines, and [linux part in system specification](../Tutorial/Installation.md) is same as NNI local mode.
Deshui Yu's avatar
Deshui Yu committed
8

9
* Follow [installation](../Tutorial/Installation.md) to install NNI on each machine.
Chi Song's avatar
Chi Song committed
10

11
12
13
14
15
* Make sure remote machines meet environment requirements of your trial code. If the default environment does not meet the requirements, the setup script can be added into `command` field of NNI config.

* Make sure remote machines can be accessed through SSH from the machine which runs `nnictl` command. It supports both password and key authentication of SSH. For advanced usages, please refer to [machineList part of configuration](../Tutorial/ExperimentConfig.md).

* Make sure the NNI version on each machine is consistent.
Deshui Yu's avatar
Deshui Yu committed
16
17

## Run an experiment
Chi Song's avatar
Chi Song committed
18

19
20
21
22
23
24
25
26
27
e.g. there are three machines, which can be logged in with username and password.

| IP       | Username | Password |
| -------- | -------- | -------- |
| 10.1.1.1 | bob      | bob123   |
| 10.1.1.2 | bob      | bob123   |
| 10.1.1.3 | bob      | bob123   |

Install and run NNI on one of those three machines or another machine, which has network access to them.
28

29
Use `examples/trials/mnist-annotation` as the example. Below is content of `examples/trials/mnist-annotation/config_remote.yml`:
Chi Song's avatar
Chi Song committed
30

Yan Ni's avatar
Yan Ni committed
31
```yaml
32
33
34
35
36
37
38
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
LongzeSong's avatar
LongzeSong committed
39
40
# search space file
searchSpacePath: search_space.json
41
#choice: true, false
Deshui Yu's avatar
Deshui Yu committed
42
43
useAnnotation: true
tuner:
44
45
  #choice: TPE, Random, Anneal, Evolution, BatchTuner
  #SMAC (SMAC should be installed through nnictl)
46
47
  builtinTunerName: TPE
  classArgs:
48
    #choice: maximize, minimize
49
    optimize_mode: maximize
Deshui Yu's avatar
Deshui Yu committed
50
trial:
51
52
  command: python3 mnist.py
  codeDir: .
53
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
54
55
56
57
58
#machineList can be empty if the platform is local
machineList:
  - ip: 10.1.1.1
    username: bob
    passwd: bob123
59
60
    #port can be skip if using default ssh port 22
    #port: 22
Deshui Yu's avatar
Deshui Yu committed
61
62
63
64
65
66
67
  - ip: 10.1.1.2
    username: bob
    passwd: bob123
  - ip: 10.1.1.3
    username: bob
    passwd: bob123
```
Chi Song's avatar
Chi Song committed
68

69
Files in `codeDir` will be uploaded to remote machines automatically. You can run below command on Windows, Linux, or macOS to spawn trials on remote Linux machines:
70
71

```bash
72
nnictl create --config examples/trials/mnist-annotation/config_remote.yml
73
```