RemoteMachineMode.md 3.78 KB
Newer Older
QuanluZhang's avatar
QuanluZhang committed
1
# Run an Experiment on Remote Machines
2

QuanluZhang's avatar
QuanluZhang committed
3
NNI can run one experiment on multiple remote machines through SSH, called `remote` mode. It's like a lightweight training platform. In this mode, NNI can be started from your computer, and dispatch trials to remote machines in parallel.
Chi Song's avatar
Chi Song committed
4

5
The OS of remote machines supports `Linux`, `Windows 10`, and `Windows Server 2019`.
SparkSnail's avatar
SparkSnail committed
6

7
## Requirements
Deshui Yu's avatar
Deshui Yu committed
8

9
* Make sure the default environment of remote machines meets requirements of your trial code. If the default environment does not meet the requirements, the setup script can be added into `command` field of NNI config.
QuanluZhang's avatar
QuanluZhang committed
10
11
12
13

* Make sure remote machines can be accessed through SSH from the machine which runs `nnictl` command. It supports both password and key authentication of SSH. For advanced usages, please refer to [machineList part of configuration](../Tutorial/ExperimentConfig.md).

* Make sure the NNI version on each machine is consistent.
Deshui Yu's avatar
Deshui Yu committed
14

15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
* Make sure the command of Trial is compatible with remote OSes, if you want to use remote Linux and Windows together. For example, the default python 3.x executable called `python3` on Linux, and `python` on Windows.

### Linux

* Follow [installation](../Tutorial/InstallationLinux.md) to install NNI on the remote machine.

### Windows

* Follow [installation](../Tutorial/InstallationWin.md) to install NNI on the remote machine.

* Install and start `OpenSSH Server`.

  1. Open `Settings` app on Windows.

  2. Click `Apps`, then click `Optional features`.

  3. Click `Add a feature`, search and select `OpenSSH Server`, and then click `Install`.

  4. Once it's installed, run below command to start and set to automatic start.

  ```bat
  sc config sshd start=auto
  net start sshd
  ```

* Make sure remote account is administrator, so that it can stop running trials.

* Make sure there is no welcome message more than default, since it causes ssh2 failed in NodeJs. For example, if you're using Data Science VM on Azure, it needs to remove extra echo commands in `C:\dsvm\tools\setup\welcome.bat`.

  The output like below is ok, when opening a new command window.

  ```text
  Microsoft Windows [Version 10.0.17763.1192]
  (c) 2018 Microsoft Corporation. All rights reserved.

  (py37_default) C:\Users\AzureUser>
  ```

Deshui Yu's avatar
Deshui Yu committed
53
## Run an experiment
Chi Song's avatar
Chi Song committed
54

QuanluZhang's avatar
QuanluZhang committed
55
56
57
58
59
60
61
62
63
e.g. there are three machines, which can be logged in with username and password.

| IP       | Username | Password |
| -------- | -------- | -------- |
| 10.1.1.1 | bob      | bob123   |
| 10.1.1.2 | bob      | bob123   |
| 10.1.1.3 | bob      | bob123   |

Install and run NNI on one of those three machines or another machine, which has network access to them.
64

QuanluZhang's avatar
QuanluZhang committed
65
Use `examples/trials/mnist-annotation` as the example. Below is content of `examples/trials/mnist-annotation/config_remote.yml`:
Chi Song's avatar
Chi Song committed
66

Yan Ni's avatar
Yan Ni committed
67
```yaml
68
69
70
71
72
73
74
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
LongzeSong's avatar
LongzeSong committed
75
76
# search space file
searchSpacePath: search_space.json
77
#choice: true, false
Deshui Yu's avatar
Deshui Yu committed
78
79
useAnnotation: true
tuner:
80
81
  #choice: TPE, Random, Anneal, Evolution, BatchTuner
  #SMAC (SMAC should be installed through nnictl)
82
83
  builtinTunerName: TPE
  classArgs:
84
    #choice: maximize, minimize
85
    optimize_mode: maximize
Deshui Yu's avatar
Deshui Yu committed
86
trial:
87
88
  command: python3 mnist.py
  codeDir: .
89
  gpuNum: 0
Deshui Yu's avatar
Deshui Yu committed
90
91
92
93
94
#machineList can be empty if the platform is local
machineList:
  - ip: 10.1.1.1
    username: bob
    passwd: bob123
95
96
    #port can be skip if using default ssh port 22
    #port: 22
Deshui Yu's avatar
Deshui Yu committed
97
98
99
100
101
102
103
  - ip: 10.1.1.2
    username: bob
    passwd: bob123
  - ip: 10.1.1.3
    username: bob
    passwd: bob123
```
Chi Song's avatar
Chi Song committed
104

QuanluZhang's avatar
QuanluZhang committed
105
Files in `codeDir` will be uploaded to remote machines automatically. You can run below command on Windows, Linux, or macOS to spawn trials on remote Linux machines:
106
107

```bash
108
nnictl create --config examples/trials/mnist-annotation/config_remote.yml
109
```