NNI can run one experiment on multiple remote machines through SSH, called `remote` mode. It's like a lightweight training platform. In this mode, NNI can be started from your computer, and dispatch trials to remote machines in parallel.
## Remote machine requirements
The OS of remote machines supports `Linux`, `Windows 10`, and `Windows Server 2019`.
* It only supports Linux as remote machines, and [linux part in system specification](../Tutorial/InstallationLinux.md) is same as NNI local mode.
## Requirements
* Follow [installation](../Tutorial/InstallationLinux.md) to install NNI on each machine.
* Make sure remote machines meet environment requirements of your trial code. If the default environment does not meet the requirements, the setup script can be added into `command` field of NNI config.
* Make sure the default environment of remote machines meets requirements of your trial code. If the default environment does not meet the requirements, the setup script can be added into `command` field of NNI config.
* Make sure remote machines can be accessed through SSH from the machine which runs `nnictl` command. It supports both password and key authentication of SSH. For advanced usages, please refer to [machineList part of configuration](../Tutorial/ExperimentConfig.md).
* Make sure the NNI version on each machine is consistent.
* Make sure the command of Trial is compatible with remote OSes, if you want to use remote Linux and Windows together. For example, the default python 3.x executable called `python3` on Linux, and `python` on Windows.
### Linux
* Follow [installation](../Tutorial/InstallationLinux.md) to install NNI on the remote machine.
### Windows
* Follow [installation](../Tutorial/InstallationWin.md) to install NNI on the remote machine.
* Install and start `OpenSSH Server`.
1. Open `Settings` app on Windows.
2. Click `Apps`, then click `Optional features`.
3. Click `Add a feature`, search and select `OpenSSH Server`, and then click `Install`.
4. Once it's installed, run below command to start and set to automatic start.
```bat
scconfigsshdstart=auto
netstartsshd
```
* Make sure remote account is administrator, so that it can stop running trials.
* Make sure there is no welcome message more than default, since it causes ssh2 failed in NodeJs. For example, if you're using Data Science VM on Azure, it needs to remove extra echo commands in `C:\dsvm\tools\setup\welcome.bat`.
The output like below is ok, when opening a new command window.
```text
Microsoft Windows [Version 10.0.17763.1192]
(c) 2018 Microsoft Corporation. All rights reserved.
(py37_default) C:\Users\AzureUser>
```
## Run an experiment
e.g. there are three machines, which can be logged in with username and password.
Anaconda or Miniconda is highly recommended to manage multiple Python environments.
* Python 3.5 (or above) 64-bit. [Anaconda](https://www.anaconda.com/products/individual) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) is highly recommended to manage multiple Python environments on Windows.
### Install NNI through pip
* If it's a newly installed Python environment, it needs to install [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) to support build NNI dependencies like `scikit-learn`.
Prerequisites: `python 64-bit >= 3.5`
```bat
pip install cython wheel
```
```bash
python -m pip install--upgrade nni
```
* git for verifying installation.
### Install NNI through source code
## Install NNI
If you are interested in special or the latest code versions, you can install NNI through source code.
In most cases, you can install and upgrade NNI from pip package. It's easy and fast.
Note: for other examples you need to change trial command `python3` to `python` in each example YAML, if python3 is called through `python` on your machine.
Note: If you are familiar with other frameworks, you can choose corresponding example under `examples\trials`. It needs to change trial command `python3` to `python` in each example YAML, since default installation has `python.exe`, not `python3.exe` executable.
* Wait for the message `INFO: Successfully started experiment!` in the command line. This message indicates that your experiment has been successfully started. You can explore the experiment using the `Web UI url`.
...
...
@@ -112,18 +122,20 @@ If there is a stderr file, please check it. Two possible cases are:
* forgetting to install experiment dependencies such as TensorFlow, Keras and so on.
### Fail to use BOHB on Windows
Make sure a C++ 14.0 compiler is installed when trying to run `nnictl package install --name=BOHB` to install the dependencies.
### Not supported tuner on Windows
SMAC is not supported currently; for the specific reason refer to this [GitHub issue](https://github.com/automl/SMAC3/issues/483).
### Use a Windows server as a remote worker
Currently, you can't.
### Use Windows as a remote worker
Note:
Refer to [Remote Machine mode](../TrainingService/RemoteMachineMode.md).
* If an error like `Segmentation fault` is encountered, please refer to the [FAQ](FAQ.md)
### Segmentation fault (core dumped) when installing