"vscode:/vscode.git/clone" did not exist on "3ec57e8df9263de6fa897e33d2d91bc5d0849ef3"
Unverified Commit 035d58bc authored by SparkSnail's avatar SparkSnail Committed by GitHub
Browse files

Merge pull request #121 from Microsoft/master

merge master
parents b633c265 8e732f2c
**Tutorial: Run an experiment on multiple machines**
===
NNI supports running an experiment on multiple machines through SSH channel, called `remote` mode. NNI assumes that you have access to those machines, and already setup the environment for running deep learning training code.
e.g. Three machines and you login in with account `bob` (Note: the account is not necessarily the same on different machine):
| IP | Username| Password |
| -------- |---------|-------|
| 10.1.1.1 | bob | bob123 |
| 10.1.1.2 | bob | bob123 |
| 10.1.1.3 | bob | bob123 |
## Setup NNI environment
Install NNI on each of your machines following the install guide [here](GetStarted.md).
For remote machines that are used only to run trials but not the nnictl, you can just install python SDK:
* __Install python SDK through pip__
python3 -m pip install --user --upgrade nni-sdk
## Run an experiment
Install NNI on another machine which has network accessibility to those three machines above, or you can just use any machine above to run nnictl command line tool.
We use `examples/trials/mnist-annotation` as an example here. `cat ~/nni/examples/trials/mnist-annotation/config_remote.yml` to see the detailed configuration file:
```
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: remote
#choice: true, false
useAnnotation: true
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner
#SMAC (SMAC should be installed through nnictl)
builtinTunerName: TPE
classArgs:
#choice: maximize, minimize
optimize_mode: maximize
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
#machineList can be empty if the platform is local
machineList:
- ip: 10.1.1.1
username: bob
passwd: bob123
#port can be skip if using default ssh port 22
#port: 22
- ip: 10.1.1.2
username: bob
passwd: bob123
- ip: 10.1.1.3
username: bob
passwd: bob123
```
Simply filling the `machineList` section and then run:
```
nnictl create --config ~/nni/examples/trials/mnist-annotation/config_remote.yml
```
to start the experiment.
# Tutorial - Try different Tuners and Assessors
NNI provides an easy to adopt approach to set up parameter tuning algorithms as well as early stop policies, we call them **Tuners** and **Assessors**.
**Tuner** specifies the algorithm you use to generate hyperparameter sets for each trial. In NNI, we support two approaches to set the tuner.
1. Directly use tuner provided by nni sdk
required fields: builtinTunerName and classArgs.
2. Customize your own tuner file
required fields: codeDirectory, classFileName, className and classArgs.
### **Learn More about tuners**
* For detailed defintion and usage about the required field, please refer to [Config an experiment](ExperimentConfig.md)
* [Tuners in the latest NNI release](HowToChooseTuner.md)
* [How to implement your own tuner](howto_2_CustomizedTuner.md)
**Assessor** specifies the algorithm you use to apply early stop policy. In NNI, there are two approaches to set the assessor.
1. Directly use assessor provided by nni sdk
required fields: builtinAssessorName and classArgs.
2. Customize your own assessor file
required fields: codeDirectory, classFileName, className and classArgs.
### **Learn More about assessor**
* For detailed defintion and usage aobut the required field, please refer to [Config an experiment](ExperimentConfig.md)
* Find more about the detailed instruction about [enable assessor](EnableAssessor.md)
* [How to implement your own assessor](../examples/assessors/README.md)
## **Learn More**
* [How to run an experiment on local (with multiple GPUs)?](tutorial_1_CR_exp_local_api.md)
* [How to run an experiment on multiple machines?](tutorial_2_RemoteMachineMode.md)
* [How to run an experiment on OpenPAI?](PAIMode.md)
...@@ -88,7 +88,7 @@ nnictl create --config ~/nni/examples/trials/ga_squad/config.yml ...@@ -88,7 +88,7 @@ nnictl create --config ~/nni/examples/trials/ga_squad/config.yml
Due to the memory limitation of upload, we only upload the source code and complete the data download and training on OpenPAI. This experiment requires sufficient memory that `memoryMB >= 32G`, and the training may last for several hours. Due to the memory limitation of upload, we only upload the source code and complete the data download and training on OpenPAI. This experiment requires sufficient memory that `memoryMB >= 32G`, and the training may last for several hours.
### Update configuration ### Update configuration
Modify `nni/examples/trials/ga_squad/config_pai.yaml`, here is the default configuration: Modify `nni/examples/trials/ga_squad/config_pai.yml`, here is the default configuration:
``` ```
authorName: default authorName: default
...@@ -114,18 +114,18 @@ trial: ...@@ -114,18 +114,18 @@ trial:
gpuNum: 0 gpuNum: 0
cpuNum: 1 cpuNum: 1
memoryMB: 32869 memoryMB: 32869
#The docker image to run nni job on pai #The docker image to run NNI job on OpenPAI
image: msranni/nni:latest image: msranni/nni:latest
#The hdfs directory to store data on pai, format 'hdfs://host:port/directory' #The hdfs directory to store data on OpenPAI, format 'hdfs://host:port/directory'
dataDir: hdfs://10.10.10.10:9000/username/nni dataDir: hdfs://10.10.10.10:9000/username/nni
#The hdfs directory to store output data generated by nni, format 'hdfs://host:port/directory' #The hdfs directory to store output data generated by NNI, format 'hdfs://host:port/directory'
outputDir: hdfs://10.10.10.10:9000/username/nni outputDir: hdfs://10.10.10.10:9000/username/nni
paiConfig: paiConfig:
#The username to login pai #The username to login OpenPAI
userName: username userName: username
#The password to login pai #The password to login OpenPAI
passWord: password passWord: password
#The host of restful server of pai #The host of restful server of OpenPAI
host: 10.10.10.10 host: 10.10.10.10
``` ```
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
**Run ENAS in NNI** **Run ENAS in NNI**
=== ===
Now we have an enas example [enas-nni](https://github.com/countif/enas_nni) run in nni from our contributors. Now we have an enas example [enas-nni](https://github.com/countif/enas_nni) run in NNI from our contributors.
Thanks our lovely contributors. Thanks our lovely contributors.
And welcome more and more people to join us! And welcome more and more people to join us!
...@@ -138,7 +138,7 @@ class Logger { ...@@ -138,7 +138,7 @@ class Logger {
private log(level: string, param: any[]): void { private log(level: string, param: any[]): void {
const buffer: WritableStreamBuffer = new WritableStreamBuffer(); const buffer: WritableStreamBuffer = new WritableStreamBuffer();
buffer.write(`[${(new Date()).toISOString()}] ${level} `); buffer.write(`[${(new Date()).toISOString()}] ${level} `);
buffer.write(format(null, param)); buffer.write(format(param));
buffer.write('\n'); buffer.write('\n');
buffer.end(); buffer.end();
this.bufferSerialEmitter.feed(buffer.getContents()); this.bufferSerialEmitter.feed(buffer.getContents());
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment