"src/vscode:/vscode.git/clone" did not exist on "3d0b3eb5774419b75fad4197eb796ce7f352c09e"
Unverified Commit ccb2211e authored by chicm-ms's avatar chicm-ms Committed by GitHub
Browse files

Merge pull request #17 from microsoft/master

pull code
parents 58fd0c84 31dc58e9
......@@ -169,7 +169,7 @@ machineList:
* __remote__ submit trial jobs to remote ubuntu machines, and __machineList__ field should be filed in order to set up SSH connection to remote machine.
* __pai__ submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PAIMode.md)
* __pai__ submit trial jobs to [OpenPai](https://github.com/Microsoft/pai) of Microsoft. For more details of pai configuration, please reference [PAIMOdeDoc](./PaiMode.md)
* __kubeflow__ submit trial jobs to [kubeflow](https://www.kubeflow.org/docs/about/kubeflow/), NNI support kubeflow based on normal kubernetes and [azure kubernetes](https://azure.microsoft.com/en-us/services/kubernetes-service/).
......
......@@ -2,14 +2,13 @@
This page is for frequent asked questions and answers.
### tmp folder fulled
nnictl will use tmp folder as a temporary folder to copy files under codeDir when executing experimentation creation.
nnictl will use tmp folder as a temporary folder to copy files under codeDir when executing experimentation creation.
When met errors like below, try to clean up **tmp** folder first.
> OSError: [Errno 28] No space left on device
### Cannot get trials' metrics in OpenPAI mode
In OpenPAI training mode, we start a rest server which listens on 51189 port in NNI Manager to receive metrcis reported from trials running in OpenPAI cluster. If you didn't see any metrics from WebUI in OpenPAI mode, check your machine where NNI manager runs on to make sure 51189 port is turned on in the firewall rule.
In OpenPAI training mode, we start a rest server which listens on 51189 port in NNI Manager to receive metrcis reported from trials running in OpenPAI cluster. If you didn't see any metrics from WebUI in OpenPAI mode, check your machine where NNI manager runs on to make sure 51189 port is turned on in the firewall rule.
### Segmentation Fault (core dumped) when installing
> make: *** [install-XXX] Segmentation fault (core dumped)
......@@ -19,7 +18,7 @@ Please try the following solutions in turn:
* Install NNI with `--no-cache-dir` flag like `python3 -m pip install nni --no-cache-dir`
### Job management error: getIPV4Address() failed because os.networkInterfaces().eth0 is undefined.
Your machine don't have eth0 device, please set [nniManagerIp](ExperimentConfig.md) in your config file manually.
Your machine don't have eth0 device, please set [nniManagerIp](ExperimentConfig.md) in your config file manually.
### Exceed the MaxDuration but didn't stop
When the duration of experiment reaches the maximum duration, nniManager will not create new trials, but the existing trials will continue unless user manually stop the experiment.
......@@ -28,7 +27,14 @@ When the duration of experiment reaches the maximum duration, nniManager will no
If you upgrade your NNI or you delete some config files of NNI when there is an experiment running, this kind of issue may happen because the loss of config file. You could use `ps -ef | grep node` to find the pid of your experiment, and use `kill -9 {pid}` to kill it manually.
### Could not get `default metric` in webUI of virtual machines
Config the network mode to bridge mode or other mode that could make virtual machine's host accessible from external machine, and make sure the port of virtual machine is not forbidden by firewall.
Config the network mode to bridge mode or other mode that could make virtual machine's host accessible from external machine, and make sure the port of virtual machine is not forbidden by firewall.
### Could not open webUI link
Unable to open the WebUI may have the following reasons:
* http://127.0.0.1, http://172.17.0.1 and http://10.0.0.15 are referred to localhost, if you start your experiment on the server or remote machine. You can replace the IP to your server IP to view the WebUI, like http://[your_server_ip]:8080
* If you still can't see the WebUI after you use the server IP, you can check the proxy and the firewall of your machine. Or use the browser on the machine where you start your NNI experiment.
* Another reason may be your experiment is failed and NNI may fail to get the experiment infomation. You can check the log of NNImanager in the following directory: ~/nni/experiment/[your_experiment_id] /log/nnimanager.log
### Windows local mode problems
Please refer to [NNI Windows local mode](WindowsLocalMode.md)
......
......@@ -100,4 +100,4 @@ Trial configuration in frameworkcontroller mode have the following configuration
After you prepare a config file, you could run your experiment by nnictl. The way to start an experiment on frameworkcontroller is similar to kubeflow, please refer the [document](./KubeflowMode.md) for more information.
## version check
NNI support version check feature in since version 0.6, [refer](PAIMode.md)
\ No newline at end of file
NNI support version check feature in since version 0.6, [refer](PaiMode.md)
\ No newline at end of file
......@@ -21,7 +21,7 @@ There are three kinds of log in NNI. When creating a new experiment, you can spe
All possible errors that happen when launching an NNI experiment can be found here.
You can use `nnictl log stderr` to find error information. For more options please refer to [NNICTL](NNICTLDOC.md)
You can use `nnictl log stderr` to find error information. For more options please refer to [NNICTL](Nnictl.md)
### Experiment Root Directory
......
......@@ -7,7 +7,7 @@ TrainingService is a module related to platform management and job schedule in N
## System architecture
![](../img/NNIDesign.jpg)
The brief system architecture of NNI is shown in the picture. NNIManager is the core management module of system, in charge of calling TrainingService to manage trial jobs and the communication between different modules. Dispatcher is a message processing center responsible for message dispatch. TrainingService is a module to manage trial jobs, it communicates with nniManager module, and has different instance according to different training platform. For the time being, NNI supports local platfrom, [remote platfrom](RemoteMachineMode.md), [PAI platfrom](PAIMode.md), [kubeflow platform](KubeflowMode.md) and [FrameworkController platfrom](FrameworkController.md).
The brief system architecture of NNI is shown in the picture. NNIManager is the core management module of system, in charge of calling TrainingService to manage trial jobs and the communication between different modules. Dispatcher is a message processing center responsible for message dispatch. TrainingService is a module to manage trial jobs, it communicates with nniManager module, and has different instance according to different training platform. For the time being, NNI supports local platfrom, [remote platfrom](RemoteMachineMode.md), [PAI platfrom](PaiMode.md), [kubeflow platform](KubeflowMode.md) and [FrameworkController platfrom](FrameworkController.md).
In this document, we introduce the brief design of TrainingService. If users want to add a new TrainingService instance, they just need to complete a child class to implement TrainingService, don't need to understand the code detail of NNIManager, Dispatcher or other modules.
## Folder structure of code
......@@ -146,4 +146,4 @@ When users submit a trial job to cloud platform, they should wrap their trial co
## Reference
For more information about how to debug, please [refer](HowToDebug.md).
The guide line of how to contribute, please [refer](CONTRIBUTING).
\ No newline at end of file
The guide line of how to contribute, please [refer](Contributing.md).
\ No newline at end of file
......@@ -25,22 +25,28 @@ Currently we support installation on Linux, Mac and Windows(local mode).
You can also install NNI in a docker image. Please follow the instructions [here](https://github.com/Microsoft/nni/tree/master/deployment/docker/README.md) to build NNI docker image. The NNI docker image can also be retrieved from Docker Hub through the command `docker pull msranni/nni:latest`.
## **Installation on Windows**
When you use powershell to run script for the first time, you need **run powershell as administrator** with this command:
When you use PowerShell to run script for the first time, you need **run PowerShell as administrator** with this command:
```bash
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
```
Anaconda is highly recommanded.
Anaconda or Miniconda is highly recommended.
* __Install NNI through pip__
Prerequisite: `python(64-bit) >= 3.5`
```bash
python -m pip install --upgrade nni
```
* __Install NNI through source code__
Prerequisite: `python >=3.5`, `git`, `powershell`
you can install nni as administrator or current user as follows:
Prerequisite: `python >=3.5`, `git`, `PowerShell`.
you can install NNI as administrator or current user as follows:
```bash
git clone -b v0.7 https://github.com/Microsoft/nni.git
cd nni
......@@ -88,12 +94,12 @@ Below are the minimum system requirements for NNI on Windows, Windows 10.1809 is
## Further reading
* [Overview](Overview.md)
* [Use command line tool nnictl](NNICTLDOC.md)
* [Use command line tool nnictl](Nnictl.md)
* [Use NNIBoard](WebUI.md)
* [Define search space](SearchSpaceSpec.md)
* [Config an experiment](ExperimentConfig.md)
* [How to run an experiment on local (with multiple GPUs)?](LocalMode.md)
* [How to run an experiment on multiple machines?](RemoteMachineMode.md)
* [How to run an experiment on OpenPAI?](PAIMode.md)
* [How to run an experiment on OpenPAI?](PaiMode.md)
* [How to run an experiment on Kubernetes through Kubeflow?](KubeflowMode.md)
* [How to run an experiment on Kubernetes through FrameworkController?](FrameworkControllerMode.md)
......@@ -197,6 +197,6 @@ Notice: In kubeflow mode, NNIManager will start a rest server and listen on a po
Once a trial job is completed, you can goto NNI WebUI's overview page (like http://localhost:8080/oview) to check trial's information.
## version check
NNI support version check feature in since version 0.6, [refer](PAIMode.md)
NNI support version check feature in since version 0.6, [refer](PaiMode.md)
Any problems when using NNI in kubeflow mode, please create issues on [NNI Github repo](https://github.com/Microsoft/nni).
......@@ -85,14 +85,14 @@ Let's use a simple trial example, e.g. mnist, provided by NNI. After you install
This command will be filled in the YAML configure file below. Please refer to [here](Trials.md) for how to write your own trial.
**Prepare tuner**: NNI supports several popular automl algorithms, including Random Search, Tree of Parzen Estimators (TPE), Evolution algorithm etc. Users can write their own tuner (refer to [here](Customize_Tuner.md)), but for simplicity, here we choose a tuner provided by NNI as below:
**Prepare tuner**: NNI supports several popular automl algorithms, including Random Search, Tree of Parzen Estimators (TPE), Evolution algorithm etc. Users can write their own tuner (refer to [here](CustomizeTuner.md)), but for simplicity, here we choose a tuner provided by NNI as below:
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
*builtinTunerName* is used to specify a tuner in NNI, *classArgs* are the arguments pass to the tuner (the spec of builtin tuners can be found [here](Builtin_Tuner.md)), *optimization_mode* is to indicate whether you want to maximize or minimize your trial's result.
*builtinTunerName* is used to specify a tuner in NNI, *classArgs* are the arguments pass to the tuner (the spec of builtin tuners can be found [here](BuiltinTuner.md)), *optimization_mode* is to indicate whether you want to maximize or minimize your trial's result.
**Prepare configure file**: Since you have already known which trial code you are going to run and which tuner you are going to use, it is time to prepare the YAML configure file. NNI provides a demo configure file for each trial example, `cat ~/nni/examples/trials/mnist-annotation/config.yml` to see it. Its content is basically shown below:
......@@ -130,7 +130,7 @@ With all these steps done, we can run the experiment with the following command:
nnictl create --config ~/nni/examples/trials/mnist-annotation/config.yml
You can refer to [here](NNICTLDOC.md) for more usage guide of *nnictl* command line tool.
You can refer to [here](Nnictl.md) for more usage guide of *nnictl* command line tool.
## View experiment results
The experiment has been running now. Other than *nnictl*, NNI also provides WebUI for you to view experiment progress, to control your experiment, and some other appealing features.
......
......@@ -453,7 +453,7 @@ Debug mode will disable version check function in Trialkeeper.
> import data to a running experiment
```bash
nnictl experiment [experiment_id] -f experiment_data.json
nnictl experiment import [experiment_id] -f experiment_data.json
```
<a name="config"></a>
......
......@@ -49,11 +49,11 @@ More details about how to run an experiment, please refer to [Get Started](Quick
## Learn More
* [Get started](QuickStart.md)
* [How to adapt your trial code on NNI?](Trials.md)
* [What are tuners supported by NNI?](Builtin_Tuner.md)
* [How to customize your own tuner?](Customize_Tuner.md)
* [What are assessors supported by NNI?](Builtin_Assessors.md)
* [How to customize your own assessor?](Customize_Assessor.md)
* [What are tuners supported by NNI?](BuiltinTuner.md)
* [How to customize your own tuner?](CustomizeTuner.md)
* [What are assessors supported by NNI?](BuiltinAssessors.md)
* [How to customize your own assessor?](CustomizeAssessor.md)
* [How to run an experiment on local?](LocalMode.md)
* [How to run an experiment on multiple machines?](RemoteMachineMode.md)
* [How to run an experiment on OpenPAI?](PAIMode.md)
* [Examples](mnist_examples.md)
\ No newline at end of file
* [How to run an experiment on OpenPAI?](PaiMode.md)
* [Examples](MnistExamples.md)
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment