"testing/vscode:/vscode.git/clone" did not exist on "b10ef75fc31deea3684256bb77bdcea27aede0ff"
Commit 252f36f8 authored by Deshui Yu's avatar Deshui Yu
Browse files

NNI dogfood version 1

parent 781cea26
*.ts text eol=lf
*.py text eol=lf
*.sh text eol=lf
BIN_PATH ?= /usr/bin
NODE_PATH ?= /usr/share
EXAMPLE_PATH ?= /usr/share/nni/examples
SRC_DIR := ${PWD}
.PHONY: build install uninstall
build:
### Building NNI Manager ###
cd src/nni_manager && yarn && yarn build
### Building Web UI ###
cd src/webui && yarn && yarn build
### Building Python SDK ###
cd src/sdk/pynni && python3 setup.py build
### Building nnictl ###
cd tools && python3 setup.py build
install:
mkdir -p $(NODE_PATH)/nni
mkdir -p $(EXAMPLE_PATH)
### Installing NNI Manager ###
cp -rT src/nni_manager/dist $(NODE_PATH)/nni/nni_manager
cp -rT src/nni_manager/node_modules $(NODE_PATH)/nni/nni_manager/node_modules
### Installing Web UI ###
cp -rT src/webui/build $(NODE_PATH)/nni/webui
ln -sf $(NODE_PATH)/nni/nni_manager/node_modules/serve/bin/serve.js $(BIN_PATH)/serve
### Installing Python SDK dependencies ###
pip3 install -r src/sdk/pynni/requirements.txt
### Installing Python SDK ###
cd src/sdk/pynni && python3 setup.py install
### Installing nnictl ###
cd tools && python3 setup.py install
echo '#!/bin/sh' > $(BIN_PATH)/nnimanager
echo 'cd $(NODE_PATH)/nni/nni_manager && node main.js $$@' >> $(BIN_PATH)/nnimanager
chmod +x $(BIN_PATH)/nnimanager
install -m 755 tools/nnictl $(BIN_PATH)/nnictl
### Installing examples ###
cp -rT examples $(EXAMPLE_PATH)
dev-install:
### Installing Python SDK dependencies ###
pip3 install --user -r src/sdk/pynni/requirements.txt
### Installing Python SDK ###
cd src/sdk/pynni && pip3 install --user -e .
### Installing nnictl ###
cd tools && pip3 install --user -e .
uninstall:
-rm -r $(EXAMPLE_PATH)
-rm -r $(NODE_PATH)/nni
-pip3 uninstall -y nnictl
-pip3 uninstall -y nni
-rm $(BIN_PATH)/nnictl
-rm $(BIN_PATH)/nnimanager
-rm $(BIN_PATH)/serve
# Introduction
Neural Network Intelligence(NNI) is a light package for supporting hyper-parameter tuning or neural architecture search.
It could easily run in different environments, such as: local/remote machine/cloud.
And it offers a new annotation language for user to conveniently design search space.
Also user could write code using any language or any machine learning framework.
# Contributing
# Getting Started
TODO: Guide users through getting your code up and running on their own system. In this section you can talk about:
1. Installation process
2. Software dependencies
3. Latest releases
4. API references
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.microsoft.com.
# Build and Test
TODO: Describe and show how to build your code and run the tests.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
# Contribute
TODO: Explain how other users and developers can contribute to make your code better.
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
# Privacy Statement
The [Microsoft Enterprise and Developer Privacy Statement](https://privacy.microsoft.com/en-us/privacystatement) describes the privacy statement of this software.
**Enable Assessor in your expeirment**
===
Assessor module is for assessing running trials. One common use case is early stopping, which terminates unpromising trial jobs based on their intermediate results.
## Using NNI built-in Assessor
Here we use the same example `examples/trials/mnist-annotation`. We use `Medianstop` assessor for this experiment. The yaml configure file is shown below:
```
authorName: your_name
experimentName: auto_mnist
# how many trials could be concurrently running
trialConcurrency: 2
# maximum experiment running duration
maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100
# choice: local, remote
trainingServicePlatform: local
# choice: true, false
useAnnotation: true
tuner:
tunerName: TPE
optimizationMode: Maximize
assessor:
assessorName: Medianstop
optimizationMode: Maximize
trial:
trialCommand: python mnist.py
trialCodeDir: /usr/share/nni/examples/trials/mnist-annotation
trialGpuNum: 0
```
For our built-in assessors, you need to fill two fields: `assessorName` which chooses NNI provided assessors (refer to [here]() for built-in assessors), `optimizationMode` which includes Maximize and Minimize (you want to maximize or minimize your trial result).
## Using user customized Assessor
You can also write your own assessor following the guidance [here](). For example, you wrote an assessor for `examples/trials/mnist-annotation`. You should prepare the yaml configure below:
```
authorName: your_name
experimentName: auto_mnist
# how many trials could be concurrently running
trialConcurrency: 2
# maximum experiment running duration
maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100
# choice: local, remote
trainingServicePlatform: local
# choice: true, false
useAnnotation: true
tuner:
tunerName: TPE
optimizationMode: Maximize
assessor:
assessorCommand: your_command
assessorCodeDir: /path/of/your/asessor
assessorGpuNum: 0
trial:
trialCommand: python mnist.py
trialCodeDir: /usr/share/nni/examples/trials/mnist-annotation
trialGpuNum: 0
```
You only need to fill three field: `assessorCommand`, `assessorCodeDir` and `assessorGpuNum`.
\ No newline at end of file
Experiment config reference
===
If you want to create a new nni experiment, you need to prepare a config file in your local machine, and provide the path of this file to nnictl.
The config file is written in yaml format, and need to be written correctly.
This document describes the rule to write config file, and will provide some examples and templates for you.
## Template
* __light weight(without Annotation and Assessor)__
```
authorName:
experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote
trainingServicePlatform:
searchSpacePath:
#choice: true, false
useAnnotation:
tuner:
#choice: TPE, Random, Anneal, Evolution
tunerName:
#choice: Maximize, Minimize
optimizationMode:
tunerGpuNum:
trial:
trialCommand:
trialCodeDir:
trialGpuNum:
#machineList can be empty if the platform is local
machineList:
- ip:
port:
username:
passwd:
```
* __Use Assessor__
```
authorName:
experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote
trainingServicePlatform:
searchSpacePath:
#choice: true, false
useAnnotation:
tuner:
#choice: TPE, Random, Anneal, Evolution
tunerName:
#choice: Maximize, Minimize
optimizationMode:
tunerGpuNum:
assessor:
#choice: Medianstop
assessorName:
#choice: Maximize, Minimize
optimizationMode:
assessorGpuNum:
trial:
trialCommand:
trialCodeDir:
trialGpuNum:
#machineList can be empty if the platform is local
machineList:
- ip:
port:
username:
passwd:
```
* __Use Annotation__
```
authorName:
experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote
trainingServicePlatform:
#choice: true, false
useAnnotation:
tuner:
#choice: TPE, Random, Anneal, Evolution
tunerName:
#choice: Maximize, Minimize
optimizationMode:
tunerGpuNum:
assessor:
#choice: Medianstop
assessorName:
#choice: Maximize, Minimize
optimizationMode:
assessorGpuNum:
trial:
trialCommand:
trialCodeDir:
trialGpuNum:
#machineList can be empty if the platform is local
machineList:
- ip:
port:
username:
passwd:
```
## Configuration
* __authorName__
* Description
__authorName__ is the name of the author who create the experiment.
* __experimentName__
* Description
__experimentName__ is the name of the experiment you created.
* __trialConcurrency__
* Description
__trialConcurrency__ specifies the max num of trial jobs run simultaneously.
Note: if you set trialGpuNum bigger than the free gpu numbers in your machine, and the trial jobs running simultaneously can not reach trialConcurrency number, some trial jobs will be put into a queue to wait for gpu allocation.
* __maxExecDuration__
* Description
__maxExecDuration__ specifies the max duration time of an experiment.The unit of the time is {__s__, __m__, __h__, __d__}, which means {_seconds_, _minutes_, _hours_, _days_}.
* __maxTrialNum__
* Description
__maxTrialNum__ specifies the max number of trial jobs created by nni, including successed and failed jobs.
* __trainingServicePlatform__
* Description
__trainingServicePlatform__ specifies the platform to run the experiment, including {__local__, __remote__}.
* __local__ mode means you run an experiment in your local linux machine.
* __remote__ mode means you submit trial jobs to remote linux machines. If you set platform as remote, you should complete __machineList__ field.
* __searchSpacePath__
* Description
__searchSpacePath__ specifies the path of search space file you want to use, which should be a valid path in your local linux machine.
Note: if you set useAnnotation=True, you should remove searchSpacePath field or just let it be empty.
* __useAnnotation__
* Description
__useAnnotation__ means whether you use annotation to analysis your code and generate search space.
Note: if you set useAnnotation=True, you should not set searchSpacePath.
* __tuner__
* Description
__tuner__ specifies the tuner algorithm you use to run an experiment, there are two kinds of ways to set tuner. One way is to use tuner provided by nni sdk, you just need to set __tunerName__ and __optimizationMode__. Another way is to use your own tuner file, and you need to set __tunerCommand__, __tunerCwd__.
* __tunerName__ and __optimizationMode__
* __tunerName__
__tunerName__ specifies the name of system tuner you want to use, nni sdk provides four kinds of tuner, including {__TPE__, __Random__, __Anneal__, __Evolution__}
* __optimizationMode__
__optimizationMode__ specifies the optimization mode of tuner algorithm, including {__Maximize__, __Minimize__}
* __tunerCommand__ and __tunerCwd__
* __tunerCommand__
__tunerCommand__ specifies the command you want to use to run your own tuner file, for example {__python3 mytuner.py__}
* __tunerCwd__
__tunerCwd__ specifies the working directory of your own tuner file, which is the path of your own tuner file.
* __tunerGpuNum__
__tunerGPUNum__ specifies the gpu number you want to use to run the tuner process. The value of this field should be a positive number.
Note: you could only specify one way to set tuner, for example, you could set {tunerName, optimizationMode} or {tunerCommand, tunerCwd}, and you could not set them both.
* __assessor__
* Description
__assessor__ specifies the assessor algorithm you use to run experiment, there are two kinds of ways to set assessor. One way is to use assessor provided by nni sdk, you just need to set __assessorName__ and __optimizationMode__. Another way is to use your own assessor file, and you need to set __assessorCommand__, __assessorCwd__.
* __assessorName__ and __optimizationMode__
* __assessorName__
__assessorName__ specifies the name of system assessor you want to use, nni sdk provides one kind of assessor, which is {__Medianstop__}.
* __optimizationMode__
__optimizationMode__ specifies the optimization mode of tuner algorithm, including {__Maximize__, __Minimize__}
* __assessorCommand__ and __assessorCwd__
* __assessorCommand__
__assessorCommand__ specifies the command you want to use to run your own assessor file, for example {__python3 myassessor.py__}
* __assessorCwd__
__assessorCwd__ specifies the working directory of your own assessor file, which is the path of your own assessor file.
* __assessorGpuNum__
__assessorGPUNum__ specifies the gpu number you want to use to run the assessor process. The value of this field should be a positive number.
Note: you could only specify one way to set assessor, for example, you could set {assessorName, optimizationMode} or {assessorCommand, assessorCwd}, and you could not set them both.If you do not want to use assessor, you just need to leave assessor empty or remove assessor in your config file.
* __trial__
* __trialCommand__
__trialCommand__ specifies the command to run trial process.
* __trialCodeDir__
__trialCodeDir__ specifies the directory of your own trial file.
* __trialGpuNum__
__trialGpuNum__ specifies the num of gpu you want to use to run your trial process.
* __machineList__
__machineList__ should be set if you set __trainingServicePlatform__=remote, or it could be empty.
* __ip__
__ip__ is the ip address of your remote machine.
* __port__
__port__ is the ssh port you want to use to connect machine.
Note: if you set port empty, the default value will be 22.
* __username__
__username__ is the account you use.
* __passwd__
__passwd__ specifies the password of your account.
## Examples
* __local mode__
If you want to run your trial jobs in your local machine, and use annotation to generate search space, you could use the following config:
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
tuner:
#choice: TPE, Random, Anneal, Evolution
tunerName: TPE
#choice: Maximize, Minimize
optimizationMode: Maximize
tunerGpuNum: 0
trial:
trialCommand: python3 mnist.py
trialCodeDir: /nni/mnist
trialGpuNum: 0
```
If you want to use assessor, you could add assessor configuration in your file.
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution
tunerName: TPE
#choice: Maximize, Minimize
optimizationMode: Maximize
tunerGpuNum: 0
assessor:
#choice: Medianstop
assessorName: Medianstop
#choice: Maximize, Minimize
optimizationMode: Maximize
assessorGpuNum: 0
trial:
trialCommand: python3 mnist.py
trialCodeDir: /nni/mnist
trialGpuNum: 0
```
Or you could specify your own tuner and assessor file as following:
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: local
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
tunerCommand: python3 mytuner.py
tunerCwd: /nni/tuner
tunerGpuNum: 0
assessor:
assessorCommand: python3 myassessor.py
assessorCwd: /nni/assessor
assessorGpuNum: 0
trial:
trialCommand: python3 mnist.py
trialCodeDir: /nni/mnist
trialGpuNum: 0
```
* __remote mode__
If you want run trial jobs in your remote machine, you could specify the remote mahcine information as fllowing format:
```
authorName: test
experimentName: test_experiment
trialConcurrency: 3
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote
trainingServicePlatform: remote
searchSpacePath: /nni/search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution
tunerName: TPE
#choice: Maximize, Minimize
optimizationMode: Maximize
tunerGpuNum: 0
trial:
trialCommand: python3 mnist.py
trialCodeDir: /nni/mnist
trialGpuNum: 0
#machineList can be empty if the platform is local
machineList:
- ip: 10.10.10.10
port: 22
username: test
passwd: test
- ip: 10.10.10.11
port: 22
username: test
passwd: test
- ip: 10.10.10.12
port: 22
username: test
passwd: test
```
\ No newline at end of file
**Getting Started with NNI**
===
NNI (Nerual Network Intelligance) is a toolkit to help users running automated machine learning experiment.
The tool dispatchs and runs trail jobs that generated by tunning algorithms to search the best neural architecture and/or hyper-parameters at different enviroments (e.g. local, remote servers, Cloud).
```
AutoML experiment Training Services
┌────────┐ ┌────────────────────────┐ ┌────────────────┐
│ nnictl │ ─────> │ nni_manager │ │ Local Machine │
└────────┘ │ sdk/tuner │ └────────────────┘
│ hyperopt_tuner │
│ evlution_tuner │ trail jobs ┌────────────────┐
│ ... │ ────────> │ Remote Servers │
├────────────────────────┤ └────────────────┘
│ trail job source code │
│ sdk/annotation │ ┌────────────────┐
├────────────────────────┤ │ Yarn,K8s, │
│ nni_board │ │ ... │
└────────────────────────┘ └────────────────┘
```
## **Who should consider using NNI**
* You want to try different AutoML algorithms for your training code (model) at local
* You want to run AutoML trail jobs in different enviroments to speed up search (e.g. remote servers, Cloud)
* As a reseacher and data scientist, you want to implement your own AutoML algorithms and compare with other algorithms
* As a ML platform owner, you want to support AutoML in your platform
## **Setup**
* install using deb file
TBD
* install from source code
```
### Prepare Node.js 10.8.0 or above
wget https://nodejs.org/dist/v10.8.0/node-v10.8.0-linux-x64.tar.xz
tar xf node-v10.8.0-linux-x64.tar.xz
mv node-v10.8.0-linux-x64/* /usr/local/node/
### Prepare Yarn 1.6.0 or above
wget https://github.com/yarnpkg/yarn/releases/download/v1.6.0/yarn-v1.6.0.tar.gz
tar xf yarn-v1.6.0.tar.gz
mv yarn-v1.6.0/* /usr/local/yarn/
### Add Node.js and Yarn in PATH
export PATH=/usr/local/node/bin:/usr/local/yarn/bin:$PATH
### clone nni source code
git clone ...
### build and install nni
make build
sudo make install
```
This documentation assumes you have setup one or more [training services]().
## **Quick start: run an experiment at local**
Requirements:
* local enviroment setup [TODO]
Run the following command to create an experiemnt for [mnist]
```bash
nnictl create --config /usr/share/nni/examples/trials/mnist-annotation/config.yaml
```
This command will start the experiment and WebUI. The WebUI endpoint will be shown in the output of this command (for example, `http://localhost:8080`). Open this URL using your browsers. You can analyze your experiment through WebUI, or open trials' tensorboard.
## **Quick start: run a customized experiment**
An experiment is to run multiple trial jobs, each trial job tries a configuration which includes a specific neural architecture (or model) and hyper-parameter values. To run an experiment through NNI, you should:
* Provide a runnable trial
* Provide or choose a tuner
* Provide a yaml experiment configure file
* (optional) Provide or choose an assessor
**Prepare trial**: Let's use a simple trial example, e.g. mnist, provided by NNI. After you installed NNI, NNI examples have been put in /usr/share/nni/examples, run `ls /usr/share/nni/examples/trials` to see all the trial examples. You can simply execute the following command to run the NNI mnist example:
python /usr/share/nni/examples/trials/mnist-annotation/mnist.py
This command will be filled in the yaml configure file below. Please refer to [here]() for how to write your own trial.
**Prepare tuner**: NNI supports several popular automl algorithms, including Random Search, Tree of Parzen Estimators (TPE), Bayesian Optimization etc. Users can write their own tuner (refer to [here]()), but for simplicity, here we can choose a tuner provided by NNI as below:
tunerName: TPE
optimizationMode: maximize
*tunerName* is used to specify a tuner in NNI, *optimizationMode* is to indicate whether you want to maximize or minimize your trial's result.
**Prepare configure file**: Since you have already known which trial code you are going to run and which tuner you are going to use, it is time to prepare the yaml configure file. NNI provides a demo configure file for each trial example, `cat /usr/share/nni/examples/trials/mnist-annotation/config.yaml` to see it. Its content is basically shown below:
```
authorName: your_name
experimentName: auto_mnist
# how many trials could be concurrently running
trialConcurrency: 2
# maximum experiment running duration
maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100
# choice: local, remote
trainingServicePlatform: local
# choice: true, false
useAnnotation: true
tuner:
tunerName: TPE
optimizationMode: Maximize
trial:
trialCommand: python mnist.py
trialCodeDir: /usr/share/nni/examples/trials/mnist-annotation
trialGpuNum: 0
```
Here *useAnnotation* is true because this trial example uses our python annotation (refer to [here]() for details). For trial, we should provide *trialCommand* which is the command to run the trial, provide *trialCodeDir* where the trial code is. The command will be executed in this directory. We should also provide how many GPUs a trial requires.
With all these steps done, we can run the experiment with the following command:
nnictl create --config /usr/share/nni/examples/trials/mnist-annotation/config.yaml
You can refer to [here](NNICTLDOC.md) for more usage guide of *nnictl* command line tool.
## View experiment results
The experiment has been running now, NNI provides WebUI for you to view experiment progress, to control your experiment, and some other appealing features. The WebUI is opened by default by `nnictl create`.
## Further reading
* [How to write a trial running on NNI (Mnist as an example)?](WriteYourTrial.md)
* [Tutorial of NNI python annotation.](../tools/annotation/README.md)
* [Tuners supported by NNI.](../src/sdk/pynni/nni/README.md)
* [How to enable early stop (i.e. assessor) in an experiment?](EnableAssessor.md)
* [How to run an experiment on multiple machines?](RemoteMachineMode.md)
* [How to write a customized tuner?](../examples/tuners/README.md)
* [How to write a customized assessor?](../examples/assessors/README.md)
* [How to resume an experiment?]()
* [Tutorial of the command tool *nnictl*.](NNICTLDOC.md)
* [How to use *nnictl* to control multiple experiments?]()
## How to contribute
TBD
nnictl
===
## Introduction
__nnictl__ is a command line tool, which can be used to control experiments, such as start/stop/resume an experiment, start/stop NNIBoard, etc.
## Commands
nnictl support commands:
```
nnictl create
nnictl stop
nnictl create
nnictl update
nnictl resume
nnictl trial
nnictl webui
nnictl rest
nnictl experiment
nnictl config
nnictl log
```
### Manage an experiment
* __nnictl create__
* Description
You can use this command to create a new experiment, using the configuration specified in config file.
After this command is successfully done, the context will be set as this experiment,
which means the following command you issued is associated with this experiment,
unless you explicitly changes the context(not supported yet).
* Usage
nnictl create [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --config, -c| True| |yaml configure file of the experiment|
| --webuiport, -w| False| 8080|assign a port for webui|
* __nnictl resume__
* Description
You can use this command to resume a stopped experiment.
* Usage
nnictl resume [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --experiment, -e| False| |ID of the experiment you want to resume|
* __nnictl stop__
* Description
You can use this command to stop a running experiment.
* Usage
nnictl stop
* __nnictl update__
* __nnictl update searchspace__
* Description
You can use this command to update an experiment's searchspace.
* Usage
nnictl update searchspace [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --filename, -f| True| |the file storing your new search space|
* __nnictl update concurrency__
* Description
You can use this command to update an experiment's concurrency.
* Usage
nnictl update concurrency [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --value, -v| True| |the number of allowed concurrent trials|
* __nnictl update duration__
* Description
You can use this command to update an experiment's concurrency.
* Usage
nnictl update duration [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --value, -v| True| |the experiment duration will be NUMBER seconds. SUFFIX may be 's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.|
* __nnictl trial__
* __nnictl trial ls__
* Description
You can use this command to show trial's information.
* Usage
nnictl trial ls
* __nnictl trial kill__
* Description
You can use this command to kill a trial job.
* Usage
nnictl trial kill [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --trialid, -t| True| |ID of the trial you want to kill.|
### Manage WebUI
* __nnictl webui start__
* Description
Start web ui function for nni, and will get a url list, you can open any of the url to see nni web page.
* Usage
nnictl webui start [OPTIONS]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --port, -p| False| 8080|assign a port for webui|
* __nnictl webui stop__
* Description
Stop web ui function, and release url occupied. If you want to start again, use 'nnictl start webui' command
* Usage
nnictl webui stop
* __nnictl webui url__
* Description
Show the urls of web ui.
* Usage
nnictl webui url
### Manage experiment information
* __nnictl experiment ls__
* Description
Show the information of experiment.
* Usage
nnictl experiment ls
* __nnictl config ls__
* Description
Display the current context information.
* Usage
nnictl config ls
### Manage restful server
* __nnictl rest check__
* Description
Check the status of restful server
* Usage
nnictl rest check
### Manage log
* __nnictl log stdout__
* Description
Show the stdout log content.
* Usage
nnictl log stdout [options]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --head, -h| False| |show head lines of stdout|
| --tail, -t| False| |show tail lines of stdout|
| --path, -p| False| |show the path of stdout file|
* __nnictl log stderr__
* Description
Show the stderr log content.
* Usage
nnictl log stderr [options]
Options:
| Name, shorthand | Required|Default | Description |
| ------ | ------ | ------ |------ |
| --head, -h| False| |show head lines of stderr|
| --tail, -t| False| |show tail lines of stderr|
| --path, -p| False| |show the path of stderr file|
\ No newline at end of file
**Run an Experiment on Multiple Machines**
===
NNI supports running an experiment on multiple machines, called remote machine mode. Let's say you have multiple machines with the account `bob` (Note: the account is not necessarily the same on multiple machines):
| IP | Username | Password |
| --------|---------|-------|
| 10.1.1.1 | bob | bob123 |
| 10.1.1.2 | bob | bob123 |
| 10.1.1.3 | bob | bob123 |
## Setup environment
Install NNI on each of your machines following the install guide [here](GetStarted.md).
## Run an experiment
Still using `examples/trials/mnist-annotation` as an example here. The yaml file you need is shown below:
```
authorName: your_name
experimentName: auto_mnist
# how many trials could be concurrently running
trialConcurrency: 2
# maximum experiment running duration
maxExecDuration: 3h
# empty means never stop
maxTrialNum: 100
# choice: local, remote
trainingServicePlatform: local
# choice: true, false
useAnnotation: true
tuner:
tunerName: TPE
optimizationMode: Maximize
trial:
trialCommand: python mnist.py
trialCodeDir: /usr/share/nni/examples/trials/mnist-annotation
trialGpuNum: 0
#machineList can be empty if the platform is local
machineList:
- ip: 10.1.1.1
username: bob
passwd: bob123
- ip: 10.1.1.2
username: bob
passwd: bob123
- ip: 10.1.1.3
username: bob
passwd: bob123
```
Simply filling the `machineList` section. This yaml file is named `exp_remote.yaml`, then run:
```
nnictl create --config exp_remote.yaml
```
to start the experiment. This command can be executed on one of those three machines above, and can also be executed on another machine which has NNI installed and has network accessibility to those three machines.
\ No newline at end of file
**Write a Trial which can Run on NNI**
===
There would be only a few changes on your existing trial(model) code to make the code runnable on NNI. We provide two approaches for you to modify your code: `Python annotation` and `NNI APIs for trial`
## Python annotation
We designed a new syntax for users to annotation which variable they want to tune and in what range they want to tune the variable. Also, they can annotate which variable they want to report as intermediate result to `assessor`, and which variable to report as the final result (e.g. model accuracy) to `tuner`. A really appealing feature of our python annotation is that it exists as comments in your code, which means you can run your code as before without NNI. Let's look at an example, below is a piece of tensorflow code:
```
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
batch_size = 128
for i in range(10000):
batch = mnist.train.next_batch(batch_size)
dropout_rate = 0.5
mnist_network.train_step.run(feed_dict={mnist_network.images: batch[0],
mnist_network.labels: batch[1],
mnist_network.keep_prob: dropout_rate})
if i % 100 == 0:
test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
mnist_network.keep_prob: 1.0})
test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
mnist_network.keep_prob: 1.0})
```
Let's say you want to tune batch\_size and dropout\_rate, and report test\_acc every 100 steps, at last report test\_acc as final result. With our python annotation, your code would look like below:
```
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
"""@nni.variable(nni.choice(50, 250, 500), name=batch_size)"""
batch_size = 128
for i in range(10000):
batch = mnist.train.next_batch(batch_size)
"""@nni.variable(nni.choice(1, 5), name=dropout_rate)"""
dropout_rate = 0.5
mnist_network.train_step.run(feed_dict={mnist_network.images: batch[0],
mnist_network.labels: batch[1],
mnist_network.keep_prob: dropout_rate})
if i % 100 == 0:
test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
mnist_network.keep_prob: 1.0})
"""@nni.report_intermediate_result(test_acc)"""
test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
mnist_network.keep_prob: 1.0})
"""@nni.report_final_result(test_acc)"""
```
Simply adding four lines would make your code runnable on NNI. You can still run your code independently. `@nni.variable` works on its next line assignment, and `@nni.report_intermediate_result`/`@nni.report_final_result` would send the data to assessor/tuner at that line. Please refer to [here](../tools/annotation/README.md) for more annotation syntax and more powerful usage. In the yaml configure file, you need one line to enable Python annotation:
```
useAnnotation: true
```
## NNI APIs for trial
We also support NNI APIs for trial code. By using this approach, you should first prepare a search space file. An example is shown below:
```
{
"dropout_rate":{"_type":"uniform","_value":[0.1,0.5]},
"conv_size":{"_type":"choice","_value":[2,3,5,7]},
"hidden_size":{"_type":"choice","_value":[124, 512, 1024]},
"learning_rate":{"_type":"uniform","_value":[0.0001, 0.1]}
}
```
You can refer to [here]() for the tutorial of search space.
Then, include `import nni` in your trial code to use APIs. Using the line:
```
RECEIVED_PARAMS = nni.get_parameters()
```
to get hyper-parameters' values assigned by tuner. `RECEIVED_PARAMS` is a json object, for example:
```
{'conv_size': 2, 'hidden_size': 124, 'learning_rate': 0.0307, 'dropout_rate': 0.2029}
```
On the other hand, you can use the API: `nni.report_intermediate_result(accuracy)` to send `accuracy` to assessor. And use `nni.report_final_result(accuracy)` to send `accuracy` to tuner. Here `accuracy` could be any python data type, but **NOTE that if you use built-in tuner/assessor, `accuracy` should be a number (e.g. float, int)**.
In the yaml configure file, you need two lines to enable NNI APIs:
```
useAnnotation: false
searchSpacePath: /path/to/your/search_space.json
```
You can refer to [here](../examples/trials/README.md) for more information about how to write trial code using NNI APIs.
\ No newline at end of file
# Customized Assessor for Experts
*Assessor receive intermediate result from Trial and decide whether the Trial should be killed. Once the Trial experiment meets the early stop conditions, the assessor will kill the Trial.*
So, if user want to implement a customized Assessor, she/he only need to:
**1) Inherit a tuner of a base Tuner class**
```python
from nni.assessor import Assessor
class CustomizedAssessor(Assessor):
def __init__(self, ...):
...
```
**2) Implement assess trial function**
```python
from nni.assessor import Assessor, AssessResult
class CustomizedAssessor(Assessor):
def __init__(self, ...):
...
def assess_trial(self, trial_history):
"""
Determines whether a trial should be killed. Must override.
trial_history: a list of intermediate result objects.
Returns AssessResult.Good or AssessResult.Bad.
"""
# you code implement here.
...
```
**3) Write a script to run Tuner**
```python
import argparse
import CustomizedAssesor
def main():
parser = argparse.ArgumentParser(description='parse command line parameters.')
# parse your assessor arg here.
...
FLAGS, unparsed = parser.parse_known_args()
tuner = CustomizedAssessor(...)
tuner.run()
main()
```
Please noted in 2). The object ```trial_history``` are exact the object that Trial send to Assesor by using SDK ```report_intermediate_result``` function.
Also, user could override the ```run``` function in Assessor to control the process logic.
More detail example you could see:
> * [Base-Assessor](https://msrasrg.visualstudio.com/NeuralNetworkIntelligenceOpenSource/_git/Default?_a=contents&path=%2Fsrc%2Fsdk%2Fpynni%2Fnni%2Fassessor.py&version=GBadd_readme)
authorName:
experimentName:
trialConcurrency:
maxExecDuration:
maxTrialNum:
#choice: local, remote
trainingServicePlatform:
searchSpacePath:
#choice: true, false
useAnnotation:
tuner:
tunerCommand:
tunerCwd:
tunerGpuNum:
assessor:
assessorCommand:
assessorCwd:
assessorGpuNum:
trial:
trialCommand:
trialCodeDir:
trialGpuNum:
#machineList can be empty if the platform is local
machineList:
- ip:
port:
username:
passwd:
\ No newline at end of file
# How to write a Trial running on NNI?
*Trial receive the hyper-parameter/architecture configure from Tuner, and send intermediate result to Assessor and final result to Tuner.*
So when user want to write a Trial running on NNI, she/he should:
**1)Have an original Trial could run**,
Trial's code could be any machine learning code that could run in local. Here we use ```mnist-keras.py``` as example:
```python
import argparse
import logging
import keras
import numpy as np
from keras import backend as K
from keras.datasets import mnist
from keras.layers import Conv2D, Dense, Flatten, MaxPooling2D
from keras.models import Sequential
K.set_image_data_format('channels_last')
H, W = 28, 28
NUM_CLASSES = 10
def create_mnist_model(hyper_params, input_shape=(H, W, 1), num_classes=NUM_CLASSES):
layers = [
Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
Flatten(),
Dense(100, activation='relu'),
Dense(num_classes, activation='softmax')
]
model = Sequential(layers)
if hyper_params['optimizer'] == 'Adam':
optimizer = keras.optimizers.Adam(lr=hyper_params['learning_rate'])
else:
optimizer = keras.optimizers.SGD(lr=hyper_params['learning_rate'], momentum=0.9)
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=optimizer, metrics=['accuracy'])
return model
def load_mnist_data(args):
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = (np.expand_dims(x_train, -1).astype(np.float) / 255.)[:args.num_train]
x_test = (np.expand_dims(x_test, -1).astype(np.float) / 255.)[:args.num_test]
y_train = keras.utils.to_categorical(y_train, NUM_CLASSES)[:args.num_train]
y_test = keras.utils.to_categorical(y_test, NUM_CLASSES)[:args.num_test]
return x_train, y_train, x_test, y_test
class SendMetrics(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
pass
def train(args, params):
x_train, y_train, x_test, y_test = load_mnist_data(args)
model = create_mnist_model(params)
model.fit(x_train, y_train, batch_size=args.batch_size, epochs=args.epochs, verbose=1,
validation_data=(x_test, y_test), callbacks=[SendMetrics()])
_, acc = model.evaluate(x_test, y_test, verbose=0)
def generate_default_params():
return {
'optimizer': 'Adam',
'learning_rate': 0.001
}
if __name__ == '__main__':
PARSER = argparse.ArgumentParser()
PARSER.add_argument("--batch_size", type=int, default=200, help="batch size", required=False)
PARSER.add_argument("--epochs", type=int, default=10, help="Train epochs", required=False)
PARSER.add_argument("--num_train", type=int, default=1000, help="Number of train samples to be used, maximum 60000", required=False)
PARSER.add_argument("--num_test", type=int, default=1000, help="Number of test samples to be used, maximum 10000", required=False)
ARGS, UNKNOWN = PARSER.parse_known_args()
PARAMS = generate_default_params()
train(ARGS, PARAMS)
```
**2)Get configure from Tuner**
User import ```nni``` and use ```nni.get_parameters()``` to recive configure. Please noted **10**, **24** and **25** line in the following code.
```python
import argparse
import logging
import keras
import numpy as np
from keras import backend as K
from keras.datasets import mnist
from keras.layers import Conv2D, Dense, Flatten, MaxPooling2D
from keras.models import Sequential
import nni
...
if __name__ == '__main__':
PARSER = argparse.ArgumentParser()
PARSER.add_argument("--batch_size", type=int, default=200, help="batch size", required=False)
PARSER.add_argument("--epochs", type=int, default=10, help="Train epochs", required=False)
PARSER.add_argument("--num_train", type=int, default=1000, help="Number of train samples to be used, maximum 60000", required=False)
PARSER.add_argument("--num_test", type=int, default=1000, help="Number of test samples to be used, maximum 10000", required=False)
ARGS, UNKNOWN = PARSER.parse_known_args()
PARAMS = generate_default_params()
RECEIVED_PARAMS = nni.get_parameters()
PARAMS.update(RECEIVED_PARAMS)
train(ARGS, PARAMS)
```
**3) Send intermediate result**
Use ```nni.report_intermediate_result``` to send intermediate result to Assessor. Please noted **5** line in the following code.
```python
...
class SendMetrics(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
nni.report_intermediate_result(logs)
def train(args, params):
x_train, y_train, x_test, y_test = load_mnist_data(args)
model = create_mnist_model(params)
model.fit(x_train, y_train, batch_size=args.batch_size, epochs=args.epochs, verbose=1,
validation_data=(x_test, y_test), callbacks=[SendMetrics()])
_, acc = model.evaluate(x_test, y_test, verbose=0)
...
```
**4) Send final result**
Use ```nni.report_final_result``` to send final result to Trial. Please noted **15** line in the following code.
```python
...
class SendMetrics(keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
nni.report_intermediate_result(logs)
def train(args, params):
x_train, y_train, x_test, y_test = load_mnist_data(args)
model = create_mnist_model(params)
model.fit(x_train, y_train, batch_size=args.batch_size, epochs=args.epochs, verbose=1,
validation_data=(x_test, y_test), callbacks=[SendMetrics()])
_, acc = model.evaluate(x_test, y_test, verbose=0)
nni.report_final_result(acc)
...
```
Here is the complete exampe:
```python
import argparse
import logging
import keras
import numpy as np
from keras import backend as K
from keras.datasets import mnist
from keras.layers import Conv2D, Dense, Flatten, MaxPooling2D
from keras.models import Sequential
import nni
LOG = logging.getLogger('mnist_keras')
K.set_image_data_format('channels_last')
H, W = 28, 28
NUM_CLASSES = 10
def create_mnist_model(hyper_params, input_shape=(H, W, 1), num_classes=NUM_CLASSES):
'''
Create simple convolutional model
'''
layers = [
Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
Flatten(),
Dense(100, activation='relu'),
Dense(num_classes, activation='softmax')
]
model = Sequential(layers)
if hyper_params['optimizer'] == 'Adam':
optimizer = keras.optimizers.Adam(lr=hyper_params['learning_rate'])
else:
optimizer = keras.optimizers.SGD(lr=hyper_params['learning_rate'], momentum=0.9)
model.compile(loss=keras.losses.categorical_crossentropy, optimizer=optimizer, metrics=['accuracy'])
return model
def load_mnist_data(args):
'''
Load MNIST dataset
'''
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = (np.expand_dims(x_train, -1).astype(np.float) / 255.)[:args.num_train]
x_test = (np.expand_dims(x_test, -1).astype(np.float) / 255.)[:args.num_test]
y_train = keras.utils.to_categorical(y_train, NUM_CLASSES)[:args.num_train]
y_test = keras.utils.to_categorical(y_test, NUM_CLASSES)[:args.num_test]
LOG.debug('x_train shape: %s', (x_train.shape,))
LOG.debug('x_test shape: %s', (x_test.shape,))
return x_train, y_train, x_test, y_test
class SendMetrics(keras.callbacks.Callback):
'''
Keras callback to send metrics to NNI framework
'''
def on_epoch_end(self, epoch, logs={}):
'''
Run on end of each epoch
'''
LOG.debug(logs)
nni.report_intermediate_result(logs)
def train(args, params):
'''
Train model
'''
x_train, y_train, x_test, y_test = load_mnist_data(args)
model = create_mnist_model(params)
model.fit(x_train, y_train, batch_size=args.batch_size, epochs=args.epochs, verbose=1,
validation_data=(x_test, y_test), callbacks=[SendMetrics()])
_, acc = model.evaluate(x_test, y_test, verbose=0)
LOG.debug('Final result is: %d', acc)
nni.report_final_result(acc)
def generate_default_params():
'''
Generate default hyper parameters
'''
return {
'optimizer': 'Adam',
'learning_rate': 0.001
}
if __name__ == '__main__':
PARSER = argparse.ArgumentParser()
PARSER.add_argument("--batch_size", type=int, default=200, help="batch size", required=False)
PARSER.add_argument("--epochs", type=int, default=10, help="Train epochs", required=False)
PARSER.add_argument("--num_train", type=int, default=1000, help="Number of train samples to be used, maximum 60000", required=False)
PARSER.add_argument("--num_test", type=int, default=1000, help="Number of test samples to be used, maximum 10000", required=False)
ARGS, UNKNOWN = PARSER.parse_known_args()
try:
# get parameters from tuner
RECEIVED_PARAMS = nni.get_parameters()
LOG.debug(RECEIVED_PARAMS)
PARAMS = generate_default_params()
PARAMS.update(RECEIVED_PARAMS)
# train
train(ARGS, PARAMS)
except Exception as e:
LOG.exception(e)
raise
```
\ No newline at end of file
# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge,
# to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
"""Builds the CIFAR-10 network.
Summary of available functions:
# Compute input images and labels for training. If you would like to run
# evaluations, use inputs() instead.
inputs, labels = distorted_inputs()
# Compute inference on the model inputs to make a prediction.
predictions = inference(inputs)
# Compute the total loss of the prediction with respect to the labels.
loss = loss(predictions, labels)
# Create a graph to run one step of training with respect to the loss.
train_op = train(loss, global_step)
"""
import argparse
import os
import re
import tarfile
import urllib
import tensorflow as tf
import cifar10_input
parser = argparse.ArgumentParser()
# Basic model parameters.
parser.add_argument('--batch_size', type=int, default=512,
help='Number of images to process in a batch.')
parser.add_argument('--data_dir', type=str, default='/tmp/cifar10_data',
help='Path to the CIFAR-10 data directory.')
parser.add_argument('--use_fp16', type=bool, default=False,
help='Train the model using fp16.')
FLAGS = parser.parse_args()
# Global constants describing the CIFAR-10 data set.
IMAGE_SIZE = cifar10_input.IMAGE_SIZE
NUM_CLASSES = cifar10_input.NUM_CLASSES
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = cifar10_input.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = cifar10_input.NUM_EXAMPLES_PER_EPOCH_FOR_EVAL
# Constants describing the training process.
MOVING_AVERAGE_DECAY = 0.9999 # The decay to use for the moving average.
NUM_EPOCHS_PER_DECAY = 350.0 # Epochs after which learning rate decays.
LEARNING_RATE_DECAY_FACTOR = 0.1 # Learning rate decay factor.
INITIAL_LEARNING_RATE = 0.1 # Initial learning rate.
# If a model is trained with multiple GPUs, prefix all Op names with tower_name
# to differentiate the operations. Note that this prefix is removed from the
# names of the summaries when visualizing a model.
TOWER_NAME = 'tower'
DATA_URL = 'http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz'
def _activation_summary(x_input):
"""Helper to create summaries for activations.
Creates a summary that provides a histogram of activations.
Creates a summary that measures the sparsity of activations.
Args:
x_input: Tensor
Returns:
nothing
"""
# Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training
# session. This helps the clarity of presentation on tensorboard.
tensor_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', x_input.op.name)
tf.summary.histogram(tensor_name + '/activations', x_input)
tf.summary.scalar(tensor_name + '/sparsity', tf.nn.zero_fraction(x_input))
def _variable_on_cpu(name, shape, initializer):
"""Helper to create a Variable stored on CPU memory.
Args:
name: name of the variable
shape: list of ints
initializer: initializer for Variable
Returns:
Variable Tensor
"""
with tf.device('/cpu:0'):
dtype = tf.float16 if FLAGS.use_fp16 else tf.float32
var = tf.get_variable(
name, shape, initializer=initializer, dtype=dtype)
return var
def _variable_with_weight_decay(name, shape, stddev, l2loss_wd):
"""Helper to create an initialized Variable with weight decay.
Note that the Variable is initialized with a truncated normal distribution.
A weight decay is added only if one is specified.
Args:
name: name of the variable
shape: list of ints
stddev: standard deviation of a truncated Gaussian
l2loss_wd: add L2Loss weight decay multiplied by this float. If None, weight
decay is not added for this Variable.
Returns:
Variable Tensor
"""
dtype = tf.float16 if FLAGS.use_fp16 else tf.float32
var = _variable_on_cpu(name, shape,
tf.truncated_normal_initializer(stddev=stddev, dtype=dtype))
if l2loss_wd is not None:
weight_decay = tf.multiply(tf.nn.l2_loss(var), l2loss_wd, name='weight_loss')
tf.add_to_collection('losses', weight_decay)
return var
def distorted_inputs():
"""Construct distorted input for CIFAR training using the Reader ops.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
Raises:
ValueError: If no data_dir
"""
if not FLAGS.data_dir:
raise ValueError('Please supply a data_dir')
FLAGS.data_dir = './'
print(FLAGS.data_dir)
data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')
images, labels = cifar10_input.distorted_inputs(data_dir=data_dir,
batch_size=FLAGS.batch_size)
if FLAGS.use_fp16:
images = tf.cast(images, tf.float16)
labels = tf.cast(labels, tf.float16)
return images, labels
def inputs(eval_data):
"""Construct input for CIFAR evaluation using the Reader ops.
Args:
eval_data: bool, indicating if one should use the train or eval data set.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
Raises:
ValueError: If no data_dir
"""
FLAGS.data_dir = './'
if not FLAGS.data_dir:
raise ValueError('Please supply a data_dir')
data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')
if eval_data is None:
images, labels = cifar10_input.distorted_inputs(data_dir=data_dir,
batch_size=FLAGS.batch_size)
else:
images, labels = cifar10_input.inputs(eval_data=eval_data,
data_dir=data_dir,
batch_size=FLAGS.batch_size)
if FLAGS.use_fp16:
images = tf.cast(images, tf.float16)
labels = tf.cast(labels, tf.float16)
return images, labels
def maybe_download_and_extract():
"""
Download and extract the tarball from Alex's website.
"""
FLAGS.data_dir = './'
dest_directory = FLAGS.data_dir
print(dest_directory)
if not os.path.exists(dest_directory):
os.makedirs(dest_directory)
filename = DATA_URL.split('/')[-1]
filepath = os.path.join(dest_directory, filename)
if not os.path.exists(filepath):
def _progress(count, block_size, total_size):
print('\r>> Downloading %s %.1f%%' % (filename, float(
count * block_size) / float(total_size) * 100.0))
filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress)
print()
statinfo = os.stat(filepath)
print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
extracted_dir_path = os.path.join(dest_directory, 'cifar-10-batches-bin')
if not os.path.exists(extracted_dir_path):
tarfile.open(filepath, 'r:gz').extractall(dest_directory)
# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge,
# to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
"""Routine for decoding the CIFAR-10 binary file format."""
import os
import types
import tensorflow as tf
# Process images of this size. Note that this differs from the original CIFAR
# image size of 32 x 32. If one alters this number, then the entire model
# architecture will change and any model would need to be retrained.
IMAGE_SIZE = 24
# Global constants describing the CIFAR-10 data set.
NUM_CLASSES = 10
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 10000
def read_cifar10(filename_queue):
"""
Reads and parses examples from CIFAR10 data files.
Recommendation: if you want N-way read parallelism, call this function N times.
This will give you N independent Readers reading different
files & positions within those files, which will give better mixing of
examples.
Args:
filename_queue: A queue of strings with the filenames to read from.
Returns:
An object representing a single example, with the following fields:
height: number of rows in the result (32)
width: number of columns in the result (32)
depth: number of color channels in the result (3)
key: a scalar string Tensor describing the filename & record number
for this example.
label: an int32 Tensor with the label in the range 0..9.
uint8image: a [height, width, depth] uint8 Tensor with the image data
"""
result = types.SimpleNamespace()
# Dimensions of the images in the CIFAR-10 dataset.
# See http://www.cs.toronto.edu/~kriz/cifar.html for a description of the
# input format.
label_bytes = 1 # 2 for CIFAR-100
result.height = 32
result.width = 32
result.depth = 3
image_bytes = result.height * result.width * result.depth
# Every record consists of a label followed by the image, with a
# fixed number of bytes for each.
record_bytes = label_bytes + image_bytes
# Read a record, getting filenames from the filename_queue. No
# header or footer in the CIFAR-10 format, so we leave header_bytes
# and footer_bytes at their default of 0.
reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
result.key, value = reader.read(filename_queue)
# Convert from a string to a vector of uint8 that is record_bytes long.
record_bytes = tf.decode_raw(value, tf.uint8)
# The first bytes represent the label, which we convert from uint8->int32.
result.label = tf.cast(tf.strided_slice(
record_bytes, [0], [label_bytes]), tf.int32)
# The remaining bytes after the label represent the image, which we reshape
# from [depth * height * width] to [depth, height, width].
depth_major = tf.reshape(tf.strided_slice(record_bytes,
[label_bytes],
[label_bytes + image_bytes]),
[result.depth, result.height, result.width])
# Convert from [depth, height, width] to [height, width, depth].
result.uint8image = tf.transpose(depth_major, [1, 2, 0])
return result
def _generate_image_and_label_batch(image, label, min_queue_examples,
batch_size, shuffle):
"""
Construct a queued batch of images and labels.
Args:
image: 3-D Tensor of [height, width, 3] of type.float32.
label: 1-D Tensor of type.int32
min_queue_examples: int32, minimum number of samples to retain
in the queue that provides of batches of examples.
batch_size: Number of images per batch.
shuffle: boolean indicating whether to use a shuffling queue.
Returns:
images: Images. 4D tensor of [batch_size, height, width, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
# Create a queue that shuffles the examples, and then
# read 'batch_size' images + labels from the example queue.
num_preprocess_threads = 16
if shuffle:
images, label_batch = tf.train.shuffle_batch(
[image, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size,
min_after_dequeue=min_queue_examples)
else:
images, label_batch = tf.train.batch(
[image, label],
batch_size=batch_size,
num_threads=num_preprocess_threads,
capacity=min_queue_examples + 3 * batch_size)
return images, tf.reshape(label_batch, [batch_size])
def distorted_inputs(data_dir, batch_size):
"""Construct distorted input for CIFAR training using the Reader ops.
Args:
data_dir: Path to the CIFAR-10 data directory.
batch_size: Number of images per batch.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
for i in range(1, 6)]
for file in filenames:
if not tf.gfile.Exists(file):
raise ValueError('Failed to find file: ' + file)
# Create a queue that produces the filenames to read.
filename_queue = tf.train.string_input_producer(filenames)
# Read examples from files in the filename queue.
read_input = read_cifar10(filename_queue)
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
height = IMAGE_SIZE
width = IMAGE_SIZE
# Image processing for training the network. Note the many random
# distortions applied to the image.
# Randomly crop a [height, width] section of the image.
distorted_image = tf.random_crop(reshaped_image, [height, width, 3])
# Randomly flip the image horizontally.
distorted_image = tf.image.random_flip_left_right(distorted_image)
# Because these operations are not commutative, consider randomizing
# the order their operation.
# NOTE: since per_image_standardization zeros the mean and makes
# the stddev unit, this likely has no effect see tensorflow#1458.
distorted_image = tf.image.random_brightness(distorted_image, max_delta=63)
distorted_image = tf.image.random_contrast(
distorted_image, lower=0.2, upper=1.8)
# Subtract off the mean and divide by the variance of the pixels.
float_image = tf.image.per_image_standardization(distorted_image)
# Set the shapes of tensors.
float_image.set_shape([height, width, 3])
read_input.label.set_shape([1])
# Ensure that the random shuffling has good mixing properties.
min_fraction_of_examples = 0.4
min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
min_fraction_of_examples)
# Generate a batch of images and labels by building up a queue of examples.
return _generate_image_and_label_batch(float_image, read_input.label,
min_queue_examples, batch_size,
shuffle=True)
def inputs(eval_data, data_dir, batch_size):
"""Construct input for CIFAR evaluation using the Reader ops.
Args:
eval_data: bool, indicating if one should use the train or eval data set.
data_dir: Path to the CIFAR-10 data directory.
batch_size: Number of images per batch.
Returns:
images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
labels: Labels. 1D tensor of [batch_size] size.
"""
if not eval_data:
filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
for i in range(1, 6)]
num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
else:
filenames = [os.path.join(data_dir, 'test_batch.bin')]
num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_EVAL
for file in filenames:
if not tf.gfile.Exists(file):
raise ValueError('Failed to find file: ' + file)
# Create a queue that produces the filenames to read.
filename_queue = tf.train.string_input_producer(filenames)
# Read examples from files in the filename queue.
read_input = read_cifar10(filename_queue)
reshaped_image = tf.cast(read_input.uint8image, tf.float32)
height = IMAGE_SIZE
width = IMAGE_SIZE
# Image processing for evaluation.
# Crop the central [height, width] of the image.
resized_image = tf.image.resize_image_with_crop_or_pad(
reshaped_image, height, width)
# Subtract off the mean and divide by the variance of the pixels.
float_image = tf.image.per_image_standardization(resized_image)
# Set the shapes of tensors.
float_image.set_shape([height, width, 3])
read_input.label.set_shape([1])
# Ensure that the random shuffling has good mixing properties.
min_fraction_of_examples = 0.4
min_queue_examples = int(num_examples_per_epoch *
min_fraction_of_examples)
# Generate a batch of images and labels by building up a queue of examples.
return _generate_image_and_label_batch(float_image, read_input.label,
min_queue_examples, batch_size,
shuffle=False)
# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge,
# to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
'''
Building the cifar10 network, run and send result to NNI.
'''
import logging
import tensorflow as tf
import nni
import cifar10
_logger = logging.getLogger("cifar10_automl")
NUM_CLASS = 10
MAX_BATCH_NUM = 5000
#MAX_BATCH_NUM = 50
def activation_functions(act):
'''
Choose activation function by index
'''
if act == 1:
return tf.nn.softmax
if act == 2:
return tf.nn.tanh
if act == 3:
return tf.nn.relu
if act == 4:
return tf.nn.relu
if act == 5:
return tf.nn.elu
if act == 6:
return tf.nn.leaky_relu
return None
def get_optimizer(opt):
'''
Return optimizer by index
'''
if opt == 1:
return tf.train.GradientDescentOptimizer
if opt == 2:
return tf.train.RMSPropOptimizer
if opt == 3:
return tf.train.AdagradOptimizer
if opt == 4:
return tf.train.AdadeltaOptimizer
if opt == 5:
return tf.train.AdamOptimizer
assert False
return None
class Cifar10(object):
'''
Class Cifar10 could build and run network for cifar10.
'''
def __init__(self):
# Place holder
self.is_train = tf.placeholder('int32')
self.keep_prob1 = tf.placeholder('float', name='xa')
self.keep_prob2 = tf.placeholder('float', name='xb')
self.accuracy = None
self.train_op = None
def build_network(self, config):
"""
Build network for CIFAR-10 and train.
"""
num_classes = NUM_CLASS
batch_size = config['batch_size']
num_units = config['conv_units_size']
conv_size = config['conv_size']
num_blocks = config['num_blocks']
initial_method = config['initial_method']
act_notlast = config['act_notlast']
pool_size = config['pool_size']
hidden_size = config['hidden_size']
act = config['act']
learning_rate = config['learning_rate']
opt = get_optimizer(config['optimizer'])
is_train = self.is_train
keep_prob1 = self.keep_prob1
keep_prob2 = self.keep_prob2
# Get images and labels for CIFAR-10.
with tf.device('/cpu:0'):
images, labels = cifar10.distorted_inputs()
test_images, test_labels = cifar10.inputs('test')
# Choose test set or train set by is_train
images = images * tf.cast(is_train, tf.float32) + \
(1-tf.cast(is_train, tf.float32)) * test_images
labels = labels * is_train + (1 - is_train) * test_labels
input_vec = tf.slice(images, [0, 0, 0, 0], [batch_size, 24, 24, 3])
output = tf.slice(labels, [0], [batch_size])
output = tf.one_hot(output, num_classes)
input_units = 3
for num in range(num_blocks):
if initial_method == 1:
conv_layer = tf.Variable(tf.truncated_normal(shape=[conv_size, conv_size,
input_units, num_units],
stddev=1.0 / num_units))
else:
conv_layer = tf.Variable(tf.random_uniform(shape=[conv_size, conv_size,
input_units, num_units],
minval=-0.05, maxval=0.05))
input_units = num_units
input_vec = tf.nn.conv2d(input_vec, conv_layer, strides=[1, 1, 1, 1], padding='SAME')
act_no_f = activation_functions(act_notlast)
input_vec = act_no_f(input_vec)
input_vec = tf.layers.batch_normalization(input_vec)
input_vec = tf.nn.dropout(input_vec, keep_prob=keep_prob1)
if num >= num_blocks - 2:
input_vec = tf.nn.max_pool(input_vec, ksize=[1, pool_size, pool_size, 1],
strides=[1, 2, 2, 1], padding='SAME')
num_units = num_units * 2
input_vec = tf.contrib.layers.flatten(input_vec)
input_vec = tf.layers.dense(
input_vec, hidden_size, activation=activation_functions(act))
input_vec = tf.layers.batch_normalization(input_vec)
input_vec = tf.nn.dropout(input_vec, keep_prob=keep_prob2)
input_vec = tf.layers.dense(input_vec, num_classes)
logit = tf.nn.softmax_cross_entropy_with_logits(
logits=input_vec, labels=output)
loss = tf.reduce_mean(logit)
accuracy = tf.equal(tf.argmax(input_vec, 1), tf.argmax(output, 1))
self.accuracy = tf.reduce_mean(
tf.cast(accuracy, "float")) # add a reduce_mean
self.train_op = opt(learning_rate=learning_rate).minimize(loss)
def train(self, config):
"""
train the cifar10 network
"""
_logger.debug('Config is: %s', str(config))
assert config['batch_size']
assert config['conv_units_size']
assert config['conv_size']
assert config['num_blocks']
assert config['initial_method']
assert config['act_notlast']
assert config['pool_size']
assert config['hidden_size']
assert config['act']
assert config['dropout']
assert config['learning_rate']
assert config['optimizer']
self.build_network(config)
with tf.Session() as sess:
# Initialize variables
tf.initialize_all_variables().run()
_logger.debug('Initialize all variables done.')
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess, coord)
cnt = 0
for cnt in range(MAX_BATCH_NUM):
cnt = cnt + 1
if cnt % 2000 == 0:
_logger.debug('Runing in batch %s', str(cnt))
acc = sess.run(self.accuracy, feed_dict={self.is_train: 0,
self.keep_prob1: 1.0,
self.keep_prob2: 1.0})
# Send intermediate result
nni.report_intermediate_result(acc)
_logger.debug('Report intermediate result done.')
sess.run(self.train_op, feed_dict={self.is_train: 1,
self.keep_prob1: 1 - config['dropout'],
self.keep_prob2: config['dropout']})
coord.request_stop()
coord.join(threads)
# Send final result
nni.report_final_result(acc)
_logger.debug('Training cifar10 done.')
def get_default_params():
'''
Return default parameters.
'''
config = {}
config['learning_rate'] = 0.1
config['batch_size'] = 512
config['num_epochs'] = 100
config['dropout'] = 0.5
config['hidden_size'] = 1682
config['conv_size'] = 5
config['num_blocks'] = 3
config['conv_units_size'] = 32
config['pool_size'] = 3
config['act_notlast'] = 5
config['act'] = 2
config['optimizer'] = 5
config['initial_method'] = 2
return config
if __name__ == '__main__':
try:
RCV_CONFIG = nni.get_parameters()
_logger.debug(RCV_CONFIG)
cifar10.maybe_download_and_extract()
train_cifar10 = Cifar10()
params = get_default_params()
params.update(RCV_CONFIG)
train_cifar10.train(params)
except Exception as exception:
_logger.exception(exception)
raise
authorName: default
experimentName: example_cifar10
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 1
#choice: local, remote
trainingServicePlatform: local
searchSpacePath: /usr/share/nni/examples/trials/cifar10/search_space.json
#choice: true, false
useAnnotation: false
tuner:
#choice: TPE, Random, Anneal, Evolution
tunerName: TPE
#choice: Maximize, Minimize
optimizationMode: Maximize
trial:
trialCommand: python3 cifar10.py
trialCodeDir: /usr/share/nni/examples/trials/cifar10
trialGpuNum: 0
\ No newline at end of file
{
"dropout":{"_type":"uniform","_value":[0, 1]},
"dropout_notlast":{"_type":"uniform","_value":[0, 1]},
"learning_rate":{"_type":"uniform", "_value":[0.0001, 1]},
"batch_size":{"_type":"choice", "_value":[50, 100, 200, 300, 400, 500]},
"hidden_size":{"_type":"choice", "_value":[100, 200, 500, 1000, 2000]},
"conv_size":{"_type":"choice", "_value":[1, 3, 5, 7]},
"conv_units_size":{"_type":"choice", "_value":[16, 32, 64]},
"num_blocks":{"_type":"choice", "_value":[1, 2, 3, 4, 5, 6, 7]},
"act_notlast":{"_type":"choice", "_value":[1, 2, 3, 4, 5, 6]},
"act":{"_type":"choice", "_value":[1, 2, 3, 4, 5, 6]},
"optimizer":{"_type":"choice", "_value": [1, 2, 3, 4, 5]},
"initial_method":{"_type":"choice", "_value":[1, 2]}
}
# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge,
# to any person obtaining a copy of this software and associated
# documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
import math
import tensorflow as tf
from tensorflow.python.ops.rnn_cell_impl import RNNCell
def _get_variable(variable_dict, name, shape, initializer=None, dtype=tf.float32):
if name not in variable_dict:
variable_dict[name] = tf.get_variable(
name=name, shape=shape, initializer=initializer, dtype=dtype)
return variable_dict[name]
def batch_linear_layer(matrix_a, matrix_b):
'''
shape of matrix_a is [*, batch, dima]
shape of matrix_b is [batch, dima, dimb]
result is [*, batch, dimb]
for each batch, do matrix_a linear op to last dim
'''
matrix_a = tf.expand_dims(matrix_a, -1)
while len(list(matrix_b.shape)) < len(list(matrix_a.shape)):
matrix_b = tf.expand_dims(matrix_b, 0)
return tf.reduce_sum(matrix_a * matrix_b, -2)
def split_last_dim(x, factor):
shape = tf.shape(x)
last_dim = int(x.shape[-1])
assert last_dim % factor == 0, \
"last dim isn't divisible by factor {%d} {%d}" % (last_dim, factor)
new_shape = tf.concat(
[shape[:-1], tf.constant([factor, last_dim // factor])], axis=0)
return tf.reshape(x, new_shape)
def merge_last2_dim(x):
shape = tf.shape(x)
last_dim = int(x.shape[-1]) * int(x.shape[-2])
new_shape = tf.concat([shape[:-2], tf.constant([last_dim])], axis=0)
return tf.reshape(x, new_shape)
class DotAttention:
'''
DotAttention
'''
def __init__(self, name,
hidden_dim,
is_vanilla=True,
is_identity_transform=False,
need_padding=False):
self._name = '/'.join([name, 'dot_att'])
self._hidden_dim = hidden_dim
self._is_identity_transform = is_identity_transform
self._need_padding = need_padding
self._is_vanilla = is_vanilla
self._var = {}
@property
def is_identity_transform(self):
return self._is_identity_transform
@property
def is_vanilla(self):
return self._is_vanilla
@property
def need_padding(self):
return self._need_padding
@property
def hidden_dim(self):
return self._hidden_dim
@property
def name(self):
return self._name
@property
def var(self):
return self._var
def _get_var(self, name, shape, initializer=None):
with tf.variable_scope(self.name):
return _get_variable(self.var, name, shape, initializer)
def _define_params(self, src_dim, tgt_dim):
hidden_dim = self.hidden_dim
self._get_var('W', [src_dim, hidden_dim])
if not self.is_vanilla:
self._get_var('V', [src_dim, hidden_dim])
if self.need_padding:
self._get_var('V_s', [src_dim, src_dim])
self._get_var('V_t', [tgt_dim, tgt_dim])
if not self.is_identity_transform:
self._get_var('T', [tgt_dim, src_dim])
self._get_var('U', [tgt_dim, hidden_dim])
self._get_var('b', [1, hidden_dim])
self._get_var('v', [hidden_dim, 1])
def get_pre_compute(self, s):
'''
:param s: [src_sequence, batch_size, src_dim]
:return: [src_sequence, batch_size. hidden_dim]
'''
hidden_dim = self.hidden_dim
src_dim = s.get_shape().as_list()[-1]
assert src_dim is not None, 'src dim must be defined'
W = self._get_var('W', shape=[src_dim, hidden_dim])
b = self._get_var('b', shape=[1, hidden_dim])
return tf.tensordot(s, W, [[2], [0]]) + b
def get_prob(self, src, tgt, mask, pre_compute, return_logits=False):
'''
:param s: [src_sequence_length, batch_size, src_dim]
:param h: [batch_size, tgt_dim] or [tgt_sequence_length, batch_size, tgt_dim]
:param mask: [src_sequence_length, batch_size]\
or [tgt_sequence_length, src_sequence_length, batch_sizse]
:param pre_compute: [src_sequence_length, batch_size, hidden_dim]
:return: [src_sequence_length, batch_size]\
or [tgt_sequence_length, src_sequence_length, batch_size]
'''
s_shape = src.get_shape().as_list()
h_shape = tgt.get_shape().as_list()
src_dim = s_shape[-1]
tgt_dim = h_shape[-1]
assert src_dim is not None, 'src dimension must be defined'
assert tgt_dim is not None, 'tgt dimension must be defined'
self._define_params(src_dim, tgt_dim)
if len(h_shape) == 2:
tgt = tf.expand_dims(tgt, 0)
if pre_compute is None:
pre_compute = self.get_pre_compute(src)
buf0 = pre_compute
buf1 = tf.tensordot(tgt, self.var['U'], axes=[[2], [0]])
buf2 = tf.tanh(tf.expand_dims(buf0, 0) + tf.expand_dims(buf1, 1))
if not self.is_vanilla:
xh1 = tgt
xh2 = tgt
s1 = src
if self.need_padding:
xh1 = tf.tensordot(xh1, self.var['V_t'], 1)
xh2 = tf.tensordot(xh2, self.var['S_t'], 1)
s1 = tf.tensordot(s1, self.var['V_s'], 1)
if not self.is_identity_transform:
xh1 = tf.tensordot(xh1, self.var['T'], 1)
xh2 = tf.tensordot(xh2, self.var['T'], 1)
buf3 = tf.expand_dims(s1, 0) * tf.expand_dims(xh1, 1)
buf3 = tf.tanh(tf.tensordot(buf3, self.var['V'], axes=[[3], [0]]))
buf = tf.reshape(tf.tanh(buf2 + buf3), shape=tf.shape(buf3))
else:
buf = buf2
v = self.var['v']
e = tf.tensordot(buf, v, [[3], [0]])
e = tf.squeeze(e, axis=[3])
tmp = tf.reshape(e + (mask - 1) * 10000.0, shape=tf.shape(e))
prob = tf.nn.softmax(tmp, 1)
if len(h_shape) == 2:
prob = tf.squeeze(prob, axis=[0])
tmp = tf.squeeze(tmp, axis=[0])
if return_logits:
return prob, tmp
return prob
def get_att(self, s, prob):
'''
:param s: [src_sequence_length, batch_size, src_dim]
:param prob: [src_sequence_length, batch_size]\
or [tgt_sequence_length, src_sequence_length, batch_size]
:return: [batch_size, src_dim] or [tgt_sequence_length, batch_size, src_dim]
'''
buf = s * tf.expand_dims(prob, axis=-1)
att = tf.reduce_sum(buf, axis=-3)
return att
class MultiHeadAttention:
'''
MultiHeadAttention.
'''
def __init__(self, name, hidden_dim, head, add=True, dot=True, divide=True):
self._name = '/'.join([name, 'dot_att'])
self._head = head
self._head_dim = hidden_dim // head
self._hidden_dim = self._head_dim * head
self._add = add
self._dot = dot
assert add or dot, "you must at least choose one between add and dot"
self._div = 1.0
if divide:
self._div = math.sqrt(self._head_dim)
self._var = {}
@property
def hidden_dim(self):
return self._head_dim * self._head
@property
def name(self):
return self._name
@property
def var(self):
return self._var
def _get_var(self, name, shape, initializer=None):
with tf.variable_scope(self.name):
return _get_variable(self.var, name, shape, initializer)
def _define_params(self, tgt_dim):
self._get_var('tgt_project', [tgt_dim, self._hidden_dim])
self._get_var('tgt_bias', [1, self._hidden_dim])
self._get_var('v', [self._head, self._head_dim, 1])
def get_pre_compute(self, src):
s_shape = src.get_shape().as_list()
src_dim = s_shape[-1]
src_project = self._get_var('src_project', [src_dim, self._hidden_dim])
src_bias = self._get_var('src_bias', [1, self._hidden_dim])
src = split_last_dim(tf.tensordot(src, src_project,
[[2], [0]]) + src_bias, self._head)
return src
def get_prob(self, src, tgt, mask, pre_compute):
'''
:param s: [src_sequence_length, batch_size, src_dim]
:param h: [batch_size, tgt_dim] or [tgt_sequence_length, batch_size, tgt_dim]
:param mask: [src_sequence_length, batch_size]\
or [tgt_sequence_length, src_sequence_length, batch_sizse]
:param pre_compute: [src_sequence_length, batch_size, hidden_dim]
:return: [src_sequence_length, batch_size]\
or [tgt_sequence_length, src_sequence_length, batch_size]
'''
s_shape = src.get_shape().as_list()
h_shape = tgt.get_shape().as_list()
src_dim = s_shape[-1]
tgt_dim = h_shape[-1]
print('src tgt dim: ', src_dim, tgt_dim)
assert src_dim is not None, 'src dimension must be defined'
assert tgt_dim is not None, 'tgt dimension must be defined'
self._define_params(tgt_dim)
if len(h_shape) == 2:
tgt = tf.expand_dims(tgt, 0)
tgt_project = self._var['tgt_project']
tgt_bias = self._var['tgt_bias']
if pre_compute is None:
pre_compute = self.get_pre_compute(src)
src = pre_compute
tgt = split_last_dim(tf.tensordot(tgt, tgt_project,
[[2], [0]]) + tgt_bias, self._head)
add_attention = 0
dot_attention = 0
if self._add:
buf = tf.tanh(tf.expand_dims(src, 0) + tf.expand_dims(tgt, 1))
v = self.var['v']
add_attention = tf.squeeze(batch_linear_layer(buf, v), -1)
if self._dot:
dot_attention = tf.reduce_sum(tf.expand_dims(
src, 0) * tf.expand_dims(tgt, 1), -1)
dot_attention /= self._div
attention = add_attention + dot_attention
mask = tf.expand_dims(mask, -1)
logits = attention + (mask - 1) * 10000.0
prob = tf.nn.softmax(logits, 1)
if len(h_shape) == 2:
prob = tf.squeeze(prob, axis=[0])
return prob
def map_target(self, tgt):
tgt_project = self._var['tgt_project']
tgt_bias = self._var['tgt_bias']
tgt = tf.tensordot(tgt, tgt_project, [[1], [0]]) + tgt_bias
return tgt
def get_att(self, src, prob):
'''
:param s: [src_sequence_length, batch_size, head, head_dim]
:param prob: [src_sequence_length, batch_size, head]\
or [tgt_sequence_length, src_sequence_length, batch_size, head]
:return: [batch_size, src_dim] or [tgt_sequence_length, batch_size, src_dim]
'''
buf = merge_last2_dim(tf.reduce_sum(
src * tf.expand_dims(prob, axis=-1), axis=-4))
return buf
class DotAttentionWrapper(RNNCell):
'''
A wrapper for DotAttention or MultiHeadAttention.
'''
def __init__(self, cell, attention,
src, mask, is_gated,
reuse=None, dropout=None,
keep_input=True, map_target=False):
super().__init__(self, _reuse=reuse)
assert isinstance(attention, (DotAttention, MultiHeadAttention)), \
'type of attention is not supported'
assert isinstance(cell, RNNCell), 'type of cell must be RNNCell'
self._attention = attention
self._src = src
self._mask = mask
self._pre_computed = None
self._is_gated = is_gated
self._cell = cell
self._dropout = dropout
self._keep_input = keep_input
self._map_target = map_target
@property
def state_size(self):
return self._cell.state_size
@property
def output_size(self):
return self._cell.output_size
def call(self, inputs, state):
if self._pre_computed is None:
self._pre_computed = self._attention.get_pre_compute(self._src)
att_prob = self._attention.get_prob(
src=self._src,
tgt=tf.concat([inputs, state], axis=1),
mask=self._mask,
pre_compute=self._pre_computed)
if isinstance(self._attention, DotAttention):
att = self._attention.get_att(self._src, att_prob)
else:
att = self._attention.get_att(self._pre_computed, att_prob)
x_list = [att]
if self._keep_input:
x_list.append(inputs)
if inputs.shape[1] == att.shape[1]:
x_list.append(inputs - att)
x_list.append(inputs * att)
if self._map_target and isinstance(self._attention, MultiHeadAttention):
tgt = self._attention.map_target(
tf.concat([inputs, state], axis=1))
x_list += [tgt, att-tgt, att*tgt]
x = tf.concat(x_list, axis=1)
dim = x.get_shape().as_list()[1]
assert dim is not None, 'dim must be defined'
if self._is_gated:
g = tf.get_variable('att_gate',
shape=[dim, dim],
dtype=tf.float32,
initializer=None)
bias_g = tf.get_variable(
'bias_gate', shape=[1, dim], dtype=tf.float32)
gate = tf.sigmoid(tf.matmul(x, g) + bias_g)
x = x * gate
if self._dropout is not None:
x = self._dropout(x)
return self._cell.call(x, state)
authorName: default
experimentName: example_ga_squad
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 1
#choice: local, remote
trainingServicePlatform: local
#choice: true, false
useAnnotation: false
tuner:
tunerCommand: python3 __main__.py
tunerCwd: /usr/share/nni/examples/tuners/ga_customer_tuner
trial:
trialCommand: python3 trial.py
trialCodeDir: /usr/share/nni/examples/trials/ga_squad
trialGpuNum: 0
\ No newline at end of file
# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge,
# to any person obtaining a copy of this software and associated
# documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
import csv
import json
from random import shuffle
import numpy as np
class WhitespaceTokenizer:
'''
Tokenizer for whitespace
'''
def tokenize(self, text):
'''
tokenize function in Tokenizer.
'''
start = -1
tokens = []
for i, character in enumerate(text):
if character == ' ' or character == '\t':
if start >= 0:
word = text[start:i]
tokens.append({
'word': word,
'original_text': word,
'char_begin': start,
'char_end': i})
start = -1
else:
if start < 0:
start = i
if start >= 0:
tokens.append({
'word': text[start:len(text)],
'original_text': text[start:len(text)],
'char_begin': start,
'char_end': len(text)
})
return tokens
def load_from_file(path, fmt=None, is_training=True):
'''
load data from file
'''
if fmt is None:
fmt = 'squad'
assert fmt in ['squad', 'csv'], 'input format must be squad or csv'
qp_pairs = []
if fmt == 'squad':
with open(path) as data_file:
data = json.load(data_file)['data']
for doc in data:
for paragraph in doc['paragraphs']:
passage = paragraph['context']
for qa in paragraph['qas']:
question = qa['question']
id = qa['id']
if not is_training:
qp_pairs.append(
{'passage': passage, 'question': question, 'id': id})
else:
for answer in qa['answers']:
answer_begin = int(answer['answer_start'])
answer_end = answer_begin + len(answer['text'])
qp_pairs.append({'passage': passage,
'question': question,
'id': id,
'answer_begin': answer_begin,
'answer_end': answer_end})
else:
with open(path, newline='') as csvfile:
reader = csv.reader(csvfile, delimiter='\t')
line_num = 0
for row in reader:
qp_pairs.append(
{'passage': row[1], 'question': row[0], 'id': line_num})
line_num += 1
return qp_pairs
def tokenize(qp_pair, tokenizer=None, is_training=False):
'''
tokenize function.
'''
question_tokens = tokenizer.tokenize(qp_pair['question'])
passage_tokens = tokenizer.tokenize(qp_pair['passage'])
if is_training:
question_tokens = question_tokens[:300]
passage_tokens = passage_tokens[:300]
passage_tokens.insert(
0, {'word': '<BOS>', 'original_text': '<BOS>', 'char_begin': 0, 'char_end': 0})
passage_tokens.append(
{'word': '<EOS>', 'original_text': '<EOS>', 'char_begin': 0, 'char_end': 0})
qp_pair['question_tokens'] = question_tokens
qp_pair['passage_tokens'] = passage_tokens
def collect_vocab(qp_pairs):
'''
Build the vocab from corpus.
'''
vocab = set()
for qp in qp_pairs:
for word in qp['question_tokens']:
vocab.add(word['word'])
for word in qp['passage_tokens']:
vocab.add(word['word'])
return vocab
def shuffle_step(l, step):
'''
Shuffle the step
'''
answer = []
for i in range(0, len(l), step):
sub = l[i:i+step]
shuffle(sub)
answer += sub
return answer
def get_batches(qp_pairs, batch_size, need_sort=True):
'''
Get batches data and shuffle.
'''
if need_sort:
qp_pairs = sorted(qp_pairs, key=lambda qp: (
len(qp['passage_tokens']), qp['id']), reverse=True)
batches = [{'qp_pairs': qp_pairs[i:(i + batch_size)]}
for i in range(0, len(qp_pairs), batch_size)]
shuffle(batches)
return batches
def get_char_input(data, char_dict, max_char_length):
'''
Get char input.
'''
batch_size = len(data)
sequence_length = max(len(d) for d in data)
char_id = np.zeros((max_char_length, sequence_length,
batch_size), dtype=np.int32)
char_lengths = np.zeros((sequence_length, batch_size), dtype=np.float32)
for b in range(0, min(len(data), batch_size)):
d = data[b]
for s in range(0, min(len(d), sequence_length)):
word = d[s]['word']
char_lengths[s, b] = min(len(word), max_char_length)
for i in range(0, min(len(word), max_char_length)):
char_id[i, s, b] = get_id(char_dict, word[i])
return char_id, char_lengths
def get_word_input(data, word_dict, embed, embed_dim):
'''
Get word input.
'''
batch_size = len(data)
max_sequence_length = max(len(d) for d in data)
sequence_length = max_sequence_length
t = np.zeros((max_sequence_length, batch_size,
embed_dim), dtype=np.float32)
ids = np.zeros((sequence_length, batch_size), dtype=np.int32)
masks = np.zeros((sequence_length, batch_size), dtype=np.float32)
lengths = np.zeros([batch_size], dtype=np.int32)
for b in range(0, min(len(data), batch_size)):
d = data[b]
lengths[b] = len(d)
for s in range(0, min(len(d), sequence_length)):
word = d[s]['word'].lower()
if word in word_dict.keys():
t[s, b] = embed[word_dict[word]]
ids[s, b] = word_dict[word]
masks[s, b] = 1
t = np.reshape(t, (-1, embed_dim))
return t, ids, masks, lengths
def get_word_index(tokens, char_index):
'''
Given word return word index.
'''
for (i, token) in enumerate(tokens):
if token['char_end'] == 0:
continue
if token['char_begin'] <= char_index and char_index <= token['char_end']:
return i
return 0
def get_answer_begin_end(data):
'''
Get answer's index of begin and end.
'''
begin = []
end = []
for qa_pair in data:
tokens = qa_pair['passage_tokens']
char_begin = qa_pair['answer_begin']
char_end = qa_pair['answer_end']
word_begin = get_word_index(tokens, char_begin)
word_end = get_word_index(tokens, char_end)
begin.append(word_begin)
end.append(word_end)
return np.asarray(begin), np.asarray(end)
def get_id(word_dict, word):
'''
Given word, return word id.
'''
if word in word_dict.keys():
return word_dict[word]
return word_dict['<unk>']
def get_buckets(min_length, max_length, bucket_count):
'''
Get bucket by length.
'''
if bucket_count <= 0:
return [max_length]
unit_length = int((max_length - min_length) // (bucket_count))
buckets = [min_length + unit_length *
(i + 1) for i in range(0, bucket_count)]
buckets[-1] = max_length
return buckets
def find_bucket(length, buckets):
'''
Find bucket.
'''
for bucket in buckets:
if length <= bucket:
return bucket
return buckets[-1]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment