"examples/git@developer.sourcefind.cn:OpenDAS/nni.git" did not exist on "fc7ddcd0c83febfbbae76bc5065e1e9d6cd8f8c3"
Unverified Commit b4773e1e authored by SparkSnail's avatar SparkSnail Committed by GitHub
Browse files

Merge pull request #240 from microsoft/master

mege master
parents 6c3148c7 d2c57770
# TextNAS
## Introduction
This is the implementation of the TextNAS algorithm proposed in the paper [TextNAS: A Neural Architecture Search Space tailored for Text Representation](https://arxiv.org/pdf/1912.10729.pdf). TextNAS is a neural architecture search algorithm tailored for text representation, more specifically, TextNAS is based on a novel search space consists of operators widely adopted to solve various NLP tasks, and TextNAS also supports multi-path ensemble within a single network to balance the width and depth of the architecture.
The search space of TextNAS contains:
* 1-D convolutional operator with filter size 1, 3, 5, 7
* recurrent operator (bi-directional GRU)
* self-attention operator
* pooling operator (max/average)
Following the ENAS algorithm, TextNAS also utilizes parameter sharing to accelerate the search speed and adopts a reinforcement-learning controller for the architecture sampling and generation. Please refer to the paper for more details of TextNAS.
## Preparation
Prepare the word vectors and SST dataset, and organize them in data directory as shown below:
```
textnas
├── data
│ ├── sst
│ │ └── trees
│ │ ├── dev.txt
│ │ ├── test.txt
│ │ └── train.txt
│ └── glove.840B.300d.txt
├── dataloader.py
├── model.py
├── ops.py
├── README.md
├── search.py
└── utils.py
```
The following link might be helpful for finding and downloading the corresponding dataset:
* [GloVe: Global Vectors for Word Representation](https://nlp.stanford.edu/projects/glove/)
* [glove.840B.300d.txt](http://nlp.stanford.edu/data/glove.840B.300d.zip)
* [Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank](https://nlp.stanford.edu/sentiment/)
* [trainDevTestTrees_PTB.zip](https://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip)
## Examples
### Search Space
[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/textnas)
```bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/textnas
# view more options for search
python3 search.py -h
```
After each search epoch, 10 sampled architectures will be tested directly. Their performances are expected to be 40% - 42% after 10 epochs.
By default, 20 sampled architectures will be exported into `checkpoints` directory for next step.
### retrain
```bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/textnas
# default to retrain on sst-2
sh run_retrain.sh
```
## Reference
TextNAS directly uses EnasTrainer, please refer to [ENAS](./ENAS.md) for the trainer APIs.
...@@ -26,5 +26,6 @@ For details, please refer to the following tutorials: ...@@ -26,5 +26,6 @@ For details, please refer to the following tutorials:
SPOS <NAS/SPOS> SPOS <NAS/SPOS>
CDARTS <NAS/CDARTS> CDARTS <NAS/CDARTS>
ProxylessNAS <NAS/Proxylessnas> ProxylessNAS <NAS/Proxylessnas>
TextNAS <NAS/TextNAS>
Customize a NAS Algorithm <NAS/Advanced> Customize a NAS Algorithm <NAS/Advanced>
API Reference <NAS/NasReference> API Reference <NAS/NasReference>
...@@ -42,4 +42,8 @@ By default, 20 sampled architectures will be exported into `checkpoints` directo ...@@ -42,4 +42,8 @@ By default, 20 sampled architectures will be exported into `checkpoints` directo
## Retrain ## Retrain
Not ready. ```
sh run_retrain.sh
```
By default, the script will retrain the architecture provided by the author on the SST-2 dataset.
{
"LayerChoice1": [
false, false, false, false, false, true, false, false
],
"InputChoice2": [
true
],
"LayerChoice3": [
false, false, false, false, false, false, false, true
],
"InputChoice4": [
false
],
"InputChoice5": [
true, false
],
"LayerChoice6": [
false, false, false, true, false, false, false, false
],
"InputChoice7": [
false, false
],
"InputChoice8": [
false, false, true
],
"LayerChoice9": [
false, false, false, false, false, false, true, false
],
"InputChoice10": [
false, true, true
],
"InputChoice11": [
false, false, true, false
],
"LayerChoice12": [
false, true, false, false, false, false, false, false
],
"InputChoice13": [
false, true, false, false
],
"InputChoice14": [
false, false, false, false, true
],
"LayerChoice15": [
false, true, false, false, false, false, false, false
],
"InputChoice16": [
false, false, true, false, true
],
"InputChoice17": [
false, false, false, false, true
],
"LayerChoice18": [
true, false, false, false, false, false, false, false
],
"InputChoice19": [
false, false, true, true, true, true
],
"InputChoice20": [
true, false, false, false, false
],
"LayerChoice21": [
false, false, false, false, false, false, true, false
],
"InputChoice22": [
false, true, true, false, false, false, false
],
"InputChoice23": [
false, true, false, false, false
],
"LayerChoice24": [
false, false, false, false, false, true, false, false
],
"InputChoice25": [
false, true, false, true, true, false, true, true
],
"InputChoice26": [
false, false, true, false, false
],
"LayerChoice27": [
false, false, false, false, false, true, false, false
],
"InputChoice28": [
false, false, false, false, false, true, false, true, true
],
"InputChoice29": [
true, false, false, false, false
],
"LayerChoice30": [
false, false, false, false, false, false, false, true
],
"InputChoice31": [
true, true, false, false, true, false, false, true, true, false
],
"InputChoice32": [
true, false, false, false, false
],
"LayerChoice33": [
false, false, false, false, true, false, false, false
],
"InputChoice34": [
true, false, false, true, true, true, true, false, false, false, false
],
"InputChoice35": [
false, false, false, true, false
],
"LayerChoice36": [
false, true, false, false, false, false, false, false
],
"InputChoice37": [
true, true, false, true, false, true, false, false, true, false, false, false
],
"InputChoice38": [
false, false, false, true, false
],
"LayerChoice39": [
false, false, true, false, false, false, false, false
],
"InputChoice40": [
true, true, false, false, false, false, true, false, false, true, true, false, true
],
"InputChoice41": [
false, false, false, true, false
],
"LayerChoice42": [
true, false, false, false, false, false, false, false
],
"InputChoice43": [
false, false, true, false, false, false, true, true, true, false, true, true, false, false
],
"InputChoice44": [
false, false, false, false, true
],
"LayerChoice45": [
false, false, false, true, false, false, false, false
],
"InputChoice46": [
true, false, false, false, false, false, true, false, false, false, true, true, false, false, true
],
"InputChoice47": [
false, false, false, true, false
],
"LayerChoice48": [
false, false, true, false, false, false, false, false
],
"InputChoice49": [
false, false, false, false, false, false, false, false, false, true, true, false, true, false, true, false
],
"InputChoice50": [
false, false, false, false, true
],
"LayerChoice51": [
false, false, false, false, true, false, false, false
],
"InputChoice52": [
false, true, true, true, true, false, false, true, false, true, false, false, false, false, true, false, false
],
"InputChoice53": [
false, false, true, false, false
],
"LayerChoice54": [
false, false, false, true, false, false, false, false
],
"InputChoice55": [
false, false, false, false, false, true, false, false, false, false, false, false, false, true, true, true, false, true
],
"InputChoice56": [
false, false, true, false, false
],
"LayerChoice57": [
false, false, false, true, false, false, false, false
],
"InputChoice58": [
false, false, false, true, false, false, false, false, false, false, true, false, false, false, true, false, false, false, false
],
"InputChoice59": [
false, true, false, false, false
],
"LayerChoice60": [
false, false, false, false, false, true, false, false
],
"InputChoice61": [
true, true, false, false, false, false, false, false, false, false, true, true, false, false, true, true, true, true, false, false
],
"InputChoice62": [
true, false, false, false, false
],
"LayerChoice63": [
false, false, false, false, false, false, false, true
],
"InputChoice64": [
false, true, true, true, false, false, false, true, false, true, true, true, true, false, true, false, false, false, false, false, false
],
"InputChoice65": [
false, false, false, false, true
],
"LayerChoice66": [
false, false, false, false, false, false, false, true
],
"InputChoice67": [
false, false, true, true, true, true, false, true, false, true, true, false, false, false, false, true, false, false, false, false, false, true
],
"InputChoice68": [
false, false, false, true, false
],
"LayerChoice69": [
false, false, false, true, false, false, false, false
],
"InputChoice70": [
true, false, false, true, false, false, false, true, false, false, false, false, true, false, false, false, true, false, false, false, false, false, false
]
}
...@@ -4,7 +4,7 @@ ...@@ -4,7 +4,7 @@
export PYTHONPATH="$(pwd)" export PYTHONPATH="$(pwd)"
export CUDA_VISIBLE_DEVICES=0 export CUDA_VISIBLE_DEVICES=0
python -u retrain.py \ python3 -u retrain.py \
--train_ratio=1.0 \ --train_ratio=1.0 \
--valid_ratio=1.0 \ --valid_ratio=1.0 \
--min_count=1 \ --min_count=1 \
...@@ -36,6 +36,6 @@ python -u retrain.py \ ...@@ -36,6 +36,6 @@ python -u retrain.py \
--child_lr_T_0=10 \ --child_lr_T_0=10 \
--child_lr_T_mul=2 \ --child_lr_T_mul=2 \
--multi_path=True \ --multi_path=True \
--child_fixed_arc="./checkpoints/architecture_00.json" \ --child_fixed_arc="./arc/final_arc.json" \
--fixed_seed=True \ --fixed_seed=True \
"$@" "$@"
...@@ -155,8 +155,8 @@ def get_params(): ...@@ -155,8 +155,8 @@ def get_params():
help='learning rate (default: 0.01)') help='learning rate (default: 0.01)')
parser.add_argument('--momentum', type=float, default=0.5, metavar='M', parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
help='SGD momentum (default: 0.5)') help='SGD momentum (default: 0.5)')
parser.add_argument('--epochs', type=int, default=10, metavar='N', parser.add_argument('--epochs', type=int, default=1, metavar='N',
help='number of epochs to train (default: 10)') help='number of epochs to train (default: 1)')
parser.add_argument('--seed', type=int, default=1, metavar='S', parser.add_argument('--seed', type=int, default=1, metavar='S',
help='random seed (default: 1)') help='random seed (default: 1)')
parser.add_argument('--no_cuda', action='store_true', default=False, parser.add_argument('--no_cuda', action='store_true', default=False,
......
...@@ -4,9 +4,11 @@ ...@@ -4,9 +4,11 @@
import copy import copy
import logging import logging
import os import os
import random
import numpy as np import numpy as np
import nni import nni
import nni.parameter_expressions
from nni.tuner import Tuner from nni.tuner import Tuner
from nni.utils import OptimizeMode, extract_scalar_reward, split_index, json2parameter, json2space from nni.utils import OptimizeMode, extract_scalar_reward, split_index, json2parameter, json2space
...@@ -14,7 +16,42 @@ from nni.utils import OptimizeMode, extract_scalar_reward, split_index, json2par ...@@ -14,7 +16,42 @@ from nni.utils import OptimizeMode, extract_scalar_reward, split_index, json2par
logger = logging.getLogger('pbt_tuner_AutoML') logger = logging.getLogger('pbt_tuner_AutoML')
def exploit_and_explore(bot_trial_info, top_trial_info, factors, epoch, search_space): def perturbation(hyperparameter_type, value, resample_probablity, uv, ub, lv, lb, random_state):
"""
Perturbation for hyperparameters
Parameters
----------
hyperparameter_type : str
type of hyperparameter
value : list
parameters for sampling hyperparameter
resample_probability : float
probability for resampling
uv : float/int
upper value after perturbation
ub : float/int
upper bound
lv : float/int
lower value after perturbation
lb : float/int
lower bound
random_state : RandomState
random state
"""
if random.random() < resample_probablity:
if hyperparameter_type == "choice":
return value.index(nni.parameter_expressions.choice(value, random_state))
else:
return getattr(nni.parameter_expressions, hyperparameter_type)(*(value + [random_state]))
else:
if random.random() > 0.5:
return min(uv, ub)
else:
return max(lv, lb)
def exploit_and_explore(bot_trial_info, top_trial_info, factor, resample_probability, epoch, search_space):
""" """
Replace checkpoint of bot_trial with top, and perturb hyperparameters Replace checkpoint of bot_trial with top, and perturb hyperparameters
...@@ -24,8 +61,10 @@ def exploit_and_explore(bot_trial_info, top_trial_info, factors, epoch, search_s ...@@ -24,8 +61,10 @@ def exploit_and_explore(bot_trial_info, top_trial_info, factors, epoch, search_s
bottom model whose parameters should be replaced bottom model whose parameters should be replaced
top_trial_info : TrialInfo top_trial_info : TrialInfo
better model better model
factors : float factor : float
factors for perturbation factor for perturbation
resample_probability : float
probability for resampling
epoch : int epoch : int
step of PBTTuner step of PBTTuner
search_space : dict search_space : dict
...@@ -34,21 +73,72 @@ def exploit_and_explore(bot_trial_info, top_trial_info, factors, epoch, search_s ...@@ -34,21 +73,72 @@ def exploit_and_explore(bot_trial_info, top_trial_info, factors, epoch, search_s
bot_checkpoint_dir = bot_trial_info.checkpoint_dir bot_checkpoint_dir = bot_trial_info.checkpoint_dir
top_hyper_parameters = top_trial_info.hyper_parameters top_hyper_parameters = top_trial_info.hyper_parameters
hyper_parameters = copy.deepcopy(top_hyper_parameters) hyper_parameters = copy.deepcopy(top_hyper_parameters)
# TODO think about different type of hyperparameters for 1.perturbation 2.within search space random_state = np.random.RandomState()
for key in hyper_parameters.keys(): for key in hyper_parameters.keys():
hyper_parameter = hyper_parameters[key]
if key == 'load_checkpoint_dir': if key == 'load_checkpoint_dir':
hyper_parameters[key] = hyper_parameters['save_checkpoint_dir'] hyper_parameters[key] = hyper_parameters['save_checkpoint_dir']
continue
elif key == 'save_checkpoint_dir': elif key == 'save_checkpoint_dir':
hyper_parameters[key] = os.path.join(bot_checkpoint_dir, str(epoch)) hyper_parameters[key] = os.path.join(bot_checkpoint_dir, str(epoch))
elif isinstance(hyper_parameters[key], float): continue
perturb = np.random.choice(factors) elif search_space[key]["_type"] == "choice":
val = hyper_parameters[key] * perturb choices = search_space[key]["_value"]
ub, uv = len(choices) - 1, choices.index(hyper_parameter["_value"]) + 1
lb, lv = 0, choices.index(hyper_parameter["_value"]) - 1
elif search_space[key]["_type"] == "randint":
lb, ub = search_space[key]["_value"][:2] lb, ub = search_space[key]["_value"][:2]
if search_space[key]["_type"] in ("uniform", "normal"): ub -= 1
val = np.clip(val, lb, ub).item() uv = hyper_parameter + 1
hyper_parameters[key] = val lv = hyper_parameter - 1
elif search_space[key]["_type"] == "uniform":
lb, ub = search_space[key]["_value"][:2]
perturb = (ub - lb) * factor
uv = hyper_parameter + perturb
lv = hyper_parameter - perturb
elif search_space[key]["_type"] == "quniform":
lb, ub, q = search_space[key]["_value"][:3]
multi = round(hyper_parameter / q)
uv = (multi + 1) * q
lv = (multi - 1) * q
elif search_space[key]["_type"] == "loguniform":
lb, ub = search_space[key]["_value"][:2]
perturb = (np.log(ub) - np.log(lb)) * factor
uv = np.exp(min(np.log(hyper_parameter) + perturb, np.log(ub)))
lv = np.exp(max(np.log(hyper_parameter) - perturb, np.log(lb)))
elif search_space[key]["_type"] == "qloguniform":
lb, ub, q = search_space[key]["_value"][:3]
multi = round(hyper_parameter / q)
uv = (multi + 1) * q
lv = (multi - 1) * q
elif search_space[key]["_type"] == "normal":
sigma = search_space[key]["_value"][1]
perturb = sigma * factor
uv = ub = hyper_parameter + perturb
lv = lb = hyper_parameter - perturb
elif search_space[key]["_type"] == "qnormal":
q = search_space[key]["_value"][2]
uv = ub = hyper_parameter + q
lv = lb = hyper_parameter - q
elif search_space[key]["_type"] == "lognormal":
sigma = search_space[key]["_value"][1]
perturb = sigma * factor
uv = ub = np.exp(np.log(hyper_parameter) + perturb)
lv = lb = np.exp(np.log(hyper_parameter) - perturb)
elif search_space[key]["_type"] == "qlognormal":
q = search_space[key]["_value"][2]
uv = ub = hyper_parameter + q
lv, lb = hyper_parameter - q, 1E-10
else: else:
logger.warning("Illegal type to perturb: %s", search_space[key]["_type"])
continue continue
if search_space[key]["_type"] == "choice":
idx = perturbation(search_space[key]["_type"], search_space[key]["_value"],
resample_probability, uv, ub, lv, lb, random_state)
hyper_parameters[key] = {'_index': idx, '_value': choices[idx]}
else:
hyper_parameters[key] = perturbation(search_space[key]["_type"], search_space[key]["_value"],
resample_probability, uv, ub, lv, lb, random_state)
bot_trial_info.hyper_parameters = hyper_parameters bot_trial_info.hyper_parameters = hyper_parameters
bot_trial_info.clean_id() bot_trial_info.clean_id()
...@@ -70,7 +160,8 @@ class TrialInfo: ...@@ -70,7 +160,8 @@ class TrialInfo:
class PBTTuner(Tuner): class PBTTuner(Tuner):
def __init__(self, optimize_mode="maximize", all_checkpoint_dir=None, population_size=10, factors=(1.2, 0.8), fraction=0.2): def __init__(self, optimize_mode="maximize", all_checkpoint_dir=None, population_size=10, factor=0.2,
resample_probability=0.25, fraction=0.2):
""" """
Initialization Initialization
...@@ -82,8 +173,10 @@ class PBTTuner(Tuner): ...@@ -82,8 +173,10 @@ class PBTTuner(Tuner):
directory to store training model checkpoint directory to store training model checkpoint
population_size : int population_size : int
number of trials for each epoch number of trials for each epoch
factors : tuple factor : float
factors for perturbation factor for perturbation
resample_probability : float
probability for resampling
fraction : float fraction : float
fraction for selecting bottom and top trials fraction for selecting bottom and top trials
""" """
...@@ -93,7 +186,8 @@ class PBTTuner(Tuner): ...@@ -93,7 +186,8 @@ class PBTTuner(Tuner):
logger.info("Checkpoint dir is set to %s by default.", all_checkpoint_dir) logger.info("Checkpoint dir is set to %s by default.", all_checkpoint_dir)
self.all_checkpoint_dir = all_checkpoint_dir self.all_checkpoint_dir = all_checkpoint_dir
self.population_size = population_size self.population_size = population_size
self.factors = factors self.factor = factor
self.resample_probability = resample_probability
self.fraction = fraction self.fraction = fraction
# defined in trial code # defined in trial code
#self.perturbation_interval = perturbation_interval #self.perturbation_interval = perturbation_interval
...@@ -237,7 +331,7 @@ class PBTTuner(Tuner): ...@@ -237,7 +331,7 @@ class PBTTuner(Tuner):
bottoms = self.finished[self.finished_trials - cutoff:] bottoms = self.finished[self.finished_trials - cutoff:]
for bottom in bottoms: for bottom in bottoms:
top = np.random.choice(tops) top = np.random.choice(tops)
exploit_and_explore(bottom, top, self.factors, self.epoch, self.searchspace_json) exploit_and_explore(bottom, top, self.factor, self.resample_probability, self.epoch, self.searchspace_json)
for trial in self.finished: for trial in self.finished:
if trial not in bottoms: if trial not in bottoms:
trial.clean_id() trial.clean_id()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment