Unverified Commit 2a0fdd3d authored by fishyds's avatar fishyds Committed by GitHub
Browse files

Merge pull request #1154 from xuehui1991/merge_v0.8

Merge v0.8 back to master
parents 4afe1670 bb5daabe
......@@ -17,7 +17,7 @@
NNI (Neural Network Intelligence) is a toolkit to help users run automated machine learning (AutoML) experiments.
The tool dispatches and runs trial jobs generated by tuning algorithms to search the best neural architecture and/or hyper-parameters in different environments like local machine, remote servers and cloud.
### **NNI [v0.7](https://github.com/Microsoft/nni/releases) has been released!**
### **NNI [v0.8](https://github.com/Microsoft/nni/releases) has been released!**
<p align="center">
<a href="#nni-v05-has-been-released"><img src="docs/img/overview.svg" /></a>
</p>
......
# General Programming Interface for Neural Architecture Search
# General Programming Interface for Neural Architecture Search (experimental feature)
_*This is an experimental feature, currently, we only implemented the general NAS programming interface. Weight sharing and one-shot NAS based on this programming interface will be supported in the following releases._
Automatic neural architecture search is taking an increasingly important role on finding better models. Recent research works have proved the feasibility of automatic NAS, and also found some models that could beat manually designed and tuned models. Some of representative works are [NASNet][2], [ENAS][1], [DARTS][3], [Network Morphism][4], and [Evolution][5]. There are new innovations keeping emerging. However, it takes great efforts to implement those algorithms, and it is hard to reuse code base of one algorithm for implementing another.
......@@ -24,6 +26,8 @@ When designing the following model there might be several choices in the fourth
There are two ways to write annotation for this example. For the upper one, input of the function calls is `[[],[out3]]`. For the bottom one, input is `[[out3],[]]`.
__Debugging__: We provided an `nnictl trial codegen` command to help debugging your code of NAS programming on NNI. If your trial with trial_id `XXX` in your experiment `YYY` is failed, you could run `nnictl trial codegen YYY --trial_id XXX` to generate an executable code for this trial under your current directory. With this code, you can directly run the trial command without NNI to check why this trial is failed. Basically, this command is to compile your trial code and replace the NNI NAS code with the real chosen layers and inputs.
### Example: choose input connections for a layer
Designing connections of layers is critical for making a high performance model. With our provided interface, users could annotate which connections a layer takes (as inputs). They could choose several ones from a set of connections. Below is an example which chooses two inputs from three candidate inputs for `concat`. Here `concat` always takes the output of its previous layer using `fixed_inputs`.
......@@ -89,11 +93,11 @@ NNI's annotation compiler transforms the annotated trial code to the code that c
![](../img/nas_on_nni.png)
The above figure shows how the trial code runs on NNI. `nnictl` processes user trial code to generate a search space file and compiled trial code. The former is fed to tuner, and the latter is used to run trials.
The above figure shows how the trial code runs on NNI. `nnictl` processes user trial code to generate a search space file and compiled trial code. The former is fed to tuner, and the latter is used to run trials.
[__TODO__] Simple example of NAS on NNI.
[Simple example of NAS on NNI](https://github.com/microsoft/nni/tree/v0.8/examples/trials/mnist-nas).
### Weight sharing
### [__TODO__] Weight sharing
Sharing weights among chosen architectures (i.e., trials) could speedup model search. For example, properly inheriting weights of completed trials could speedup the converge of new trials. One-Shot NAS (e.g., ENAS, Darts) is more aggressive, the training of different architectures (i.e., subgraphs) shares the same copy of the weights in full graph.
......@@ -101,9 +105,9 @@ Sharing weights among chosen architectures (i.e., trials) could speedup model se
We believe weight sharing (transferring) plays a key role on speeding up NAS, while finding efficient ways of sharing weights is still a hot research topic. We provide a key-value store for users to store and load weights. Tuners and Trials use a provided KV client lib to access the storage.
[__TODO__] Example of weight sharing on NNI.
Example of weight sharing on NNI.
### Support of One-Shot NAS
### [__TODO__] Support of One-Shot NAS
One-Shot NAS is a popular approach to find good neural architecture within a limited time and resource budget. Basically, it builds a full graph based on the search space, and uses gradient descent to at last find the best subgraph. There are different training approaches, such as [training subgraphs (per mini-batch)][1], [training full graph through dropout][6], [training with architecture weights (regularization)][3]. Here we focus on the first approach, i.e., training subgraphs (ENAS).
......@@ -113,18 +117,18 @@ With the same annotated trial code, users could choose One-Shot NAS as execution
The design of One-Shot NAS on NNI is shown in the above figure. One-Shot NAS usually only has one trial job with full graph. NNI supports running multiple such trial jobs each of which runs independently. As One-Shot NAS is not stable, running multiple instances helps find better model. Moreover, trial jobs are also able to synchronize weights during running (i.e., there is only one copy of weights, like asynchronous parameter-server mode). This may speedup converge.
[__TODO__] Example of One-Shot NAS on NNI.
Example of One-Shot NAS on NNI.
## General tuning algorithms for NAS
## [__TODO__] General tuning algorithms for NAS
Like hyperparameter tuning, a relatively general algorithm for NAS is required. The general programming interface makes this task easier to some extent. We have a RL-based tuner algorithm for NAS from our contributors. We expect efforts from community to design and implement better NAS algorithms.
[__TODO__] More tuning algorithms for NAS.
More tuning algorithms for NAS.
## Export best neural architecture and code
## [__TODO__] Export best neural architecture and code
[__TODO__] After the NNI experiment is done, users could run `nnictl experiment export --code` to export the trial code with the best neural architecture.
After the NNI experiment is done, users could run `nnictl experiment export --code` to export the trial code with the best neural architecture.
## Conclusion and Future work
......
# ChangeLog
# Release 0.8 - 6/4/2019
## Major Features
* [Support NNI on Windows for PAI/Remote mode]
* NNI running on windows for remote mode
* NNI running on windows for PAI mode
* [Advanced features for using GPU]
* Run multiple trial jobs on the same GPU for local and remote mode
* Run trial jobs on the GPU running non-NNI jobs
* [Kubeflow v1beta2 operator]
* Support Kubeflow TFJob/PyTorchJob v1beta2
* [General NAS programming interface](./GeneralNasInterfaces.md)
* Provide NAS programming interface for users to easily express their neural architecture search space through NNI annotation
* Provide a new command `nnictl trial codegen` for debugging the NAS code
* Tutorial of NAS programming interface, example of NAS on mnist, customized random tuner for NAS
* [Support resume tuner/advisor's state for experiment resume]
* For experiment resume, tuner/advisor will be resumed by replaying finished trial data
* [Web Portal]
* Improve the design of copying trial's parameters
* Support 'randint' type in hyper-parameter graph
* Use should ComponentUpdate to avoid unnecessary render
## Bug fix and other changes
* [Bug fix that `nnictl update` has inconsistent command styles]
* [Support import data for SMAC tuner]
* [Bug fix that experiment state transition from ERROR back to RUNNING]
* [Fix bug of table entries]
* [Nested search space refinement]
* [Refine 'randint' type and support lower bound]
* [Comparison of different hyper-parameter tuning algorithm](./CommunitySharings/HpoComparision.md)
* [Comparison of NAS algorithm](./CommunitySharings/NasComparision.md)
* [NNI practice on Recommenders](./CommunitySharings/NniPracticeSharing/RecommendersSvd.md)
## Release 0.7 - 4/29/2018
### Major Features
......
......@@ -3,4 +3,5 @@ Advanced Features
.. toctree::
MultiPhase<MultiPhase>
AdvancedNas<AdvancedNas>
\ No newline at end of file
AdvancedNas<AdvancedNas>
NAS Programming Interface<GeneralNasInterfaces>
\ No newline at end of file
authorName: default
experimentName: example_mnist
trialConcurrency: 1
maxExecDuration: 1h
maxTrialNum: 10
#choice: local, remote, pai
trainingServicePlatform: local
#choice: true, false
useAnnotation: true
tuner:
#choice: TPE, Random, Anneal, Evolution, BatchTuner, MetisTuner
#SMAC (SMAC should be installed through nnictl)
#codeDir: ~/nni/nni/examples/tuners/random_nas_tuner
codeDir: ../../tuners/random_nas_tuner
classFileName: random_nas_tuner.py
className: RandomNASTuner
trial:
command: python3 mnist.py
codeDir: .
gpuNum: 0
"""A deep MNIST classifier using convolutional layers."""
import argparse
import logging
import math
import tempfile
import time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import operators as op
FLAGS = None
logger = logging.getLogger('mnist_AutoML')
class MnistNetwork(object):
'''
MnistNetwork is for initializing and building basic network for mnist.
'''
def __init__(self,
channel_1_num,
channel_2_num,
conv_size,
hidden_size,
pool_size,
learning_rate,
x_dim=784,
y_dim=10):
self.channel_1_num = channel_1_num
self.channel_2_num = channel_2_num
self.conv_size = conv_size
self.hidden_size = hidden_size
self.pool_size = pool_size
self.learning_rate = learning_rate
self.x_dim = x_dim
self.y_dim = y_dim
self.images = tf.placeholder(tf.float32, [None, self.x_dim], name='input_x')
self.labels = tf.placeholder(tf.float32, [None, self.y_dim], name='input_y')
self.keep_prob = tf.placeholder(tf.float32, name='keep_prob')
self.train_step = None
self.accuracy = None
def build_network(self):
'''
Building network for mnist, meanwhile specifying its neural architecture search space
'''
# Reshape to use within a convolutional neural net.
# Last dimension is for "features" - there is only one here, since images are
# grayscale -- it would be 3 for an RGB image, 4 for RGBA, etc.
with tf.name_scope('reshape'):
try:
input_dim = int(math.sqrt(self.x_dim))
except:
print(
'input dim cannot be sqrt and reshape. input dim: ' + str(self.x_dim))
logger.debug(
'input dim cannot be sqrt and reshape. input dim: %s', str(self.x_dim))
raise
x_image = tf.reshape(self.images, [-1, input_dim, input_dim, 1])
"""@nni.mutable_layers(
{
layer_choice: [op.conv2d(size=1, in_ch=1, out_ch=self.channel_1_num),
op.conv2d(size=3, in_ch=1, out_ch=self.channel_1_num),
op.twice_conv2d(size=3, in_ch=1, out_ch=self.channel_1_num),
op.twice_conv2d(size=7, in_ch=1, out_ch=self.channel_1_num),
op.dilated_conv(in_ch=1, out_ch=self.channel_1_num),
op.separable_conv(size=3, in_ch=1, out_ch=self.channel_1_num),
op.separable_conv(size=5, in_ch=1, out_ch=self.channel_1_num),
op.separable_conv(size=7, in_ch=1, out_ch=self.channel_1_num)],
fixed_inputs: [x_image],
layer_output: conv1_out
},
{
layer_choice: [op.post_process(ch_size=self.channel_1_num)],
fixed_inputs: [conv1_out],
layer_output: post1_out
},
{
layer_choice: [op.max_pool(size=3),
op.max_pool(size=5),
op.max_pool(size=7),
op.avg_pool(size=3),
op.avg_pool(size=5),
op.avg_pool(size=7)],
fixed_inputs: [post1_out],
layer_output: pool1_out
},
{
layer_choice: [op.conv2d(size=1, in_ch=self.channel_1_num, out_ch=self.channel_2_num),
op.conv2d(size=3, in_ch=self.channel_1_num, out_ch=self.channel_2_num),
op.twice_conv2d(size=3, in_ch=self.channel_1_num, out_ch=self.channel_2_num),
op.twice_conv2d(size=7, in_ch=self.channel_1_num, out_ch=self.channel_2_num),
op.dilated_conv(in_ch=self.channel_1_num, out_ch=self.channel_2_num),
op.separable_conv(size=3, in_ch=self.channel_1_num, out_ch=self.channel_2_num),
op.separable_conv(size=5, in_ch=self.channel_1_num, out_ch=self.channel_2_num),
op.separable_conv(size=7, in_ch=self.channel_1_num, out_ch=self.channel_2_num)],
fixed_inputs: [pool1_out],
optional_inputs: [post1_out],
optional_input_size: [0, 1],
layer_output: conv2_out
},
{
layer_choice: [op.post_process(ch_size=self.channel_2_num)],
fixed_inputs: [conv2_out],
layer_output: post2_out
},
{
layer_choice: [op.max_pool(size=3),
op.max_pool(size=5),
op.max_pool(size=7),
op.avg_pool(size=3),
op.avg_pool(size=5),
op.avg_pool(size=7)],
fixed_inputs: [post2_out],
optional_inputs: [post1_out, pool1_out],
optional_input_size: [0, 1],
layer_output: pool2_out
}
)"""
# Fully connected layer 1 -- after 2 round of downsampling, our 28x28 image
# is down to 7x7x64 feature maps -- maps this to 1024 features.
last_dim_list = pool2_out.get_shape().as_list()
assert(last_dim_list[1] == last_dim_list[2])
last_dim = last_dim_list[1]
with tf.name_scope('fc1'):
w_fc1 = op.weight_variable(
[last_dim * last_dim * self.channel_2_num, self.hidden_size])
b_fc1 = op.bias_variable([self.hidden_size])
h_pool2_flat = tf.reshape(
pool2_out, [-1, last_dim * last_dim * self.channel_2_num])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)
# Dropout - controls the complexity of the model, prevents co-adaptation of features.
with tf.name_scope('dropout'):
h_fc1_drop = tf.nn.dropout(h_fc1, self.keep_prob)
# Map the 1024 features to 10 classes, one for each digit
with tf.name_scope('fc2'):
w_fc2 = op.weight_variable([self.hidden_size, self.y_dim])
b_fc2 = op.bias_variable([self.y_dim])
y_conv = tf.matmul(h_fc1_drop, w_fc2) + b_fc2
with tf.name_scope('loss'):
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=self.labels, logits=y_conv))
with tf.name_scope('adam_optimizer'):
self.train_step = tf.train.AdamOptimizer(
self.learning_rate).minimize(cross_entropy)
with tf.name_scope('accuracy'):
correct_prediction = tf.equal(
tf.argmax(y_conv, 1), tf.argmax(self.labels, 1))
self.accuracy = tf.reduce_mean(
tf.cast(correct_prediction, tf.float32))
def download_mnist_retry(data_dir, max_num_retries=20):
"""Try to download mnist dataset and avoid errors"""
for _ in range(max_num_retries):
try:
return input_data.read_data_sets(data_dir, one_hot=True)
except tf.errors.AlreadyExistsError:
time.sleep(1)
raise Exception("Failed to download MNIST.")
def main(params):
'''
Main function, build mnist network, run and send result to NNI.
'''
# Import data
mnist = download_mnist_retry(params['data_dir'])
print('Mnist download data done.')
logger.debug('Mnist download data done.')
# Create the model
# Build the graph for the deep net
mnist_network = MnistNetwork(channel_1_num=params['channel_1_num'],
channel_2_num=params['channel_2_num'],
conv_size=params['conv_size'],
hidden_size=params['hidden_size'],
pool_size=params['pool_size'],
learning_rate=params['learning_rate'])
mnist_network.build_network()
logger.debug('Mnist build network done.')
# Write log
graph_location = tempfile.mkdtemp()
logger.debug('Saving graph to: %s', graph_location)
train_writer = tf.summary.FileWriter(graph_location)
train_writer.add_graph(tf.get_default_graph())
test_acc = 0.0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for i in range(params['batch_num']):
batch = mnist.train.next_batch(params['batch_size'])
mnist_network.train_step.run(feed_dict={mnist_network.images: batch[0],
mnist_network.labels: batch[1],
mnist_network.keep_prob: 1 - params['dropout_rate']}
)
if i % 100 == 0:
test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
mnist_network.keep_prob: 1.0})
"""@nni.report_intermediate_result(test_acc)"""
logger.debug('test accuracy %g', test_acc)
logger.debug('Pipe send intermediate result done.')
test_acc = mnist_network.accuracy.eval(
feed_dict={mnist_network.images: mnist.test.images,
mnist_network.labels: mnist.test.labels,
mnist_network.keep_prob: 1.0})
"""@nni.report_final_result(test_acc)"""
logger.debug('Final result is %g', test_acc)
logger.debug('Send final result done.')
def get_params():
''' Get parameters from command line '''
parser = argparse.ArgumentParser()
parser.add_argument("--data_dir", type=str, default='/tmp/tensorflow/mnist/input_data', help="data directory")
parser.add_argument("--dropout_rate", type=float, default=0.5, help="dropout rate")
parser.add_argument("--channel_1_num", type=int, default=32)
parser.add_argument("--channel_2_num", type=int, default=64)
parser.add_argument("--conv_size", type=int, default=5)
parser.add_argument("--pool_size", type=int, default=2)
parser.add_argument("--hidden_size", type=int, default=1024)
parser.add_argument("--learning_rate", type=float, default=1e-4)
parser.add_argument("--batch_num", type=int, default=2000)
parser.add_argument("--batch_size", type=int, default=32)
args, _ = parser.parse_known_args()
return args
if __name__ == '__main__':
try:
params = vars(get_params())
main(params)
except Exception as exception:
logger.exception(exception)
raise
import tensorflow as tf
import math
def weight_variable(shape):
"""weight_variable generates a weight variable of a given shape."""
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
"""bias_variable generates a bias variable of a given shape."""
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def sum_op(inputs):
"""sum_op"""
fixed_input = inputs[0][0]
optional_input = inputs[1][0]
fixed_shape = fixed_input.get_shape().as_list()
optional_shape = optional_input.get_shape().as_list()
assert fixed_shape[1] == fixed_shape[2]
assert optional_shape[1] == optional_shape[2]
pool_size = math.ceil(optional_shape[1] / fixed_shape[1])
pool_out = tf.nn.avg_pool(optional_input, ksize=[1, pool_size, pool_size, 1], strides=[1, pool_size, pool_size, 1], padding='SAME')
conv_matrix = weight_variable([1, 1, optional_shape[3], fixed_shape[3]])
conv_out = tf.nn.conv2d(pool_out, conv_matrix, strides=[1, 1, 1, 1], padding='SAME')
return fixed_input + conv_out
def conv2d(inputs, size=-1, in_ch=-1, out_ch=-1):
"""conv2d returns a 2d convolution layer with full stride."""
if not inputs[1]:
x_input = inputs[0][0]
else:
x_input = sum_op(inputs)
if size in [1, 3]:
w_matrix = weight_variable([size, size, in_ch, out_ch])
return tf.nn.conv2d(x_input, w_matrix, strides=[1, 1, 1, 1], padding='SAME')
else:
raise Exception("Unknown filter size: %d." % size)
def twice_conv2d(inputs, size=-1, in_ch=-1, out_ch=-1):
"""twice_conv2d"""
if not inputs[1]:
x_input = inputs[0][0]
else:
x_input = sum_op(inputs)
if size in [3, 7]:
w_matrix1 = weight_variable([1, size, in_ch, int(out_ch/2)])
out = tf.nn.conv2d(x_input, w_matrix1, strides=[1, 1, 1, 1], padding='SAME')
w_matrix2 = weight_variable([size, 1, int(out_ch/2), out_ch])
return tf.nn.conv2d(out, w_matrix2, strides=[1, 1, 1, 1], padding='SAME')
else:
raise Exception("Unknown filter size: %d." % size)
def dilated_conv(inputs, size=3, in_ch=-1, out_ch=-1):
"""dilated_conv"""
if not inputs[1]:
x_input = inputs[0][0]
else:
x_input = sum_op(inputs)
if size == 3:
w_matrix = weight_variable([size, size, in_ch, out_ch])
return tf.nn.atrous_conv2d(x_input, w_matrix, rate=2, padding='SAME')
else:
raise Exception("Unknown filter size: %d." % size)
def separable_conv(inputs, size=-1, in_ch=-1, out_ch=-1):
"""separable_conv"""
if not inputs[1]:
x_input = inputs[0][0]
else:
x_input = sum_op(inputs)
if size in [3, 5, 7]:
depth_matrix = weight_variable([size, size, in_ch, 1])
point_matrix = weight_variable([1, 1, 1*in_ch, out_ch])
return tf.nn.separable_conv2d(x_input, depth_matrix, point_matrix, strides=[1, 1, 1, 1], padding='SAME')
else:
raise Exception("Unknown filter size: %d." % size)
def avg_pool(inputs, size=-1):
"""avg_pool downsamples a feature map."""
if not inputs[1]:
x_input = inputs[0][0]
else:
x_input = sum_op(inputs)
if size in [3, 5, 7]:
return tf.nn.avg_pool(x_input, ksize=[1, size, size, 1], strides=[1, size, size, 1], padding='SAME')
else:
raise Exception("Unknown filter size: %d." % size)
def max_pool(inputs, size=-1):
"""max_pool downsamples a feature map."""
if not inputs[1]:
x_input = inputs[0][0]
else:
x_input = sum_op(inputs)
if size in [3, 5, 7]:
return tf.nn.max_pool(x_input, ksize=[1, size, size, 1], strides=[1, size, size, 1], padding='SAME')
else:
raise Exception("Unknown filter size: %d." % size)
def post_process(inputs, ch_size=-1):
"""post_process"""
x_input = inputs[0][0]
bias_matrix = bias_variable([ch_size])
return tf.nn.relu(x_input + bias_matrix)
**Run Neural Network Architecture Search in NNI**
===
Now we have an NAS example [NNI-NAS-Example](https://github.com/Crysple/NNI-NAS-Example) run in NNI using NAS interface from our contributors.
Thanks our lovely contributors.
And welcome more and more people to join us!
**Run Neural Network Architecture Search in NNI**
===
Now we have an NAS example [NNI-NAS-Example](https://github.com/Crysple/NNI-NAS-Example) run in NNI using NAS interface from our contributors.
Thanks our lovely contributors.
And welcome more and more people to join us!
\ No newline at end of file
import numpy as np
from nni.tuner import Tuner
def random_archi_generator(nas_ss, random_state):
'''random
'''
chosen_archi = {}
print("zql: nas search space: ", nas_ss)
for block_name, block in nas_ss.items():
tmp_block = {}
for layer_name, layer in block.items():
tmp_layer = {}
for key, value in layer.items():
if key == 'layer_choice':
index = random_state.randint(len(value))
tmp_layer['chosen_layer'] = value[index]
elif key == 'optional_inputs':
tmp_layer['chosen_inputs'] = []
print("zql: optional_inputs", layer['optional_inputs'])
if layer['optional_inputs']:
if isinstance(layer['optional_input_size'], int):
choice_num = layer['optional_input_size']
else:
choice_range = layer['optional_input_size']
choice_num = random_state.randint(choice_range[0], choice_range[1]+1)
for _ in range(choice_num):
index = random_state.randint(len(layer['optional_inputs']))
tmp_layer['chosen_inputs'].append(layer['optional_inputs'][index])
elif key == 'optional_input_size':
pass
else:
raise ValueError('Unknown field %s in layer %s of block %s' % (key, layer_name, block_name))
tmp_block[layer_name] = tmp_layer
chosen_archi[block_name] = tmp_block
return chosen_archi
class RandomNASTuner(Tuner):
'''RandomNASTuner
'''
def __init__(self):
self.searchspace_json = None
self.random_state = None
def update_search_space(self, search_space):
'''update
'''
self.searchspace_json = search_space
self.random_state = np.random.RandomState()
def generate_parameters(self, parameter_id):
'''generate
'''
return random_archi_generator(self.searchspace_json, self.random_state)
def receive_trial_result(self, parameter_id, parameters, value):
'''receive
'''
pass
......@@ -203,7 +203,7 @@ class NNIRestHandler {
res.send();
} catch (err) {
// setClusterMetata is a step of initialization, so any exception thrown is a fatal
this.handle_error(err, res, true);
this.handle_error(NNIError.FromError(err), res, true);
}
});
}
......
......@@ -29,31 +29,18 @@ import { GPUSummary, GPUInfo } from '../common/gpuData';
* Metadata of remote machine for configuration and statuc query
*/
export class RemoteMachineMeta {
public readonly ip : string;
public readonly port : number;
public readonly username : string;
public readonly passwd?: string;
public readonly ip : string = '';
public readonly port : number = 22;
public readonly username : string = '';
public readonly passwd: string = '';
public readonly sshKeyPath?: string;
public readonly passphrase?: string;
public gpuSummary : GPUSummary | undefined;
public readonly gpuIndices?: string;
public readonly maxTrialNumPerGpu?: number;
public occupiedGpuIndexMap: Map<number, number>;
//TODO: initialize varialbe in constructor
public occupiedGpuIndexMap?: Map<number, number>;
public readonly useActiveGpu?: boolean = false;
constructor(ip : string, port : number, username : string, passwd : string,
sshKeyPath: string, passphrase : string, gpuIndices?: string, maxTrialNumPerGpu?: number, useActiveGpu?: boolean) {
this.ip = ip;
this.port = port;
this.username = username;
this.passwd = passwd;
this.sshKeyPath = sshKeyPath;
this.passphrase = passphrase;
this.gpuIndices = gpuIndices;
this.maxTrialNumPerGpu = maxTrialNumPerGpu;
this.occupiedGpuIndexMap = new Map<number, number>();
this.useActiveGpu = useActiveGpu;
}
}
export function parseGpuIndices(gpuIndices?: string): Set<number> | undefined {
......
......@@ -466,6 +466,7 @@ class RemoteMachineTrainingService implements TrainingService {
let connectedRMNum: number = 0;
rmMetaList.forEach(async (rmMeta: RemoteMachineMeta) => {
rmMeta.occupiedGpuIndexMap = new Map<number, number>();
let sshClientManager: SSHClientManager = new SSHClientManager([], this.MAX_TRIAL_NUMBER_PER_SSHCONNECTION, rmMeta);
let sshClient: Client = await sshClientManager.getAvailableSSHClient();
this.machineSSHClientMap.set(rmMeta, sshClientManager);
......
......@@ -124,7 +124,7 @@ else:
funcs_args,
fixed_inputs,
optional_inputs,
optional_input_size=0):
optional_input_size):
'''execute the chosen function and inputs.
Below is an example of chosen function and inputs:
{
......@@ -149,7 +149,7 @@ else:
chosen_layer = mutable_block[mutable_layer_id]["chosen_layer"]
chosen_inputs = mutable_block[mutable_layer_id]["chosen_inputs"]
real_chosen_inputs = [optional_inputs[input_name] for input_name in chosen_inputs]
layer_out = funcs[chosen_layer]([fixed_inputs, real_chosen_inputs], *funcs_args[chosen_layer])
layer_out = funcs[chosen_layer]([fixed_inputs, real_chosen_inputs], **funcs_args[chosen_layer])
return layer_out
......
......@@ -92,7 +92,7 @@ class SlideBar extends React.Component<{}, SliderState> {
const aTag = document.createElement('a');
const isEdge = navigator.userAgent.indexOf('Edge') !== -1 ? true : false;
const file = new Blob([nniLogfile], { type: 'application/json' });
aTag.download = 'nnimanagerLog.json';
aTag.download = 'nnimanager.log';
aTag.href = URL.createObjectURL(file);
aTag.click();
if (!isEdge) {
......@@ -101,7 +101,7 @@ class SlideBar extends React.Component<{}, SliderState> {
if (navigator.userAgent.indexOf('Firefox') > -1) {
const downTag = document.createElement('a');
downTag.addEventListener('click', function () {
downTag.download = 'nnimanagerLog.json';
downTag.download = 'nnimanager.log';
downTag.href = URL.createObjectURL(file);
});
let eventMouse = document.createEvent('MouseEvents');
......@@ -122,7 +122,7 @@ class SlideBar extends React.Component<{}, SliderState> {
const aTag = document.createElement('a');
const isEdge = navigator.userAgent.indexOf('Edge') !== -1 ? true : false;
const file = new Blob([dispatchLogfile], { type: 'application/json' });
aTag.download = 'dispatcherLog.json';
aTag.download = 'dispatcher.log';
aTag.href = URL.createObjectURL(file);
aTag.click();
if (!isEdge) {
......@@ -131,7 +131,7 @@ class SlideBar extends React.Component<{}, SliderState> {
if (navigator.userAgent.indexOf('Firefox') > -1) {
const downTag = document.createElement('a');
downTag.addEventListener('click', function () {
downTag.download = 'dispatcherLog.json';
downTag.download = 'dispatcher.log';
downTag.href = URL.createObjectURL(file);
});
let eventMouse = document.createEvent('MouseEvents');
......
......@@ -27,7 +27,6 @@ interface TrialDetailState {
entriesInSelect: string;
searchSpace: string;
isMultiPhase: boolean;
isTableLoading: boolean;
whichGraph: string;
hyperCounts: number; // user click the hyper-parameter counts
durationCounts: number;
......@@ -79,7 +78,6 @@ class TrialsDetail extends React.Component<{}, TrialDetailState> {
whichGraph: '1',
isHasSearch: false,
isMultiPhase: false,
isTableLoading: false,
hyperCounts: 0,
durationCounts: 0,
intermediateCounts: 0
......@@ -95,9 +93,6 @@ class TrialsDetail extends React.Component<{}, TrialDetailState> {
])
.then(axios.spread((res, res1) => {
if (res.status === 200 && res1.status === 200) {
if (this._isMounted === true) {
this.setState(() => ({ isTableLoading: true }));
}
const trialJobs = res.data;
const metricSource = res1.data;
const trialTable: Array<TableObj> = [];
......@@ -187,10 +182,7 @@ class TrialsDetail extends React.Component<{}, TrialDetailState> {
}
}
if (this._isMounted) {
this.setState(() => ({
isTableLoading: false,
tableListSource: trialTable
}));
this.setState(() => ({ tableListSource: trialTable }));
}
if (entriesInSelect === 'all' && this._isMounted) {
this.setState(() => ({
......@@ -330,7 +322,7 @@ class TrialsDetail extends React.Component<{}, TrialDetailState> {
const {
tableListSource, searchResultSource, isHasSearch, isMultiPhase,
entriesTable, experimentPlatform, searchSpace, experimentLogCollection,
whichGraph, isTableLoading
whichGraph
} = this.state;
const source = isHasSearch ? searchResultSource : tableListSource;
return (
......@@ -407,7 +399,6 @@ class TrialsDetail extends React.Component<{}, TrialDetailState> {
<TableList
entries={entriesTable}
tableSource={source}
isTableLoading={isTableLoading}
isMultiPhase={isMultiPhase}
platform={experimentPlatform}
updateList={this.getDetailSource}
......
......@@ -123,7 +123,7 @@ class OpenRow extends React.Component<OpenRowProps, OpenRowState> {
<Button
onClick={this.showFormatModal.bind(this, record)}
>
Copy as python
Copy as json
</Button>
</Row>
</Row>
......
......@@ -115,13 +115,13 @@ class Intermediate extends React.Component<IntermediateProps, IntermediateState>
},
xAxis: {
type: 'category',
name: 'Scape',
name: 'Step',
boundaryGap: false,
data: xAxis
},
yAxis: {
type: 'value',
name: 'Intermediate'
name: 'metric'
},
series: trialIntermediate
};
......
......@@ -22,6 +22,7 @@ interface ParaState {
max: number; // graph color bar limit
min: number;
sutrialCount: number; // succeed trial numbers for SUC
succeedRenderCount: number; // all succeed trials number
clickCounts: number;
isLoadConfirm: boolean;
}
......@@ -68,6 +69,7 @@ class Para extends React.Component<ParaProps, ParaState> {
min: 0,
max: 1,
sutrialCount: 10000000,
succeedRenderCount: 10000000,
clickCounts: 1,
isLoadConfirm: false
};
......@@ -76,7 +78,8 @@ class Para extends React.Component<ParaProps, ParaState> {
getParallelAxis =
(
dimName: Array<string>, parallelAxis: Array<Dimobj>,
accPara: Array<number>, eachTrialParams: Array<string>
accPara: Array<number>, eachTrialParams: Array<string>,
lengthofTrials: number
) => {
// get data for every lines. if dim is choice type, number -> toString()
const paraYdata: number[][] = [];
......@@ -120,7 +123,7 @@ class Para extends React.Component<ParaProps, ParaState> {
if (swapAxisArr.length >= 2) {
this.swapGraph(paraData, swapAxisArr);
}
this.getOption(paraData);
this.getOption(paraData, lengthofTrials);
if (this._isMounted === true) {
this.setState(() => ({ paraBack: paraData }));
}
......@@ -159,8 +162,8 @@ class Para extends React.Component<ParaProps, ParaState> {
parallelAxis.push({
dim: i,
name: dimName[i],
max: searchKey._value[0] - 1,
min: 0
min: searchKey._value[0],
max: searchKey._value[1],
});
break;
......@@ -248,7 +251,8 @@ class Para extends React.Component<ParaProps, ParaState> {
this.setState({
paraNodata: 'No data',
option: optionOfNull,
sutrialCount: 0
sutrialCount: 0,
succeedRenderCount: 0
});
}
} else {
......@@ -265,7 +269,7 @@ class Para extends React.Component<ParaProps, ParaState> {
});
if (this._isMounted) {
this.setState({ max: Math.max(...accPara), min: Math.min(...accPara) }, () => {
this.getParallelAxis(dimName, parallelAxis, accPara, eachTrialParams);
this.getParallelAxis(dimName, parallelAxis, accPara, eachTrialParams, lenOfDataSource);
});
}
}
......@@ -283,7 +287,7 @@ class Para extends React.Component<ParaProps, ParaState> {
}
// deal with response data into pic data
getOption = (dataObj: ParaObj) => {
getOption = (dataObj: ParaObj, lengthofTrials: number) => {
// dataObj [[y1], [y2]... [default metric]]
const { max, min } = this.state;
const parallelAxis = dataObj.parallelAxis;
......@@ -348,6 +352,7 @@ class Para extends React.Component<ParaProps, ParaState> {
this.setState(() => ({
option: optionown,
paraNodata: '',
succeedRenderCount: lengthofTrials,
sutrialCount: paralleData.length
}));
}
......@@ -367,7 +372,7 @@ class Para extends React.Component<ParaProps, ParaState> {
}
swapReInit = () => {
const { clickCounts } = this.state;
const { clickCounts, succeedRenderCount } = this.state;
const val = clickCounts + 1;
if (this._isMounted) {
this.setState({ isLoadConfirm: true, clickCounts: val, });
......@@ -419,7 +424,7 @@ class Para extends React.Component<ParaProps, ParaState> {
paraData[paraItem][dim1] = paraData[paraItem][dim2];
paraData[paraItem][dim2] = temp;
});
this.getOption(paraBack);
this.getOption(paraBack, succeedRenderCount);
// please wait the data
if (this._isMounted) {
this.setState(() => ({
......@@ -503,12 +508,16 @@ class Para extends React.Component<ParaProps, ParaState> {
return true;
}
const { sutrialCount, clickCounts } = nextState;
const { sutrialCount, clickCounts, succeedRenderCount } = nextState;
const beforeCount = this.state.sutrialCount;
const beforeClickCount = this.state.clickCounts;
const beforeRealRenderCount = this.state.succeedRenderCount;
if (sutrialCount !== beforeCount) {
return true;
}
if (succeedRenderCount !== beforeRealRenderCount) {
return true;
}
if (clickCounts !== beforeClickCount) {
return true;
......
......@@ -30,7 +30,6 @@ interface TableListProps {
platform: string;
logCollection: boolean;
isMultiPhase: boolean;
isTableLoading: boolean;
}
interface TableListState {
......@@ -195,7 +194,7 @@ class TableList extends React.Component<TableListProps, TableListState> {
render() {
const { entries, tableSource, updateList, isTableLoading } = this.props;
const { entries, tableSource, updateList } = this.props;
const { intermediateOption, modalVisible, isShowColumn, columnSelected } = this.state;
let showTitle = COLUMN;
let bgColor = '';
......@@ -264,8 +263,12 @@ class TableList extends React.Component<TableListProps, TableListState> {
sorter: (a: TableObj, b: TableObj) => (a.duration as number) - (b.duration as number),
render: (text: string, record: TableObj) => {
let duration;
if (record.duration !== undefined && record.duration > 0) {
duration = convertDuration(record.duration);
if (record.duration !== undefined) {
if (record.duration > 0 && record.duration < 1) {
duration = `${record.duration}s`;
} else {
duration = convertDuration(record.duration);
}
} else {
duration = 0;
}
......@@ -418,7 +421,6 @@ class TableList extends React.Component<TableListProps, TableListState> {
dataSource={tableSource}
className="commonTableStyle"
pagination={{ pageSize: entries }}
loading={isTableLoading}
/>
{/* Intermediate Result Modal */}
<Modal
......
authorName: nni
experimentName: default_test
maxExecDuration: 5m
maxTrialNum: 4
trialConcurrency: 2
tuner:
codeDir: ../../../examples/tuners/random_nas_tuner
classFileName: random_nas_tuner.py
className: RandomNASTuner
trial:
codeDir: ../../../examples/trials/mnist-nas
command: python3 mnist.py --batch_num 100
gpuNum: 0
useAnnotation: true
multiPhase: false
multiThread: false
trainingServicePlatform: local
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment