"docs/vscode:/vscode.git/clone" did not exist on "0b6336b583c0d0208acf49011ccfbe8a2327338c"
Unverified Commit d90433da authored by SparkSnail's avatar SparkSnail Committed by GitHub
Browse files

Merge pull request #246 from microsoft/master

merge master
parents 1e511829 bf7daa8f
# Multi-phase
## What is multi-phase experiment
Typically each trial job gets a single configuration (e.g., hyperparameters) from tuner, tries this configuration and reports result, then exits. But sometimes a trial job may wants to request multiple configurations from tuner. We find this is a very compelling feature. For example:
1. Job launch takes tens of seconds in some training platform. If a configuration takes only around a minute to finish, running only one configuration in a trial job would be very inefficient. An appealing way is that a trial job requests a configuration and finishes it, then requests another configuration and run. The extreme case is that a trial job can run infinite configurations. If you set concurrency to be for example 6, there would be 6 __long running__ jobs keeping trying different configurations.
2. Some types of models have to be trained phase by phase, the configuration of next phase depends on the results of previous phase(s). For example, to find the best quantization for a model, the training procedure is often as follows: the auto-quantization algorithm (i.e., tuner in NNI) chooses a size of bits (e.g., 16 bits), a trial job gets this configuration and trains the model for some epochs and reports result (e.g., accuracy). The algorithm receives this result and makes decision of changing 16 bits to 8 bits, or changing back to 32 bits. This process is repeated for a configured times.
The above cases can be supported by the same feature, i.e., multi-phase execution. To support those cases, basically a trial job should be able to request multiple configurations from tuner. Tuner is aware of whether two configuration requests are from the same trial job or different ones. Also in multi-phase a trial job can report multiple final results.
## Create multi-phase experiment
### Write trial code which leverages multi-phase:
__1. Update trial code__
It is pretty simple to use multi-phase in trial code, an example is shown below:
```python
# ...
for i in range(5):
# get parameter from tuner
tuner_param = nni.get_next_parameter()
# nni.get_next_parameter returns None if there is no more hyper parameters can be generated by tuner.
if tuner_param is None:
break
# consume the params
# ...
# report final result somewhere for the parameter retrieved above
nni.report_final_result()
# ...
# ...
```
In multi-phase experiments, at each time the API `nni.get_next_parameter()` is called, it returns a new hyper parameter generated by tuner, then the trail code consume this new hyper parameter and report final result of this hyper parameter. `nni.get_next_parameter()` and `nni.report_final_result()` should be called sequentially: __call the former one, then call the later one; and repeat this pattern__. If `nni.get_next_parameter()` is called multiple times consecutively, and then `nni.report_final_result()` is called once, the result is associated to the last configuration, which is retrieved from the last get_next_parameter call. So there is no result associated to previous get_next_parameter calls, and it may cause some multi-phase algorithm broken.
Note that, `nni.get_next_parameter` returns None if there is no more hyper parameters can be generated by tuner.
__2. Experiment configuration__
To enable multi-phase, you should also add `multiPhase: true` in your experiment YAML configure file. If this line is not added, `nni.get_next_parameter()` would always return the same configuration.
Multi-phase experiment configuration example:
```yaml
authorName: default
experimentName: multiphase experiment
trialConcurrency: 2
maxExecDuration: 1h
maxTrialNum: 8
trainingServicePlatform: local
searchSpacePath: search_space.json
multiPhase: true
useAnnotation: false
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
trial:
command: python3 mytrial.py
codeDir: .
gpuNum: 0
```
### Write a tuner that leverages multi-phase:
Before writing a multi-phase tuner, we highly suggest you to go through [Customize Tuner](https://nni.readthedocs.io/en/latest/Tuner/CustomizeTuner.html). Same as writing a normal tuner, your tuner needs to inherit from `Tuner` class. When you enable multi-phase through configuration (set `multiPhase` to true), your tuner will get an additional parameter `trial_job_id` via tuner's following methods:
```text
generate_parameters
generate_multiple_parameters
receive_trial_result
receive_customized_trial_result
trial_end
```
With this information, the tuner could know which trial is requesting a configuration, and which trial is reporting results. This information provides enough flexibility for your tuner to deal with different trials and different phases. For example, you may want to use the trial_job_id parameter of generate_parameters method to generate hyperparameters for a specific trial job.
### Tuners support multi-phase experiments:
[TPE](../Tuner/HyperoptTuner.md), [Random](../Tuner/HyperoptTuner.md), [Anneal](../Tuner/HyperoptTuner.md), [Evolution](../Tuner/EvolutionTuner.md), [SMAC](../Tuner/SmacTuner.md), [NetworkMorphism](../Tuner/NetworkmorphismTuner.md), [MetisTuner](../Tuner/MetisTuner.md), [BOHB](../Tuner/BohbAdvisor.md), [Hyperband](../Tuner/HyperbandAdvisor.md).
### Training services support multi-phase experiment:
[Local Machine](../TrainingService/LocalMode.md), [Remote Servers](../TrainingService/RemoteMachineMode.md), [OpenPAI](../TrainingService/PaiMode.md)
...@@ -156,12 +156,23 @@ model = Net() ...@@ -156,12 +156,23 @@ model = Net()
apply_fixed_architecture(model, "model_dir/final_architecture.json") apply_fixed_architecture(model, "model_dir/final_architecture.json")
``` ```
The JSON is simply a mapping from mutable keys to one-hot or multi-hot representation of choices. For example The JSON is simply a mapping from mutable keys to choices. Choices can be expressed in:
* A string: select the candidate with corresponding name.
* A number: select the candidate with corresponding index.
* A list of string: select the candidates with corresponding names.
* A list of number: select the candidates with corresponding indices.
* A list of boolean values: a multi-hot array.
For example,
```json ```json
{ {
"LayerChoice1": [false, true, false, false], "LayerChoice1": "conv5x5",
"InputChoice2": [true, true, false] "LayerChoice2": 6,
"InputChoice3": ["layer1", "layer3"],
"InputChoice4": [1, 2],
"InputChoice5": [false, true, false, false, true]
} }
``` ```
......
...@@ -206,7 +206,7 @@ ...@@ -206,7 +206,7 @@
* Documentation * Documentation
- Update the docs structure -Issue #1231 - Update the docs structure -Issue #1231
- [Multi phase document improvement](AdvancedFeature/MultiPhase.md) -Issue #1233 -PR #1242 - (deprecated) Multi phase document improvement -Issue #1233 -PR #1242
+ Add configuration example + Add configuration example
- [WebUI description improvement](Tutorial/WebUI.md) -PR #1419 - [WebUI description improvement](Tutorial/WebUI.md) -PR #1419
...@@ -234,12 +234,10 @@ ...@@ -234,12 +234,10 @@
* Add `enas-mode` and `oneshot-mode` for NAS interface: [PR #1201](https://github.com/microsoft/nni/pull/1201#issue-291094510) * Add `enas-mode` and `oneshot-mode` for NAS interface: [PR #1201](https://github.com/microsoft/nni/pull/1201#issue-291094510)
* [Gaussian Process Tuner with Matern kernel](Tuner/GPTuner.md) * [Gaussian Process Tuner with Matern kernel](Tuner/GPTuner.md)
* Multiphase experiment supports * (deprecated) Multiphase experiment supports
* Added new training service support for multiphase experiment: PAI mode supports multiphase experiment since v0.9. * Added new training service support for multiphase experiment: PAI mode supports multiphase experiment since v0.9.
* Added multiphase capability for the following builtin tuners: * Added multiphase capability for the following builtin tuners:
* TPE, Random Search, Anneal, Naïve Evolution, SMAC, Network Morphism, Metis Tuner. * TPE, Random Search, Anneal, Naïve Evolution, SMAC, Network Morphism, Metis Tuner.
For details, please refer to [Write a tuner that leverages multi-phase](AdvancedFeature/MultiPhase.md)
* Web Portal * Web Portal
* Enable trial comparation in Web Portal. For details, refer to [View trials status](Tutorial/WebUI.md) * Enable trial comparation in Web Portal. For details, refer to [View trials status](Tutorial/WebUI.md)
...@@ -549,4 +547,3 @@ Initial release of Neural Network Intelligence (NNI). ...@@ -549,4 +547,3 @@ Initial release of Neural Network Intelligence (NNI).
* Support CI by providing out-of-box integration with [travis-ci](https://github.com/travis-ci) on ubuntu * Support CI by providing out-of-box integration with [travis-ci](https://github.com/travis-ci) on ubuntu
* Others * Others
* Support simple GPU job scheduling * Support simple GPU job scheduling
...@@ -26,7 +26,7 @@ NNI supports running experiment using [FrameworkController](https://github.com/M ...@@ -26,7 +26,7 @@ NNI supports running experiment using [FrameworkController](https://github.com/M
## Setup FrameworkController ## Setup FrameworkController
Follow the [guideline](https://github.com/Microsoft/frameworkcontroller/tree/master/example/run) to set up FrameworkController in the Kubernetes cluster, NNI supports FrameworkController by the stateful set mode. Follow the [guideline](https://github.com/Microsoft/frameworkcontroller/tree/master/example/run) to set up FrameworkController in the Kubernetes cluster, NNI supports FrameworkController by the stateful set mode. If your cluster enforces authorization, you need to create a service account with granted permission for FrameworkController, and then pass the name of the FrameworkController service account to the NNI Experiment Config. [refer](https://github.com/Microsoft/frameworkcontroller/tree/master/example/run#run-by-kubernetes-statefulset)
## Design ## Design
...@@ -83,6 +83,7 @@ If you use Azure Kubernetes Service, you should set `frameworkcontrollerConfig` ...@@ -83,6 +83,7 @@ If you use Azure Kubernetes Service, you should set `frameworkcontrollerConfig`
```yaml ```yaml
frameworkcontrollerConfig: frameworkcontrollerConfig:
storage: azureStorage storage: azureStorage
serviceAccountName: {your_frameworkcontroller_service_account_name}
keyVault: keyVault:
vaultName: {your_vault_name} vaultName: {your_vault_name}
name: {your_secert_name} name: {your_secert_name}
......
...@@ -68,10 +68,6 @@ tuner: ...@@ -68,10 +68,6 @@ tuner:
Random search is suggested when each trial does not take very long (e.g., each trial can be completed very quickly, or early stopped by the assessor), and you have enough computational resources. It's also useful if you want to uniformly explore the search space. Random Search can be considered a baseline search algorithm. [Detailed Description](./HyperoptTuner.md) Random search is suggested when each trial does not take very long (e.g., each trial can be completed very quickly, or early stopped by the assessor), and you have enough computational resources. It's also useful if you want to uniformly explore the search space. Random Search can be considered a baseline search algorithm. [Detailed Description](./HyperoptTuner.md)
**classArgs Requirements:**
* **optimize_mode** (*maximize or minimize, optional, default = maximize*) - If 'maximize', the tuner will try to maximize metrics. If 'minimize', the tuner will try to minimize metrics.
**Example Configuration:** **Example Configuration:**
```yaml ```yaml
......
...@@ -51,7 +51,7 @@ class CustomizedTuner(Tuner): ...@@ -51,7 +51,7 @@ class CustomizedTuner(Tuner):
... ...
``` ```
`receive_trial_result` will receive the `parameter_id, parameters, value` as parameters input. Also, Tuner will receive the `value` object are exactly same value that Trial send. If `multiPhase` is set to `true` in the experiment configuration file, an additional `trial_job_id` parameter is passed to `receive_trial_result` and `generate_parameters` through the `**kwargs` parameter. `receive_trial_result` will receive the `parameter_id, parameters, value` as parameters input. Also, Tuner will receive the `value` object are exactly same value that Trial send.
The `your_parameters` return from `generate_parameters` function, will be package as json object by NNI SDK. NNI SDK will unpack json object so the Trial will receive the exact same `your_parameters` from Tuner. The `your_parameters` return from `generate_parameters` function, will be package as json object by NNI SDK. NNI SDK will unpack json object so the Trial will receive the exact same `your_parameters` from Tuner.
...@@ -109,4 +109,4 @@ More detail example you could see: ...@@ -109,4 +109,4 @@ More detail example you could see:
### Write a more advanced automl algorithm ### Write a more advanced automl algorithm
The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called `advisor` which directly inherits from `MsgDispatcherBase` in [`src/sdk/pynni/nni/msg_dispatcher_base.py`](https://github.com/Microsoft/nni/tree/master/src/sdk/pynni/nni/msg_dispatcher_base.py). Please refer to [here](CustomizeAdvisor.md) for how to write a customized advisor. The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called `advisor` which directly inherits from `MsgDispatcherBase` in [`src/sdk/pynni/nni/msg_dispatcher_base.py`](https://github.com/Microsoft/nni/tree/master/src/sdk/pynni/nni/msg_dispatcher_base.py). Please refer to [here](CustomizeAdvisor.md) for how to write a customized advisor.
\ No newline at end of file
...@@ -17,7 +17,6 @@ This document describes the rules to write the config file, and provides some ex ...@@ -17,7 +17,6 @@ This document describes the rules to write the config file, and provides some ex
+ [trainingServicePlatform](#trainingserviceplatform) + [trainingServicePlatform](#trainingserviceplatform)
+ [searchSpacePath](#searchspacepath) + [searchSpacePath](#searchspacepath)
+ [useAnnotation](#useannotation) + [useAnnotation](#useannotation)
+ [multiPhase](#multiphase)
+ [multiThread](#multithread) + [multiThread](#multithread)
+ [nniManagerIp](#nnimanagerip) + [nniManagerIp](#nnimanagerip)
+ [logDir](#logdir) + [logDir](#logdir)
...@@ -94,8 +93,6 @@ searchSpacePath: ...@@ -94,8 +93,6 @@ searchSpacePath:
#choice: true, false, default: false #choice: true, false, default: false
useAnnotation: useAnnotation:
#choice: true, false, default: false #choice: true, false, default: false
multiPhase:
#choice: true, false, default: false
multiThread: multiThread:
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution #choice: TPE, Random, Anneal, Evolution
...@@ -130,8 +127,6 @@ searchSpacePath: ...@@ -130,8 +127,6 @@ searchSpacePath:
#choice: true, false, default: false #choice: true, false, default: false
useAnnotation: useAnnotation:
#choice: true, false, default: false #choice: true, false, default: false
multiPhase:
#choice: true, false, default: false
multiThread: multiThread:
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution #choice: TPE, Random, Anneal, Evolution
...@@ -171,8 +166,6 @@ trainingServicePlatform: ...@@ -171,8 +166,6 @@ trainingServicePlatform:
#choice: true, false, default: false #choice: true, false, default: false
useAnnotation: useAnnotation:
#choice: true, false, default: false #choice: true, false, default: false
multiPhase:
#choice: true, false, default: false
multiThread: multiThread:
tuner: tuner:
#choice: TPE, Random, Anneal, Evolution #choice: TPE, Random, Anneal, Evolution
...@@ -283,12 +276,6 @@ Use annotation to analysis trial code and generate search space. ...@@ -283,12 +276,6 @@ Use annotation to analysis trial code and generate search space.
Note: if __useAnnotation__ is true, the searchSpacePath field should be removed. Note: if __useAnnotation__ is true, the searchSpacePath field should be removed.
### multiPhase
Optional. Bool. Default: false.
Enable [multi-phase experiment](../AdvancedFeature/MultiPhase.md).
### multiThread ### multiThread
Optional. Bool. Default: false. Optional. Bool. Default: false.
......
...@@ -67,10 +67,8 @@ It doesn't need to redeploy, but the nnictl may need to be restarted. ...@@ -67,10 +67,8 @@ It doesn't need to redeploy, but the nnictl may need to be restarted.
#### TypeScript #### TypeScript
* If `src/nni_manager` will be changed, run `yarn watch` continually under this folder. It will rebuild code instantly. * If `src/nni_manager` is changed, run `yarn watch` continually under this folder. It will rebuild code instantly. The nnictl may need to be restarted to reload NNI manager.
* If `src/webui` or `src/nasui` is changed, use **step 3** to rebuild code. * If `src/webui` or `src/nasui` are changed, run `yarn start` under the corresponding folder. The web UI will refresh automatically if code is changed.
The nnictl may need to be restarted.
--- ---
......
...@@ -4,7 +4,6 @@ Advanced Features ...@@ -4,7 +4,6 @@ Advanced Features
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
Enable Multi-phase <AdvancedFeature/MultiPhase>
Write a New Tuner <Tuner/CustomizeTuner> Write a New Tuner <Tuner/CustomizeTuner>
Write a New Assessor <Assessor/CustomizeAssessor> Write a New Assessor <Assessor/CustomizeAssessor>
Write a New Advisor <Tuner/CustomizeAdvisor> Write a New Advisor <Tuner/CustomizeAdvisor>
......
# 多阶段
## 多阶段 Experiment
通常,每个 Trial 任务只需要从 Tuner 获取一个配置(超参等),然后使用这个配置执行并报告结果,然后退出。 但有时,一个 Trial 任务可能需要从 Tuner 请求多次配置。 这是一个非常有用的功能。 例如:
1. 在一些训练平台上,需要数十秒来启动一个任务。 如果一个配置只需要一分钟就能完成,那么每个 Trial 任务中只运行一个配置就会非常低效。 这种情况下,可以在同一个 Trial 任务中,完成一个配置后,再请求并完成另一个配置。 极端情况下,一个 Trial 任务可以运行无数个配置。 如果设置了并发(例如设为 6),那么就会有 6 个**长时间**运行的任务来不断尝试不同的配置。
2. 有些类型的模型需要进行多阶段的训练,而下一个阶段的配置依赖于前一个阶段的结果。 例如,为了找到模型最好的量化结果,训练过程通常为:自动量化算法(例如 NNI 中的 TunerJ)选择一个位宽(如 16 位), Trial 任务获得此配置,并训练数个 epoch,并返回结果(例如精度)。 算法收到结果后,决定是将 16 位改为 8 位,还是 32 位。 此过程会重复多次。
上述情况都可以通过多阶段执行的功能来支持。 为了支持这些情况,一个 Trial 任务需要能从 Tuner 请求多个配置。 Tuner 需要知道两次配置请求是否来自同一个 Trial 任务。 同时,多阶段中的 Trial 任务需要多次返回最终结果。
## 创建多阶段的 Experiment
### 实现使用多阶段的 Trial 代码:
**1. 更新 Trial 代码**
Trial 代码中使用多阶段非常容易,示例如下:
```python
# ...
for i in range(5):
# 从 Tuner 中获得参数
tuner_param = nni.get_next_parameter()
# 如果没有更多超参可生成,nni.get_next_parameter 会返回 None。
if tuner_param is None:
break
# 使用参数
# ...
# 返回最终结果
nni.report_final_result()
# ...
# ...
```
在多阶段 Experiment 中,每次 API `nni.get_next_parameter()` 被调用时,会返回 Tuner 新生成的超参,然后 Trial 代码会使用新的超参,并返回其最终结果。 `nni.get_next_parameter()``nni.report_final_result()` 需要依次被调用:**先调用前者,然后调用后者,并按此顺序重复调用**。 如果 `nni.get_next_parameter()` 被连续多次调用,然后再调用 `nni.report_final_result()`,这会造成最终结果只会与 get_next_parameter 所返回的最后一个配置相关联。 因此,前面的 get_next_parameter 调用都没有关联的结果,这可能会造成一些多阶段算法出问题。
注意,如果 `nni.get_next_parameter` 返回 None,表示 Tuner 没有生成更多的超参。
**2. Experiment 配置**
要启用多阶段,需要在 Experiment 的 YAML 配置文件中增加 `multiPhase: true`。 如果不添加此参数,`nni.get_next_parameter()` 会一直返回同样的配置。
多阶段 Experiment 配置示例:
```yaml
authorName: default
experimentName: multiphase experiment
trialConcurrency: 2
maxExecDuration: 1h
maxTrialNum: 8
trainingServicePlatform: local
searchSpacePath: search_space.json
multiPhase: true
useAnnotation: false
tuner:
builtinTunerName: TPE
classArgs:
optimize_mode: maximize
trial:
command: python3 mytrial.py
codeDir: .
gpuNum: 0
```
### 实现使用多阶段的 Tuner:
强烈建议首先阅读[自定义 Tuner](https://nni.readthedocs.io/zh/latest/Tuner/CustomizeTuner.html),再开始实现多阶段 Tuner。 与普通 Tuner 一样,需要从 `Tuner` 类继承。 当通过配置启用多阶段时(将 `multiPhase` 设为 true),Tuner 会通过下列方法得到一个新的参数 `trial_job_id`
```text
generate_parameters
generate_multiple_parameters
receive_trial_result
receive_customized_trial_result
trial_end
```
有了这个信息, Tuner 能够知道哪个 Trial 在请求配置信息, 返回的结果是哪个 Trial 的。 通过此信息,Tuner 能够灵活的为不同的 Trial 及其阶段实现功能。 例如,可在 generate_parameters 方法中使用 trial_job_id 来为特定的 Trial 任务生成超参。
### 支持多阶段 Experiment 的 Tuner:
[TPE](../Tuner/HyperoptTuner.md), [Random](../Tuner/HyperoptTuner.md), [Anneal](../Tuner/HyperoptTuner.md), [Evolution](../Tuner/EvolutionTuner.md), [SMAC](../Tuner/SmacTuner.md), [NetworkMorphism](../Tuner/NetworkmorphismTuner.md), [MetisTuner](../Tuner/MetisTuner.md), [BOHB](../Tuner/BohbAdvisor.md), [Hyperband](../Tuner/HyperbandAdvisor.md).
### 支持多阶段 Experiment 的训练平台:
[本机](../TrainingService/LocalMode.md), [远程计算机](../TrainingService/RemoteMachineMode.md), [OpenPAI](../TrainingService/PaiMode.md)
\ No newline at end of file
...@@ -147,8 +147,10 @@ class ShuffleNetV2OneShot(nn.Module): ...@@ -147,8 +147,10 @@ class ShuffleNetV2OneShot(nn.Module):
def load_and_parse_state_dict(filepath="./data/checkpoint-150000.pth.tar"): def load_and_parse_state_dict(filepath="./data/checkpoint-150000.pth.tar"):
checkpoint = torch.load(filepath, map_location=torch.device("cpu")) checkpoint = torch.load(filepath, map_location=torch.device("cpu"))
if "state_dict" in checkpoint:
checkpoint = checkpoint["state_dict"]
result = dict() result = dict()
for k, v in checkpoint["state_dict"].items(): for k, v in checkpoint.items():
if k.startswith("module."): if k.startswith("module."):
k = k[len("module."):] k = k[len("module."):]
result[k] = v result[k] = v
......
...@@ -283,6 +283,9 @@ class PAIYarnTrainingService extends PAITrainingService { ...@@ -283,6 +283,9 @@ class PAIYarnTrainingService extends PAITrainingService {
}; };
request(submitJobRequest, (error: Error, response: request.Response, _body: any) => { request(submitJobRequest, (error: Error, response: request.Response, _body: any) => {
if ((error !== undefined && error !== null) || response.statusCode >= 400) { if ((error !== undefined && error !== null) || response.statusCode >= 400) {
const errorMessage: string = (error !== undefined && error !== null) ? error.message :
`Submit trial ${trialJobId} failed, http code:${response.statusCode}, http body: ${response.body.message}`;
this.log.error(errorMessage);
trialJobDetail.status = 'FAILED'; trialJobDetail.status = 'FAILED';
deferred.resolve(true); deferred.resolve(true);
} else { } else {
......
...@@ -3,10 +3,9 @@ ...@@ -3,10 +3,9 @@
import json import json
import torch from .mutables import InputChoice, LayerChoice, MutableScope
from .mutator import Mutator
from nni.nas.pytorch.mutables import MutableScope from .utils import to_list
from nni.nas.pytorch.mutator import Mutator
class FixedArchitecture(Mutator): class FixedArchitecture(Mutator):
...@@ -17,8 +16,8 @@ class FixedArchitecture(Mutator): ...@@ -17,8 +16,8 @@ class FixedArchitecture(Mutator):
---------- ----------
model : nn.Module model : nn.Module
A mutable network. A mutable network.
fixed_arc : str or dict fixed_arc : dict
Path to the architecture checkpoint (a string), or preloaded architecture object (a dict). Preloaded architecture object.
strict : bool strict : bool
Force everything that appears in ``fixed_arc`` to be used at least once. Force everything that appears in ``fixed_arc`` to be used at least once.
""" """
...@@ -33,6 +32,34 @@ class FixedArchitecture(Mutator): ...@@ -33,6 +32,34 @@ class FixedArchitecture(Mutator):
raise RuntimeError("Unexpected keys found in fixed architecture: {}.".format(fixed_arc_keys - mutable_keys)) raise RuntimeError("Unexpected keys found in fixed architecture: {}.".format(fixed_arc_keys - mutable_keys))
if mutable_keys - fixed_arc_keys: if mutable_keys - fixed_arc_keys:
raise RuntimeError("Missing keys in fixed architecture: {}.".format(mutable_keys - fixed_arc_keys)) raise RuntimeError("Missing keys in fixed architecture: {}.".format(mutable_keys - fixed_arc_keys))
self._fixed_arc = self._from_human_readable_architecture(self._fixed_arc)
def _from_human_readable_architecture(self, human_arc):
# convert from an exported architecture
result_arc = {k: to_list(v) for k, v in human_arc.items()} # there could be tensors, numpy arrays, etc.
# First, convert non-list to list, because there could be {"op1": 0} or {"op1": "conv"},
# which means {"op1": [0, ]} ir {"op1": ["conv", ]}
result_arc = {k: v if isinstance(v, list) else [v] for k, v in result_arc.items()}
# Second, infer which ones are multi-hot arrays and which ones are in human-readable format.
# This is non-trivial, since if an array in [0, 1], we cannot know for sure it means [false, true] or [true, true].
# Here, we assume an multihot array has to be a boolean array or a float array and matches the length.
for mutable in self.mutables:
if mutable.key not in result_arc:
continue # skip silently
choice_arr = result_arc[mutable.key]
if all(isinstance(v, bool) for v in choice_arr) or all(isinstance(v, float) for v in choice_arr):
if (isinstance(mutable, LayerChoice) and len(mutable) == len(choice_arr)) or \
(isinstance(mutable, InputChoice) and mutable.n_candidates == len(choice_arr)):
# multihot, do nothing
continue
if isinstance(mutable, LayerChoice):
choice_arr = [mutable.names.index(val) if isinstance(val, str) else val for val in choice_arr]
choice_arr = [i in choice_arr for i in range(len(mutable))]
elif isinstance(mutable, InputChoice):
choice_arr = [mutable.choose_from.index(val) if isinstance(val, str) else val for val in choice_arr]
choice_arr = [i in choice_arr for i in range(mutable.n_candidates)]
result_arc[mutable.key] = choice_arr
return result_arc
def sample_search(self): def sample_search(self):
""" """
...@@ -47,17 +74,6 @@ class FixedArchitecture(Mutator): ...@@ -47,17 +74,6 @@ class FixedArchitecture(Mutator):
return self._fixed_arc return self._fixed_arc
def _encode_tensor(data):
if isinstance(data, list):
if all(map(lambda o: isinstance(o, bool), data)):
return torch.tensor(data, dtype=torch.bool) # pylint: disable=not-callable
else:
return torch.tensor(data, dtype=torch.float) # pylint: disable=not-callable
if isinstance(data, dict):
return {k: _encode_tensor(v) for k, v in data.items()}
return data
def apply_fixed_architecture(model, fixed_arc): def apply_fixed_architecture(model, fixed_arc):
""" """
Load architecture from `fixed_arc` and apply to model. Load architecture from `fixed_arc` and apply to model.
...@@ -78,7 +94,6 @@ def apply_fixed_architecture(model, fixed_arc): ...@@ -78,7 +94,6 @@ def apply_fixed_architecture(model, fixed_arc):
if isinstance(fixed_arc, str): if isinstance(fixed_arc, str):
with open(fixed_arc) as f: with open(fixed_arc) as f:
fixed_arc = json.load(f) fixed_arc = json.load(f)
fixed_arc = _encode_tensor(fixed_arc)
architecture = FixedArchitecture(model, fixed_arc) architecture = FixedArchitecture(model, fixed_arc)
architecture.reset() architecture.reset()
return architecture return architecture
...@@ -7,7 +7,9 @@ from collections import defaultdict ...@@ -7,7 +7,9 @@ from collections import defaultdict
import numpy as np import numpy as np
import torch import torch
from nni.nas.pytorch.base_mutator import BaseMutator from .base_mutator import BaseMutator
from .mutables import LayerChoice, InputChoice
from .utils import to_list
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
...@@ -58,7 +60,16 @@ class Mutator(BaseMutator): ...@@ -58,7 +60,16 @@ class Mutator(BaseMutator):
dict dict
A mapping from key of mutables to decisions. A mapping from key of mutables to decisions.
""" """
return self.sample_final() sampled = self.sample_final()
result = dict()
for mutable in self.mutables:
if not isinstance(mutable, (LayerChoice, InputChoice)):
# not supported as built-in
continue
result[mutable.key] = self._convert_mutable_decision_to_human_readable(mutable, sampled.pop(mutable.key))
if sampled:
raise ValueError("Unexpected keys returned from 'sample_final()': %s", list(sampled.keys()))
return result
def status(self): def status(self):
""" """
...@@ -159,7 +170,7 @@ class Mutator(BaseMutator): ...@@ -159,7 +170,7 @@ class Mutator(BaseMutator):
mask = self._get_decision(mutable) mask = self._get_decision(mutable)
assert len(mask) == len(mutable), \ assert len(mask) == len(mutable), \
"Invalid mask, expected {} to be of length {}.".format(mask, len(mutable)) "Invalid mask, expected {} to be of length {}.".format(mask, len(mutable))
out = self._select_with_mask(_map_fn, [(choice, args, kwargs) for choice in mutable], mask) out, mask = self._select_with_mask(_map_fn, [(choice, args, kwargs) for choice in mutable], mask)
return self._tensor_reduction(mutable.reduction, out), mask return self._tensor_reduction(mutable.reduction, out), mask
def on_forward_input_choice(self, mutable, tensor_list): def on_forward_input_choice(self, mutable, tensor_list):
...@@ -185,17 +196,41 @@ class Mutator(BaseMutator): ...@@ -185,17 +196,41 @@ class Mutator(BaseMutator):
mask = self._get_decision(mutable) mask = self._get_decision(mutable)
assert len(mask) == mutable.n_candidates, \ assert len(mask) == mutable.n_candidates, \
"Invalid mask, expected {} to be of length {}.".format(mask, mutable.n_candidates) "Invalid mask, expected {} to be of length {}.".format(mask, mutable.n_candidates)
out = self._select_with_mask(lambda x: x, [(t,) for t in tensor_list], mask) out, mask = self._select_with_mask(lambda x: x, [(t,) for t in tensor_list], mask)
return self._tensor_reduction(mutable.reduction, out), mask return self._tensor_reduction(mutable.reduction, out), mask
def _select_with_mask(self, map_fn, candidates, mask): def _select_with_mask(self, map_fn, candidates, mask):
if "BoolTensor" in mask.type(): """
Select masked tensors and return a list of tensors.
Parameters
----------
map_fn : function
Convert candidates to target candidates. Can be simply identity.
candidates : list of torch.Tensor
Tensor list to apply the decision on.
mask : list-like object
Can be a list, an numpy array or a tensor (recommended). Needs to
have the same length as ``candidates``.
Returns
-------
tuple of list of torch.Tensor and torch.Tensor
Output and mask.
"""
if (isinstance(mask, list) and len(mask) >= 1 and isinstance(mask[0], bool)) or \
(isinstance(mask, np.ndarray) and mask.dtype == np.bool) or \
"BoolTensor" in mask.type():
out = [map_fn(*cand) for cand, m in zip(candidates, mask) if m] out = [map_fn(*cand) for cand, m in zip(candidates, mask) if m]
elif "FloatTensor" in mask.type(): elif (isinstance(mask, list) and len(mask) >= 1 and isinstance(mask[0], (float, int))) or \
(isinstance(mask, np.ndarray) and mask.dtype in (np.float32, np.float64, np.int32, np.int64)) or \
"FloatTensor" in mask.type():
out = [map_fn(*cand) * m for cand, m in zip(candidates, mask) if m] out = [map_fn(*cand) * m for cand, m in zip(candidates, mask) if m]
else: else:
raise ValueError("Unrecognized mask") raise ValueError("Unrecognized mask '%s'" % mask)
return out if not torch.is_tensor(mask):
mask = torch.tensor(mask) # pylint: disable=not-callable
return out, mask
def _tensor_reduction(self, reduction_type, tensor_list): def _tensor_reduction(self, reduction_type, tensor_list):
if reduction_type == "none": if reduction_type == "none":
...@@ -237,3 +272,37 @@ class Mutator(BaseMutator): ...@@ -237,3 +272,37 @@ class Mutator(BaseMutator):
result = self._cache[mutable.key] result = self._cache[mutable.key]
logger.debug("Decision %s: %s", mutable.key, result) logger.debug("Decision %s: %s", mutable.key, result)
return result return result
def _convert_mutable_decision_to_human_readable(self, mutable, sampled):
# Assert the existence of mutable.key in returned architecture.
# Also check if there is anything extra.
multihot_list = to_list(sampled)
converted = None
# If it's a boolean array, we can do optimization.
if all([t == 0 or t == 1 for t in multihot_list]):
if isinstance(mutable, LayerChoice):
assert len(multihot_list) == len(mutable), \
"Results returned from 'sample_final()' (%s: %s) either too short or too long." \
% (mutable.key, multihot_list)
# check if all modules have different names and they indeed have names
if len(set(mutable.names)) == len(mutable) and not all(d.isdigit() for d in mutable.names):
converted = [name for i, name in enumerate(mutable.names) if multihot_list[i]]
else:
converted = [i for i in range(len(multihot_list)) if multihot_list[i]]
if isinstance(mutable, InputChoice):
assert len(multihot_list) == mutable.n_candidates, \
"Results returned from 'sample_final()' (%s: %s) either too short or too long." \
% (mutable.key, multihot_list)
# check if all input candidates have different names
if len(set(mutable.choose_from)) == mutable.n_candidates:
converted = [name for i, name in enumerate(mutable.choose_from) if multihot_list[i]]
else:
converted = [i for i in range(len(multihot_list)) if multihot_list[i]]
if converted is not None:
# if only one element, then remove the bracket
if len(converted) == 1:
converted = converted[0]
else:
# do nothing
converted = multihot_list
return converted
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
import logging import logging
from collections import OrderedDict from collections import OrderedDict
import numpy as np
import torch import torch
_counter = 0 _counter = 0
...@@ -45,6 +46,16 @@ def to_device(obj, device): ...@@ -45,6 +46,16 @@ def to_device(obj, device):
raise ValueError("'%s' has unsupported type '%s'" % (obj, type(obj))) raise ValueError("'%s' has unsupported type '%s'" % (obj, type(obj)))
def to_list(arr):
if torch.is_tensor(arr):
return arr.cpu().numpy().tolist()
if isinstance(arr, np.ndarray):
return arr.tolist()
if isinstance(arr, (list, tuple)):
return list(arr)
return arr
class AverageMeterGroup: class AverageMeterGroup:
""" """
Average meter group for multiple average meters. Average meter group for multiple average meters.
......
...@@ -74,18 +74,16 @@ def exploit_and_explore(bot_trial_info, top_trial_info, factor, resample_probabi ...@@ -74,18 +74,16 @@ def exploit_and_explore(bot_trial_info, top_trial_info, factor, resample_probabi
top_hyper_parameters = top_trial_info.hyper_parameters top_hyper_parameters = top_trial_info.hyper_parameters
hyper_parameters = copy.deepcopy(top_hyper_parameters) hyper_parameters = copy.deepcopy(top_hyper_parameters)
random_state = np.random.RandomState() random_state = np.random.RandomState()
hyper_parameters['load_checkpoint_dir'] = hyper_parameters['save_checkpoint_dir']
hyper_parameters['save_checkpoint_dir'] = os.path.join(bot_checkpoint_dir, str(epoch))
for key in hyper_parameters.keys(): for key in hyper_parameters.keys():
hyper_parameter = hyper_parameters[key] hyper_parameter = hyper_parameters[key]
if key == 'load_checkpoint_dir': if key == 'load_checkpoint_dir' or key == 'save_checkpoint_dir':
hyper_parameters[key] = hyper_parameters['save_checkpoint_dir']
continue
elif key == 'save_checkpoint_dir':
hyper_parameters[key] = os.path.join(bot_checkpoint_dir, str(epoch))
continue continue
elif search_space[key]["_type"] == "choice": elif search_space[key]["_type"] == "choice":
choices = search_space[key]["_value"] choices = search_space[key]["_value"]
ub, uv = len(choices) - 1, choices.index(hyper_parameter["_value"]) + 1 ub, uv = len(choices) - 1, choices.index(hyper_parameter) + 1
lb, lv = 0, choices.index(hyper_parameter["_value"]) - 1 lb, lv = 0, choices.index(hyper_parameter) - 1
elif search_space[key]["_type"] == "randint": elif search_space[key]["_type"] == "randint":
lb, ub = search_space[key]["_value"][:2] lb, ub = search_space[key]["_value"][:2]
ub -= 1 ub -= 1
...@@ -132,10 +130,11 @@ def exploit_and_explore(bot_trial_info, top_trial_info, factor, resample_probabi ...@@ -132,10 +130,11 @@ def exploit_and_explore(bot_trial_info, top_trial_info, factor, resample_probabi
else: else:
logger.warning("Illegal type to perturb: %s", search_space[key]["_type"]) logger.warning("Illegal type to perturb: %s", search_space[key]["_type"])
continue continue
if search_space[key]["_type"] == "choice": if search_space[key]["_type"] == "choice":
idx = perturbation(search_space[key]["_type"], search_space[key]["_value"], idx = perturbation(search_space[key]["_type"], search_space[key]["_value"],
resample_probability, uv, ub, lv, lb, random_state) resample_probability, uv, ub, lv, lb, random_state)
hyper_parameters[key] = {'_index': idx, '_value': choices[idx]} hyper_parameters[key] = choices[idx]
else: else:
hyper_parameters[key] = perturbation(search_space[key]["_type"], search_space[key]["_value"], hyper_parameters[key] = perturbation(search_space[key]["_type"], search_space[key]["_value"],
resample_probability, uv, ub, lv, lb, random_state) resample_probability, uv, ub, lv, lb, random_state)
...@@ -231,6 +230,7 @@ class PBTTuner(Tuner): ...@@ -231,6 +230,7 @@ class PBTTuner(Tuner):
for i in range(self.population_size): for i in range(self.population_size):
hyper_parameters = json2parameter( hyper_parameters = json2parameter(
self.searchspace_json, is_rand, self.random_state) self.searchspace_json, is_rand, self.random_state)
hyper_parameters = split_index(hyper_parameters)
checkpoint_dir = os.path.join(self.all_checkpoint_dir, str(i)) checkpoint_dir = os.path.join(self.all_checkpoint_dir, str(i))
hyper_parameters['load_checkpoint_dir'] = os.path.join(checkpoint_dir, str(self.epoch)) hyper_parameters['load_checkpoint_dir'] = os.path.join(checkpoint_dir, str(self.epoch))
hyper_parameters['save_checkpoint_dir'] = os.path.join(checkpoint_dir, str(self.epoch)) hyper_parameters['save_checkpoint_dir'] = os.path.join(checkpoint_dir, str(self.epoch))
...@@ -294,7 +294,42 @@ class PBTTuner(Tuner): ...@@ -294,7 +294,42 @@ class PBTTuner(Tuner):
trial_info.parameter_id = parameter_id trial_info.parameter_id = parameter_id
self.running[parameter_id] = trial_info self.running[parameter_id] = trial_info
logger.info('Generate parameter : %s', trial_info.hyper_parameters) logger.info('Generate parameter : %s', trial_info.hyper_parameters)
return split_index(trial_info.hyper_parameters) return trial_info.hyper_parameters
def _proceed_next_epoch(self):
"""
"""
logger.info('Proceeding to next epoch')
self.epoch += 1
self.population = []
self.pos = -1
self.running = {}
#exploit and explore
reverse = True if self.optimize_mode == OptimizeMode.Maximize else False
self.finished = sorted(self.finished, key=lambda x: x.score, reverse=reverse)
cutoff = int(np.ceil(self.fraction * len(self.finished)))
tops = self.finished[:cutoff]
bottoms = self.finished[self.finished_trials - cutoff:]
for bottom in bottoms:
top = np.random.choice(tops)
exploit_and_explore(bottom, top, self.factor, self.resample_probability, self.epoch, self.searchspace_json)
for trial in self.finished:
if trial not in bottoms:
trial.clean_id()
trial.hyper_parameters['load_checkpoint_dir'] = trial.hyper_parameters['save_checkpoint_dir']
trial.hyper_parameters['save_checkpoint_dir'] = os.path.join(trial.checkpoint_dir, str(self.epoch))
self.finished_trials = 0
for _ in range(self.population_size):
trial_info = self.finished.pop()
self.population.append(trial_info)
while self.credit > 0 and self.pos + 1 < len(self.population):
self.credit -= 1
self.pos += 1
parameter_id = self.param_ids.pop()
trial_info = self.population[self.pos]
trial_info.parameter_id = parameter_id
self.running[parameter_id] = trial_info
self.send_trial_callback(parameter_id, trial_info.hyper_parameters)
def receive_trial_result(self, parameter_id, parameters, value, **kwargs): def receive_trial_result(self, parameter_id, parameters, value, **kwargs):
""" """
...@@ -312,43 +347,99 @@ class PBTTuner(Tuner): ...@@ -312,43 +347,99 @@ class PBTTuner(Tuner):
""" """
logger.info('Get one trial result, id = %d, value = %s', parameter_id, value) logger.info('Get one trial result, id = %d, value = %s', parameter_id, value)
value = extract_scalar_reward(value) value = extract_scalar_reward(value)
trial_info = self.running.pop(parameter_id, None)
trial_info.score = value
self.finished.append(trial_info)
self.finished_trials += 1
if self.finished_trials == self.population_size:
self._proceed_next_epoch()
def trial_end(self, parameter_id, success, **kwargs):
"""
Deal with trial failure
Parameters
----------
parameter_id : int
Unique identifier for hyper-parameters used by this trial.
success : bool
True if the trial successfully completed; False if failed or terminated.
**kwargs
Unstable parameters which should be ignored by normal users.
"""
if success:
return
if self.optimize_mode == OptimizeMode.Minimize: if self.optimize_mode == OptimizeMode.Minimize:
value = -value value = float('inf')
else:
value = float('-inf')
trial_info = self.running.pop(parameter_id, None) trial_info = self.running.pop(parameter_id, None)
trial_info.score = value trial_info.score = value
self.finished.append(trial_info) self.finished.append(trial_info)
self.finished_trials += 1 self.finished_trials += 1
if self.finished_trials == self.population_size: if self.finished_trials == self.population_size:
logger.info('Proceeding to next epoch') self._proceed_next_epoch()
self.epoch += 1
self.population = []
self.pos = -1
self.running = {}
#exploit and explore
self.finished = sorted(self.finished, key=lambda x: x.score, reverse=True)
cutoff = int(np.ceil(self.fraction * len(self.finished)))
tops = self.finished[:cutoff]
bottoms = self.finished[self.finished_trials - cutoff:]
for bottom in bottoms:
top = np.random.choice(tops)
exploit_and_explore(bottom, top, self.factor, self.resample_probability, self.epoch, self.searchspace_json)
for trial in self.finished:
if trial not in bottoms:
trial.clean_id()
trial.hyper_parameters['load_checkpoint_dir'] = trial.hyper_parameters['save_checkpoint_dir']
trial.hyper_parameters['save_checkpoint_dir'] = os.path.join(trial.checkpoint_dir, str(self.epoch))
self.finished_trials = 0
for _ in range(self.population_size):
trial_info = self.finished.pop()
self.population.append(trial_info)
while self.credit > 0 and self.pos + 1 < len(self.population):
self.credit -= 1
self.pos += 1
parameter_id = self.param_ids.pop()
trial_info = self.population[self.pos]
trial_info.parameter_id = parameter_id
self.running[parameter_id] = trial_info
self.send_trial_callback(parameter_id, split_index(trial_info.hyper_parameters))
def import_data(self, data): def import_data(self, data):
pass """
Parameters
----------
data : json obj
imported data records
Returns
-------
int
the start epoch number after data imported, only used for unittest
"""
if self.running:
logger.warning("Do not support importing data in the middle of experiment")
return
# the following is for experiment resume
_completed_num = 0
epoch_data_dict = {}
for trial_info in data:
logger.info("Process data record %s / %s", _completed_num, len(data))
_completed_num += 1
# simply validate data format
_params = trial_info["parameter"]
_value = trial_info['value']
# assign fake value for failed trials
if not _value:
logger.info("Useless trial data, value is %s, skip this trial data.", _value)
_value = float('inf') if self.optimize_mode == OptimizeMode.Minimize else float('-inf')
_value = extract_scalar_reward(_value)
if 'save_checkpoint_dir' not in _params:
logger.warning("Invalid data record: save_checkpoint_dir is missing, abandon data import.")
return
epoch_num = int(os.path.basename(_params['save_checkpoint_dir']))
if epoch_num not in epoch_data_dict:
epoch_data_dict[epoch_num] = []
epoch_data_dict[epoch_num].append((_params, _value))
if not epoch_data_dict:
logger.warning("No valid epochs, abandon data import.")
return
# figure out start epoch for resume
max_epoch_num = max(epoch_data_dict, key=int)
if len(epoch_data_dict[max_epoch_num]) < self.population_size:
max_epoch_num -= 1
# If there is no a single complete round, no data to import, start from scratch
if max_epoch_num < 0:
logger.warning("No completed epoch, abandon data import.")
return
assert len(epoch_data_dict[max_epoch_num]) == self.population_size
# check existence of trial save checkpoint dir
for params, _ in epoch_data_dict[max_epoch_num]:
if not os.path.isdir(params['save_checkpoint_dir']):
logger.warning("save_checkpoint_dir %s does not exist, data will not be resumed", params['save_checkpoint_dir'])
return
# resume data
self.epoch = max_epoch_num
self.finished_trials = self.population_size
for params, value in epoch_data_dict[max_epoch_num]:
checkpoint_dir = os.path.dirname(params['save_checkpoint_dir'])
self.finished.append(TrialInfo(checkpoint_dir=checkpoint_dir, hyper_parameters=params, score=value))
self._proceed_next_epoch()
logger.info("Successfully import data to PBT tuner, total data: %d, imported data: %d.", len(data), self.population_size)
logger.info("Start from epoch %d ...", self.epoch)
return self.epoch # return for test
...@@ -159,6 +159,62 @@ class BuiltinTunersTestCase(TestCase): ...@@ -159,6 +159,62 @@ class BuiltinTunersTestCase(TestCase):
logger.info("Full supported search space: %s", full_supported_search_space) logger.info("Full supported search space: %s", full_supported_search_space)
self.search_space_test_one(tuner_factory, full_supported_search_space) self.search_space_test_one(tuner_factory, full_supported_search_space)
def import_data_test_for_pbt(self):
"""
test1: import data with complete epoch
test2: import data with incomplete epoch
"""
search_space = {
"choice_str": {
"_type": "choice",
"_value": ["cat", "dog", "elephant", "cow", "sheep", "panda"]
}
}
all_checkpoint_dir = os.path.expanduser("~/nni/checkpoint/test/")
population_size = 4
# ===import data at the beginning===
tuner = PBTTuner(
all_checkpoint_dir=all_checkpoint_dir,
population_size=population_size
)
self.assertIsInstance(tuner, Tuner)
tuner.update_search_space(search_space)
save_dirs = [os.path.join(all_checkpoint_dir, str(i), str(0)) for i in range(population_size)]
# create save checkpoint directory
for save_dir in save_dirs:
os.makedirs(save_dir, exist_ok=True)
# for simplicity, omit "load_checkpoint_dir"
data = [{"parameter": {"choice_str": "cat", "save_checkpoint_dir": save_dirs[0]}, "value": 1.1},
{"parameter": {"choice_str": "dog", "save_checkpoint_dir": save_dirs[1]}, "value": {"default": 1.2, "tmp": 2}},
{"parameter": {"choice_str": "cat", "save_checkpoint_dir": save_dirs[2]}, "value": 11},
{"parameter": {"choice_str": "cat", "save_checkpoint_dir": save_dirs[3]}, "value": 7}]
epoch = tuner.import_data(data)
self.assertEqual(epoch, 1)
logger.info("Imported data successfully at the beginning")
shutil.rmtree(all_checkpoint_dir)
# ===import another data at the beginning, test the case when there is an incompleted epoch===
tuner = PBTTuner(
all_checkpoint_dir=all_checkpoint_dir,
population_size=population_size
)
self.assertIsInstance(tuner, Tuner)
tuner.update_search_space(search_space)
for i in range(population_size - 1):
save_dirs.append(os.path.join(all_checkpoint_dir, str(i), str(1)))
for save_dir in save_dirs:
os.makedirs(save_dir, exist_ok=True)
data = [{"parameter": {"choice_str": "cat", "save_checkpoint_dir": save_dirs[0]}, "value": 1.1},
{"parameter": {"choice_str": "dog", "save_checkpoint_dir": save_dirs[1]}, "value": {"default": 1.2, "tmp": 2}},
{"parameter": {"choice_str": "cat", "save_checkpoint_dir": save_dirs[2]}, "value": 11},
{"parameter": {"choice_str": "cat", "save_checkpoint_dir": save_dirs[3]}, "value": 7},
{"parameter": {"choice_str": "cat", "save_checkpoint_dir": save_dirs[4]}, "value": 1.1},
{"parameter": {"choice_str": "dog", "save_checkpoint_dir": save_dirs[5]}, "value": {"default": 1.2, "tmp": 2}},
{"parameter": {"choice_str": "cat", "save_checkpoint_dir": save_dirs[6]}, "value": 11}]
epoch = tuner.import_data(data)
self.assertEqual(epoch, 1)
logger.info("Imported data successfully at the beginning with incomplete epoch")
shutil.rmtree(all_checkpoint_dir)
def import_data_test(self, tuner_factory, stype="choice_str"): def import_data_test(self, tuner_factory, stype="choice_str"):
""" """
import data at the beginning with number value and dict value import data at the beginning with number value and dict value
...@@ -297,6 +353,7 @@ class BuiltinTunersTestCase(TestCase): ...@@ -297,6 +353,7 @@ class BuiltinTunersTestCase(TestCase):
all_checkpoint_dir=os.path.expanduser("~/nni/checkpoint/test/"), all_checkpoint_dir=os.path.expanduser("~/nni/checkpoint/test/"),
population_size=100 population_size=100
)) ))
self.import_data_test_for_pbt()
def tearDown(self): def tearDown(self):
file_list = glob.glob("smac3*") + ["param_config_space.pcs", "scenario.txt", "model_path"] file_list = glob.glob("smac3*") + ["param_config_space.pcs", "scenario.txt", "model_path"]
......
...@@ -29,7 +29,18 @@ ...@@ -29,7 +29,18 @@
margin: 0 auto; margin: 0 auto;
margin-top: 74px; margin-top: 74px;
margin-bottom: 30px; margin-bottom: 30px;
background: #fff; }
.bottomDiv{
margin-bottom: 10px;
}
.bgNNI{
background-color: #fff;
}
.borderRight{
margin-right: 10px;
} }
/* office-fabric-ui */ /* office-fabric-ui */
......
...@@ -14,6 +14,7 @@ interface AppState { ...@@ -14,6 +14,7 @@ interface AppState {
metricGraphMode: 'max' | 'min'; // tuner's optimize_mode filed metricGraphMode: 'max' | 'min'; // tuner's optimize_mode filed
isillegalFinal: boolean; isillegalFinal: boolean;
expWarningMessage: string; expWarningMessage: string;
bestTrialEntries: string; // for overview page: best trial entreis
} }
class App extends React.Component<{}, AppState> { class App extends React.Component<{}, AppState> {
...@@ -30,7 +31,8 @@ class App extends React.Component<{}, AppState> { ...@@ -30,7 +31,8 @@ class App extends React.Component<{}, AppState> {
trialsUpdateBroadcast: 0, trialsUpdateBroadcast: 0,
metricGraphMode: 'max', metricGraphMode: 'max',
isillegalFinal: false, isillegalFinal: false,
expWarningMessage: '' expWarningMessage: '',
bestTrialEntries: '10'
}; };
} }
...@@ -92,9 +94,14 @@ class App extends React.Component<{}, AppState> { ...@@ -92,9 +94,14 @@ class App extends React.Component<{}, AppState> {
this.setState({ metricGraphMode: val }); this.setState({ metricGraphMode: val });
} }
// overview best trial module
changeEntries = (entries: string): void => {
this.setState({bestTrialEntries: entries});
}
render(): React.ReactNode { render(): React.ReactNode {
const { interval, columnList, experimentUpdateBroadcast, trialsUpdateBroadcast, const { interval, columnList, experimentUpdateBroadcast, trialsUpdateBroadcast,
metricGraphMode, isillegalFinal, expWarningMessage metricGraphMode, isillegalFinal, expWarningMessage, bestTrialEntries
} = this.state; } = this.state;
if (experimentUpdateBroadcast === 0 || trialsUpdateBroadcast === 0) { if (experimentUpdateBroadcast === 0 || trialsUpdateBroadcast === 0) {
return null; // TODO: render a loading page return null; // TODO: render a loading page
...@@ -106,7 +113,8 @@ class App extends React.Component<{}, AppState> { ...@@ -106,7 +113,8 @@ class App extends React.Component<{}, AppState> {
columnList, changeColumn: this.changeColumn, columnList, changeColumn: this.changeColumn,
experimentUpdateBroadcast, experimentUpdateBroadcast,
trialsUpdateBroadcast, trialsUpdateBroadcast,
metricGraphMode, changeMetricGraphMode: this.changeMetricGraphMode metricGraphMode, changeMetricGraphMode: this.changeMetricGraphMode,
bestTrialEntries, changeEntries: this.changeEntries
}) })
); );
......
...@@ -7,6 +7,7 @@ import { ...@@ -7,6 +7,7 @@ import {
import { MANAGER_IP, DRAWEROPTION } from '../../static/const'; import { MANAGER_IP, DRAWEROPTION } from '../../static/const';
import MonacoEditor from 'react-monaco-editor'; import MonacoEditor from 'react-monaco-editor';
import '../../static/style/logDrawer.scss'; import '../../static/style/logDrawer.scss';
import { TrialManager } from '../../static/model/trialmanager';
interface ExpDrawerProps { interface ExpDrawerProps {
isVisble: boolean; isVisble: boolean;
...@@ -37,27 +38,27 @@ class ExperimentDrawer extends React.Component<ExpDrawerProps, ExpDrawerState> { ...@@ -37,27 +38,27 @@ class ExperimentDrawer extends React.Component<ExpDrawerProps, ExpDrawerState> {
axios.get(`${MANAGER_IP}/trial-jobs`), axios.get(`${MANAGER_IP}/trial-jobs`),
axios.get(`${MANAGER_IP}/metric-data`) axios.get(`${MANAGER_IP}/metric-data`)
]) ])
.then(axios.spread((res, res1, res2) => { .then(axios.spread((resExperiment, resTrialJobs, resMetricData) => {
if (res.status === 200 && res1.status === 200 && res2.status === 200) { if (resExperiment.status === 200 && resTrialJobs.status === 200 && resMetricData.status === 200) {
if (res.data.params.searchSpace) { if (resExperiment.data.params.searchSpace) {
res.data.params.searchSpace = JSON.parse(res.data.params.searchSpace); resExperiment.data.params.searchSpace = JSON.parse(resExperiment.data.params.searchSpace);
} }
const trialMessagesArr = res1.data; const trialMessagesArr = TrialManager.expandJobsToTrials(resTrialJobs.data);
const interResultList = res2.data; const interResultList = resMetricData.data;
Object.keys(trialMessagesArr).map(item => { Object.keys(trialMessagesArr).map(item => {
// not deal with trial's hyperParameters // not deal with trial's hyperParameters
const trialId = trialMessagesArr[item].id; const trialId = trialMessagesArr[item].id;
// add intermediate result message // add intermediate result message
trialMessagesArr[item].intermediate = []; trialMessagesArr[item].intermediate = [];
Object.keys(interResultList).map(key => { Object.keys(interResultList).map(key => {
const interId = interResultList[key].trialJobId; const interId = `${interResultList[key].trialJobId}-${interResultList[key].parameterId}`;
if (trialId === interId) { if (trialId === interId) {
trialMessagesArr[item].intermediate.push(interResultList[key]); trialMessagesArr[item].intermediate.push(interResultList[key]);
} }
}); });
}); });
const result = { const result = {
experimentParameters: res.data, experimentParameters: resExperiment.data,
trialMessage: trialMessagesArr trialMessage: trialMessagesArr
}; };
if (this._isCompareMount === true) { if (this._isCompareMount === true) {
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment