"docs/git@developer.sourcefind.cn:OpenDAS/nni.git" did not exist on "463c0f78d74c7f3d8f9053602868ae5b208475ff"
Unverified Commit 742c26eb authored by Yuge Zhang's avatar Yuge Zhang Committed by GitHub
Browse files

Add NAS Visualization Documentation (#2257)

* nas ui docs

* add link in overview

* update

* Update Visualization.md
parent 4dc9eb93
...@@ -119,7 +119,9 @@ trainer.export(file="model_dir/final_architecture.json") # export the final arc ...@@ -119,7 +119,9 @@ trainer.export(file="model_dir/final_architecture.json") # export the final arc
Users can directly run their training file through `python3 train.py` without `nnictl`. After training, users can export the best one of the found models through `trainer.export()`. Users can directly run their training file through `python3 train.py` without `nnictl`. After training, users can export the best one of the found models through `trainer.export()`.
Normally, the trainer exposes a few arguments that you can customize. For example, the loss function, the metrics function, the optimizer, and the datasets. These should satisfy most usages needs and we do our best to make sure our built-in trainers work on as many models, tasks, and datasets as possible. But there is no guarantee. For example, some trainers have the assumption that the task is a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps); most trainers do not have support for distributed training: they won't wrap your model with `DataParallel` or `DistributedDataParallel` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might need to [customize your trainer](#extend-the-ability-of-one-shot-trainers). Normally, the trainer exposes a few arguments that you can customize. For example, the loss function, the metrics function, the optimizer, and the datasets. These should satisfy most usages needs and we do our best to make sure our built-in trainers work on as many models, tasks, and datasets as possible. But there is no guarantee. For example, some trainers have the assumption that the task is a classification task; some trainers might have a different definition of "epoch" (e.g., an ENAS epoch = some child steps + some controller steps); most trainers do not have support for distributed training: they won't wrap your model with `DataParallel` or `DistributedDataParallel` to do that. So after a few tryouts, if you want to actually use the trainers on your very customized applications, you might need to [customize your trainer](./Advanced.md#extend-the-ability-of-one-shot-trainers).
Furthermore, one-shot NAS can be visualized with our NAS UI. [See more details.](./Visualization.md)
### Distributed NAS ### Distributed NAS
......
...@@ -25,11 +25,12 @@ One-shot algorithms run **standalone without nnictl**. Only the PyTorch version ...@@ -25,11 +25,12 @@ One-shot algorithms run **standalone without nnictl**. Only the PyTorch version
Here are some common dependencies to run the examples. PyTorch needs to be above 1.2 to use ``BoolTensor``. Here are some common dependencies to run the examples. PyTorch needs to be above 1.2 to use ``BoolTensor``.
* NNI 1.2+
* tensorboard * tensorboard
* PyTorch 1.2+ * PyTorch 1.2+
* git * git
One-shot NAS can be visualized with our visualization tool. Learn more details [here](./Visualization.md).
## Supported Distributed NAS Algorithms ## Supported Distributed NAS Algorithms
|Name|Brief Introduction of Algorithm| |Name|Brief Introduction of Algorithm|
...@@ -49,6 +50,10 @@ The programming interface of designing and searching a model is often demanded i ...@@ -49,6 +50,10 @@ The programming interface of designing and searching a model is often demanded i
[Here](./NasGuide.md) is the user guide to get started with using NAS on NNI. [Here](./NasGuide.md) is the user guide to get started with using NAS on NNI.
## NAS Visualization
To help users track the process and status of how the model is searched under specified search space, we developed a visualization tool. It visualizes search space as a super-net and shows importance of subnets and layers/operations, as well as how the importance changes along with the search process. Please refer to [the document of NAS visualization](./Visualization.md) for how to use it.
## Reference and Feedback ## Reference and Feedback
[1]: https://arxiv.org/abs/1802.03268 [1]: https://arxiv.org/abs/1802.03268
......
# NAS Visualization (Experimental)
## Built-in Trainers Support
Currently, only ENAS and DARTS support visualization. Examples of [ENAS](./ENAS.md) and [DARTS](./DARTS.md) has demonstrated how to enable visualization in your code, namely, adding this before `trainer.train()`:
```python
trainer.enable_visualization()
```
This will create a directory `logs/<current_time_stamp>` in your working folder, in which you will find two files `graph.json` and `log`.
You don't have to wait until your program finishes to launch NAS UI, but it's important that these two files have been already created. Launch NAS UI with
```bash
nnictl webui nas --logdir logs/<current_time_stamp> --port <port>
```
## Visualize a Customized Trainer
If you are interested in how to customize a trainer, please read this [doc](./Advanced.md#extend-the-ability-of-one-shot-trainers).
You should do two modifications to an existing trainer to enable visualization:
1. Export your graph before training, with
```python
vis_graph = self.mutator.graph(inputs)
# `inputs` is a dummy input to your model. For example, torch.randn((1, 3, 32, 32)).cuda()
# If your model has multiple inputs, it should be a tuple.
with open("/path/to/your/logdir/graph.json", "w") as f:
json.dump(vis_graph, f)
```
2. Logging the choices you've made. You can do it once per epoch, once per mini-batch or whatever frequency you'd like.
```python
def __init__(self):
# ...
self.status_writer = open("/path/to/your/logdir/log", "w") # create a writer
def train(self):
# ...
print(json.dumps(self.mutator.status()), file=self.status_writer, flush=True) # dump a record of status
```
If you are implementing your customized trainer inheriting `Trainer`. We have provided `enable_visualization()` and `_write_graph_status()` for easy-to-use purposes. All you need to do is calling `trainer.enable_visualization()` before start, and `trainer._write_graph_status()` each time you want to do the logging. But remember both of these APIs are experimental and subject to change in future.
Last but not least, invode NAS UI with
```bash
nnictl webui nas --logdir /path/to/your/logdir
```
## NAS UI Preview
![](../../img/nasui-1.png)
![](../../img/nasui-2.png)
## Limitations
* NAS visualization only works with PyTorch >=1.4. We've tested it on PyTorch 1.3.1 and it doesn't work.
* We rely on PyTorch support for tensorboard for graph export, which relies on `torch.jit`. It will not work if your model doesn't support `jit`.
* There are known performance issues when loading a moderate-size graph with many op choices (like DARTS search space).
## Feedback
NAS UI is currently experimental. We welcome your feedback. [Here](https://github.com/microsoft/nni/pull/2085) we have listed all the to-do items of NAS UI in the future. Feel free to comment (or [submit a new issue](https://github.com/microsoft/nni/issues/new?template=enhancement.md)) if you have other suggestions.
...@@ -27,4 +27,5 @@ For details, please refer to the following tutorials: ...@@ -27,4 +27,5 @@ For details, please refer to the following tutorials:
CDARTS <NAS/CDARTS> CDARTS <NAS/CDARTS>
ProxylessNAS <NAS/Proxylessnas> ProxylessNAS <NAS/Proxylessnas>
Customize a NAS Algorithm <NAS/Advanced> Customize a NAS Algorithm <NAS/Advanced>
NAS Visualization <NAS/Visualization>
API Reference <NAS/NasReference> API Reference <NAS/NasReference>
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment