Unverified Commit a911b856 authored by Yuge Zhang's avatar Yuge Zhang Committed by GitHub
Browse files

Resolve conflicts for #4760 (#4762)

parent 14d2966b
0e49e3aef98633744807b814786f6b31
\ No newline at end of file
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hello_nas.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hello_nas.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hello_nas.py:
Hello, NAS!
===========
This is the 101 tutorial of Neural Architecture Search (NAS) on NNI.
In this tutorial, we will search for a neural architecture on MNIST dataset with the help of NAS framework of NNI, i.e., *Retiarii*.
We use multi-trial NAS as an example to show how to construct and explore a model space.
There are mainly three crucial components for a neural architecture search task, namely,
* Model search space that defines a set of models to explore.
* A proper strategy as the method to explore this model space.
* A model evaluator that reports the performance of every model in the space.
Currently, PyTorch is the only supported framework by Retiarii, and we have only tested **PyTorch 1.7 to 1.10**.
This tutorial assumes PyTorch context but it should also apply to other frameworks, which is in our future plan.
Define your Model Space
-----------------------
Model space is defined by users to express a set of models that users want to explore, which contains potentially good-performing models.
In this framework, a model space is defined with two parts: a base model and possible mutations on the base model.
.. GENERATED FROM PYTHON SOURCE LINES 26-34
Define Base Model
^^^^^^^^^^^^^^^^^
Defining a base model is almost the same as defining a PyTorch (or TensorFlow) model.
Usually, you only need to replace the code ``import torch.nn as nn`` with
``import nni.retiarii.nn.pytorch as nn`` to use our wrapped PyTorch modules.
Below is a very simple example of defining a base model.
.. GENERATED FROM PYTHON SOURCE LINES 35-61
.. code-block:: default
import torch
import torch.nn.functional as F
import nni.retiarii.nn.pytorch as nn
from nni.retiarii import model_wrapper
@model_wrapper # this decorator should be put on the out most
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
.. GENERATED FROM PYTHON SOURCE LINES 62-104
.. tip:: Always keep in mind that you should use ``import nni.retiarii.nn.pytorch as nn`` and :meth:`nni.retiarii.model_wrapper`.
Many mistakes are a result of forgetting one of those.
Also, please use ``torch.nn`` for submodules of ``nn.init``, e.g., ``torch.nn.init`` instead of ``nn.init``.
Define Model Mutations
^^^^^^^^^^^^^^^^^^^^^^
A base model is only one concrete model not a model space. We provide :doc:`API and Primitives </nas/construct_space>`
for users to express how the base model can be mutated. That is, to build a model space which includes many models.
Based on the above base model, we can define a model space as below.
.. code-block:: diff
@model_wrapper
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
- self.conv2 = nn.Conv2d(32, 64, 3, 1)
+ self.conv2 = nn.LayerChoice([
+ nn.Conv2d(32, 64, 3, 1),
+ DepthwiseSeparableConv(32, 64)
+ ])
- self.dropout1 = nn.Dropout(0.25)
+ self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
self.dropout2 = nn.Dropout(0.5)
- self.fc1 = nn.Linear(9216, 128)
- self.fc2 = nn.Linear(128, 10)
+ feature = nn.ValueChoice([64, 128, 256])
+ self.fc1 = nn.Linear(9216, feature)
+ self.fc2 = nn.Linear(feature, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
This results in the following code:
.. GENERATED FROM PYTHON SOURCE LINES 104-147
.. code-block:: default
class DepthwiseSeparableConv(nn.Module):
def __init__(self, in_ch, out_ch):
super().__init__()
self.depthwise = nn.Conv2d(in_ch, in_ch, kernel_size=3, groups=in_ch)
self.pointwise = nn.Conv2d(in_ch, out_ch, kernel_size=1)
def forward(self, x):
return self.pointwise(self.depthwise(x))
@model_wrapper
class ModelSpace(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
# LayerChoice is used to select a layer between Conv2d and DwConv.
self.conv2 = nn.LayerChoice([
nn.Conv2d(32, 64, 3, 1),
DepthwiseSeparableConv(32, 64)
])
# ValueChoice is used to select a dropout rate.
# ValueChoice can be used as parameter of modules wrapped in `nni.retiarii.nn.pytorch`
# or customized modules wrapped with `@basic_unit`.
self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75])) # choose dropout rate from 0.25, 0.5 and 0.75
self.dropout2 = nn.Dropout(0.5)
feature = nn.ValueChoice([64, 128, 256])
self.fc1 = nn.Linear(9216, feature)
self.fc2 = nn.Linear(feature, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
model_space = ModelSpace()
model_space
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
ModelSpace(
(conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
(conv2): LayerChoice([Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1)), DepthwiseSeparableConv(
(depthwise): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32)
(pointwise): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
)], label='model_1')
(dropout1): Dropout(p=0.25, inplace=False)
(dropout2): Dropout(p=0.5, inplace=False)
(fc1): Linear(in_features=9216, out_features=64, bias=True)
(fc2): Linear(in_features=64, out_features=10, bias=True)
)
.. GENERATED FROM PYTHON SOURCE LINES 148-182
This example uses two mutation APIs,
:class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>` and
:class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>`.
:class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>`
takes a list of candidate modules (two in this example), one will be chosen for each sampled model.
It can be used like normal PyTorch module.
:class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>` takes a list of candidate values,
one will be chosen to take effect for each sampled model.
More detailed API description and usage can be found :doc:`here </nas/construct_space>`.
.. note::
We are actively enriching the mutation APIs, to facilitate easy construction of model space.
If the currently supported mutation APIs cannot express your model space,
please refer to :doc:`this doc </nas/mutator>` for customizing mutators.
Explore the Defined Model Space
-------------------------------
There are basically two exploration approaches: (1) search by evaluating each sampled model independently,
which is the search approach in :ref:`multi-trial NAS <multi-trial-nas>`
and (2) one-shot weight-sharing based search, which is used in one-shot NAS.
We demonstrate the first approach in this tutorial. Users can refer to :ref:`here <one-shot-nas>` for the second approach.
First, users need to pick a proper exploration strategy to explore the defined model space.
Second, users need to pick or customize a model evaluator to evaluate the performance of each explored model.
Pick an exploration strategy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Retiarii supports many :doc:`exploration strategies </nas/exploration_strategy>`.
Simply choosing (i.e., instantiate) an exploration strategy as below.
.. GENERATED FROM PYTHON SOURCE LINES 182-186
.. code-block:: default
import nni.retiarii.strategy as strategy
search_strategy = strategy.Random(dedup=True) # dedup=False if deduplication is not wanted
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
/home/yugzhan/miniconda3/envs/cu102/lib/python3.8/site-packages/ray/autoscaler/_private/cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
warnings.warn(
.. GENERATED FROM PYTHON SOURCE LINES 187-200
Pick or customize a model evaluator
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In the exploration process, the exploration strategy repeatedly generates new models. A model evaluator is for training
and validating each generated model to obtain the model's performance.
The performance is sent to the exploration strategy for the strategy to generate better models.
Retiarii has provided :doc:`built-in model evaluators </nas/evaluator>`, but to start with,
it is recommended to use :class:`FunctionalEvaluator <nni.retiarii.evaluator.FunctionalEvaluator>`,
that is, to wrap your own training and evaluation code with one single function.
This function should receive one single model class and uses :func:`nni.report_final_result` to report the final score of this model.
An example here creates a simple evaluator that runs on MNIST dataset, trains for 2 epochs, and reports its validation accuracy.
.. GENERATED FROM PYTHON SOURCE LINES 200-268
.. code-block:: default
import nni
from torchvision import transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
def train_epoch(model, device, train_loader, optimizer, epoch):
loss_fn = torch.nn.CrossEntropyLoss()
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test_epoch(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
accuracy = 100. * correct / len(test_loader.dataset)
print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
correct, len(test_loader.dataset), accuracy))
return accuracy
def evaluate_model(model_cls):
# "model_cls" is a class, need to instantiate
model = model_cls()
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
for epoch in range(3):
# train the model for one epoch
train_epoch(model, device, train_loader, optimizer, epoch)
# test the model for one epoch
accuracy = test_epoch(model, device, test_loader)
# call report intermediate result. Result can be float or dict
nni.report_intermediate_result(accuracy)
# report final test result
nni.report_final_result(accuracy)
.. GENERATED FROM PYTHON SOURCE LINES 269-270
Create the evaluator
.. GENERATED FROM PYTHON SOURCE LINES 270-274
.. code-block:: default
from nni.retiarii.evaluator import FunctionalEvaluator
evaluator = FunctionalEvaluator(evaluate_model)
.. GENERATED FROM PYTHON SOURCE LINES 275-286
The ``train_epoch`` and ``test_epoch`` here can be any customized function,
where users can write their own training recipe.
It is recommended that the ``evaluate_model`` here accepts no additional arguments other than ``model_cls``.
However, in the :doc:`advanced tutorial </nas/evaluator>`, we will show how to use additional arguments in case you actually need those.
In future, we will support mutation on the arguments of evaluators, which is commonly called "Hyper-parmeter tuning".
Launch an Experiment
--------------------
After all the above are prepared, it is time to start an experiment to do the model search. An example is shown below.
.. GENERATED FROM PYTHON SOURCE LINES 287-293
.. code-block:: default
from nni.retiarii.experiment.pytorch import RetiariiExperiment, RetiariiExeConfig
exp = RetiariiExperiment(model_space, evaluator, [], search_strategy)
exp_config = RetiariiExeConfig('local')
exp_config.experiment_name = 'mnist_search'
.. GENERATED FROM PYTHON SOURCE LINES 294-295
The following configurations are useful to control how many trials to run at most / at the same time.
.. GENERATED FROM PYTHON SOURCE LINES 295-299
.. code-block:: default
exp_config.max_trial_number = 4 # spawn 4 trials at most
exp_config.trial_concurrency = 2 # will run two trials concurrently
.. GENERATED FROM PYTHON SOURCE LINES 300-302
Remember to set the following config if you want to GPU.
``use_active_gpu`` should be set true if you wish to use an occupied GPU (possibly running a GUI).
.. GENERATED FROM PYTHON SOURCE LINES 302-306
.. code-block:: default
exp_config.trial_gpu_number = 1
exp_config.training_service.use_active_gpu = True
.. GENERATED FROM PYTHON SOURCE LINES 307-308
Launch the experiment. The experiment should take several minutes to finish on a workstation with 2 GPUs.
.. GENERATED FROM PYTHON SOURCE LINES 308-311
.. code-block:: default
exp.run(exp_config, 8081)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
INFO:nni.experiment:Creating experiment, Experiment ID: z8ns5fv7
INFO:nni.experiment:Connecting IPC pipe...
INFO:nni.experiment:Starting web server...
INFO:nni.experiment:Setting up...
INFO:nni.runtime.msg_dispatcher_base:Dispatcher started
INFO:nni.retiarii.experiment.pytorch:Web UI URLs: http://127.0.0.1:8081 http://10.190.172.35:8081 http://192.168.49.1:8081 http://172.17.0.1:8081
INFO:nni.retiarii.experiment.pytorch:Start strategy...
INFO:root:Successfully update searchSpace.
INFO:nni.retiarii.strategy.bruteforce:Random search running in fixed size mode. Dedup: on.
INFO:nni.retiarii.experiment.pytorch:Stopping experiment, please wait...
INFO:nni.retiarii.experiment.pytorch:Strategy exit
INFO:nni.retiarii.experiment.pytorch:Waiting for experiment to become DONE (you can ctrl+c if there is no running trial jobs)...
INFO:nni.runtime.msg_dispatcher_base:Dispatcher exiting...
INFO:nni.retiarii.experiment.pytorch:Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 312-330
Users can also run Retiarii Experiment with :doc:`different training services </experiment/training_service/overview>`
besides ``local`` training service.
Visualize the Experiment
------------------------
Users can visualize their experiment in the same way as visualizing a normal hyper-parameter tuning experiment.
For example, open ``localhost:8081`` in your browser, 8081 is the port that you set in ``exp.run``.
Please refer to :doc:`here </experiment/web_portal/web_portal>` for details.
We support visualizing models with 3rd-party visualization engines (like `Netron <https://netron.app/>`__).
This can be used by clicking ``Visualization`` in detail panel for each trial.
Note that current visualization is based on `onnx <https://onnx.ai/>`__ ,
thus visualization is not feasible if the model cannot be exported into onnx.
Built-in evaluators (e.g., Classification) will automatically export the model into a file.
For your own evaluator, you need to save your file into ``$NNI_OUTPUT_DIR/model.onnx`` to make this work.
For instance,
.. GENERATED FROM PYTHON SOURCE LINES 330-344
.. code-block:: default
import os
from pathlib import Path
def evaluate_model_with_visualization(model_cls):
model = model_cls()
# dump the model into an onnx
if 'NNI_OUTPUT_DIR' in os.environ:
dummy_input = torch.zeros(1, 3, 32, 32)
torch.onnx.export(model, (dummy_input, ),
Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')
evaluate_model(model_cls)
.. GENERATED FROM PYTHON SOURCE LINES 345-353
Relaunch the experiment, and a button is shown on Web portal.
.. image:: ../../img/netron_entrance_webui.png
Export Top Models
-----------------
Users can export top models after the exploration is done using ``export_top_models``.
.. GENERATED FROM PYTHON SOURCE LINES 353-357
.. code-block:: default
for model_dict in exp.export_top_models(formatter='dict'):
print(model_dict)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
{'model_1': '0', 'model_2': 0.25, 'model_3': 64}
.. GENERATED FROM PYTHON SOURCE LINES 358-362
The output is ``json`` object which records the mutation actions of the top model.
If users want to output source code of the top model,
they can use :ref:`graph-based execution engine <graph-based-execution-engine>` for the experiment,
by simply adding the following two lines.
.. GENERATED FROM PYTHON SOURCE LINES 362-365
.. code-block:: default
exp_config.execution_engine = 'base'
export_formatter = 'code'
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 2 minutes 4.499 seconds)
.. _sphx_glr_download_tutorials_hello_nas.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: hello_nas.py <hello_nas.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: hello_nas.ipynb <hello_nas.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
.. 8a873f2c9cb0e8e3ed2d66b9d16c330f
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hello_nas.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hello_nas.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hello_nas.py:
架构搜索入门教程
================
这是 NNI 上的神经架构搜索(NAS)的入门教程。
在本教程中,我们将借助 NNI NAS 框架,即 *Retiarii*,在 MNIST 数据集上实现网络结构搜索。
我们以多尝试的架构搜索为例来展示如何构建和探索模型空间。
神经架构搜索任务主要有三个关键组成部分,即
* 模型搜索空间,定义了一个要探索的模型的集合。
* 一个合适的策略作为探索这个模型空间的方法。
* 一个模型评估器,用于为搜索空间中每个模型评估性能。
目前,Retiarii 只支持 PyTorch,并对 **PyTorch 1.7 1.10** 进行了测试。
所以本教程假定您使用 PyTorch 作为深度学习框架。未来我们会支持更多框架。
定义您的模型空间
----------------------
模型空间是由用户定义的,用来表达用户想要探索的一组模型,其中包含有潜力的好模型。
NNI 的框架中,模型空间由两部分定义:基本模型和基本模型上可能的变化。
.. GENERATED FROM PYTHON SOURCE LINES 26-34
定义基本模型
^^^^^^^^^^^^^^^^^
定义基本模型与定义 PyTorch(或 TensorFlow)模型几乎相同。
通常,您只需将代码 ``import torch.nn as nn`` 替换为
``import nni.retiarii.nn.pytorch as nn`` 以使用我们打包的 PyTorch 模块。
下面是定义基本模型的一个非常简单的示例。
.. GENERATED FROM PYTHON SOURCE LINES 35-61
.. code-block:: default
import torch
import torch.nn.functional as F
import nni.retiarii.nn.pytorch as nn
from nni.retiarii import model_wrapper
@model_wrapper # this decorator should be put on the out most
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
.. GENERATED FROM PYTHON SOURCE LINES 62-104
.. tip:: 记住,您应该使用 ``import nni.retiarii.nn.pytorch as nn`` :meth:`nni.retiarii.model_wrapper`
许多错误都是因为忘记使用某一个。
另外,要使用 ``nn.init`` 的子模块,可以使用 ``torch.nn``,例如, ``torch.nn.init`` 而不是 ``nn.init``
定义模型变化
^^^^^^^^^^^^^^^^^^^^^^
基本模型只是一个具体模型,而不是模型空间。 我们提供 :doc:`模型变化的 API </nas/construct_space>`
让用户表达如何改变基本模型。 即构建一个包含许多模型的搜索空间。
基于上述基本模型,我们可以定义如下模型空间。
.. code-block:: diff
@model_wrapper
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
- self.conv2 = nn.Conv2d(32, 64, 3, 1)
+ self.conv2 = nn.LayerChoice([
+ nn.Conv2d(32, 64, 3, 1),
+ DepthwiseSeparableConv(32, 64)
+ ])
- self.dropout1 = nn.Dropout(0.25)
+ self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
self.dropout2 = nn.Dropout(0.5)
- self.fc1 = nn.Linear(9216, 128)
- self.fc2 = nn.Linear(128, 10)
+ feature = nn.ValueChoice([64, 128, 256])
+ self.fc1 = nn.Linear(9216, feature)
+ self.fc2 = nn.Linear(feature, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
结果是以下代码:
.. GENERATED FROM PYTHON SOURCE LINES 104-147
.. code-block:: default
class DepthwiseSeparableConv(nn.Module):
def __init__(self, in_ch, out_ch):
super().__init__()
self.depthwise = nn.Conv2d(in_ch, in_ch, kernel_size=3, groups=in_ch)
self.pointwise = nn.Conv2d(in_ch, out_ch, kernel_size=1)
def forward(self, x):
return self.pointwise(self.depthwise(x))
@model_wrapper
class ModelSpace(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
# LayerChoice is used to select a layer between Conv2d and DwConv.
self.conv2 = nn.LayerChoice([
nn.Conv2d(32, 64, 3, 1),
DepthwiseSeparableConv(32, 64)
])
# ValueChoice is used to select a dropout rate.
# ValueChoice can be used as parameter of modules wrapped in `nni.retiarii.nn.pytorch`
# or customized modules wrapped with `@basic_unit`.
self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75])) # choose dropout rate from 0.25, 0.5 and 0.75
self.dropout2 = nn.Dropout(0.5)
feature = nn.ValueChoice([64, 128, 256])
self.fc1 = nn.Linear(9216, feature)
self.fc2 = nn.Linear(feature, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
model_space = ModelSpace()
model_space
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
ModelSpace(
(conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
(conv2): LayerChoice([Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1)), DepthwiseSeparableConv(
(depthwise): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32)
(pointwise): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
)], label='model_1')
(dropout1): Dropout(p=0.25, inplace=False)
(dropout2): Dropout(p=0.5, inplace=False)
(fc1): Linear(in_features=9216, out_features=64, bias=True)
(fc2): Linear(in_features=64, out_features=10, bias=True)
)
.. GENERATED FROM PYTHON SOURCE LINES 148-182
这个例子使用了两个模型变化的 API :class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>` :class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>`
:class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>` 可以从一系列的候选子模块中(在本例中为两个),为每个采样模型选择一个。
它可以像原来的 PyTorch 子模块一样使用。
:class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>` 的参数是一个候选值列表,语义是为每个采样模型选择一个值。
更详细的 API 描述和用法可以在 :doc:`这里 </nas/construct_space>` 找到。
.. note::
我们正在积极丰富模型变化的 API,使得您可以轻松构建模型空间。
如果当前支持的模型变化的 API 不能表达您的模型空间,
请参考 :doc:`这篇文档 </nas/mutator>` 来自定义突变。
探索定义的模型空间
-------------------------------------------
简单来讲,有两种探索方法:
(1) 独立评估每个采样到的模型,这是 :ref:`多尝试 NAS <multi-trial-nas>` 中的搜索方法。
(2) 单尝试共享权重型的搜索,简称单尝试 NAS
我们在本教程中演示了第一种方法。第二种方法用户可以参考 :ref:`这里 <one-shot-nas>`
首先,用户需要选择合适的探索策略来探索定义好的模型空间。
其次,用户需要选择或自定义模型性能评估来评估每个探索模型的性能。
选择探索策略
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Retiarii 支持许多 :doc:`探索策略</nas/exploration_strategy>`
只需选择(即实例化)探索策略,就如下面的代码演示的一样:
.. GENERATED FROM PYTHON SOURCE LINES 182-186
.. code-block:: default
import nni.retiarii.strategy as strategy
search_strategy = strategy.Random(dedup=True) # dedup=False if deduplication is not wanted
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
/home/yugzhan/miniconda3/envs/cu102/lib/python3.8/site-packages/ray/autoscaler/_private/cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
warnings.warn(
.. GENERATED FROM PYTHON SOURCE LINES 187-200
挑选或自定义模型评估器
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
在探索过程中,探索策略反复生成新模型。模型评估器负责训练并验证每个生成的模型以获得模型的性能。
该性能作为模型的得分被发送到探索策略以帮助其生成更好的模型。
Retiarii 提供了 :doc:`内置模型评估器 </nas/evaluator>`,但在此之前,
我们建议使用 :class:`FunctionalEvaluator <nni.retiarii.evaluator.FunctionalEvaluator>`,即用一个函数包装您自己的训练和评估代码。
这个函数应该接收一个单一的模型类并使用 :func:`nni.report_final_result` 报告这个模型的最终分数。
此处的示例创建了一个简单的评估器,该评估器在 MNIST 数据集上运行,训练 2 epoch,并报告其在验证集上的准确率。
.. GENERATED FROM PYTHON SOURCE LINES 200-268
.. code-block:: default
import nni
from torchvision import transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
def train_epoch(model, device, train_loader, optimizer, epoch):
loss_fn = torch.nn.CrossEntropyLoss()
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test_epoch(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
accuracy = 100. * correct / len(test_loader.dataset)
print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
correct, len(test_loader.dataset), accuracy))
return accuracy
def evaluate_model(model_cls):
# "model_cls" is a class, need to instantiate
model = model_cls()
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
for epoch in range(3):
# train the model for one epoch
train_epoch(model, device, train_loader, optimizer, epoch)
# test the model for one epoch
accuracy = test_epoch(model, device, test_loader)
# call report intermediate result. Result can be float or dict
nni.report_intermediate_result(accuracy)
# report final test result
nni.report_final_result(accuracy)
.. GENERATED FROM PYTHON SOURCE LINES 269-270
创建评估器
.. GENERATED FROM PYTHON SOURCE LINES 270-274
.. code-block:: default
from nni.retiarii.evaluator import FunctionalEvaluator
evaluator = FunctionalEvaluator(evaluate_model)
.. GENERATED FROM PYTHON SOURCE LINES 275-286
这里的 ``train_epoch`` ``test_epoch`` 可以是任何自定义函数,用户可以在其中编写自己的训练逻辑。
建议这里的 ``evaluate_model`` 不接受除 ``model_cls`` 之外的其他参数。
但是,在 `高级教程 </nas/evaluator>` 中,我们将展示如何使用其他参数,以免您确实需要这些参数。
未来,我们将支持对评估器的参数进行变化(通常称为“超参数调优”)。
启动实验
--------------------
一切都已准备就绪,现在就可以开始做模型搜索的实验了。如下所示。
.. GENERATED FROM PYTHON SOURCE LINES 287-293
.. code-block:: default
from nni.retiarii.experiment.pytorch import RetiariiExperiment, RetiariiExeConfig
exp = RetiariiExperiment(model_space, evaluator, [], search_strategy)
exp_config = RetiariiExeConfig('local')
exp_config.experiment_name = 'mnist_search'
.. GENERATED FROM PYTHON SOURCE LINES 294-295
以下配置可以用于控制最多/同时运行多少试验。
.. GENERATED FROM PYTHON SOURCE LINES 295-299
.. code-block:: default
exp_config.max_trial_number = 4 # 最多运行 4 个实验
exp_config.trial_concurrency = 2 # 最多同时运行 2 个试验
.. GENERATED FROM PYTHON SOURCE LINES 300-302
如果要使用 GPU,请设置以下配置。
如果您希望使用被占用了的 GPU(比如 GPU 上可能正在运行 GUI),则 ``use_active_gpu`` 应设置为 true
.. GENERATED FROM PYTHON SOURCE LINES 302-306
.. code-block:: default
exp_config.trial_gpu_number = 1
exp_config.training_service.use_active_gpu = True
.. GENERATED FROM PYTHON SOURCE LINES 307-308
启动实验。 在一个有两块 GPU 的工作站上完成整个实验大约需要几分钟时间。
.. GENERATED FROM PYTHON SOURCE LINES 308-311
.. code-block:: default
exp.run(exp_config, 8081)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
INFO:nni.experiment:Creating experiment, Experiment ID: z8ns5fv7
INFO:nni.experiment:Connecting IPC pipe...
INFO:nni.experiment:Starting web server...
INFO:nni.experiment:Setting up...
INFO:nni.runtime.msg_dispatcher_base:Dispatcher started
INFO:nni.retiarii.experiment.pytorch:Web UI URLs: http://127.0.0.1:8081 http://10.190.172.35:8081 http://192.168.49.1:8081 http://172.17.0.1:8081
INFO:nni.retiarii.experiment.pytorch:Start strategy...
INFO:root:Successfully update searchSpace.
INFO:nni.retiarii.strategy.bruteforce:Random search running in fixed size mode. Dedup: on.
INFO:nni.retiarii.experiment.pytorch:Stopping experiment, please wait...
INFO:nni.retiarii.experiment.pytorch:Strategy exit
INFO:nni.retiarii.experiment.pytorch:Waiting for experiment to become DONE (you can ctrl+c if there is no running trial jobs)...
INFO:nni.runtime.msg_dispatcher_base:Dispatcher exiting...
INFO:nni.retiarii.experiment.pytorch:Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 312-330
除了 ``local`` 训练平台,用户还可以使用 :doc:`不同的训练平台 </experiment/training_service/overview>` 来运行 Retiarii 试验。
可视化实验
----------------------
用户可以可视化他们的架构搜索实验,就像可视化超参调优实验一样。
例如,在浏览器中打开 ``localhost:8081``8081 是您在 ``exp.run`` 中设置的端口。
详情请参考 :doc:`这里</experiment/web_portal/web_portal>`
我们支持使用第三方可视化引擎(如 `Netron <https://netron.app/>`__)对模型进行可视化。
这可以通过单击每个试验的详细面板中的“可视化”来使用。
请注意,当前的可视化是基于 `onnx <https://onnx.ai/>`__
因此,如果模型不能导出为 onnx,可视化是不可行的。
内置评估器(例如 Classification)会将模型自动导出到文件中。
对于您自己的评估器,您需要将文件保存到 ``$NNI_OUTPUT_DIR/model.onnx``
例如,
.. GENERATED FROM PYTHON SOURCE LINES 330-344
.. code-block:: default
import os
from pathlib import Path
def evaluate_model_with_visualization(model_cls):
model = model_cls()
# dump the model into an onnx
if 'NNI_OUTPUT_DIR' in os.environ:
dummy_input = torch.zeros(1, 3, 32, 32)
torch.onnx.export(model, (dummy_input, ),
Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')
evaluate_model(model_cls)
.. GENERATED FROM PYTHON SOURCE LINES 345-353
重新启动实验,Web 界面上会显示一个按钮。
.. image:: ../../img/netron_entrance_webui.png
导出最优模型
-----------------
搜索完成后,用户可以使用 ``export_top_models`` 导出最优模型。
.. GENERATED FROM PYTHON SOURCE LINES 353-357
.. code-block:: default
for model_dict in exp.export_top_models(formatter='dict'):
print(model_dict)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
{'model_1': '0', 'model_2': 0.25, 'model_3': 64}
.. GENERATED FROM PYTHON SOURCE LINES 358-362
输出是一个 JSON 对象,记录了最好的模型的每一个选择都选了什么。
如果用户想要搜出来的模型的源代码,他们可以使用 :ref:`基于图的引擎 <graph-based-execution-engine>`,只需增加如下两行。
.. GENERATED FROM PYTHON SOURCE LINES 362-365
.. code-block:: default
exp_config.execution_engine = 'base'
export_formatter = 'code'
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 2 minutes 4.499 seconds)
.. _sphx_glr_download_tutorials_hello_nas.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: hello_nas.py <hello_nas.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: hello_nas.ipynb <hello_nas.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Port PyTorch Quickstart to NNI\nThis is a modified version of `PyTorch quickstart`_.\n\nIt can be run directly and will have the exact same result as original version.\n\nFurthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.\n\nIt is recommended to run this script directly first to verify the environment.\n\nThere are 2 key differences from the original version:\n\n1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.\n2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import nni\nimport torch\nfrom torch import nn\nfrom torch.utils.data import DataLoader\nfrom torchvision import datasets\nfrom torchvision.transforms import ToTensor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hyperparameters to be tuned\nThese are the hyperparameters that will be tuned.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"params = {\n 'features': 512,\n 'lr': 0.001,\n 'momentum': 0,\n}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get optimized hyperparameters\nIf run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.\nBut with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"optimized_params = nni.get_next_parameter()\nparams.update(optimized_params)\nprint(params)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load dataset\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"training_data = datasets.FashionMNIST(root=\"data\", train=True, download=True, transform=ToTensor())\ntest_data = datasets.FashionMNIST(root=\"data\", train=False, download=True, transform=ToTensor())\n\nbatch_size = 64\n\ntrain_dataloader = DataLoader(training_data, batch_size=batch_size)\ntest_dataloader = DataLoader(test_data, batch_size=batch_size)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build model with hyperparameters\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\nprint(f\"Using {device} device\")\n\nclass NeuralNetwork(nn.Module):\n def __init__(self):\n super(NeuralNetwork, self).__init__()\n self.flatten = nn.Flatten()\n self.linear_relu_stack = nn.Sequential(\n nn.Linear(28*28, params['features']),\n nn.ReLU(),\n nn.Linear(params['features'], params['features']),\n nn.ReLU(),\n nn.Linear(params['features'], 10)\n )\n\n def forward(self, x):\n x = self.flatten(x)\n logits = self.linear_relu_stack(x)\n return logits\n\nmodel = NeuralNetwork().to(device)\n\nloss_fn = nn.CrossEntropyLoss()\noptimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define train and test\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def train(dataloader, model, loss_fn, optimizer):\n size = len(dataloader.dataset)\n model.train()\n for batch, (X, y) in enumerate(dataloader):\n X, y = X.to(device), y.to(device)\n pred = model(X)\n loss = loss_fn(pred, y)\n optimizer.zero_grad()\n loss.backward()\n optimizer.step()\n\ndef test(dataloader, model, loss_fn):\n size = len(dataloader.dataset)\n num_batches = len(dataloader)\n model.eval()\n test_loss, correct = 0, 0\n with torch.no_grad():\n for X, y in dataloader:\n X, y = X.to(device), y.to(device)\n pred = model(X)\n test_loss += loss_fn(pred, y).item()\n correct += (pred.argmax(1) == y).type(torch.float).sum().item()\n test_loss /= num_batches\n correct /= size\n return correct"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model and report accuracy\nReport accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"epochs = 5\nfor t in range(epochs):\n print(f\"Epoch {t+1}\\n-------------------------------\")\n train(train_dataloader, model, loss_fn, optimizer)\n accuracy = test(test_dataloader, model, loss_fn)\n nni.report_intermediate_result(accuracy)\nnni.report_final_result(accuracy)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
\ No newline at end of file
"""
Port PyTorch Quickstart to NNI
==============================
This is a modified version of `PyTorch quickstart`_.
It can be run directly and will have the exact same result as original version.
Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
It is recommended to run this script directly first to verify the environment.
There are 2 key differences from the original version:
1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.
.. _PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
"""
# %%
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
# %%
# Hyperparameters to be tuned
# ---------------------------
# These are the hyperparameters that will be tuned.
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
# %%
# Get optimized hyperparameters
# -----------------------------
# If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
# But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
# %%
# Load dataset
# ------------
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
# %%
# Build model with hyperparameters
# --------------------------------
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
# %%
# Define train and test
# ---------------------
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
# %%
# Train model and report accuracy
# -------------------------------
# Report accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
ed8bfc27e3d555d842fc4eec2635e619
\ No newline at end of file
:orphan:
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_nnictl/model.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_nnictl_model.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_nnictl_model.py:
Port PyTorch Quickstart to NNI
==============================
This is a modified version of `PyTorch quickstart`_.
It can be run directly and will have the exact same result as original version.
Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
It is recommended to run this script directly first to verify the environment.
There are 2 key differences from the original version:
1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.
.. _PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
.. GENERATED FROM PYTHON SOURCE LINES 21-28
.. code-block:: default
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
.. GENERATED FROM PYTHON SOURCE LINES 29-32
Hyperparameters to be tuned
---------------------------
These are the hyperparameters that will be tuned.
.. GENERATED FROM PYTHON SOURCE LINES 32-38
.. code-block:: default
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
.. GENERATED FROM PYTHON SOURCE LINES 39-43
Get optimized hyperparameters
-----------------------------
If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
.. GENERATED FROM PYTHON SOURCE LINES 43-47
.. code-block:: default
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
{'features': 512, 'lr': 0.001, 'momentum': 0}
.. GENERATED FROM PYTHON SOURCE LINES 48-50
Load dataset
------------
.. GENERATED FROM PYTHON SOURCE LINES 50-58
.. code-block:: default
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
.. GENERATED FROM PYTHON SOURCE LINES 59-61
Build model with hyperparameters
--------------------------------
.. GENERATED FROM PYTHON SOURCE LINES 61-86
.. code-block:: default
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Using cpu device
.. GENERATED FROM PYTHON SOURCE LINES 87-89
Define train and test
---------------------
.. GENERATED FROM PYTHON SOURCE LINES 89-115
.. code-block:: default
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
.. GENERATED FROM PYTHON SOURCE LINES 116-119
Train model and report accuracy
-------------------------------
Report accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.
.. GENERATED FROM PYTHON SOURCE LINES 119-126
.. code-block:: default
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Epoch 1
-------------------------------
[2022-03-21 01:09:37] INFO (nni/MainThread) Intermediate result: 0.461 (Index 0)
Epoch 2
-------------------------------
[2022-03-21 01:09:42] INFO (nni/MainThread) Intermediate result: 0.5529 (Index 1)
Epoch 3
-------------------------------
[2022-03-21 01:09:47] INFO (nni/MainThread) Intermediate result: 0.6155 (Index 2)
Epoch 4
-------------------------------
[2022-03-21 01:09:52] INFO (nni/MainThread) Intermediate result: 0.6345 (Index 3)
Epoch 5
-------------------------------
[2022-03-21 01:09:56] INFO (nni/MainThread) Intermediate result: 0.6505 (Index 4)
[2022-03-21 01:09:56] INFO (nni/MainThread) Final result: 0.6505
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 24.441 seconds)
.. _sphx_glr_download_tutorials_hpo_nnictl_model.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: model.py <model.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: model.ipynb <model.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
Run HPO Experiment with nnictl
==============================
This tutorial has exactly the same effect as :doc:`../hpo_quickstart_pytorch/main`.
Both tutorials optimize the model in `official PyTorch quickstart
<https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html>`__ with auto-tuning,
while this one manages the experiment with command line tool and YAML config file, instead of pure Python code.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
2. Define hyperparameters' search space.
3. Create config file.
4. Run the experiment.
The first two steps are identical to quickstart.
Step 1: Prepare the model
-------------------------
In first step, we need to prepare the model to be tuned.
The model should be put in a separate script.
It will be evaluated many times concurrently,
and possibly will be trained on distributed platforms.
In this tutorial, the model is defined in :doc:`model.py <model>`.
In short, it is a PyTorch model with 3 additional API calls:
1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
3. Use :func:`nni.report_final_result` to report final accuracy.
Please understand the model code before continue to next step.
Step 2: Define search space
---------------------------
In model code, we have prepared 3 hyperparameters to be tuned:
*features*, *lr*, and *momentum*.
Here we need to define their *search space* so the tuning algorithm can sample them in desired range.
Assuming we have following prior knowledge for these hyperparameters:
1. *features* should be one of 128, 256, 512, 1024.
2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
3. *momentum* should be a float between 0 and 1.
In NNI, the space of *features* is called ``choice``;
the space of *lr* is called ``loguniform``;
and the space of *momentum* is called ``uniform``.
You may have noticed, these names are derived from ``numpy.random``.
For full specification of search space, check :doc:`the reference </hpo/search_space>`.
Now we can define the search space as follow:
.. code-block:: yaml
search_space:
features:
_type: choice
_value: [ 128, 256, 512, 1024 ]
lr:
_type: loguniform
_value: [ 0.0001, 0.1 ]
momentum:
_type: uniform
_value: [ 0, 1 ]
Step 3: Configure the experiment
--------------------------------
NNI uses an *experiment* to manage the HPO process.
The *experiment config* defines how to train the models and how to explore the search space.
In this tutorial we use a YAML file ``config.yaml`` to define the experiment.
Configure trial code
^^^^^^^^^^^^^^^^^^^^
In NNI evaluation of each hyperparameter set is called a *trial*.
So the model script is called *trial code*.
.. code-block:: yaml
trial_command: python model.py
trial_code_directory: .
When ``trial_code_directory`` is a relative path, it relates to the config file.
So in this case we need to put ``config.yaml`` and ``model.py`` in the same directory.
.. attention::
The rules for resolving relative path are different in YAML config file and :doc:`Python experiment API </reference/experiment>`.
In Python experiment API relative paths are relative to current working directory.
Configure how many trials to run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
.. code-block:: yaml
max_trial_number: 10
trial_concurrency: 2
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you stop it.
.. note::
``max_trial_number`` is set to 10 here for a fast example.
In real world it should be set to a larger number.
With default config TPE tuner requires 20 trials to warm up.
Configure tuning algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we use :doc:`TPE tuner </hpo/tuners>`.
.. code-block:: yaml
name: TPE
class_args:
optimize_mode: maximize
Configure training service
^^^^^^^^^^^^^^^^^^^^^^^^^^
In this tutorial we use *local* mode,
which means models will be trained on local machine, without using any special training platform.
.. code-block:: yaml
training_service:
platform: local
Wrap up
^^^^^^^
The full content of ``config.yaml`` is as follow:
.. code-block:: yaml
search_space:
features:
_type: choice
_value: [ 128, 256, 512, 1024 ]
lr:
_type: loguniform
_value: [ 0.0001, 0.1 ]
momentum:
_type: uniform
_value: [ 0, 1 ]
trial_command: python model.py
trial_code_directory: .
trial_concurrency: 2
max_trial_number: 10
tuner:
name: TPE
class_args:
optimize_mode: maximize
training_service:
platform: local
Step 4: Run the experiment
--------------------------
Now the experiment is ready. Launch it with ``nnictl create`` command:
.. code-block:: bash
$ nnictl create --config config.yaml --port 8080
You can use the web portal to view experiment status: http://localhost:8080.
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-01 12:00:00] Creating experiment, Experiment ID: p43ny6ew
[2022-04-01 12:00:00] Starting web server...
[2022-04-01 12:00:01] Setting up...
[2022-04-01 12:00:01] Web portal URLs: http://127.0.0.1:8080 http://192.168.1.1:8080
[2022-04-01 12:00:01] To stop experiment run "nnictl stop p43ny6ew" or "nnictl stop --all"
[2022-04-01 12:00:01] Reference: https://nni.readthedocs.io/en/stable/reference/nnictl.html
When the experiment is done, use ``nnictl stop`` command to stop it.
.. code-block:: bash
$ nnictl stop p43ny6ew
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
INFO: Stopping experiment 7u8yg9zw
INFO: Stop experiment success.
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# HPO Quickstart with PyTorch\nThis tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.\n\nThe tutorial consists of 4 steps: \n\n1. Modify the model for auto-tuning.\n2. Define hyperparameters' search space.\n3. Configure the experiment.\n4. Run the experiment.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Prepare the model\nIn first step, we need to prepare the model to be tuned.\n\nThe model should be put in a separate script.\nIt will be evaluated many times concurrently,\nand possibly will be trained on distributed platforms.\n\nIn this tutorial, the model is defined in :doc:`model.py <model>`.\n\nIn short, it is a PyTorch model with 3 additional API calls:\n\n1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.\n2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.\n3. Use :func:`nni.report_final_result` to report final accuracy.\n\nPlease understand the model code before continue to next step.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Define search space\nIn model code, we have prepared 3 hyperparameters to be tuned:\n*features*, *lr*, and *momentum*.\n\nHere we need to define their *search space* so the tuning algorithm can sample them in desired range.\n\nAssuming we have following prior knowledge for these hyperparameters:\n\n1. *features* should be one of 128, 256, 512, 1024.\n2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.\n3. *momentum* should be a float between 0 and 1.\n\nIn NNI, the space of *features* is called ``choice``;\nthe space of *lr* is called ``loguniform``;\nand the space of *momentum* is called ``uniform``.\nYou may have noticed, these names are derived from ``numpy.random``.\n\nFor full specification of search space, check :doc:`the reference </hpo/search_space>`.\n\nNow we can define the search space as follow:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"search_space = {\n 'features': {'_type': 'choice', '_value': [128, 256, 512, 1024]},\n 'lr': {'_type': 'loguniform', '_value': [0.0001, 0.1]},\n 'momentum': {'_type': 'uniform', '_value': [0, 1]},\n}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Configure the experiment\nNNI uses an *experiment* to manage the HPO process.\nThe *experiment config* defines how to train the models and how to explore the search space.\n\nIn this tutorial we use a *local* mode experiment,\nwhich means models will be trained on local machine, without using any special training platform.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from nni.experiment import Experiment\nexperiment = Experiment('local')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we start to configure the experiment.\n\n### Configure trial code\nIn NNI evaluation of each hyperparameter set is called a *trial*.\nSo the model script is called *trial code*.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.config.trial_command = 'python model.py'\nexperiment.config.trial_code_directory = '.'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When ``trial_code_directory`` is a relative path, it relates to current working directory.\nTo run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.\n(`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__\nis only available in standard Python, not in Jupyter Notebook.)\n\n.. attention::\n\n If you are using Linux system without Conda,\n you may need to change ``\"python model.py\"`` to ``\"python3 model.py\"``.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure search space\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.config.search_space = search_space"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure tuning algorithm\nHere we use :doc:`TPE tuner </hpo/tuners>`.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.config.tuner.name = 'TPE'\nexperiment.config.tuner.class_args['optimize_mode'] = 'maximize'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure how many trials to run\nHere we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.config.max_trial_number = 10\nexperiment.config.trial_concurrency = 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You may also set ``max_experiment_duration = '1h'`` to limit running time.\n\nIf neither ``max_trial_number`` nor ``max_experiment_duration`` are set,\nthe experiment will run forever until you press Ctrl-C.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>``max_trial_number`` is set to 10 here for a fast example.\n In real world it should be set to a larger number.\n With default config TPE tuner requires 20 trials to warm up.</p></div>\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Run the experiment\nNow the experiment is ready. Choose a port and launch it. (Here we use port 8080.)\n\nYou can use the web portal to view experiment status: http://localhost:8080.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.run(8080)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## After the experiment is done\nEverything is done and it is safe to exit now. The following are optional.\n\nIf you are using standard Python instead of Jupyter Notebook,\nyou can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,\nallowing you to view the web portal after the experiment is done.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# input('Press enter to quit')\nexperiment.stop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,\nso it can be omitted in your code.\n\nAfter the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.\n\n.. tip::\n\n This example uses :doc:`Python API </reference/experiment>` to create experiment.\n\n You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.\n\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
\ No newline at end of file
"""
HPO Quickstart with PyTorch
===========================
This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
2. Define hyperparameters' search space.
3. Configure the experiment.
4. Run the experiment.
.. _official PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
"""
# %%
# Step 1: Prepare the model
# -------------------------
# In first step, we need to prepare the model to be tuned.
#
# The model should be put in a separate script.
# It will be evaluated many times concurrently,
# and possibly will be trained on distributed platforms.
#
# In this tutorial, the model is defined in :doc:`model.py <model>`.
#
# In short, it is a PyTorch model with 3 additional API calls:
#
# 1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
# 2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
# 3. Use :func:`nni.report_final_result` to report final accuracy.
#
# Please understand the model code before continue to next step.
# %%
# Step 2: Define search space
# ---------------------------
# In model code, we have prepared 3 hyperparameters to be tuned:
# *features*, *lr*, and *momentum*.
#
# Here we need to define their *search space* so the tuning algorithm can sample them in desired range.
#
# Assuming we have following prior knowledge for these hyperparameters:
#
# 1. *features* should be one of 128, 256, 512, 1024.
# 2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
# 3. *momentum* should be a float between 0 and 1.
#
# In NNI, the space of *features* is called ``choice``;
# the space of *lr* is called ``loguniform``;
# and the space of *momentum* is called ``uniform``.
# You may have noticed, these names are derived from ``numpy.random``.
#
# For full specification of search space, check :doc:`the reference </hpo/search_space>`.
#
# Now we can define the search space as follow:
search_space = {
'features': {'_type': 'choice', '_value': [128, 256, 512, 1024]},
'lr': {'_type': 'loguniform', '_value': [0.0001, 0.1]},
'momentum': {'_type': 'uniform', '_value': [0, 1]},
}
# %%
# Step 3: Configure the experiment
# --------------------------------
# NNI uses an *experiment* to manage the HPO process.
# The *experiment config* defines how to train the models and how to explore the search space.
#
# In this tutorial we use a *local* mode experiment,
# which means models will be trained on local machine, without using any special training platform.
from nni.experiment import Experiment
experiment = Experiment('local')
# %%
# Now we start to configure the experiment.
#
# Configure trial code
# ^^^^^^^^^^^^^^^^^^^^
# In NNI evaluation of each hyperparameter set is called a *trial*.
# So the model script is called *trial code*.
experiment.config.trial_command = 'python model.py'
experiment.config.trial_code_directory = '.'
# %%
# When ``trial_code_directory`` is a relative path, it relates to current working directory.
# To run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.
# (`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__
# is only available in standard Python, not in Jupyter Notebook.)
#
# .. attention::
#
# If you are using Linux system without Conda,
# you may need to change ``"python model.py"`` to ``"python3 model.py"``.
# %%
# Configure search space
# ^^^^^^^^^^^^^^^^^^^^^^
experiment.config.search_space = search_space
# %%
# Configure tuning algorithm
# ^^^^^^^^^^^^^^^^^^^^^^^^^^
# Here we use :doc:`TPE tuner </hpo/tuners>`.
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
# %%
# Configure how many trials to run
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
# %%
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
#
# .. note::
#
# ``max_trial_number`` is set to 10 here for a fast example.
# In real world it should be set to a larger number.
# With default config TPE tuner requires 20 trials to warm up.
# %%
# Step 4: Run the experiment
# --------------------------
# Now the experiment is ready. Choose a port and launch it. (Here we use port 8080.)
#
# You can use the web portal to view experiment status: http://localhost:8080.
experiment.run(8080)
# %%
# After the experiment is done
# ----------------------------
# Everything is done and it is safe to exit now. The following are optional.
#
# If you are using standard Python instead of Jupyter Notebook,
# you can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,
# allowing you to view the web portal after the experiment is done.
# input('Press enter to quit')
experiment.stop()
# %%
# :meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,
# so it can be omitted in your code.
#
# After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.
#
# .. tip::
#
# This example uses :doc:`Python API </reference/experiment>` to create experiment.
#
# You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
e732cee426a4629b71f5fa28ce16fad7
\ No newline at end of file
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_quickstart_pytorch/main.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_main.py:
HPO Quickstart with PyTorch
===========================
This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
2. Define hyperparameters' search space.
3. Configure the experiment.
4. Run the experiment.
.. _official PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
.. GENERATED FROM PYTHON SOURCE LINES 17-34
Step 1: Prepare the model
-------------------------
In first step, we need to prepare the model to be tuned.
The model should be put in a separate script.
It will be evaluated many times concurrently,
and possibly will be trained on distributed platforms.
In this tutorial, the model is defined in :doc:`model.py <model>`.
In short, it is a PyTorch model with 3 additional API calls:
1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
3. Use :func:`nni.report_final_result` to report final accuracy.
Please understand the model code before continue to next step.
.. GENERATED FROM PYTHON SOURCE LINES 36-57
Step 2: Define search space
---------------------------
In model code, we have prepared 3 hyperparameters to be tuned:
*features*, *lr*, and *momentum*.
Here we need to define their *search space* so the tuning algorithm can sample them in desired range.
Assuming we have following prior knowledge for these hyperparameters:
1. *features* should be one of 128, 256, 512, 1024.
2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
3. *momentum* should be a float between 0 and 1.
In NNI, the space of *features* is called ``choice``;
the space of *lr* is called ``loguniform``;
and the space of *momentum* is called ``uniform``.
You may have noticed, these names are derived from ``numpy.random``.
For full specification of search space, check :doc:`the reference </hpo/search_space>`.
Now we can define the search space as follow:
.. GENERATED FROM PYTHON SOURCE LINES 57-64
.. code-block:: default
search_space = {
'features': {'_type': 'choice', '_value': [128, 256, 512, 1024]},
'lr': {'_type': 'loguniform', '_value': [0.0001, 0.1]},
'momentum': {'_type': 'uniform', '_value': [0, 1]},
}
.. GENERATED FROM PYTHON SOURCE LINES 65-72
Step 3: Configure the experiment
--------------------------------
NNI uses an *experiment* to manage the HPO process.
The *experiment config* defines how to train the models and how to explore the search space.
In this tutorial we use a *local* mode experiment,
which means models will be trained on local machine, without using any special training platform.
.. GENERATED FROM PYTHON SOURCE LINES 72-75
.. code-block:: default
from nni.experiment import Experiment
experiment = Experiment('local')
.. GENERATED FROM PYTHON SOURCE LINES 76-82
Now we start to configure the experiment.
Configure trial code
^^^^^^^^^^^^^^^^^^^^
In NNI evaluation of each hyperparameter set is called a *trial*.
So the model script is called *trial code*.
.. GENERATED FROM PYTHON SOURCE LINES 82-84
.. code-block:: default
experiment.config.trial_command = 'python model.py'
experiment.config.trial_code_directory = '.'
.. GENERATED FROM PYTHON SOURCE LINES 85-94
When ``trial_code_directory`` is a relative path, it relates to current working directory.
To run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.
(`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__
is only available in standard Python, not in Jupyter Notebook.)
.. attention::
If you are using Linux system without Conda,
you may need to change ``"python model.py"`` to ``"python3 model.py"``.
.. GENERATED FROM PYTHON SOURCE LINES 96-98
Configure search space
^^^^^^^^^^^^^^^^^^^^^^
.. GENERATED FROM PYTHON SOURCE LINES 98-100
.. code-block:: default
experiment.config.search_space = search_space
.. GENERATED FROM PYTHON SOURCE LINES 101-104
Configure tuning algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we use :doc:`TPE tuner </hpo/tuners>`.
.. GENERATED FROM PYTHON SOURCE LINES 104-107
.. code-block:: default
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
.. GENERATED FROM PYTHON SOURCE LINES 108-111
Configure how many trials to run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
.. GENERATED FROM PYTHON SOURCE LINES 111-113
.. code-block:: default
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
.. GENERATED FROM PYTHON SOURCE LINES 114-124
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you press Ctrl-C.
.. note::
``max_trial_number`` is set to 10 here for a fast example.
In real world it should be set to a larger number.
With default config TPE tuner requires 20 trials to warm up.
.. GENERATED FROM PYTHON SOURCE LINES 126-131
Step 4: Run the experiment
--------------------------
Now the experiment is ready. Choose a port and launch it. (Here we use port 8080.)
You can use the web portal to view experiment status: http://localhost:8080.
.. GENERATED FROM PYTHON SOURCE LINES 131-133
.. code-block:: default
experiment.run(8080)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:07:29] Creating experiment, Experiment ID: hgkju3iq
[2022-04-13 12:07:29] Starting web server...
[2022-04-13 12:07:30] Setting up...
[2022-04-13 12:07:30] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
True
.. GENERATED FROM PYTHON SOURCE LINES 134-141
After the experiment is done
----------------------------
Everything is done and it is safe to exit now. The following are optional.
If you are using standard Python instead of Jupyter Notebook,
you can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,
allowing you to view the web portal after the experiment is done.
.. GENERATED FROM PYTHON SOURCE LINES 141-145
.. code-block:: default
# input('Press enter to quit')
experiment.stop()
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:08:50] Stopping experiment, please wait...
[2022-04-13 12:08:53] Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 146-156
:meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,
so it can be omitted in your code.
After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.
.. tip::
This example uses :doc:`Python API </reference/experiment>` to create experiment.
You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 24.367 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: main.py <main.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: main.ipynb <main.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
.. a395c59bf5359c3583b7a0a3ab66d705
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_quickstart_pytorch/main.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_main.py:
HPO 教程(PyTorch 版本)
========================
本教程对 `PyTorch 官方教程 <https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html>`__ 进行超参调优。
教程分为四步:
1. 修改调优的模型代码;
2. 定义超参的搜索空间;
3. 配置实验;
4. 运行实验。
.. GENERATED FROM PYTHON SOURCE LINES 17-34
步骤一:准备模型
----------------
首先,我们需要准备待调优的模型。
由于被调优的模型会被独立地运行多次,
并且使用特定训练平台时还可能会被上传到云端执行,
我们需要将代码写在另一个 py 文件中。
本教程使用的模型的代码是 :doc:`model.py <model>`。
模型代码在一个普通的 PyTorch 模型基础之上,增加了3个 API 调用:
1. 使用 :func:`nni.get_next_parameter` 获取需要评估的超参;
2. 使用 :func:`nni.report_intermediate_result` 报告每个 epoch 产生的中间训练结果;
3. 使用 :func:`nni.report_final_result` 报告最终准确率。
请先理解模型代码,然后再继续下一步。
.. GENERATED FROM PYTHON SOURCE LINES 36-57
步骤二:定义搜索空间
--------------------
在模型代码中,我们准备了三个需要调优的超参:features、lr、momentum。
现在,我们需要定义的它们的“搜索空间”,指定它们的取值范围和分布规律。
假设我们对三个超参有以下先验知识:
1. features 的取值可以为128、256、512、1024;
2. lr 的取值在0.0001到0.1之间,其取值符合指数分布;
3. momentum 的取值在0到1之间。
在 NNI 中,features 的取值范围称为 ``choice`` ,
lr 的取值范围称为 ``loguniform`` ,
momentum 的取值范围称为 ``uniform`` 。
您可能已经注意到了,这些名称和 ``numpy.random`` 中的函数名一致。
完整的搜索空间文档: :doc:`/hpo/search_space`.
我们的搜索空间定义如下:
.. GENERATED FROM PYTHON SOURCE LINES 57-64
.. code-block:: default
search_space = {
'features': {'_type': 'choice', '_value': [128, 256, 512, 1024]},
'lr': {'_type': 'loguniform', '_value': [0.0001, 0.1]},
'momentum': {'_type': 'uniform', '_value': [0, 1]},
}
.. GENERATED FROM PYTHON SOURCE LINES 65-72
步骤三:配置实验
----------------
NNI 使用“实验”来管理超参调优,“实验配置”定义了如何训练模型、如何遍历搜索空间。
在本教程中我们使用 local 模式的实验,这意味着实验只在本机运行,不使用任何特别的训练平台。
.. GENERATED FROM PYTHON SOURCE LINES 72-75
.. code-block:: default
from nni.experiment import Experiment
experiment = Experiment('local')
.. GENERATED FROM PYTHON SOURCE LINES 76-82
现在我们开始配置实验。
配置 trial
^^^^^^^^^^
在 NNI 中评估一组超参的过程被称为一个“trial”(试验),上面的模型代码被称为“trial 代码”。
.. GENERATED FROM PYTHON SOURCE LINES 82-84
.. code-block:: default
experiment.config.trial_command = 'python model.py'
experiment.config.trial_code_directory = '.'
.. GENERATED FROM PYTHON SOURCE LINES 85-94
如果 ``trial_code_directory`` 是一个相对路径,它被认为相对于当前的工作目录。
如果您想在其他路径下运行本文件 ``main.py`` ,您可以将代码目录设置为 ``Path(__file__).parent`` 。
(`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__
只能在 py 文件中使用,不能在 Jupyter Notebook 中使用)
.. attention::
如果您使用 Linux 系统,并且没有使用 Conda,
您可能需要将 ``"python model.py"`` 改为 ``"python3 model.py"`` 。
.. GENERATED FROM PYTHON SOURCE LINES 96-98
配置搜索空间
^^^^^^^^^^^^
.. GENERATED FROM PYTHON SOURCE LINES 98-100
.. code-block:: default
experiment.config.search_space = search_space
.. GENERATED FROM PYTHON SOURCE LINES 101-104
配置调优算法
^^^^^^^^^^^^
此处我们使用 :doc:`TPE 算法</hpo/tuners>` 。
.. GENERATED FROM PYTHON SOURCE LINES 104-107
.. code-block:: default
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
.. GENERATED FROM PYTHON SOURCE LINES 108-111
配置运行多少 trial
^^^^^^^^^^^^^^^^^^
本教程中我们总共尝试10组超参,并且每次并行地评估2组超参。
.. GENERATED FROM PYTHON SOURCE LINES 111-113
.. code-block:: default
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
.. GENERATED FROM PYTHON SOURCE LINES 114-124
您也可以设置 ``max_experiment_duration = '1h'`` 来限制运行时间。
如果 ``max_trial_number`` 和 ``max_experiment_duration`` 都没有设置,实验将会一直运行,直到您按下 Ctrl-C。
.. note::
此处将 ``max_trial_number`` 设置为10是为了让教程能够较快地运行结束,
在实际使用中应该设为更大的数值,TPE 算法在默认参数下需要评估20组超参才会完成初始化。
.. GENERATED FROM PYTHON SOURCE LINES 126-131
步骤四:运行实验
----------------
现在实验已经配置完成了,您可以指定一个端口来运行它,教程中我们使用8080端口。
您可以通过网页控制台查看实验状态: http://localhost:8080.
.. GENERATED FROM PYTHON SOURCE LINES 131-133
.. code-block:: default
experiment.run(8080)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:07:29] Creating experiment, Experiment ID: hgkju3iq
[2022-04-13 12:07:29] Starting web server...
[2022-04-13 12:07:30] Setting up...
[2022-04-13 12:07:30] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
True
.. GENERATED FROM PYTHON SOURCE LINES 134-141
实验结束之后
------------
您只需要等待函数返回就可以正常结束实验,以下内容为可选项。
如果您使用的是普通 Python 而不是 Jupyter Notebook,
您可以在代码末尾加上一行 ``input()`` 或者 ``signal.pause()`` 来避免 Python 解释器自动退出,
这样您就能继续使用网页控制台。
.. GENERATED FROM PYTHON SOURCE LINES 141-145
.. code-block:: default
# input('Press enter to quit')
experiment.stop()
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:08:50] Stopping experiment, please wait...
[2022-04-13 12:08:53] Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 146-156
:meth:`nni.experiment.Experiment.stop` 会在 Python 退出前自动调用,所以您可以将其省略,不写在自己的代码中。
实验完全停止之后,您可以使用 :meth:`nni.experiment.Experiment.view` 重新启动网页控制台。
.. tip::
本教程使用 :doc:`Python API </reference/experiment>` 创建实验,
除此之外您也可以选择使用 :doc:`命令行工具 <../hpo_nnictl/nnictl>` 。
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 24.367 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: main.py <main.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: main.ipynb <main.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Port PyTorch Quickstart to NNI\nThis is a modified version of `PyTorch quickstart`_.\n\nIt can be run directly and will have the exact same result as original version.\n\nFurthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.\n\nIt is recommended to run this script directly first to verify the environment.\n\nThere are 2 key differences from the original version:\n\n1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.\n2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import nni\nimport torch\nfrom torch import nn\nfrom torch.utils.data import DataLoader\nfrom torchvision import datasets\nfrom torchvision.transforms import ToTensor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hyperparameters to be tuned\nThese are the hyperparameters that will be tuned.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"params = {\n 'features': 512,\n 'lr': 0.001,\n 'momentum': 0,\n}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get optimized hyperparameters\nIf run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.\nBut with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"optimized_params = nni.get_next_parameter()\nparams.update(optimized_params)\nprint(params)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load dataset\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"training_data = datasets.FashionMNIST(root=\"data\", train=True, download=True, transform=ToTensor())\ntest_data = datasets.FashionMNIST(root=\"data\", train=False, download=True, transform=ToTensor())\n\nbatch_size = 64\n\ntrain_dataloader = DataLoader(training_data, batch_size=batch_size)\ntest_dataloader = DataLoader(test_data, batch_size=batch_size)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build model with hyperparameters\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\nprint(f\"Using {device} device\")\n\nclass NeuralNetwork(nn.Module):\n def __init__(self):\n super(NeuralNetwork, self).__init__()\n self.flatten = nn.Flatten()\n self.linear_relu_stack = nn.Sequential(\n nn.Linear(28*28, params['features']),\n nn.ReLU(),\n nn.Linear(params['features'], params['features']),\n nn.ReLU(),\n nn.Linear(params['features'], 10)\n )\n\n def forward(self, x):\n x = self.flatten(x)\n logits = self.linear_relu_stack(x)\n return logits\n\nmodel = NeuralNetwork().to(device)\n\nloss_fn = nn.CrossEntropyLoss()\noptimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define train and test\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def train(dataloader, model, loss_fn, optimizer):\n size = len(dataloader.dataset)\n model.train()\n for batch, (X, y) in enumerate(dataloader):\n X, y = X.to(device), y.to(device)\n pred = model(X)\n loss = loss_fn(pred, y)\n optimizer.zero_grad()\n loss.backward()\n optimizer.step()\n\ndef test(dataloader, model, loss_fn):\n size = len(dataloader.dataset)\n num_batches = len(dataloader)\n model.eval()\n test_loss, correct = 0, 0\n with torch.no_grad():\n for X, y in dataloader:\n X, y = X.to(device), y.to(device)\n pred = model(X)\n test_loss += loss_fn(pred, y).item()\n correct += (pred.argmax(1) == y).type(torch.float).sum().item()\n test_loss /= num_batches\n correct /= size\n return correct"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model and report accuracy\nReport accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"epochs = 5\nfor t in range(epochs):\n print(f\"Epoch {t+1}\\n-------------------------------\")\n train(train_dataloader, model, loss_fn, optimizer)\n accuracy = test(test_dataloader, model, loss_fn)\n nni.report_intermediate_result(accuracy)\nnni.report_final_result(accuracy)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
\ No newline at end of file
"""
Port PyTorch Quickstart to NNI
==============================
This is a modified version of `PyTorch quickstart`_.
It can be run directly and will have the exact same result as original version.
Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
It is recommended to run this script directly first to verify the environment.
There are 2 key differences from the original version:
1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.
.. _PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
"""
# %%
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
# %%
# Hyperparameters to be tuned
# ---------------------------
# These are the hyperparameters that will be tuned.
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
# %%
# Get optimized hyperparameters
# -----------------------------
# If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
# But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
# %%
# Load dataset
# ------------
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
# %%
# Build model with hyperparameters
# --------------------------------
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
# %%
# Define train and test
# ---------------------
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
# %%
# Train model and report accuracy
# -------------------------------
# Report accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment