Unverified Commit f5b89bb6 authored by J-shang's avatar J-shang Committed by GitHub
Browse files

Merge pull request #4776 from microsoft/v2.7

parents 7aa44612 1546962f
......@@ -8,7 +8,7 @@ Overview
The performance of RocksDB is highly contingent on its tuning. However, because of the complexity of its underlying technology and a large number of configurable parameters, a good configuration is sometimes hard to obtain. NNI can help to address this issue. NNI supports many kinds of tuning algorithms to search the best configuration of RocksDB, and support many kinds of environments like local machine, remote servers and cloud.
This example illustrates how to use NNI to search the best configuration of RocksDB for a ``fillrandom`` benchmark supported by a benchmark tool ``db_bench``\ , which is an official benchmark tool provided by RocksDB itself. Therefore, before running this example, please make sure NNI is installed and `db_bench <https://github.com/facebook/rocksdb/wiki/Benchmarking-tools>`__ is in your ``PATH``. Please refer to `here <../Tutorial/QuickStart.rst>`__ for detailed information about installation and preparing of NNI environment, and `here <https://github.com/facebook/rocksdb/blob/master/INSTALL.md>`__ for compiling RocksDB as well as ``db_bench``.
This example illustrates how to use NNI to search the best configuration of RocksDB for a ``fillrandom`` benchmark supported by a benchmark tool ``db_bench``\ , which is an official benchmark tool provided by RocksDB itself. Therefore, before running this example, please make sure NNI is installed and `db_bench <https://github.com/facebook/rocksdb/wiki/Benchmarking-tools>`__ is in your ``PATH``. Please refer to :doc:`here </installation>` for detailed information about installation and preparing of NNI environment, and `here <https://github.com/facebook/rocksdb/blob/master/INSTALL.md>`__ for compiling RocksDB as well as ``db_bench``.
We also provide a simple script :githublink:`db_bench_installation.sh <examples/trials/systems_auto_tuning/rocksdb-fillrandom/db_bench_installation.sh>` helping to compile and install ``db_bench`` as well as its dependencies on Ubuntu. Installing RocksDB on other systems can follow the same procedure.
......@@ -24,7 +24,7 @@ Search Space
For simplicity, this example tunes three parameters, ``write_buffer_size``\ , ``min_write_buffer_num`` and ``level0_file_num_compaction_trigger``\ , for writing 16M keys with 20 Bytes of key size and 100 Bytes of value size randomly, based on writing operations per second (OPS). ``write_buffer_size`` sets the size of a single memtable. Once memtable exceeds this size, it is marked immutable and a new one is created. ``min_write_buffer_num`` is the minimum number of memtables to be merged before flushing to storage. Once the number of files in level 0 reaches ``level0_file_num_compaction_trigger``\ , level 0 to level 1 compaction is triggered.
In this example, the search space is specified by a ``search_space.json`` file as shown below. Detailed explanation of search space could be found `here <../Tutorial/SearchSpaceSpec.rst>`__.
In this example, the search space is specified by a ``search_space.json`` file as shown below. Detailed explanation of search space could be found :doc:`here </hpo/search_space>`.
.. code-block:: json
......@@ -48,8 +48,7 @@ In this example, the search space is specified by a ``search_space.json`` file a
Benchmark code
^^^^^^^^^^^^^^
Benchmark code should receive a configuration from NNI manager, and report the corresponding benchmark result back. Following NNI APIs are designed for this purpose. In this example, writing operations per second (OPS) is used as a performance metric. Please refer to `here <Trials.rst>`__ for detailed information.
Benchmark code should receive a configuration from NNI manager, and report the corresponding benchmark result back. Following NNI APIs are designed for this purpose. In this example, writing operations per second (OPS) is used as a performance metric.
* Use ``nni.get_next_parameter()`` to get next system configuration.
* Use ``nni.report_final_result(metric)`` to report the benchmark result.
......@@ -59,7 +58,7 @@ Benchmark code should receive a configuration from NNI manager, and report the c
Config file
^^^^^^^^^^^
One could start a NNI experiment with a config file. A config file for NNI is a ``yaml`` file usually including experiment settings (\ ``trialConcurrency``\ , ``trialGpuNumber``\ , etc.), platform settings (\ ``trainingService``\ ), path settings (\ ``searchSpaceFile``\ , ``trialCodeDirectory``\ , etc.) and tuner settings (\ ``tuner``\ , ``tuner optimize_mode``\ , etc.). Please refer to `here <../Tutorial/QuickStart.rst>`__ for more information.
One could start a NNI experiment with a config file. A config file for NNI is a ``yaml`` file usually including experiment settings (\ ``trialConcurrency``\ , ``trialGpuNumber``\ , etc.), platform settings (\ ``trainingService``\ ), path settings (\ ``searchSpaceFile``\ , ``trialCodeDirectory``\ , etc.) and tuner settings (\ ``tuner``\ , ``tuner optimize_mode``\ , etc.). Please refer to :doc:`/reference/experiment_config`.
Here is an example of tuning RocksDB with SMAC algorithm:
......@@ -69,7 +68,7 @@ Here is an example of tuning RocksDB with TPE algorithm:
:githublink:`code directory <examples/trials/systems_auto_tuning/rocksdb-fillrandom/config_tpe.yml>`
Other tuners can be easily adopted in the same way. Please refer to `here <../Tuner/BuiltinTuner.rst>`__ for more information.
Other tuners can be easily adopted in the same way. Please refer to :doc:`here </hpo/tuners>` for more information.
Finally, we could enter the example folder and start the experiment using following commands:
......
......@@ -213,7 +213,25 @@
},
"outputs": [],
"source": [
"for model_dict in exp.export_top_models(formatter='dict'):\n print(model_dict)\n\n# The output is `json` object which records the mutation actions of the top model.\n# If users want to output source code of the top model, they can use graph-based execution engine for the experiment,\n# by simply adding the following two lines.\n#\n# .. code-block:: python\n#\n# exp_config.execution_engine = 'base'\n# export_formatter = 'code'"
"for model_dict in exp.export_top_models(formatter='dict'):\n print(model_dict)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The output is ``json`` object which records the mutation actions of the top model.\nIf users want to output source code of the top model,\nthey can use `graph-based execution engine <graph-based-execution-engine>` for the experiment,\nby simply adding the following two lines.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"exp_config.execution_engine = 'base'\nexport_formatter = 'code'"
]
}
],
......
......@@ -354,11 +354,11 @@ def evaluate_model_with_visualization(model_cls):
for model_dict in exp.export_top_models(formatter='dict'):
print(model_dict)
# The output is `json` object which records the mutation actions of the top model.
# If users want to output source code of the top model, they can use graph-based execution engine for the experiment,
# %%
# The output is ``json`` object which records the mutation actions of the top model.
# If users want to output source code of the top model,
# they can use :ref:`graph-based execution engine <graph-based-execution-engine>` for the experiment,
# by simply adding the following two lines.
#
# .. code-block:: python
#
# exp_config.execution_engine = 'base'
# export_formatter = 'code'
exp_config.execution_engine = 'base'
export_formatter = 'code'
be654727f3e5e43571f23dcb9a871abf
\ No newline at end of file
0e49e3aef98633744807b814786f6b31
\ No newline at end of file
......@@ -466,6 +466,27 @@ Launch the experiment. The experiment should take several minutes to finish on a
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
INFO:nni.experiment:Creating experiment, Experiment ID: z8ns5fv7
INFO:nni.experiment:Connecting IPC pipe...
INFO:nni.experiment:Starting web server...
INFO:nni.experiment:Setting up...
INFO:nni.runtime.msg_dispatcher_base:Dispatcher started
INFO:nni.retiarii.experiment.pytorch:Web UI URLs: http://127.0.0.1:8081 http://10.190.172.35:8081 http://192.168.49.1:8081 http://172.17.0.1:8081
INFO:nni.retiarii.experiment.pytorch:Start strategy...
INFO:root:Successfully update searchSpace.
INFO:nni.retiarii.strategy.bruteforce:Random search running in fixed size mode. Dedup: on.
INFO:nni.retiarii.experiment.pytorch:Stopping experiment, please wait...
INFO:nni.retiarii.experiment.pytorch:Strategy exit
INFO:nni.retiarii.experiment.pytorch:Waiting for experiment to become DONE (you can ctrl+c if there is no running trial jobs)...
INFO:nni.runtime.msg_dispatcher_base:Dispatcher exiting...
INFO:nni.retiarii.experiment.pytorch:Experiment stopped
......@@ -526,7 +547,7 @@ Export Top Models
Users can export top models after the exploration is done using ``export_top_models``.
.. GENERATED FROM PYTHON SOURCE LINES 353-365
.. GENERATED FROM PYTHON SOURCE LINES 353-357
.. code-block:: default
......@@ -534,14 +555,6 @@ Users can export top models after the exploration is done using ``export_top_mod
for model_dict in exp.export_top_models(formatter='dict'):
print(model_dict)
# The output is `json` object which records the mutation actions of the top model.
# If users want to output source code of the top model, they can use graph-based execution engine for the experiment,
# by simply adding the following two lines.
#
# .. code-block:: python
#
# exp_config.execution_engine = 'base'
# export_formatter = 'code'
......@@ -552,7 +565,28 @@ Users can export top models after the exploration is done using ``export_top_mod
.. code-block:: none
{'model_1': '0', 'model_2': 0.75, 'model_3': 128}
{'model_1': '0', 'model_2': 0.25, 'model_3': 64}
.. GENERATED FROM PYTHON SOURCE LINES 358-362
The output is ``json`` object which records the mutation actions of the top model.
If users want to output source code of the top model,
they can use :ref:`graph-based execution engine <graph-based-execution-engine>` for the experiment,
by simply adding the following two lines.
.. GENERATED FROM PYTHON SOURCE LINES 362-365
.. code-block:: default
exp_config.execution_engine = 'base'
export_formatter = 'code'
......@@ -560,7 +594,7 @@ Users can export top models after the exploration is done using ``export_top_mod
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 2 minutes 15.810 seconds)
**Total running time of the script:** ( 2 minutes 4.499 seconds)
.. _sphx_glr_download_tutorials_hello_nas.py:
......
.. 8a873f2c9cb0e8e3ed2d66b9d16c330f
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hello_nas.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hello_nas.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hello_nas.py:
架构搜索入门教程
================
这是 NNI 上的神经架构搜索(NAS)的入门教程。
在本教程中,我们将借助 NNI NAS 框架,即 *Retiarii*,在 MNIST 数据集上实现网络结构搜索。
我们以多尝试的架构搜索为例来展示如何构建和探索模型空间。
神经架构搜索任务主要有三个关键组成部分,即
* 模型搜索空间,定义了一个要探索的模型的集合。
* 一个合适的策略作为探索这个模型空间的方法。
* 一个模型评估器,用于为搜索空间中每个模型评估性能。
目前,Retiarii 只支持 PyTorch,并对 **PyTorch 1.7 1.10** 进行了测试。
所以本教程假定您使用 PyTorch 作为深度学习框架。未来我们会支持更多框架。
定义您的模型空间
----------------------
模型空间是由用户定义的,用来表达用户想要探索的一组模型,其中包含有潜力的好模型。
NNI 的框架中,模型空间由两部分定义:基本模型和基本模型上可能的变化。
.. GENERATED FROM PYTHON SOURCE LINES 26-34
定义基本模型
^^^^^^^^^^^^^^^^^
定义基本模型与定义 PyTorch(或 TensorFlow)模型几乎相同。
通常,您只需将代码 ``import torch.nn as nn`` 替换为
``import nni.retiarii.nn.pytorch as nn`` 以使用我们打包的 PyTorch 模块。
下面是定义基本模型的一个非常简单的示例。
.. GENERATED FROM PYTHON SOURCE LINES 35-61
.. code-block:: default
import torch
import torch.nn.functional as F
import nni.retiarii.nn.pytorch as nn
from nni.retiarii import model_wrapper
@model_wrapper # this decorator should be put on the out most
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout(0.25)
self.dropout2 = nn.Dropout(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
.. GENERATED FROM PYTHON SOURCE LINES 62-104
.. tip:: 记住,您应该使用 ``import nni.retiarii.nn.pytorch as nn`` :meth:`nni.retiarii.model_wrapper`
许多错误都是因为忘记使用某一个。
另外,要使用 ``nn.init`` 的子模块,可以使用 ``torch.nn``,例如, ``torch.nn.init`` 而不是 ``nn.init``
定义模型变化
^^^^^^^^^^^^^^^^^^^^^^
基本模型只是一个具体模型,而不是模型空间。 我们提供 :doc:`模型变化的 API </nas/construct_space>`
让用户表达如何改变基本模型。 即构建一个包含许多模型的搜索空间。
基于上述基本模型,我们可以定义如下模型空间。
.. code-block:: diff
@model_wrapper
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
- self.conv2 = nn.Conv2d(32, 64, 3, 1)
+ self.conv2 = nn.LayerChoice([
+ nn.Conv2d(32, 64, 3, 1),
+ DepthwiseSeparableConv(32, 64)
+ ])
- self.dropout1 = nn.Dropout(0.25)
+ self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75]))
self.dropout2 = nn.Dropout(0.5)
- self.fc1 = nn.Linear(9216, 128)
- self.fc2 = nn.Linear(128, 10)
+ feature = nn.ValueChoice([64, 128, 256])
+ self.fc1 = nn.Linear(9216, feature)
+ self.fc2 = nn.Linear(feature, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
结果是以下代码:
.. GENERATED FROM PYTHON SOURCE LINES 104-147
.. code-block:: default
class DepthwiseSeparableConv(nn.Module):
def __init__(self, in_ch, out_ch):
super().__init__()
self.depthwise = nn.Conv2d(in_ch, in_ch, kernel_size=3, groups=in_ch)
self.pointwise = nn.Conv2d(in_ch, out_ch, kernel_size=1)
def forward(self, x):
return self.pointwise(self.depthwise(x))
@model_wrapper
class ModelSpace(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
# LayerChoice is used to select a layer between Conv2d and DwConv.
self.conv2 = nn.LayerChoice([
nn.Conv2d(32, 64, 3, 1),
DepthwiseSeparableConv(32, 64)
])
# ValueChoice is used to select a dropout rate.
# ValueChoice can be used as parameter of modules wrapped in `nni.retiarii.nn.pytorch`
# or customized modules wrapped with `@basic_unit`.
self.dropout1 = nn.Dropout(nn.ValueChoice([0.25, 0.5, 0.75])) # choose dropout rate from 0.25, 0.5 and 0.75
self.dropout2 = nn.Dropout(0.5)
feature = nn.ValueChoice([64, 128, 256])
self.fc1 = nn.Linear(9216, feature)
self.fc2 = nn.Linear(feature, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(self.conv2(x), 2)
x = torch.flatten(self.dropout1(x), 1)
x = self.fc2(self.dropout2(F.relu(self.fc1(x))))
output = F.log_softmax(x, dim=1)
return output
model_space = ModelSpace()
model_space
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
ModelSpace(
(conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
(conv2): LayerChoice([Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1)), DepthwiseSeparableConv(
(depthwise): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), groups=32)
(pointwise): Conv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
)], label='model_1')
(dropout1): Dropout(p=0.25, inplace=False)
(dropout2): Dropout(p=0.5, inplace=False)
(fc1): Linear(in_features=9216, out_features=64, bias=True)
(fc2): Linear(in_features=64, out_features=10, bias=True)
)
.. GENERATED FROM PYTHON SOURCE LINES 148-182
这个例子使用了两个模型变化的 API :class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>` :class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>`
:class:`nn.LayerChoice <nni.retiarii.nn.pytorch.LayerChoice>` 可以从一系列的候选子模块中(在本例中为两个),为每个采样模型选择一个。
它可以像原来的 PyTorch 子模块一样使用。
:class:`nn.InputChoice <nni.retiarii.nn.pytorch.ValueChoice>` 的参数是一个候选值列表,语义是为每个采样模型选择一个值。
更详细的 API 描述和用法可以在 :doc:`这里 </nas/construct_space>` 找到。
.. note::
我们正在积极丰富模型变化的 API,使得您可以轻松构建模型空间。
如果当前支持的模型变化的 API 不能表达您的模型空间,
请参考 :doc:`这篇文档 </nas/mutator>` 来自定义突变。
探索定义的模型空间
-------------------------------------------
简单来讲,有两种探索方法:
(1) 独立评估每个采样到的模型,这是 :ref:`多尝试 NAS <multi-trial-nas>` 中的搜索方法。
(2) 单尝试共享权重型的搜索,简称单尝试 NAS
我们在本教程中演示了第一种方法。第二种方法用户可以参考 :ref:`这里 <one-shot-nas>`
首先,用户需要选择合适的探索策略来探索定义好的模型空间。
其次,用户需要选择或自定义模型性能评估来评估每个探索模型的性能。
选择探索策略
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Retiarii 支持许多 :doc:`探索策略</nas/exploration_strategy>`
只需选择(即实例化)探索策略,就如下面的代码演示的一样:
.. GENERATED FROM PYTHON SOURCE LINES 182-186
.. code-block:: default
import nni.retiarii.strategy as strategy
search_strategy = strategy.Random(dedup=True) # dedup=False if deduplication is not wanted
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
/home/yugzhan/miniconda3/envs/cu102/lib/python3.8/site-packages/ray/autoscaler/_private/cli_logger.py:57: FutureWarning: Not all Ray CLI dependencies were found. In Ray 1.4+, the Ray CLI, autoscaler, and dashboard will only be usable via `pip install 'ray[default]'`. Please update your install command.
warnings.warn(
.. GENERATED FROM PYTHON SOURCE LINES 187-200
挑选或自定义模型评估器
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
在探索过程中,探索策略反复生成新模型。模型评估器负责训练并验证每个生成的模型以获得模型的性能。
该性能作为模型的得分被发送到探索策略以帮助其生成更好的模型。
Retiarii 提供了 :doc:`内置模型评估器 </nas/evaluator>`,但在此之前,
我们建议使用 :class:`FunctionalEvaluator <nni.retiarii.evaluator.FunctionalEvaluator>`,即用一个函数包装您自己的训练和评估代码。
这个函数应该接收一个单一的模型类并使用 :func:`nni.report_final_result` 报告这个模型的最终分数。
此处的示例创建了一个简单的评估器,该评估器在 MNIST 数据集上运行,训练 2 epoch,并报告其在验证集上的准确率。
.. GENERATED FROM PYTHON SOURCE LINES 200-268
.. code-block:: default
import nni
from torchvision import transforms
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader
def train_epoch(model, device, train_loader, optimizer, epoch):
loss_fn = torch.nn.CrossEntropyLoss()
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test_epoch(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
accuracy = 100. * correct / len(test_loader.dataset)
print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
correct, len(test_loader.dataset), accuracy))
return accuracy
def evaluate_model(model_cls):
# "model_cls" is a class, need to instantiate
model = model_cls()
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
transf = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = DataLoader(MNIST('data/mnist', download=True, transform=transf), batch_size=64, shuffle=True)
test_loader = DataLoader(MNIST('data/mnist', download=True, train=False, transform=transf), batch_size=64)
for epoch in range(3):
# train the model for one epoch
train_epoch(model, device, train_loader, optimizer, epoch)
# test the model for one epoch
accuracy = test_epoch(model, device, test_loader)
# call report intermediate result. Result can be float or dict
nni.report_intermediate_result(accuracy)
# report final test result
nni.report_final_result(accuracy)
.. GENERATED FROM PYTHON SOURCE LINES 269-270
创建评估器
.. GENERATED FROM PYTHON SOURCE LINES 270-274
.. code-block:: default
from nni.retiarii.evaluator import FunctionalEvaluator
evaluator = FunctionalEvaluator(evaluate_model)
.. GENERATED FROM PYTHON SOURCE LINES 275-286
这里的 ``train_epoch`` ``test_epoch`` 可以是任何自定义函数,用户可以在其中编写自己的训练逻辑。
建议这里的 ``evaluate_model`` 不接受除 ``model_cls`` 之外的其他参数。
但是,在 `高级教程 </nas/evaluator>` 中,我们将展示如何使用其他参数,以免您确实需要这些参数。
未来,我们将支持对评估器的参数进行变化(通常称为“超参数调优”)。
启动实验
--------------------
一切都已准备就绪,现在就可以开始做模型搜索的实验了。如下所示。
.. GENERATED FROM PYTHON SOURCE LINES 287-293
.. code-block:: default
from nni.retiarii.experiment.pytorch import RetiariiExperiment, RetiariiExeConfig
exp = RetiariiExperiment(model_space, evaluator, [], search_strategy)
exp_config = RetiariiExeConfig('local')
exp_config.experiment_name = 'mnist_search'
.. GENERATED FROM PYTHON SOURCE LINES 294-295
以下配置可以用于控制最多/同时运行多少试验。
.. GENERATED FROM PYTHON SOURCE LINES 295-299
.. code-block:: default
exp_config.max_trial_number = 4 # 最多运行 4 个实验
exp_config.trial_concurrency = 2 # 最多同时运行 2 个试验
.. GENERATED FROM PYTHON SOURCE LINES 300-302
如果要使用 GPU,请设置以下配置。
如果您希望使用被占用了的 GPU(比如 GPU 上可能正在运行 GUI),则 ``use_active_gpu`` 应设置为 true
.. GENERATED FROM PYTHON SOURCE LINES 302-306
.. code-block:: default
exp_config.trial_gpu_number = 1
exp_config.training_service.use_active_gpu = True
.. GENERATED FROM PYTHON SOURCE LINES 307-308
启动实验。 在一个有两块 GPU 的工作站上完成整个实验大约需要几分钟时间。
.. GENERATED FROM PYTHON SOURCE LINES 308-311
.. code-block:: default
exp.run(exp_config, 8081)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
INFO:nni.experiment:Creating experiment, Experiment ID: z8ns5fv7
INFO:nni.experiment:Connecting IPC pipe...
INFO:nni.experiment:Starting web server...
INFO:nni.experiment:Setting up...
INFO:nni.runtime.msg_dispatcher_base:Dispatcher started
INFO:nni.retiarii.experiment.pytorch:Web UI URLs: http://127.0.0.1:8081 http://10.190.172.35:8081 http://192.168.49.1:8081 http://172.17.0.1:8081
INFO:nni.retiarii.experiment.pytorch:Start strategy...
INFO:root:Successfully update searchSpace.
INFO:nni.retiarii.strategy.bruteforce:Random search running in fixed size mode. Dedup: on.
INFO:nni.retiarii.experiment.pytorch:Stopping experiment, please wait...
INFO:nni.retiarii.experiment.pytorch:Strategy exit
INFO:nni.retiarii.experiment.pytorch:Waiting for experiment to become DONE (you can ctrl+c if there is no running trial jobs)...
INFO:nni.runtime.msg_dispatcher_base:Dispatcher exiting...
INFO:nni.retiarii.experiment.pytorch:Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 312-330
除了 ``local`` 训练平台,用户还可以使用 :doc:`不同的训练平台 </experiment/training_service/overview>` 来运行 Retiarii 试验。
可视化实验
----------------------
用户可以可视化他们的架构搜索实验,就像可视化超参调优实验一样。
例如,在浏览器中打开 ``localhost:8081``8081 是您在 ``exp.run`` 中设置的端口。
详情请参考 :doc:`这里</experiment/web_portal/web_portal>`
我们支持使用第三方可视化引擎(如 `Netron <https://netron.app/>`__)对模型进行可视化。
这可以通过单击每个试验的详细面板中的“可视化”来使用。
请注意,当前的可视化是基于 `onnx <https://onnx.ai/>`__
因此,如果模型不能导出为 onnx,可视化是不可行的。
内置评估器(例如 Classification)会将模型自动导出到文件中。
对于您自己的评估器,您需要将文件保存到 ``$NNI_OUTPUT_DIR/model.onnx``
例如,
.. GENERATED FROM PYTHON SOURCE LINES 330-344
.. code-block:: default
import os
from pathlib import Path
def evaluate_model_with_visualization(model_cls):
model = model_cls()
# dump the model into an onnx
if 'NNI_OUTPUT_DIR' in os.environ:
dummy_input = torch.zeros(1, 3, 32, 32)
torch.onnx.export(model, (dummy_input, ),
Path(os.environ['NNI_OUTPUT_DIR']) / 'model.onnx')
evaluate_model(model_cls)
.. GENERATED FROM PYTHON SOURCE LINES 345-353
重新启动实验,Web 界面上会显示一个按钮。
.. image:: ../../img/netron_entrance_webui.png
导出最优模型
-----------------
搜索完成后,用户可以使用 ``export_top_models`` 导出最优模型。
.. GENERATED FROM PYTHON SOURCE LINES 353-357
.. code-block:: default
for model_dict in exp.export_top_models(formatter='dict'):
print(model_dict)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
{'model_1': '0', 'model_2': 0.25, 'model_3': 64}
.. GENERATED FROM PYTHON SOURCE LINES 358-362
输出是一个 JSON 对象,记录了最好的模型的每一个选择都选了什么。
如果用户想要搜出来的模型的源代码,他们可以使用 :ref:`基于图的引擎 <graph-based-execution-engine>`,只需增加如下两行。
.. GENERATED FROM PYTHON SOURCE LINES 362-365
.. code-block:: default
exp_config.execution_engine = 'base'
export_formatter = 'code'
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 2 minutes 4.499 seconds)
.. _sphx_glr_download_tutorials_hello_nas.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: hello_nas.py <hello_nas.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: hello_nas.ipynb <hello_nas.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Port PyTorch Quickstart to NNI\nThis is a modified version of `PyTorch quickstart`_.\n\nIt can be run directly and will have the exact same result as original version.\n\nFurthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.\n\nIt is recommended to run this script directly first to verify the environment.\n\nThere are 2 key differences from the original version:\n\n1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.\n2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import nni\nimport torch\nfrom torch import nn\nfrom torch.utils.data import DataLoader\nfrom torchvision import datasets\nfrom torchvision.transforms import ToTensor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hyperparameters to be tuned\nThese are the hyperparameters that will be tuned.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"params = {\n 'features': 512,\n 'lr': 0.001,\n 'momentum': 0,\n}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get optimized hyperparameters\nIf run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.\nBut with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"optimized_params = nni.get_next_parameter()\nparams.update(optimized_params)\nprint(params)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load dataset\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"training_data = datasets.FashionMNIST(root=\"data\", train=True, download=True, transform=ToTensor())\ntest_data = datasets.FashionMNIST(root=\"data\", train=False, download=True, transform=ToTensor())\n\nbatch_size = 64\n\ntrain_dataloader = DataLoader(training_data, batch_size=batch_size)\ntest_dataloader = DataLoader(test_data, batch_size=batch_size)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build model with hyperparameters\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\nprint(f\"Using {device} device\")\n\nclass NeuralNetwork(nn.Module):\n def __init__(self):\n super(NeuralNetwork, self).__init__()\n self.flatten = nn.Flatten()\n self.linear_relu_stack = nn.Sequential(\n nn.Linear(28*28, params['features']),\n nn.ReLU(),\n nn.Linear(params['features'], params['features']),\n nn.ReLU(),\n nn.Linear(params['features'], 10)\n )\n\n def forward(self, x):\n x = self.flatten(x)\n logits = self.linear_relu_stack(x)\n return logits\n\nmodel = NeuralNetwork().to(device)\n\nloss_fn = nn.CrossEntropyLoss()\noptimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define train and test\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def train(dataloader, model, loss_fn, optimizer):\n size = len(dataloader.dataset)\n model.train()\n for batch, (X, y) in enumerate(dataloader):\n X, y = X.to(device), y.to(device)\n pred = model(X)\n loss = loss_fn(pred, y)\n optimizer.zero_grad()\n loss.backward()\n optimizer.step()\n\ndef test(dataloader, model, loss_fn):\n size = len(dataloader.dataset)\n num_batches = len(dataloader)\n model.eval()\n test_loss, correct = 0, 0\n with torch.no_grad():\n for X, y in dataloader:\n X, y = X.to(device), y.to(device)\n pred = model(X)\n test_loss += loss_fn(pred, y).item()\n correct += (pred.argmax(1) == y).type(torch.float).sum().item()\n test_loss /= num_batches\n correct /= size\n return correct"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model and report accuracy\nReport accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"epochs = 5\nfor t in range(epochs):\n print(f\"Epoch {t+1}\\n-------------------------------\")\n train(train_dataloader, model, loss_fn, optimizer)\n accuracy = test(test_dataloader, model, loss_fn)\n nni.report_intermediate_result(accuracy)\nnni.report_final_result(accuracy)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
\ No newline at end of file
"""
Port PyTorch Quickstart to NNI
==============================
This is a modified version of `PyTorch quickstart`_.
It can be run directly and will have the exact same result as original version.
Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
It is recommended to run this script directly first to verify the environment.
There are 2 key differences from the original version:
1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.
.. _PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
"""
# %%
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
# %%
# Hyperparameters to be tuned
# ---------------------------
# These are the hyperparameters that will be tuned.
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
# %%
# Get optimized hyperparameters
# -----------------------------
# If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
# But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
# %%
# Load dataset
# ------------
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
# %%
# Build model with hyperparameters
# --------------------------------
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
# %%
# Define train and test
# ---------------------
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
# %%
# Train model and report accuracy
# -------------------------------
# Report accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
ed8bfc27e3d555d842fc4eec2635e619
\ No newline at end of file
:orphan:
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_nnictl/model.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_nnictl_model.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_nnictl_model.py:
Port PyTorch Quickstart to NNI
==============================
This is a modified version of `PyTorch quickstart`_.
It can be run directly and will have the exact same result as original version.
Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
It is recommended to run this script directly first to verify the environment.
There are 2 key differences from the original version:
1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.
.. _PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
.. GENERATED FROM PYTHON SOURCE LINES 21-28
.. code-block:: default
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
.. GENERATED FROM PYTHON SOURCE LINES 29-32
Hyperparameters to be tuned
---------------------------
These are the hyperparameters that will be tuned.
.. GENERATED FROM PYTHON SOURCE LINES 32-38
.. code-block:: default
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
.. GENERATED FROM PYTHON SOURCE LINES 39-43
Get optimized hyperparameters
-----------------------------
If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
.. GENERATED FROM PYTHON SOURCE LINES 43-47
.. code-block:: default
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
{'features': 512, 'lr': 0.001, 'momentum': 0}
.. GENERATED FROM PYTHON SOURCE LINES 48-50
Load dataset
------------
.. GENERATED FROM PYTHON SOURCE LINES 50-58
.. code-block:: default
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
.. GENERATED FROM PYTHON SOURCE LINES 59-61
Build model with hyperparameters
--------------------------------
.. GENERATED FROM PYTHON SOURCE LINES 61-86
.. code-block:: default
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Using cpu device
.. GENERATED FROM PYTHON SOURCE LINES 87-89
Define train and test
---------------------
.. GENERATED FROM PYTHON SOURCE LINES 89-115
.. code-block:: default
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
.. GENERATED FROM PYTHON SOURCE LINES 116-119
Train model and report accuracy
-------------------------------
Report accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.
.. GENERATED FROM PYTHON SOURCE LINES 119-126
.. code-block:: default
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Epoch 1
-------------------------------
[2022-03-21 01:09:37] INFO (nni/MainThread) Intermediate result: 0.461 (Index 0)
Epoch 2
-------------------------------
[2022-03-21 01:09:42] INFO (nni/MainThread) Intermediate result: 0.5529 (Index 1)
Epoch 3
-------------------------------
[2022-03-21 01:09:47] INFO (nni/MainThread) Intermediate result: 0.6155 (Index 2)
Epoch 4
-------------------------------
[2022-03-21 01:09:52] INFO (nni/MainThread) Intermediate result: 0.6345 (Index 3)
Epoch 5
-------------------------------
[2022-03-21 01:09:56] INFO (nni/MainThread) Intermediate result: 0.6505 (Index 4)
[2022-03-21 01:09:56] INFO (nni/MainThread) Final result: 0.6505
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 24.441 seconds)
.. _sphx_glr_download_tutorials_hpo_nnictl_model.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: model.py <model.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: model.ipynb <model.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
Run HPO Experiment with nnictl
==============================
This tutorial has exactly the same effect as :doc:`../hpo_quickstart_pytorch/main`.
Both tutorials optimize the model in `official PyTorch quickstart
<https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html>`__ with auto-tuning,
while this one manages the experiment with command line tool and YAML config file, instead of pure Python code.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
2. Define hyperparameters' search space.
3. Create config file.
4. Run the experiment.
The first two steps are identical to quickstart.
Step 1: Prepare the model
-------------------------
In first step, we need to prepare the model to be tuned.
The model should be put in a separate script.
It will be evaluated many times concurrently,
and possibly will be trained on distributed platforms.
In this tutorial, the model is defined in :doc:`model.py <model>`.
In short, it is a PyTorch model with 3 additional API calls:
1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
3. Use :func:`nni.report_final_result` to report final accuracy.
Please understand the model code before continue to next step.
Step 2: Define search space
---------------------------
In model code, we have prepared 3 hyperparameters to be tuned:
*features*, *lr*, and *momentum*.
Here we need to define their *search space* so the tuning algorithm can sample them in desired range.
Assuming we have following prior knowledge for these hyperparameters:
1. *features* should be one of 128, 256, 512, 1024.
2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
3. *momentum* should be a float between 0 and 1.
In NNI, the space of *features* is called ``choice``;
the space of *lr* is called ``loguniform``;
and the space of *momentum* is called ``uniform``.
You may have noticed, these names are derived from ``numpy.random``.
For full specification of search space, check :doc:`the reference </hpo/search_space>`.
Now we can define the search space as follow:
.. code-block:: yaml
search_space:
features:
_type: choice
_value: [ 128, 256, 512, 1024 ]
lr:
_type: loguniform
_value: [ 0.0001, 0.1 ]
momentum:
_type: uniform
_value: [ 0, 1 ]
Step 3: Configure the experiment
--------------------------------
NNI uses an *experiment* to manage the HPO process.
The *experiment config* defines how to train the models and how to explore the search space.
In this tutorial we use a YAML file ``config.yaml`` to define the experiment.
Configure trial code
^^^^^^^^^^^^^^^^^^^^
In NNI evaluation of each hyperparameter set is called a *trial*.
So the model script is called *trial code*.
.. code-block:: yaml
trial_command: python model.py
trial_code_directory: .
When ``trial_code_directory`` is a relative path, it relates to the config file.
So in this case we need to put ``config.yaml`` and ``model.py`` in the same directory.
.. attention::
The rules for resolving relative path are different in YAML config file and :doc:`Python experiment API </reference/experiment>`.
In Python experiment API relative paths are relative to current working directory.
Configure how many trials to run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
.. code-block:: yaml
max_trial_number: 10
trial_concurrency: 2
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you stop it.
.. note::
``max_trial_number`` is set to 10 here for a fast example.
In real world it should be set to a larger number.
With default config TPE tuner requires 20 trials to warm up.
Configure tuning algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we use :doc:`TPE tuner </hpo/tuners>`.
.. code-block:: yaml
name: TPE
class_args:
optimize_mode: maximize
Configure training service
^^^^^^^^^^^^^^^^^^^^^^^^^^
In this tutorial we use *local* mode,
which means models will be trained on local machine, without using any special training platform.
.. code-block:: yaml
training_service:
platform: local
Wrap up
^^^^^^^
The full content of ``config.yaml`` is as follow:
.. code-block:: yaml
search_space:
features:
_type: choice
_value: [ 128, 256, 512, 1024 ]
lr:
_type: loguniform
_value: [ 0.0001, 0.1 ]
momentum:
_type: uniform
_value: [ 0, 1 ]
trial_command: python model.py
trial_code_directory: .
trial_concurrency: 2
max_trial_number: 10
tuner:
name: TPE
class_args:
optimize_mode: maximize
training_service:
platform: local
Step 4: Run the experiment
--------------------------
Now the experiment is ready. Launch it with ``nnictl create`` command:
.. code-block:: bash
$ nnictl create --config config.yaml --port 8080
You can use the web portal to view experiment status: http://localhost:8080.
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-01 12:00:00] Creating experiment, Experiment ID: p43ny6ew
[2022-04-01 12:00:00] Starting web server...
[2022-04-01 12:00:01] Setting up...
[2022-04-01 12:00:01] Web portal URLs: http://127.0.0.1:8080 http://192.168.1.1:8080
[2022-04-01 12:00:01] To stop experiment run "nnictl stop p43ny6ew" or "nnictl stop --all"
[2022-04-01 12:00:01] Reference: https://nni.readthedocs.io/en/stable/reference/nnictl.html
When the experiment is done, use ``nnictl stop`` command to stop it.
.. code-block:: bash
$ nnictl stop p43ny6ew
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
INFO: Stopping experiment 7u8yg9zw
INFO: Stop experiment success.
......@@ -15,7 +15,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# NNI HPO Quickstart with PyTorch\nThis tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.\n\nThere is also a :doc:`TensorFlow version<../hpo_quickstart_tensorflow/main>` if you prefer it.\n\nThe tutorial consists of 4 steps: \n\n1. Modify the model for auto-tuning.\n2. Define hyperparameters' search space.\n3. Configure the experiment.\n4. Run the experiment.\n\n"
"\n# HPO Quickstart with PyTorch\nThis tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.\n\nThe tutorial consists of 4 steps: \n\n1. Modify the model for auto-tuning.\n2. Define hyperparameters' search space.\n3. Configure the experiment.\n4. Run the experiment.\n\n"
]
},
{
......@@ -144,7 +144,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-info\"><h4>Note</h4><p>``max_trial_number`` is set to 10 here for a fast example.\n In real world it should be set to a larger number.\n With default config TPE tuner requires 20 trials to warm up.</p></div>\n\nYou may also set ``max_experiment_duration = '1h'`` to limit running time.\n\nIf neither ``max_trial_number`` nor ``max_experiment_duration`` are set,\nthe experiment will run forever until you press Ctrl-C.\n\n"
"You may also set ``max_experiment_duration = '1h'`` to limit running time.\n\nIf neither ``max_trial_number`` nor ``max_experiment_duration`` are set,\nthe experiment will run forever until you press Ctrl-C.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>``max_trial_number`` is set to 10 here for a fast example.\n In real world it should be set to a larger number.\n With default config TPE tuner requires 20 trials to warm up.</p></div>\n\n"
]
},
{
......@@ -187,7 +187,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
":meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,\nso it can be omitted in your code.\n\nAfter the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.\n\n.. tip::\n\n This example uses :doc:`Python API </reference/experiment>` to create experiment.\n\n You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`.\n\n"
":meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,\nso it can be omitted in your code.\n\nAfter the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.\n\n.. tip::\n\n This example uses :doc:`Python API </reference/experiment>` to create experiment.\n\n You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.\n\n"
]
}
],
......@@ -207,7 +207,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.3"
"version": "3.10.4"
}
},
"nbformat": 4,
......
"""
NNI HPO Quickstart with PyTorch
===============================
HPO Quickstart with PyTorch
===========================
This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.
There is also a :doc:`TensorFlow version<../hpo_quickstart_tensorflow/main>` if you prefer it.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
......@@ -113,16 +111,16 @@ experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
# %%
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
#
# .. note::
#
# ``max_trial_number`` is set to 10 here for a fast example.
# In real world it should be set to a larger number.
# With default config TPE tuner requires 20 trials to warm up.
#
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
# %%
# Step 4: Run the experiment
......@@ -154,4 +152,4 @@ experiment.stop()
#
# This example uses :doc:`Python API </reference/experiment>` to create experiment.
#
# You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`.
# You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
f3498812ae89cde34b6f0f54216012fd
\ No newline at end of file
e732cee426a4629b71f5fa28ce16fad7
\ No newline at end of file
:orphan:
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
......@@ -19,12 +18,10 @@
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_main.py:
NNI HPO Quickstart with PyTorch
===============================
HPO Quickstart with PyTorch
===========================
This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.
There is also a :doc:`TensorFlow version<../hpo_quickstart_tensorflow/main>` if you prefer it.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
......@@ -34,7 +31,7 @@ The tutorial consists of 4 steps:
.. _official PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
.. GENERATED FROM PYTHON SOURCE LINES 19-36
.. GENERATED FROM PYTHON SOURCE LINES 17-34
Step 1: Prepare the model
-------------------------
......@@ -54,7 +51,7 @@ In short, it is a PyTorch model with 3 additional API calls:
Please understand the model code before continue to next step.
.. GENERATED FROM PYTHON SOURCE LINES 38-59
.. GENERATED FROM PYTHON SOURCE LINES 36-57
Step 2: Define search space
---------------------------
......@@ -78,7 +75,7 @@ For full specification of search space, check :doc:`the reference </hpo/search_s
Now we can define the search space as follow:
.. GENERATED FROM PYTHON SOURCE LINES 59-66
.. GENERATED FROM PYTHON SOURCE LINES 57-64
.. code-block:: default
......@@ -96,7 +93,7 @@ Now we can define the search space as follow:
.. GENERATED FROM PYTHON SOURCE LINES 67-74
.. GENERATED FROM PYTHON SOURCE LINES 65-72
Step 3: Configure the experiment
--------------------------------
......@@ -106,7 +103,7 @@ The *experiment config* defines how to train the models and how to explore the s
In this tutorial we use a *local* mode experiment,
which means models will be trained on local machine, without using any special training platform.
.. GENERATED FROM PYTHON SOURCE LINES 74-77
.. GENERATED FROM PYTHON SOURCE LINES 72-75
.. code-block:: default
......@@ -120,7 +117,7 @@ which means models will be trained on local machine, without using any special t
.. GENERATED FROM PYTHON SOURCE LINES 78-84
.. GENERATED FROM PYTHON SOURCE LINES 76-82
Now we start to configure the experiment.
......@@ -129,7 +126,7 @@ Configure trial code
In NNI evaluation of each hyperparameter set is called a *trial*.
So the model script is called *trial code*.
.. GENERATED FROM PYTHON SOURCE LINES 84-86
.. GENERATED FROM PYTHON SOURCE LINES 82-84
.. code-block:: default
......@@ -142,7 +139,7 @@ So the model script is called *trial code*.
.. GENERATED FROM PYTHON SOURCE LINES 87-96
.. GENERATED FROM PYTHON SOURCE LINES 85-94
When ``trial_code_directory`` is a relative path, it relates to current working directory.
To run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.
......@@ -154,12 +151,12 @@ is only available in standard Python, not in Jupyter Notebook.)
If you are using Linux system without Conda,
you may need to change ``"python model.py"`` to ``"python3 model.py"``.
.. GENERATED FROM PYTHON SOURCE LINES 98-100
.. GENERATED FROM PYTHON SOURCE LINES 96-98
Configure search space
^^^^^^^^^^^^^^^^^^^^^^
.. GENERATED FROM PYTHON SOURCE LINES 100-102
.. GENERATED FROM PYTHON SOURCE LINES 98-100
.. code-block:: default
......@@ -172,13 +169,13 @@ Configure search space
.. GENERATED FROM PYTHON SOURCE LINES 103-106
.. GENERATED FROM PYTHON SOURCE LINES 101-104
Configure tuning algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we use :doc:`TPE tuner </hpo/tuners>`.
.. GENERATED FROM PYTHON SOURCE LINES 106-109
.. GENERATED FROM PYTHON SOURCE LINES 104-107
.. code-block:: default
......@@ -192,13 +189,13 @@ Here we use :doc:`TPE tuner </hpo/tuners>`.
.. GENERATED FROM PYTHON SOURCE LINES 110-113
.. GENERATED FROM PYTHON SOURCE LINES 108-111
Configure how many trials to run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
.. GENERATED FROM PYTHON SOURCE LINES 113-115
.. GENERATED FROM PYTHON SOURCE LINES 111-113
.. code-block:: default
......@@ -211,7 +208,12 @@ Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate
.. GENERATED FROM PYTHON SOURCE LINES 116-126
.. GENERATED FROM PYTHON SOURCE LINES 114-124
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you press Ctrl-C.
.. note::
......@@ -219,12 +221,7 @@ Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate
In real world it should be set to a larger number.
With default config TPE tuner requires 20 trials to warm up.
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you press Ctrl-C.
.. GENERATED FROM PYTHON SOURCE LINES 128-133
.. GENERATED FROM PYTHON SOURCE LINES 126-131
Step 4: Run the experiment
--------------------------
......@@ -232,7 +229,7 @@ Now the experiment is ready. Choose a port and launch it. (Here we use port 8080
You can use the web portal to view experiment status: http://localhost:8080.
.. GENERATED FROM PYTHON SOURCE LINES 133-135
.. GENERATED FROM PYTHON SOURCE LINES 131-133
.. code-block:: default
......@@ -248,16 +245,16 @@ You can use the web portal to view experiment status: http://localhost:8080.
.. code-block:: none
[2022-03-20 21:07:36] Creating experiment, Experiment ID: p43ny6ew
[2022-03-20 21:07:36] Starting web server...
[2022-03-20 21:07:37] Setting up...
[2022-03-20 21:07:37] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
[2022-04-13 12:07:29] Creating experiment, Experiment ID: hgkju3iq
[2022-04-13 12:07:29] Starting web server...
[2022-04-13 12:07:30] Setting up...
[2022-04-13 12:07:30] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
True
.. GENERATED FROM PYTHON SOURCE LINES 136-143
.. GENERATED FROM PYTHON SOURCE LINES 134-141
After the experiment is done
----------------------------
......@@ -267,7 +264,7 @@ If you are using standard Python instead of Jupyter Notebook,
you can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,
allowing you to view the web portal after the experiment is done.
.. GENERATED FROM PYTHON SOURCE LINES 143-147
.. GENERATED FROM PYTHON SOURCE LINES 141-145
.. code-block:: default
......@@ -285,13 +282,13 @@ allowing you to view the web portal after the experiment is done.
.. code-block:: none
[2022-03-20 21:08:57] Stopping experiment, please wait...
[2022-03-20 21:09:00] Experiment stopped
[2022-04-13 12:08:50] Stopping experiment, please wait...
[2022-04-13 12:08:53] Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 148-158
.. GENERATED FROM PYTHON SOURCE LINES 146-156
:meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,
so it can be omitted in your code.
......@@ -302,12 +299,12 @@ After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.vi
This example uses :doc:`Python API </reference/experiment>` to create experiment.
You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`.
You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 24.393 seconds)
**Total running time of the script:** ( 1 minutes 24.367 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py:
......
.. a395c59bf5359c3583b7a0a3ab66d705
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_quickstart_pytorch/main.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_main.py:
HPO 教程(PyTorch 版本)
========================
本教程对 `PyTorch 官方教程 <https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html>`__ 进行超参调优。
教程分为四步:
1. 修改调优的模型代码;
2. 定义超参的搜索空间;
3. 配置实验;
4. 运行实验。
.. GENERATED FROM PYTHON SOURCE LINES 17-34
步骤一:准备模型
----------------
首先,我们需要准备待调优的模型。
由于被调优的模型会被独立地运行多次,
并且使用特定训练平台时还可能会被上传到云端执行,
我们需要将代码写在另一个 py 文件中。
本教程使用的模型的代码是 :doc:`model.py <model>`。
模型代码在一个普通的 PyTorch 模型基础之上,增加了3个 API 调用:
1. 使用 :func:`nni.get_next_parameter` 获取需要评估的超参;
2. 使用 :func:`nni.report_intermediate_result` 报告每个 epoch 产生的中间训练结果;
3. 使用 :func:`nni.report_final_result` 报告最终准确率。
请先理解模型代码,然后再继续下一步。
.. GENERATED FROM PYTHON SOURCE LINES 36-57
步骤二:定义搜索空间
--------------------
在模型代码中,我们准备了三个需要调优的超参:features、lr、momentum。
现在,我们需要定义的它们的“搜索空间”,指定它们的取值范围和分布规律。
假设我们对三个超参有以下先验知识:
1. features 的取值可以为128、256、512、1024;
2. lr 的取值在0.0001到0.1之间,其取值符合指数分布;
3. momentum 的取值在0到1之间。
在 NNI 中,features 的取值范围称为 ``choice`` ,
lr 的取值范围称为 ``loguniform`` ,
momentum 的取值范围称为 ``uniform`` 。
您可能已经注意到了,这些名称和 ``numpy.random`` 中的函数名一致。
完整的搜索空间文档: :doc:`/hpo/search_space`.
我们的搜索空间定义如下:
.. GENERATED FROM PYTHON SOURCE LINES 57-64
.. code-block:: default
search_space = {
'features': {'_type': 'choice', '_value': [128, 256, 512, 1024]},
'lr': {'_type': 'loguniform', '_value': [0.0001, 0.1]},
'momentum': {'_type': 'uniform', '_value': [0, 1]},
}
.. GENERATED FROM PYTHON SOURCE LINES 65-72
步骤三:配置实验
----------------
NNI 使用“实验”来管理超参调优,“实验配置”定义了如何训练模型、如何遍历搜索空间。
在本教程中我们使用 local 模式的实验,这意味着实验只在本机运行,不使用任何特别的训练平台。
.. GENERATED FROM PYTHON SOURCE LINES 72-75
.. code-block:: default
from nni.experiment import Experiment
experiment = Experiment('local')
.. GENERATED FROM PYTHON SOURCE LINES 76-82
现在我们开始配置实验。
配置 trial
^^^^^^^^^^
在 NNI 中评估一组超参的过程被称为一个“trial”(试验),上面的模型代码被称为“trial 代码”。
.. GENERATED FROM PYTHON SOURCE LINES 82-84
.. code-block:: default
experiment.config.trial_command = 'python model.py'
experiment.config.trial_code_directory = '.'
.. GENERATED FROM PYTHON SOURCE LINES 85-94
如果 ``trial_code_directory`` 是一个相对路径,它被认为相对于当前的工作目录。
如果您想在其他路径下运行本文件 ``main.py`` ,您可以将代码目录设置为 ``Path(__file__).parent`` 。
(`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__
只能在 py 文件中使用,不能在 Jupyter Notebook 中使用)
.. attention::
如果您使用 Linux 系统,并且没有使用 Conda,
您可能需要将 ``"python model.py"`` 改为 ``"python3 model.py"`` 。
.. GENERATED FROM PYTHON SOURCE LINES 96-98
配置搜索空间
^^^^^^^^^^^^
.. GENERATED FROM PYTHON SOURCE LINES 98-100
.. code-block:: default
experiment.config.search_space = search_space
.. GENERATED FROM PYTHON SOURCE LINES 101-104
配置调优算法
^^^^^^^^^^^^
此处我们使用 :doc:`TPE 算法</hpo/tuners>` 。
.. GENERATED FROM PYTHON SOURCE LINES 104-107
.. code-block:: default
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
.. GENERATED FROM PYTHON SOURCE LINES 108-111
配置运行多少 trial
^^^^^^^^^^^^^^^^^^
本教程中我们总共尝试10组超参,并且每次并行地评估2组超参。
.. GENERATED FROM PYTHON SOURCE LINES 111-113
.. code-block:: default
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
.. GENERATED FROM PYTHON SOURCE LINES 114-124
您也可以设置 ``max_experiment_duration = '1h'`` 来限制运行时间。
如果 ``max_trial_number`` 和 ``max_experiment_duration`` 都没有设置,实验将会一直运行,直到您按下 Ctrl-C。
.. note::
此处将 ``max_trial_number`` 设置为10是为了让教程能够较快地运行结束,
在实际使用中应该设为更大的数值,TPE 算法在默认参数下需要评估20组超参才会完成初始化。
.. GENERATED FROM PYTHON SOURCE LINES 126-131
步骤四:运行实验
----------------
现在实验已经配置完成了,您可以指定一个端口来运行它,教程中我们使用8080端口。
您可以通过网页控制台查看实验状态: http://localhost:8080.
.. GENERATED FROM PYTHON SOURCE LINES 131-133
.. code-block:: default
experiment.run(8080)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:07:29] Creating experiment, Experiment ID: hgkju3iq
[2022-04-13 12:07:29] Starting web server...
[2022-04-13 12:07:30] Setting up...
[2022-04-13 12:07:30] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
True
.. GENERATED FROM PYTHON SOURCE LINES 134-141
实验结束之后
------------
您只需要等待函数返回就可以正常结束实验,以下内容为可选项。
如果您使用的是普通 Python 而不是 Jupyter Notebook,
您可以在代码末尾加上一行 ``input()`` 或者 ``signal.pause()`` 来避免 Python 解释器自动退出,
这样您就能继续使用网页控制台。
.. GENERATED FROM PYTHON SOURCE LINES 141-145
.. code-block:: default
# input('Press enter to quit')
experiment.stop()
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:08:50] Stopping experiment, please wait...
[2022-04-13 12:08:53] Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 146-156
:meth:`nni.experiment.Experiment.stop` 会在 Python 退出前自动调用,所以您可以将其省略,不写在自己的代码中。
实验完全停止之后,您可以使用 :meth:`nni.experiment.Experiment.view` 重新启动网页控制台。
.. tip::
本教程使用 :doc:`Python API </reference/experiment>` 创建实验,
除此之外您也可以选择使用 :doc:`命令行工具 <../hpo_nnictl/nnictl>` 。
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 24.367 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: main.py <main.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: main.ipynb <main.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
.. e083b4dc8e350428ddf680e97b47cc8e
:orphan:
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_quickstart_pytorch/model.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_quickstart_pytorch_model.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_model.py:
将 PyTorch 官方教程移植到NNI
============================
本文件是 `PyTorch 官方教程 <https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.htlm>`__ 的修改版。
您可以直接运行本文件,其结果和原版完全一致。同时,您也可以在 NNI 实验中使用本文件,进行超参调优。
我们建议您先直接运行一次本文件,在熟悉代码的同时检查运行环境。
和原版相比,我们做了两处修改:
1. `获取调优后的参数`_ 部分,我们使用调优算法生成的参数替换默认参数;
2. `训练模型并上传结果`_ 部分,我们将准确率数据报告给 NNI
.. GENERATED FROM PYTHON SOURCE LINES 21-28
.. code-block:: default
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
.. GENERATED FROM PYTHON SOURCE LINES 29-32
准备调优的超参
--------------
以下超参将被调优:
.. GENERATED FROM PYTHON SOURCE LINES 32-38
.. code-block:: default
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
.. GENERATED FROM PYTHON SOURCE LINES 39-43
获取调优后的参数
----------------
直接运行时 :func:`nni.get_next_parameter` 会返回空 dict
而在 NNI 实验中使用时,它会返回调优算法生成的超参组合。
.. GENERATED FROM PYTHON SOURCE LINES 43-47
.. code-block:: default
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
{'features': 512, 'lr': 0.001, 'momentum': 0}
.. GENERATED FROM PYTHON SOURCE LINES 48-50
加载数据集
----------
.. GENERATED FROM PYTHON SOURCE LINES 50-58
.. code-block:: default
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
.. GENERATED FROM PYTHON SOURCE LINES 59-61
使用超参构建模型
----------------
.. GENERATED FROM PYTHON SOURCE LINES 61-86
.. code-block:: default
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Using cpu device
.. GENERATED FROM PYTHON SOURCE LINES 87-89
定义训练和测试函数
------------------
.. GENERATED FROM PYTHON SOURCE LINES 89-115
.. code-block:: default
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
.. GENERATED FROM PYTHON SOURCE LINES 116-119
训练模型并上传结果
------------------
将准确率数据报告给 NNI 的调参算法,以使其能够预测更优的超参组合。
.. GENERATED FROM PYTHON SOURCE LINES 119-126
.. code-block:: default
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Epoch 1
-------------------------------
[2022-03-21 01:09:37] INFO (nni/MainThread) Intermediate result: 0.461 (Index 0)
Epoch 2
-------------------------------
[2022-03-21 01:09:42] INFO (nni/MainThread) Intermediate result: 0.5529 (Index 1)
Epoch 3
-------------------------------
[2022-03-21 01:09:47] INFO (nni/MainThread) Intermediate result: 0.6155 (Index 2)
Epoch 4
-------------------------------
[2022-03-21 01:09:52] INFO (nni/MainThread) Intermediate result: 0.6345 (Index 3)
Epoch 5
-------------------------------
[2022-03-21 01:09:56] INFO (nni/MainThread) Intermediate result: 0.6505 (Index 4)
[2022-03-21 01:09:56] INFO (nni/MainThread) Final result: 0.6505
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 24.441 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_model.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: model.py <model.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: model.ipynb <model.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment