Unverified Commit 4446280d authored by liuzhe-lz's avatar liuzhe-lz Committed by GitHub
Browse files

update hpo tutorials (#4758)

parent b7dfc7cf
f3498812ae89cde34b6f0f54216012fd e732cee426a4629b71f5fa28ce16fad7
\ No newline at end of file \ No newline at end of file
:orphan:
.. DO NOT EDIT. .. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
...@@ -19,12 +18,10 @@ ...@@ -19,12 +18,10 @@
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_main.py: .. _sphx_glr_tutorials_hpo_quickstart_pytorch_main.py:
NNI HPO Quickstart with PyTorch HPO Quickstart with PyTorch
=============================== ===========================
This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning. This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.
There is also a :doc:`TensorFlow version<../hpo_quickstart_tensorflow/main>` if you prefer it.
The tutorial consists of 4 steps: The tutorial consists of 4 steps:
1. Modify the model for auto-tuning. 1. Modify the model for auto-tuning.
...@@ -34,7 +31,7 @@ The tutorial consists of 4 steps: ...@@ -34,7 +31,7 @@ The tutorial consists of 4 steps:
.. _official PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html .. _official PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
.. GENERATED FROM PYTHON SOURCE LINES 19-36 .. GENERATED FROM PYTHON SOURCE LINES 17-34
Step 1: Prepare the model Step 1: Prepare the model
------------------------- -------------------------
...@@ -54,7 +51,7 @@ In short, it is a PyTorch model with 3 additional API calls: ...@@ -54,7 +51,7 @@ In short, it is a PyTorch model with 3 additional API calls:
Please understand the model code before continue to next step. Please understand the model code before continue to next step.
.. GENERATED FROM PYTHON SOURCE LINES 38-59 .. GENERATED FROM PYTHON SOURCE LINES 36-57
Step 2: Define search space Step 2: Define search space
--------------------------- ---------------------------
...@@ -78,7 +75,7 @@ For full specification of search space, check :doc:`the reference </hpo/search_s ...@@ -78,7 +75,7 @@ For full specification of search space, check :doc:`the reference </hpo/search_s
Now we can define the search space as follow: Now we can define the search space as follow:
.. GENERATED FROM PYTHON SOURCE LINES 59-66 .. GENERATED FROM PYTHON SOURCE LINES 57-64
.. code-block:: default .. code-block:: default
...@@ -96,7 +93,7 @@ Now we can define the search space as follow: ...@@ -96,7 +93,7 @@ Now we can define the search space as follow:
.. GENERATED FROM PYTHON SOURCE LINES 67-74 .. GENERATED FROM PYTHON SOURCE LINES 65-72
Step 3: Configure the experiment Step 3: Configure the experiment
-------------------------------- --------------------------------
...@@ -106,7 +103,7 @@ The *experiment config* defines how to train the models and how to explore the s ...@@ -106,7 +103,7 @@ The *experiment config* defines how to train the models and how to explore the s
In this tutorial we use a *local* mode experiment, In this tutorial we use a *local* mode experiment,
which means models will be trained on local machine, without using any special training platform. which means models will be trained on local machine, without using any special training platform.
.. GENERATED FROM PYTHON SOURCE LINES 74-77 .. GENERATED FROM PYTHON SOURCE LINES 72-75
.. code-block:: default .. code-block:: default
...@@ -120,7 +117,7 @@ which means models will be trained on local machine, without using any special t ...@@ -120,7 +117,7 @@ which means models will be trained on local machine, without using any special t
.. GENERATED FROM PYTHON SOURCE LINES 78-84 .. GENERATED FROM PYTHON SOURCE LINES 76-82
Now we start to configure the experiment. Now we start to configure the experiment.
...@@ -129,7 +126,7 @@ Configure trial code ...@@ -129,7 +126,7 @@ Configure trial code
In NNI evaluation of each hyperparameter set is called a *trial*. In NNI evaluation of each hyperparameter set is called a *trial*.
So the model script is called *trial code*. So the model script is called *trial code*.
.. GENERATED FROM PYTHON SOURCE LINES 84-86 .. GENERATED FROM PYTHON SOURCE LINES 82-84
.. code-block:: default .. code-block:: default
...@@ -142,7 +139,7 @@ So the model script is called *trial code*. ...@@ -142,7 +139,7 @@ So the model script is called *trial code*.
.. GENERATED FROM PYTHON SOURCE LINES 87-96 .. GENERATED FROM PYTHON SOURCE LINES 85-94
When ``trial_code_directory`` is a relative path, it relates to current working directory. When ``trial_code_directory`` is a relative path, it relates to current working directory.
To run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``. To run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.
...@@ -154,12 +151,12 @@ is only available in standard Python, not in Jupyter Notebook.) ...@@ -154,12 +151,12 @@ is only available in standard Python, not in Jupyter Notebook.)
If you are using Linux system without Conda, If you are using Linux system without Conda,
you may need to change ``"python model.py"`` to ``"python3 model.py"``. you may need to change ``"python model.py"`` to ``"python3 model.py"``.
.. GENERATED FROM PYTHON SOURCE LINES 98-100 .. GENERATED FROM PYTHON SOURCE LINES 96-98
Configure search space Configure search space
^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
.. GENERATED FROM PYTHON SOURCE LINES 100-102 .. GENERATED FROM PYTHON SOURCE LINES 98-100
.. code-block:: default .. code-block:: default
...@@ -172,13 +169,13 @@ Configure search space ...@@ -172,13 +169,13 @@ Configure search space
.. GENERATED FROM PYTHON SOURCE LINES 103-106 .. GENERATED FROM PYTHON SOURCE LINES 101-104
Configure tuning algorithm Configure tuning algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we use :doc:`TPE tuner </hpo/tuners>`. Here we use :doc:`TPE tuner </hpo/tuners>`.
.. GENERATED FROM PYTHON SOURCE LINES 106-109 .. GENERATED FROM PYTHON SOURCE LINES 104-107
.. code-block:: default .. code-block:: default
...@@ -192,13 +189,13 @@ Here we use :doc:`TPE tuner </hpo/tuners>`. ...@@ -192,13 +189,13 @@ Here we use :doc:`TPE tuner </hpo/tuners>`.
.. GENERATED FROM PYTHON SOURCE LINES 110-113 .. GENERATED FROM PYTHON SOURCE LINES 108-111
Configure how many trials to run Configure how many trials to run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time. Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
.. GENERATED FROM PYTHON SOURCE LINES 113-115 .. GENERATED FROM PYTHON SOURCE LINES 111-113
.. code-block:: default .. code-block:: default
...@@ -211,7 +208,12 @@ Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate ...@@ -211,7 +208,12 @@ Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate
.. GENERATED FROM PYTHON SOURCE LINES 116-126 .. GENERATED FROM PYTHON SOURCE LINES 114-124
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you press Ctrl-C.
.. note:: .. note::
...@@ -219,12 +221,7 @@ Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate ...@@ -219,12 +221,7 @@ Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate
In real world it should be set to a larger number. In real world it should be set to a larger number.
With default config TPE tuner requires 20 trials to warm up. With default config TPE tuner requires 20 trials to warm up.
You may also set ``max_experiment_duration = '1h'`` to limit running time. .. GENERATED FROM PYTHON SOURCE LINES 126-131
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you press Ctrl-C.
.. GENERATED FROM PYTHON SOURCE LINES 128-133
Step 4: Run the experiment Step 4: Run the experiment
-------------------------- --------------------------
...@@ -232,7 +229,7 @@ Now the experiment is ready. Choose a port and launch it. (Here we use port 8080 ...@@ -232,7 +229,7 @@ Now the experiment is ready. Choose a port and launch it. (Here we use port 8080
You can use the web portal to view experiment status: http://localhost:8080. You can use the web portal to view experiment status: http://localhost:8080.
.. GENERATED FROM PYTHON SOURCE LINES 133-135 .. GENERATED FROM PYTHON SOURCE LINES 131-133
.. code-block:: default .. code-block:: default
...@@ -248,16 +245,16 @@ You can use the web portal to view experiment status: http://localhost:8080. ...@@ -248,16 +245,16 @@ You can use the web portal to view experiment status: http://localhost:8080.
.. code-block:: none .. code-block:: none
[2022-03-20 21:07:36] Creating experiment, Experiment ID: p43ny6ew [2022-04-13 12:07:29] Creating experiment, Experiment ID: hgkju3iq
[2022-03-20 21:07:36] Starting web server... [2022-04-13 12:07:29] Starting web server...
[2022-03-20 21:07:37] Setting up... [2022-04-13 12:07:30] Setting up...
[2022-03-20 21:07:37] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080 [2022-04-13 12:07:30] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
True True
.. GENERATED FROM PYTHON SOURCE LINES 136-143 .. GENERATED FROM PYTHON SOURCE LINES 134-141
After the experiment is done After the experiment is done
---------------------------- ----------------------------
...@@ -267,7 +264,7 @@ If you are using standard Python instead of Jupyter Notebook, ...@@ -267,7 +264,7 @@ If you are using standard Python instead of Jupyter Notebook,
you can add ``input()`` or ``signal.pause()`` to prevent Python from exiting, you can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,
allowing you to view the web portal after the experiment is done. allowing you to view the web portal after the experiment is done.
.. GENERATED FROM PYTHON SOURCE LINES 143-147 .. GENERATED FROM PYTHON SOURCE LINES 141-145
.. code-block:: default .. code-block:: default
...@@ -285,13 +282,13 @@ allowing you to view the web portal after the experiment is done. ...@@ -285,13 +282,13 @@ allowing you to view the web portal after the experiment is done.
.. code-block:: none .. code-block:: none
[2022-03-20 21:08:57] Stopping experiment, please wait... [2022-04-13 12:08:50] Stopping experiment, please wait...
[2022-03-20 21:09:00] Experiment stopped [2022-04-13 12:08:53] Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 148-158 .. GENERATED FROM PYTHON SOURCE LINES 146-156
:meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits, :meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,
so it can be omitted in your code. so it can be omitted in your code.
...@@ -302,12 +299,12 @@ After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.vi ...@@ -302,12 +299,12 @@ After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.vi
This example uses :doc:`Python API </reference/experiment>` to create experiment. This example uses :doc:`Python API </reference/experiment>` to create experiment.
You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`. You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
.. rst-class:: sphx-glr-timing .. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 24.393 seconds) **Total running time of the script:** ( 1 minutes 24.367 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py: .. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py:
......
.. a395c59bf5359c3583b7a0a3ab66d705
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_quickstart_pytorch/main.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_main.py:
HPO 教程(PyTorch 版本)
========================
本教程对 `PyTorch 官方教程 <https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html>`__ 进行超参调优。
教程分为四步:
1. 修改调优的模型代码;
2. 定义超参的搜索空间;
3. 配置实验;
4. 运行实验。
.. GENERATED FROM PYTHON SOURCE LINES 17-34
步骤一:准备模型
----------------
首先,我们需要准备待调优的模型。
由于被调优的模型会被独立地运行多次,
并且使用特定训练平台时还可能会被上传到云端执行,
我们需要将代码写在另一个 py 文件中。
本教程使用的模型的代码是 :doc:`model.py <model>`。
模型代码在一个普通的 PyTorch 模型基础之上,增加了3个 API 调用:
1. 使用 :func:`nni.get_next_parameter` 获取需要评估的超参;
2. 使用 :func:`nni.report_intermediate_result` 报告每个 epoch 产生的中间训练结果;
3. 使用 :func:`nni.report_final_result` 报告最终准确率。
请先理解模型代码,然后再继续下一步。
.. GENERATED FROM PYTHON SOURCE LINES 36-57
步骤二:定义搜索空间
--------------------
在模型代码中,我们准备了三个需要调优的超参:features、lr、momentum。
现在,我们需要定义的它们的“搜索空间”,指定它们的取值范围和分布规律。
假设我们对三个超参有以下先验知识:
1. features 的取值可以为128、256、512、1024;
2. lr 的取值在0.0001到0.1之间,其取值符合指数分布;
3. momentum 的取值在0到1之间。
在 NNI 中,features 的取值范围称为 ``choice`` ,
lr 的取值范围称为 ``loguniform`` ,
momentum 的取值范围称为 ``uniform`` 。
您可能已经注意到了,这些名称和 ``numpy.random`` 中的函数名一致。
完整的搜索空间文档: :doc:`/hpo/search_space`.
我们的搜索空间定义如下:
.. GENERATED FROM PYTHON SOURCE LINES 57-64
.. code-block:: default
search_space = {
'features': {'_type': 'choice', '_value': [128, 256, 512, 1024]},
'lr': {'_type': 'loguniform', '_value': [0.0001, 0.1]},
'momentum': {'_type': 'uniform', '_value': [0, 1]},
}
.. GENERATED FROM PYTHON SOURCE LINES 65-72
步骤三:配置实验
----------------
NNI 使用“实验”来管理超参调优,“实验配置”定义了如何训练模型、如何遍历搜索空间。
在本教程中我们使用 local 模式的实验,这意味着实验只在本机运行,不使用任何特别的训练平台。
.. GENERATED FROM PYTHON SOURCE LINES 72-75
.. code-block:: default
from nni.experiment import Experiment
experiment = Experiment('local')
.. GENERATED FROM PYTHON SOURCE LINES 76-82
现在我们开始配置实验。
配置 trial
^^^^^^^^^^
在 NNI 中评估一组超参的过程被称为一个“trial”(试验),上面的模型代码被称为“trial 代码”。
.. GENERATED FROM PYTHON SOURCE LINES 82-84
.. code-block:: default
experiment.config.trial_command = 'python model.py'
experiment.config.trial_code_directory = '.'
.. GENERATED FROM PYTHON SOURCE LINES 85-94
如果 ``trial_code_directory`` 是一个相对路径,它被认为相对于当前的工作目录。
如果您想在其他路径下运行本文件 ``main.py`` ,您可以将代码目录设置为 ``Path(__file__).parent`` 。
(`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__
只能在 py 文件中使用,不能在 Jupyter Notebook 中使用)
.. attention::
如果您使用 Linux 系统,并且没有使用 Conda,
您可能需要将 ``"python model.py"`` 改为 ``"python3 model.py"`` 。
.. GENERATED FROM PYTHON SOURCE LINES 96-98
配置搜索空间
^^^^^^^^^^^^
.. GENERATED FROM PYTHON SOURCE LINES 98-100
.. code-block:: default
experiment.config.search_space = search_space
.. GENERATED FROM PYTHON SOURCE LINES 101-104
配置调优算法
^^^^^^^^^^^^
此处我们使用 :doc:`TPE 算法</hpo/tuners>` 。
.. GENERATED FROM PYTHON SOURCE LINES 104-107
.. code-block:: default
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
.. GENERATED FROM PYTHON SOURCE LINES 108-111
配置运行多少 trial
^^^^^^^^^^^^^^^^^^
本教程中我们总共尝试10组超参,并且每次并行地评估2组超参。
.. GENERATED FROM PYTHON SOURCE LINES 111-113
.. code-block:: default
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
.. GENERATED FROM PYTHON SOURCE LINES 114-124
您也可以设置 ``max_experiment_duration = '1h'`` 来限制运行时间。
如果 ``max_trial_number`` 和 ``max_experiment_duration`` 都没有设置,实验将会一直运行,直到您按下 Ctrl-C。
.. note::
此处将 ``max_trial_number`` 设置为10是为了让教程能够较快地运行结束,
在实际使用中应该设为更大的数值,TPE 算法在默认参数下需要评估20组超参才会完成初始化。
.. GENERATED FROM PYTHON SOURCE LINES 126-131
步骤四:运行实验
----------------
现在实验已经配置完成了,您可以指定一个端口来运行它,教程中我们使用8080端口。
您可以通过网页控制台查看实验状态: http://localhost:8080.
.. GENERATED FROM PYTHON SOURCE LINES 131-133
.. code-block:: default
experiment.run(8080)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:07:29] Creating experiment, Experiment ID: hgkju3iq
[2022-04-13 12:07:29] Starting web server...
[2022-04-13 12:07:30] Setting up...
[2022-04-13 12:07:30] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
True
.. GENERATED FROM PYTHON SOURCE LINES 134-141
实验结束之后
------------
您只需要等待函数返回就可以正常结束实验,以下内容为可选项。
如果您使用的是普通 Python 而不是 Jupyter Notebook,
您可以在代码末尾加上一行 ``input()`` 或者 ``signal.pause()`` 来避免 Python 解释器自动退出,
这样您就能继续使用网页控制台。
.. GENERATED FROM PYTHON SOURCE LINES 141-145
.. code-block:: default
# input('Press enter to quit')
experiment.stop()
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:08:50] Stopping experiment, please wait...
[2022-04-13 12:08:53] Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 146-156
:meth:`nni.experiment.Experiment.stop` 会在 Python 退出前自动调用,所以您可以将其省略,不写在自己的代码中。
实验完全停止之后,您可以使用 :meth:`nni.experiment.Experiment.view` 重新启动网页控制台。
.. tip::
本教程使用 :doc:`Python API </reference/experiment>` 创建实验,
除此之外您也可以选择使用 :doc:`命令行工具 <../hpo_nnictl/nnictl>` 。
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 24.367 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: main.py <main.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: main.ipynb <main.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
.. e083b4dc8e350428ddf680e97b47cc8e
:orphan:
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_quickstart_pytorch/model.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_quickstart_pytorch_model.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_model.py:
将 PyTorch 官方教程移植到NNI
============================
本文件是 `PyTorch 官方教程 <https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.htlm>`__ 的修改版。
您可以直接运行本文件,其结果和原版完全一致。同时,您也可以在 NNI 实验中使用本文件,进行超参调优。
我们建议您先直接运行一次本文件,在熟悉代码的同时检查运行环境。
和原版相比,我们做了两处修改:
1. `获取调优后的参数`_ 部分,我们使用调优算法生成的参数替换默认参数;
2. `训练模型并上传结果`_ 部分,我们将准确率数据报告给 NNI
.. GENERATED FROM PYTHON SOURCE LINES 21-28
.. code-block:: default
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
.. GENERATED FROM PYTHON SOURCE LINES 29-32
准备调优的超参
--------------
以下超参将被调优:
.. GENERATED FROM PYTHON SOURCE LINES 32-38
.. code-block:: default
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
.. GENERATED FROM PYTHON SOURCE LINES 39-43
获取调优后的参数
----------------
直接运行时 :func:`nni.get_next_parameter` 会返回空 dict
而在 NNI 实验中使用时,它会返回调优算法生成的超参组合。
.. GENERATED FROM PYTHON SOURCE LINES 43-47
.. code-block:: default
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
{'features': 512, 'lr': 0.001, 'momentum': 0}
.. GENERATED FROM PYTHON SOURCE LINES 48-50
加载数据集
----------
.. GENERATED FROM PYTHON SOURCE LINES 50-58
.. code-block:: default
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
.. GENERATED FROM PYTHON SOURCE LINES 59-61
使用超参构建模型
----------------
.. GENERATED FROM PYTHON SOURCE LINES 61-86
.. code-block:: default
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Using cpu device
.. GENERATED FROM PYTHON SOURCE LINES 87-89
定义训练和测试函数
------------------
.. GENERATED FROM PYTHON SOURCE LINES 89-115
.. code-block:: default
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
.. GENERATED FROM PYTHON SOURCE LINES 116-119
训练模型并上传结果
------------------
将准确率数据报告给 NNI 的调参算法,以使其能够预测更优的超参组合。
.. GENERATED FROM PYTHON SOURCE LINES 119-126
.. code-block:: default
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Epoch 1
-------------------------------
[2022-03-21 01:09:37] INFO (nni/MainThread) Intermediate result: 0.461 (Index 0)
Epoch 2
-------------------------------
[2022-03-21 01:09:42] INFO (nni/MainThread) Intermediate result: 0.5529 (Index 1)
Epoch 3
-------------------------------
[2022-03-21 01:09:47] INFO (nni/MainThread) Intermediate result: 0.6155 (Index 2)
Epoch 4
-------------------------------
[2022-03-21 01:09:52] INFO (nni/MainThread) Intermediate result: 0.6345 (Index 3)
Epoch 5
-------------------------------
[2022-03-21 01:09:56] INFO (nni/MainThread) Intermediate result: 0.6505 (Index 4)
[2022-03-21 01:09:56] INFO (nni/MainThread) Final result: 0.6505
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 24.441 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_model.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: model.py <model.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: model.ipynb <model.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
...@@ -5,10 +5,10 @@ ...@@ -5,10 +5,10 @@
Computation times Computation times
================= =================
**00:24.441** total execution time for **tutorials_hpo_quickstart_pytorch** files: **01:24.367** total execution time for **tutorials_hpo_quickstart_pytorch** files:
+--------------------------------------------------------------------------+-----------+--------+ +--------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_model.py` (``model.py``) | 00:24.441 | 0.0 MB | | :ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_main.py` (``main.py``) | 01:24.367 | 0.0 MB |
+--------------------------------------------------------------------------+-----------+--------+ +--------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_main.py` (``main.py``) | 00:00.000 | 0.0 MB | | :ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_model.py` (``model.py``) | 00:00.000 | 0.0 MB |
+--------------------------------------------------------------------------+-----------+--------+ +--------------------------------------------------------------------------+-----------+--------+
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"\n# NNI HPO Quickstart with TensorFlow\nThis tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.\n\nThe tutorial consists of 4 steps: \n\n1. Modify the model for auto-tuning.\n2. Define hyperparameters' search space.\n3. Configure the experiment.\n4. Run the experiment.\n\n" "\n# HPO Quickstart with TensorFlow\nThis tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.\n\nThe tutorial consists of 4 steps: \n\n1. Modify the model for auto-tuning.\n2. Define hyperparameters' search space.\n3. Configure the experiment.\n4. Run the experiment.\n\n"
] ]
}, },
{ {
...@@ -144,7 +144,7 @@ ...@@ -144,7 +144,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
"<div class=\"alert alert-info\"><h4>Note</h4><p>``max_trial_number`` is set to 10 here for a fast example.\n In real world it should be set to a larger number.\n With default config TPE tuner requires 20 trials to warm up.</p></div>\n\nYou may also set ``max_experiment_duration = '1h'`` to limit running time.\n\nIf neither ``max_trial_number`` nor ``max_experiment_duration`` are set,\nthe experiment will run forever until you press Ctrl-C.\n\n" "You may also set ``max_experiment_duration = '1h'`` to limit running time.\n\nIf neither ``max_trial_number`` nor ``max_experiment_duration`` are set,\nthe experiment will run forever until you press Ctrl-C.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>``max_trial_number`` is set to 10 here for a fast example.\n In real world it should be set to a larger number.\n With default config TPE tuner requires 20 trials to warm up.</p></div>\n\n"
] ]
}, },
{ {
...@@ -187,7 +187,7 @@ ...@@ -187,7 +187,7 @@
"cell_type": "markdown", "cell_type": "markdown",
"metadata": {}, "metadata": {},
"source": [ "source": [
":meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,\nso it can be omitted in your code.\n\nAfter the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.\n\n.. tip::\n\n This example uses :doc:`Python API </reference/experiment>` to create experiment.\n\n You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`.\n\n" ":meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,\nso it can be omitted in your code.\n\nAfter the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.\n\n.. tip::\n\n This example uses :doc:`Python API </reference/experiment>` to create experiment.\n\n You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.\n\n"
] ]
} }
], ],
...@@ -207,7 +207,7 @@ ...@@ -207,7 +207,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.3" "version": "3.10.4"
} }
}, },
"nbformat": 4, "nbformat": 4,
......
""" """
NNI HPO Quickstart with TensorFlow HPO Quickstart with TensorFlow
================================== ==============================
This tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning. This tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.
The tutorial consists of 4 steps: The tutorial consists of 4 steps:
...@@ -113,16 +113,16 @@ experiment.config.tuner.class_args['optimize_mode'] = 'maximize' ...@@ -113,16 +113,16 @@ experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.max_trial_number = 10 experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2 experiment.config.trial_concurrency = 2
# %% # %%
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
#
# .. note:: # .. note::
# #
# ``max_trial_number`` is set to 10 here for a fast example. # ``max_trial_number`` is set to 10 here for a fast example.
# In real world it should be set to a larger number. # In real world it should be set to a larger number.
# With default config TPE tuner requires 20 trials to warm up. # With default config TPE tuner requires 20 trials to warm up.
#
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
# %% # %%
# Step 4: Run the experiment # Step 4: Run the experiment
...@@ -154,4 +154,4 @@ experiment.stop() ...@@ -154,4 +154,4 @@ experiment.stop()
# #
# This example uses :doc:`Python API </reference/experiment>` to create experiment. # This example uses :doc:`Python API </reference/experiment>` to create experiment.
# #
# You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`. # You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
fe5546e4ae3f3dbf5e852af322dae15f b8a9880a36233005ade7a8dae6d428a8
\ No newline at end of file \ No newline at end of file
:orphan:
.. DO NOT EDIT. .. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
...@@ -19,8 +18,8 @@ ...@@ -19,8 +18,8 @@
.. _sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py: .. _sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py:
NNI HPO Quickstart with TensorFlow HPO Quickstart with TensorFlow
================================== ==============================
This tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning. This tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.
The tutorial consists of 4 steps: The tutorial consists of 4 steps:
...@@ -213,17 +212,17 @@ Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate ...@@ -213,17 +212,17 @@ Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate
.. GENERATED FROM PYTHON SOURCE LINES 116-126 .. GENERATED FROM PYTHON SOURCE LINES 116-126
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you press Ctrl-C.
.. note:: .. note::
``max_trial_number`` is set to 10 here for a fast example. ``max_trial_number`` is set to 10 here for a fast example.
In real world it should be set to a larger number. In real world it should be set to a larger number.
With default config TPE tuner requires 20 trials to warm up. With default config TPE tuner requires 20 trials to warm up.
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you press Ctrl-C.
.. GENERATED FROM PYTHON SOURCE LINES 128-133 .. GENERATED FROM PYTHON SOURCE LINES 128-133
Step 4: Run the experiment Step 4: Run the experiment
...@@ -248,10 +247,10 @@ You can use the web portal to view experiment status: http://localhost:8080. ...@@ -248,10 +247,10 @@ You can use the web portal to view experiment status: http://localhost:8080.
.. code-block:: none .. code-block:: none
[2022-03-20 21:12:19] Creating experiment, Experiment ID: 8raiuoyb [2022-04-13 12:11:34] Creating experiment, Experiment ID: enw27qxj
[2022-03-20 21:12:19] Starting web server... [2022-04-13 12:11:34] Starting web server...
[2022-03-20 21:12:20] Setting up... [2022-04-13 12:11:35] Setting up...
[2022-03-20 21:12:20] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080 [2022-04-13 12:11:35] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
True True
...@@ -285,8 +284,8 @@ allowing you to view the web portal after the experiment is done. ...@@ -285,8 +284,8 @@ allowing you to view the web portal after the experiment is done.
.. code-block:: none .. code-block:: none
[2022-03-20 21:13:41] Stopping experiment, please wait... [2022-04-13 12:12:55] Stopping experiment, please wait...
[2022-03-20 21:13:44] Experiment stopped [2022-04-13 12:12:58] Experiment stopped
...@@ -302,12 +301,12 @@ After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.vi ...@@ -302,12 +301,12 @@ After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.vi
This example uses :doc:`Python API </reference/experiment>` to create experiment. This example uses :doc:`Python API </reference/experiment>` to create experiment.
You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`. You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
.. rst-class:: sphx-glr-timing .. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 24.257 seconds) **Total running time of the script:** ( 1 minutes 24.384 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_tensorflow_main.py: .. _sphx_glr_download_tutorials_hpo_quickstart_tensorflow_main.py:
......
...@@ -5,10 +5,10 @@ ...@@ -5,10 +5,10 @@
Computation times Computation times
================= =================
**02:27.156** total execution time for **tutorials_hpo_quickstart_tensorflow** files: **01:24.384** total execution time for **tutorials_hpo_quickstart_tensorflow** files:
+-----------------------------------------------------------------------------+-----------+--------+ +-----------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_model.py` (``model.py``) | 02:27.156 | 0.0 MB | | :ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py` (``main.py``) | 01:24.384 | 0.0 MB |
+-----------------------------------------------------------------------------+-----------+--------+ +-----------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py` (``main.py``) | 00:00.000 | 0.0 MB | | :ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_model.py` (``model.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------+-----------+--------+ +-----------------------------------------------------------------------------+-----------+--------+
...@@ -189,12 +189,12 @@ Tutorials ...@@ -189,12 +189,12 @@ Tutorials
.. raw:: html .. raw:: html
<div class="sphx-glr-thumbcontainer" tooltip="There is also a TensorFlow version&lt;../hpo_quickstart_tensorflow/main&gt; if you prefer it."> <div class="sphx-glr-thumbcontainer" tooltip="The tutorial consists of 4 steps: ">
.. only:: html .. only:: html
.. figure:: /tutorials/hpo_quickstart_pytorch/images/thumb/sphx_glr_main_thumb.png .. figure:: /tutorials/hpo_quickstart_pytorch/images/thumb/sphx_glr_main_thumb.png
:alt: NNI HPO Quickstart with PyTorch :alt: HPO Quickstart with PyTorch
:ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_main.py` :ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_main.py`
...@@ -246,7 +246,7 @@ Tutorials ...@@ -246,7 +246,7 @@ Tutorials
.. only:: html .. only:: html
.. figure:: /tutorials/hpo_quickstart_tensorflow/images/thumb/sphx_glr_main_thumb.png .. figure:: /tutorials/hpo_quickstart_tensorflow/images/thumb/sphx_glr_main_thumb.png
:alt: NNI HPO Quickstart with TensorFlow :alt: HPO Quickstart with TensorFlow
:ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py` :ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py`
......
search_space:
features:
_type: choice
_value: [ 128, 256, 512, 1024 ]
lr:
_type: loguniform
_value: [ 0.0001, 0.1 ]
momentum:
_type: uniform
_value: [ 0, 1 ]
trial_command: python model.py
trial_code_directory: .
trial_concurrency: 2
max_trial_number: 10
tuner:
name: TPE
class_args:
optimize_mode: maximize
training_service:
platform: local
"""
Port PyTorch Quickstart to NNI
==============================
This is a modified version of `PyTorch quickstart`_.
It can be run directly and will have the exact same result as original version.
Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
It is recommended to run this script directly first to verify the environment.
There are 2 key differences from the original version:
1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.
.. _PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
"""
# %%
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
# %%
# Hyperparameters to be tuned
# ---------------------------
# These are the hyperparameters that will be tuned.
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
# %%
# Get optimized hyperparameters
# -----------------------------
# If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
# But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
# %%
# Load dataset
# ------------
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
# %%
# Build model with hyperparameters
# --------------------------------
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
# %%
# Define train and test
# ---------------------
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
# %%
# Train model and report accuracy
# -------------------------------
# Report accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
Run HPO Experiment with nnictl
==============================
This tutorial has exactly the same effect as :doc:`PyTorch quickstart <../hpo_quickstart_pytorch/main>`.
Both tutorials optimize the model in `official PyTorch quickstart
<https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html>`__ with auto-tuning,
while this one manages the experiment with command line tool and YAML config file, instead of pure Python code.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
2. Define hyperparameters' search space.
3. Create config file.
4. Run the experiment.
The first two steps are identical to quickstart.
Step 1: Prepare the model
-------------------------
In first step, we need to prepare the model to be tuned.
The model should be put in a separate script.
It will be evaluated many times concurrently,
and possibly will be trained on distributed platforms.
In this tutorial, the model is defined in :doc:`model.py <model>`.
In short, it is a PyTorch model with 3 additional API calls:
1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
3. Use :func:`nni.report_final_result` to report final accuracy.
Please understand the model code before continue to next step.
Step 2: Define search space
---------------------------
In model code, we have prepared 3 hyperparameters to be tuned:
*features*, *lr*, and *momentum*.
Here we need to define their *search space* so the tuning algorithm can sample them in desired range.
Assuming we have following prior knowledge for these hyperparameters:
1. *features* should be one of 128, 256, 512, 1024.
2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
3. *momentum* should be a float between 0 and 1.
In NNI, the space of *features* is called ``choice``;
the space of *lr* is called ``loguniform``;
and the space of *momentum* is called ``uniform``.
You may have noticed, these names are derived from ``numpy.random``.
For full specification of search space, check :doc:`the reference </hpo/search_space>`.
Now we can define the search space as follow:
.. code-block:: yaml
search_space:
features:
_type: choice
_value: [ 128, 256, 512, 1024 ]
lr:
_type: loguniform
_value: [ 0.0001, 0.1 ]
momentum:
_type: uniform
_value: [ 0, 1 ]
Step 3: Configure the experiment
--------------------------------
NNI uses an *experiment* to manage the HPO process.
The *experiment config* defines how to train the models and how to explore the search space.
In this tutorial we use a YAML file ``config.yaml`` to define the experiment.
Configure trial code
^^^^^^^^^^^^^^^^^^^^
In NNI evaluation of each hyperparameter set is called a *trial*.
So the model script is called *trial code*.
.. code-block:: yaml
trial_command: python model.py
trial_code_directory: .
When ``trial_code_directory`` is a relative path, it relates to the config file.
So in this case we need to put ``config.yaml`` and ``model.py`` in the same directory.
.. attention::
The rules for resolving relative path are different in YAML config file and :doc:`Python experiment API </reference/experiment>`.
In Python experiment API relative paths are relative to current working directory.
Configure how many trials to run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
.. code-block:: yaml
max_trial_number: 10
trial_concurrency: 2
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you stop it.
.. note::
``max_trial_number`` is set to 10 here for a fast example.
In real world it should be set to a larger number.
With default config TPE tuner requires 20 trials to warm up.
Configure tuning algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we use :doc:`TPE tuner </hpo/tuners>`.
.. code-block:: yaml
name: TPE
class_args:
optimize_mode: maximize
Configure training service
^^^^^^^^^^^^^^^^^^^^^^^^^^
In this tutorial we use *local* mode,
which means models will be trained on local machine, without using any special training platform.
.. code-block:: yaml
training_service:
platform: local
Wrap up
^^^^^^^
The full content of ``config.yaml`` is as follow:
.. code-block:: yaml
search_space:
features:
_type: choice
_value: [ 128, 256, 512, 1024 ]
lr:
_type: loguniform
_value: [ 0.0001, 0.1 ]
momentum:
_type: uniform
_value: [ 0, 1 ]
trial_command: python model.py
trial_code_directory: .
trial_concurrency: 2
max_trial_number: 10
tuner:
name: TPE
class_args:
optimize_mode: maximize
training_service:
platform: local
Step 4: Run the experiment
--------------------------
Now the experiment is ready. Launch it with ``nnictl create`` command:
.. code-block:: bash
$ nnictl create --config config.yaml --port 8080
You can use the web portal to view experiment status: http://localhost:8080.
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-01 12:00:00] Creating experiment, Experiment ID: p43ny6ew
[2022-04-01 12:00:00] Starting web server...
[2022-04-01 12:00:01] Setting up...
[2022-04-01 12:00:01] Web portal URLs: http://127.0.0.1:8080 http://192.168.1.1:8080
[2022-04-01 12:00:01] To stop experiment run "nnictl stop p43ny6ew" or "nnictl stop --all"
[2022-04-01 12:00:01] Reference: https://nni.readthedocs.io/en/stable/reference/nnictl.html
When the experiment is done, use ``nnictl stop`` command to stop it.
.. code-block:: bash
$ nnictl stop p43ny6ew
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
INFO: Stopping experiment 7u8yg9zw
INFO: Stop experiment success.
""" """
NNI HPO Quickstart with PyTorch HPO Quickstart with PyTorch
=============================== ===========================
This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning. This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.
There is also a :doc:`TensorFlow version<../hpo_quickstart_tensorflow/main>` if you prefer it.
The tutorial consists of 4 steps: The tutorial consists of 4 steps:
1. Modify the model for auto-tuning. 1. Modify the model for auto-tuning.
...@@ -113,16 +111,16 @@ experiment.config.tuner.class_args['optimize_mode'] = 'maximize' ...@@ -113,16 +111,16 @@ experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.max_trial_number = 10 experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2 experiment.config.trial_concurrency = 2
# %% # %%
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
#
# .. note:: # .. note::
# #
# ``max_trial_number`` is set to 10 here for a fast example. # ``max_trial_number`` is set to 10 here for a fast example.
# In real world it should be set to a larger number. # In real world it should be set to a larger number.
# With default config TPE tuner requires 20 trials to warm up. # With default config TPE tuner requires 20 trials to warm up.
#
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
# %% # %%
# Step 4: Run the experiment # Step 4: Run the experiment
...@@ -154,4 +152,4 @@ experiment.stop() ...@@ -154,4 +152,4 @@ experiment.stop()
# #
# This example uses :doc:`Python API </reference/experiment>` to create experiment. # This example uses :doc:`Python API </reference/experiment>` to create experiment.
# #
# You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`. # You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
""" """
NNI HPO Quickstart with TensorFlow HPO Quickstart with TensorFlow
================================== ==============================
This tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning. This tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.
The tutorial consists of 4 steps: The tutorial consists of 4 steps:
...@@ -113,16 +113,16 @@ experiment.config.tuner.class_args['optimize_mode'] = 'maximize' ...@@ -113,16 +113,16 @@ experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.max_trial_number = 10 experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2 experiment.config.trial_concurrency = 2
# %% # %%
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
#
# .. note:: # .. note::
# #
# ``max_trial_number`` is set to 10 here for a fast example. # ``max_trial_number`` is set to 10 here for a fast example.
# In real world it should be set to a larger number. # In real world it should be set to a larger number.
# With default config TPE tuner requires 20 trials to warm up. # With default config TPE tuner requires 20 trials to warm up.
#
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
# %% # %%
# Step 4: Run the experiment # Step 4: Run the experiment
...@@ -154,4 +154,4 @@ experiment.stop() ...@@ -154,4 +154,4 @@ experiment.stop()
# #
# This example uses :doc:`Python API </reference/experiment>` to create experiment. # This example uses :doc:`Python API </reference/experiment>` to create experiment.
# #
# You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`. # You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment