Unverified Commit 4446280d authored by liuzhe-lz's avatar liuzhe-lz Committed by GitHub
Browse files

update hpo tutorials (#4758)

parent b7dfc7cf
f3498812ae89cde34b6f0f54216012fd
\ No newline at end of file
e732cee426a4629b71f5fa28ce16fad7
\ No newline at end of file
:orphan:
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
......@@ -19,12 +18,10 @@
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_main.py:
NNI HPO Quickstart with PyTorch
===============================
HPO Quickstart with PyTorch
===========================
This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.
There is also a :doc:`TensorFlow version<../hpo_quickstart_tensorflow/main>` if you prefer it.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
......@@ -34,7 +31,7 @@ The tutorial consists of 4 steps:
.. _official PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
.. GENERATED FROM PYTHON SOURCE LINES 19-36
.. GENERATED FROM PYTHON SOURCE LINES 17-34
Step 1: Prepare the model
-------------------------
......@@ -54,7 +51,7 @@ In short, it is a PyTorch model with 3 additional API calls:
Please understand the model code before continue to next step.
.. GENERATED FROM PYTHON SOURCE LINES 38-59
.. GENERATED FROM PYTHON SOURCE LINES 36-57
Step 2: Define search space
---------------------------
......@@ -78,7 +75,7 @@ For full specification of search space, check :doc:`the reference </hpo/search_s
Now we can define the search space as follow:
.. GENERATED FROM PYTHON SOURCE LINES 59-66
.. GENERATED FROM PYTHON SOURCE LINES 57-64
.. code-block:: default
......@@ -96,7 +93,7 @@ Now we can define the search space as follow:
.. GENERATED FROM PYTHON SOURCE LINES 67-74
.. GENERATED FROM PYTHON SOURCE LINES 65-72
Step 3: Configure the experiment
--------------------------------
......@@ -106,7 +103,7 @@ The *experiment config* defines how to train the models and how to explore the s
In this tutorial we use a *local* mode experiment,
which means models will be trained on local machine, without using any special training platform.
.. GENERATED FROM PYTHON SOURCE LINES 74-77
.. GENERATED FROM PYTHON SOURCE LINES 72-75
.. code-block:: default
......@@ -120,7 +117,7 @@ which means models will be trained on local machine, without using any special t
.. GENERATED FROM PYTHON SOURCE LINES 78-84
.. GENERATED FROM PYTHON SOURCE LINES 76-82
Now we start to configure the experiment.
......@@ -129,7 +126,7 @@ Configure trial code
In NNI evaluation of each hyperparameter set is called a *trial*.
So the model script is called *trial code*.
.. GENERATED FROM PYTHON SOURCE LINES 84-86
.. GENERATED FROM PYTHON SOURCE LINES 82-84
.. code-block:: default
......@@ -142,7 +139,7 @@ So the model script is called *trial code*.
.. GENERATED FROM PYTHON SOURCE LINES 87-96
.. GENERATED FROM PYTHON SOURCE LINES 85-94
When ``trial_code_directory`` is a relative path, it relates to current working directory.
To run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.
......@@ -154,12 +151,12 @@ is only available in standard Python, not in Jupyter Notebook.)
If you are using Linux system without Conda,
you may need to change ``"python model.py"`` to ``"python3 model.py"``.
.. GENERATED FROM PYTHON SOURCE LINES 98-100
.. GENERATED FROM PYTHON SOURCE LINES 96-98
Configure search space
^^^^^^^^^^^^^^^^^^^^^^
.. GENERATED FROM PYTHON SOURCE LINES 100-102
.. GENERATED FROM PYTHON SOURCE LINES 98-100
.. code-block:: default
......@@ -172,13 +169,13 @@ Configure search space
.. GENERATED FROM PYTHON SOURCE LINES 103-106
.. GENERATED FROM PYTHON SOURCE LINES 101-104
Configure tuning algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we use :doc:`TPE tuner </hpo/tuners>`.
.. GENERATED FROM PYTHON SOURCE LINES 106-109
.. GENERATED FROM PYTHON SOURCE LINES 104-107
.. code-block:: default
......@@ -192,13 +189,13 @@ Here we use :doc:`TPE tuner </hpo/tuners>`.
.. GENERATED FROM PYTHON SOURCE LINES 110-113
.. GENERATED FROM PYTHON SOURCE LINES 108-111
Configure how many trials to run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
.. GENERATED FROM PYTHON SOURCE LINES 113-115
.. GENERATED FROM PYTHON SOURCE LINES 111-113
.. code-block:: default
......@@ -211,7 +208,12 @@ Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate
.. GENERATED FROM PYTHON SOURCE LINES 116-126
.. GENERATED FROM PYTHON SOURCE LINES 114-124
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you press Ctrl-C.
.. note::
......@@ -219,12 +221,7 @@ Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate
In real world it should be set to a larger number.
With default config TPE tuner requires 20 trials to warm up.
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you press Ctrl-C.
.. GENERATED FROM PYTHON SOURCE LINES 128-133
.. GENERATED FROM PYTHON SOURCE LINES 126-131
Step 4: Run the experiment
--------------------------
......@@ -232,7 +229,7 @@ Now the experiment is ready. Choose a port and launch it. (Here we use port 8080
You can use the web portal to view experiment status: http://localhost:8080.
.. GENERATED FROM PYTHON SOURCE LINES 133-135
.. GENERATED FROM PYTHON SOURCE LINES 131-133
.. code-block:: default
......@@ -248,16 +245,16 @@ You can use the web portal to view experiment status: http://localhost:8080.
.. code-block:: none
[2022-03-20 21:07:36] Creating experiment, Experiment ID: p43ny6ew
[2022-03-20 21:07:36] Starting web server...
[2022-03-20 21:07:37] Setting up...
[2022-03-20 21:07:37] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
[2022-04-13 12:07:29] Creating experiment, Experiment ID: hgkju3iq
[2022-04-13 12:07:29] Starting web server...
[2022-04-13 12:07:30] Setting up...
[2022-04-13 12:07:30] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
True
.. GENERATED FROM PYTHON SOURCE LINES 136-143
.. GENERATED FROM PYTHON SOURCE LINES 134-141
After the experiment is done
----------------------------
......@@ -267,7 +264,7 @@ If you are using standard Python instead of Jupyter Notebook,
you can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,
allowing you to view the web portal after the experiment is done.
.. GENERATED FROM PYTHON SOURCE LINES 143-147
.. GENERATED FROM PYTHON SOURCE LINES 141-145
.. code-block:: default
......@@ -285,13 +282,13 @@ allowing you to view the web portal after the experiment is done.
.. code-block:: none
[2022-03-20 21:08:57] Stopping experiment, please wait...
[2022-03-20 21:09:00] Experiment stopped
[2022-04-13 12:08:50] Stopping experiment, please wait...
[2022-04-13 12:08:53] Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 148-158
.. GENERATED FROM PYTHON SOURCE LINES 146-156
:meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,
so it can be omitted in your code.
......@@ -302,12 +299,12 @@ After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.vi
This example uses :doc:`Python API </reference/experiment>` to create experiment.
You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`.
You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 24.393 seconds)
**Total running time of the script:** ( 1 minutes 24.367 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py:
......
.. a395c59bf5359c3583b7a0a3ab66d705
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_quickstart_pytorch/main.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_main.py:
HPO 教程(PyTorch 版本)
========================
本教程对 `PyTorch 官方教程 <https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html>`__ 进行超参调优。
教程分为四步:
1. 修改调优的模型代码;
2. 定义超参的搜索空间;
3. 配置实验;
4. 运行实验。
.. GENERATED FROM PYTHON SOURCE LINES 17-34
步骤一:准备模型
----------------
首先,我们需要准备待调优的模型。
由于被调优的模型会被独立地运行多次,
并且使用特定训练平台时还可能会被上传到云端执行,
我们需要将代码写在另一个 py 文件中。
本教程使用的模型的代码是 :doc:`model.py <model>`。
模型代码在一个普通的 PyTorch 模型基础之上,增加了3个 API 调用:
1. 使用 :func:`nni.get_next_parameter` 获取需要评估的超参;
2. 使用 :func:`nni.report_intermediate_result` 报告每个 epoch 产生的中间训练结果;
3. 使用 :func:`nni.report_final_result` 报告最终准确率。
请先理解模型代码,然后再继续下一步。
.. GENERATED FROM PYTHON SOURCE LINES 36-57
步骤二:定义搜索空间
--------------------
在模型代码中,我们准备了三个需要调优的超参:features、lr、momentum。
现在,我们需要定义的它们的“搜索空间”,指定它们的取值范围和分布规律。
假设我们对三个超参有以下先验知识:
1. features 的取值可以为128、256、512、1024;
2. lr 的取值在0.0001到0.1之间,其取值符合指数分布;
3. momentum 的取值在0到1之间。
在 NNI 中,features 的取值范围称为 ``choice`` ,
lr 的取值范围称为 ``loguniform`` ,
momentum 的取值范围称为 ``uniform`` 。
您可能已经注意到了,这些名称和 ``numpy.random`` 中的函数名一致。
完整的搜索空间文档: :doc:`/hpo/search_space`.
我们的搜索空间定义如下:
.. GENERATED FROM PYTHON SOURCE LINES 57-64
.. code-block:: default
search_space = {
'features': {'_type': 'choice', '_value': [128, 256, 512, 1024]},
'lr': {'_type': 'loguniform', '_value': [0.0001, 0.1]},
'momentum': {'_type': 'uniform', '_value': [0, 1]},
}
.. GENERATED FROM PYTHON SOURCE LINES 65-72
步骤三:配置实验
----------------
NNI 使用“实验”来管理超参调优,“实验配置”定义了如何训练模型、如何遍历搜索空间。
在本教程中我们使用 local 模式的实验,这意味着实验只在本机运行,不使用任何特别的训练平台。
.. GENERATED FROM PYTHON SOURCE LINES 72-75
.. code-block:: default
from nni.experiment import Experiment
experiment = Experiment('local')
.. GENERATED FROM PYTHON SOURCE LINES 76-82
现在我们开始配置实验。
配置 trial
^^^^^^^^^^
在 NNI 中评估一组超参的过程被称为一个“trial”(试验),上面的模型代码被称为“trial 代码”。
.. GENERATED FROM PYTHON SOURCE LINES 82-84
.. code-block:: default
experiment.config.trial_command = 'python model.py'
experiment.config.trial_code_directory = '.'
.. GENERATED FROM PYTHON SOURCE LINES 85-94
如果 ``trial_code_directory`` 是一个相对路径,它被认为相对于当前的工作目录。
如果您想在其他路径下运行本文件 ``main.py`` ,您可以将代码目录设置为 ``Path(__file__).parent`` 。
(`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__
只能在 py 文件中使用,不能在 Jupyter Notebook 中使用)
.. attention::
如果您使用 Linux 系统,并且没有使用 Conda,
您可能需要将 ``"python model.py"`` 改为 ``"python3 model.py"`` 。
.. GENERATED FROM PYTHON SOURCE LINES 96-98
配置搜索空间
^^^^^^^^^^^^
.. GENERATED FROM PYTHON SOURCE LINES 98-100
.. code-block:: default
experiment.config.search_space = search_space
.. GENERATED FROM PYTHON SOURCE LINES 101-104
配置调优算法
^^^^^^^^^^^^
此处我们使用 :doc:`TPE 算法</hpo/tuners>` 。
.. GENERATED FROM PYTHON SOURCE LINES 104-107
.. code-block:: default
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
.. GENERATED FROM PYTHON SOURCE LINES 108-111
配置运行多少 trial
^^^^^^^^^^^^^^^^^^
本教程中我们总共尝试10组超参,并且每次并行地评估2组超参。
.. GENERATED FROM PYTHON SOURCE LINES 111-113
.. code-block:: default
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
.. GENERATED FROM PYTHON SOURCE LINES 114-124
您也可以设置 ``max_experiment_duration = '1h'`` 来限制运行时间。
如果 ``max_trial_number`` 和 ``max_experiment_duration`` 都没有设置,实验将会一直运行,直到您按下 Ctrl-C。
.. note::
此处将 ``max_trial_number`` 设置为10是为了让教程能够较快地运行结束,
在实际使用中应该设为更大的数值,TPE 算法在默认参数下需要评估20组超参才会完成初始化。
.. GENERATED FROM PYTHON SOURCE LINES 126-131
步骤四:运行实验
----------------
现在实验已经配置完成了,您可以指定一个端口来运行它,教程中我们使用8080端口。
您可以通过网页控制台查看实验状态: http://localhost:8080.
.. GENERATED FROM PYTHON SOURCE LINES 131-133
.. code-block:: default
experiment.run(8080)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:07:29] Creating experiment, Experiment ID: hgkju3iq
[2022-04-13 12:07:29] Starting web server...
[2022-04-13 12:07:30] Setting up...
[2022-04-13 12:07:30] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
True
.. GENERATED FROM PYTHON SOURCE LINES 134-141
实验结束之后
------------
您只需要等待函数返回就可以正常结束实验,以下内容为可选项。
如果您使用的是普通 Python 而不是 Jupyter Notebook,
您可以在代码末尾加上一行 ``input()`` 或者 ``signal.pause()`` 来避免 Python 解释器自动退出,
这样您就能继续使用网页控制台。
.. GENERATED FROM PYTHON SOURCE LINES 141-145
.. code-block:: default
# input('Press enter to quit')
experiment.stop()
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:08:50] Stopping experiment, please wait...
[2022-04-13 12:08:53] Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 146-156
:meth:`nni.experiment.Experiment.stop` 会在 Python 退出前自动调用,所以您可以将其省略,不写在自己的代码中。
实验完全停止之后,您可以使用 :meth:`nni.experiment.Experiment.view` 重新启动网页控制台。
.. tip::
本教程使用 :doc:`Python API </reference/experiment>` 创建实验,
除此之外您也可以选择使用 :doc:`命令行工具 <../hpo_nnictl/nnictl>` 。
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 24.367 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: main.py <main.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: main.ipynb <main.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
.. e083b4dc8e350428ddf680e97b47cc8e
:orphan:
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_quickstart_pytorch/model.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_quickstart_pytorch_model.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_model.py:
将 PyTorch 官方教程移植到NNI
============================
本文件是 `PyTorch 官方教程 <https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.htlm>`__ 的修改版。
您可以直接运行本文件,其结果和原版完全一致。同时,您也可以在 NNI 实验中使用本文件,进行超参调优。
我们建议您先直接运行一次本文件,在熟悉代码的同时检查运行环境。
和原版相比,我们做了两处修改:
1. `获取调优后的参数`_ 部分,我们使用调优算法生成的参数替换默认参数;
2. `训练模型并上传结果`_ 部分,我们将准确率数据报告给 NNI
.. GENERATED FROM PYTHON SOURCE LINES 21-28
.. code-block:: default
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
.. GENERATED FROM PYTHON SOURCE LINES 29-32
准备调优的超参
--------------
以下超参将被调优:
.. GENERATED FROM PYTHON SOURCE LINES 32-38
.. code-block:: default
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
.. GENERATED FROM PYTHON SOURCE LINES 39-43
获取调优后的参数
----------------
直接运行时 :func:`nni.get_next_parameter` 会返回空 dict
而在 NNI 实验中使用时,它会返回调优算法生成的超参组合。
.. GENERATED FROM PYTHON SOURCE LINES 43-47
.. code-block:: default
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
{'features': 512, 'lr': 0.001, 'momentum': 0}
.. GENERATED FROM PYTHON SOURCE LINES 48-50
加载数据集
----------
.. GENERATED FROM PYTHON SOURCE LINES 50-58
.. code-block:: default
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
.. GENERATED FROM PYTHON SOURCE LINES 59-61
使用超参构建模型
----------------
.. GENERATED FROM PYTHON SOURCE LINES 61-86
.. code-block:: default
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Using cpu device
.. GENERATED FROM PYTHON SOURCE LINES 87-89
定义训练和测试函数
------------------
.. GENERATED FROM PYTHON SOURCE LINES 89-115
.. code-block:: default
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
.. GENERATED FROM PYTHON SOURCE LINES 116-119
训练模型并上传结果
------------------
将准确率数据报告给 NNI 的调参算法,以使其能够预测更优的超参组合。
.. GENERATED FROM PYTHON SOURCE LINES 119-126
.. code-block:: default
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Epoch 1
-------------------------------
[2022-03-21 01:09:37] INFO (nni/MainThread) Intermediate result: 0.461 (Index 0)
Epoch 2
-------------------------------
[2022-03-21 01:09:42] INFO (nni/MainThread) Intermediate result: 0.5529 (Index 1)
Epoch 3
-------------------------------
[2022-03-21 01:09:47] INFO (nni/MainThread) Intermediate result: 0.6155 (Index 2)
Epoch 4
-------------------------------
[2022-03-21 01:09:52] INFO (nni/MainThread) Intermediate result: 0.6345 (Index 3)
Epoch 5
-------------------------------
[2022-03-21 01:09:56] INFO (nni/MainThread) Intermediate result: 0.6505 (Index 4)
[2022-03-21 01:09:56] INFO (nni/MainThread) Final result: 0.6505
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 24.441 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_model.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: model.py <model.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: model.ipynb <model.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
......@@ -5,10 +5,10 @@
Computation times
=================
**00:24.441** total execution time for **tutorials_hpo_quickstart_pytorch** files:
**01:24.367** total execution time for **tutorials_hpo_quickstart_pytorch** files:
+--------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_model.py` (``model.py``) | 00:24.441 | 0.0 MB |
| :ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_main.py` (``main.py``) | 01:24.367 | 0.0 MB |
+--------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_main.py` (``main.py``) | 00:00.000 | 0.0 MB |
| :ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_model.py` (``model.py``) | 00:00.000 | 0.0 MB |
+--------------------------------------------------------------------------+-----------+--------+
......@@ -15,7 +15,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# NNI HPO Quickstart with TensorFlow\nThis tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.\n\nThe tutorial consists of 4 steps: \n\n1. Modify the model for auto-tuning.\n2. Define hyperparameters' search space.\n3. Configure the experiment.\n4. Run the experiment.\n\n"
"\n# HPO Quickstart with TensorFlow\nThis tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.\n\nThe tutorial consists of 4 steps: \n\n1. Modify the model for auto-tuning.\n2. Define hyperparameters' search space.\n3. Configure the experiment.\n4. Run the experiment.\n\n"
]
},
{
......@@ -144,7 +144,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-info\"><h4>Note</h4><p>``max_trial_number`` is set to 10 here for a fast example.\n In real world it should be set to a larger number.\n With default config TPE tuner requires 20 trials to warm up.</p></div>\n\nYou may also set ``max_experiment_duration = '1h'`` to limit running time.\n\nIf neither ``max_trial_number`` nor ``max_experiment_duration`` are set,\nthe experiment will run forever until you press Ctrl-C.\n\n"
"You may also set ``max_experiment_duration = '1h'`` to limit running time.\n\nIf neither ``max_trial_number`` nor ``max_experiment_duration`` are set,\nthe experiment will run forever until you press Ctrl-C.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>``max_trial_number`` is set to 10 here for a fast example.\n In real world it should be set to a larger number.\n With default config TPE tuner requires 20 trials to warm up.</p></div>\n\n"
]
},
{
......@@ -187,7 +187,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
":meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,\nso it can be omitted in your code.\n\nAfter the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.\n\n.. tip::\n\n This example uses :doc:`Python API </reference/experiment>` to create experiment.\n\n You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`.\n\n"
":meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,\nso it can be omitted in your code.\n\nAfter the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.\n\n.. tip::\n\n This example uses :doc:`Python API </reference/experiment>` to create experiment.\n\n You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.\n\n"
]
}
],
......@@ -207,7 +207,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.3"
"version": "3.10.4"
}
},
"nbformat": 4,
......
"""
NNI HPO Quickstart with TensorFlow
==================================
HPO Quickstart with TensorFlow
==============================
This tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.
The tutorial consists of 4 steps:
......@@ -113,16 +113,16 @@ experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
# %%
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
#
# .. note::
#
# ``max_trial_number`` is set to 10 here for a fast example.
# In real world it should be set to a larger number.
# With default config TPE tuner requires 20 trials to warm up.
#
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
# %%
# Step 4: Run the experiment
......@@ -154,4 +154,4 @@ experiment.stop()
#
# This example uses :doc:`Python API </reference/experiment>` to create experiment.
#
# You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`.
# You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
fe5546e4ae3f3dbf5e852af322dae15f
\ No newline at end of file
b8a9880a36233005ade7a8dae6d428a8
\ No newline at end of file
:orphan:
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
......@@ -19,8 +18,8 @@
.. _sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py:
NNI HPO Quickstart with TensorFlow
==================================
HPO Quickstart with TensorFlow
==============================
This tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.
The tutorial consists of 4 steps:
......@@ -213,17 +212,17 @@ Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate
.. GENERATED FROM PYTHON SOURCE LINES 116-126
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you press Ctrl-C.
.. note::
``max_trial_number`` is set to 10 here for a fast example.
In real world it should be set to a larger number.
With default config TPE tuner requires 20 trials to warm up.
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you press Ctrl-C.
.. GENERATED FROM PYTHON SOURCE LINES 128-133
Step 4: Run the experiment
......@@ -248,10 +247,10 @@ You can use the web portal to view experiment status: http://localhost:8080.
.. code-block:: none
[2022-03-20 21:12:19] Creating experiment, Experiment ID: 8raiuoyb
[2022-03-20 21:12:19] Starting web server...
[2022-03-20 21:12:20] Setting up...
[2022-03-20 21:12:20] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
[2022-04-13 12:11:34] Creating experiment, Experiment ID: enw27qxj
[2022-04-13 12:11:34] Starting web server...
[2022-04-13 12:11:35] Setting up...
[2022-04-13 12:11:35] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
True
......@@ -285,8 +284,8 @@ allowing you to view the web portal after the experiment is done.
.. code-block:: none
[2022-03-20 21:13:41] Stopping experiment, please wait...
[2022-03-20 21:13:44] Experiment stopped
[2022-04-13 12:12:55] Stopping experiment, please wait...
[2022-04-13 12:12:58] Experiment stopped
......@@ -302,12 +301,12 @@ After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.vi
This example uses :doc:`Python API </reference/experiment>` to create experiment.
You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`.
You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 24.257 seconds)
**Total running time of the script:** ( 1 minutes 24.384 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_tensorflow_main.py:
......
......@@ -5,10 +5,10 @@
Computation times
=================
**02:27.156** total execution time for **tutorials_hpo_quickstart_tensorflow** files:
**01:24.384** total execution time for **tutorials_hpo_quickstart_tensorflow** files:
+-----------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_model.py` (``model.py``) | 02:27.156 | 0.0 MB |
| :ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py` (``main.py``) | 01:24.384 | 0.0 MB |
+-----------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py` (``main.py``) | 00:00.000 | 0.0 MB |
| :ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_model.py` (``model.py``) | 00:00.000 | 0.0 MB |
+-----------------------------------------------------------------------------+-----------+--------+
......@@ -189,12 +189,12 @@ Tutorials
.. raw:: html
<div class="sphx-glr-thumbcontainer" tooltip="There is also a TensorFlow version&lt;../hpo_quickstart_tensorflow/main&gt; if you prefer it.">
<div class="sphx-glr-thumbcontainer" tooltip="The tutorial consists of 4 steps: ">
.. only:: html
.. figure:: /tutorials/hpo_quickstart_pytorch/images/thumb/sphx_glr_main_thumb.png
:alt: NNI HPO Quickstart with PyTorch
:alt: HPO Quickstart with PyTorch
:ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_main.py`
......@@ -246,7 +246,7 @@ Tutorials
.. only:: html
.. figure:: /tutorials/hpo_quickstart_tensorflow/images/thumb/sphx_glr_main_thumb.png
:alt: NNI HPO Quickstart with TensorFlow
:alt: HPO Quickstart with TensorFlow
:ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py`
......
search_space:
features:
_type: choice
_value: [ 128, 256, 512, 1024 ]
lr:
_type: loguniform
_value: [ 0.0001, 0.1 ]
momentum:
_type: uniform
_value: [ 0, 1 ]
trial_command: python model.py
trial_code_directory: .
trial_concurrency: 2
max_trial_number: 10
tuner:
name: TPE
class_args:
optimize_mode: maximize
training_service:
platform: local
"""
Port PyTorch Quickstart to NNI
==============================
This is a modified version of `PyTorch quickstart`_.
It can be run directly and will have the exact same result as original version.
Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
It is recommended to run this script directly first to verify the environment.
There are 2 key differences from the original version:
1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.
.. _PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
"""
# %%
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
# %%
# Hyperparameters to be tuned
# ---------------------------
# These are the hyperparameters that will be tuned.
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
# %%
# Get optimized hyperparameters
# -----------------------------
# If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
# But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
# %%
# Load dataset
# ------------
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
# %%
# Build model with hyperparameters
# --------------------------------
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
# %%
# Define train and test
# ---------------------
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
# %%
# Train model and report accuracy
# -------------------------------
# Report accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
Run HPO Experiment with nnictl
==============================
This tutorial has exactly the same effect as :doc:`PyTorch quickstart <../hpo_quickstart_pytorch/main>`.
Both tutorials optimize the model in `official PyTorch quickstart
<https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html>`__ with auto-tuning,
while this one manages the experiment with command line tool and YAML config file, instead of pure Python code.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
2. Define hyperparameters' search space.
3. Create config file.
4. Run the experiment.
The first two steps are identical to quickstart.
Step 1: Prepare the model
-------------------------
In first step, we need to prepare the model to be tuned.
The model should be put in a separate script.
It will be evaluated many times concurrently,
and possibly will be trained on distributed platforms.
In this tutorial, the model is defined in :doc:`model.py <model>`.
In short, it is a PyTorch model with 3 additional API calls:
1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
3. Use :func:`nni.report_final_result` to report final accuracy.
Please understand the model code before continue to next step.
Step 2: Define search space
---------------------------
In model code, we have prepared 3 hyperparameters to be tuned:
*features*, *lr*, and *momentum*.
Here we need to define their *search space* so the tuning algorithm can sample them in desired range.
Assuming we have following prior knowledge for these hyperparameters:
1. *features* should be one of 128, 256, 512, 1024.
2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
3. *momentum* should be a float between 0 and 1.
In NNI, the space of *features* is called ``choice``;
the space of *lr* is called ``loguniform``;
and the space of *momentum* is called ``uniform``.
You may have noticed, these names are derived from ``numpy.random``.
For full specification of search space, check :doc:`the reference </hpo/search_space>`.
Now we can define the search space as follow:
.. code-block:: yaml
search_space:
features:
_type: choice
_value: [ 128, 256, 512, 1024 ]
lr:
_type: loguniform
_value: [ 0.0001, 0.1 ]
momentum:
_type: uniform
_value: [ 0, 1 ]
Step 3: Configure the experiment
--------------------------------
NNI uses an *experiment* to manage the HPO process.
The *experiment config* defines how to train the models and how to explore the search space.
In this tutorial we use a YAML file ``config.yaml`` to define the experiment.
Configure trial code
^^^^^^^^^^^^^^^^^^^^
In NNI evaluation of each hyperparameter set is called a *trial*.
So the model script is called *trial code*.
.. code-block:: yaml
trial_command: python model.py
trial_code_directory: .
When ``trial_code_directory`` is a relative path, it relates to the config file.
So in this case we need to put ``config.yaml`` and ``model.py`` in the same directory.
.. attention::
The rules for resolving relative path are different in YAML config file and :doc:`Python experiment API </reference/experiment>`.
In Python experiment API relative paths are relative to current working directory.
Configure how many trials to run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
.. code-block:: yaml
max_trial_number: 10
trial_concurrency: 2
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you stop it.
.. note::
``max_trial_number`` is set to 10 here for a fast example.
In real world it should be set to a larger number.
With default config TPE tuner requires 20 trials to warm up.
Configure tuning algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we use :doc:`TPE tuner </hpo/tuners>`.
.. code-block:: yaml
name: TPE
class_args:
optimize_mode: maximize
Configure training service
^^^^^^^^^^^^^^^^^^^^^^^^^^
In this tutorial we use *local* mode,
which means models will be trained on local machine, without using any special training platform.
.. code-block:: yaml
training_service:
platform: local
Wrap up
^^^^^^^
The full content of ``config.yaml`` is as follow:
.. code-block:: yaml
search_space:
features:
_type: choice
_value: [ 128, 256, 512, 1024 ]
lr:
_type: loguniform
_value: [ 0.0001, 0.1 ]
momentum:
_type: uniform
_value: [ 0, 1 ]
trial_command: python model.py
trial_code_directory: .
trial_concurrency: 2
max_trial_number: 10
tuner:
name: TPE
class_args:
optimize_mode: maximize
training_service:
platform: local
Step 4: Run the experiment
--------------------------
Now the experiment is ready. Launch it with ``nnictl create`` command:
.. code-block:: bash
$ nnictl create --config config.yaml --port 8080
You can use the web portal to view experiment status: http://localhost:8080.
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-01 12:00:00] Creating experiment, Experiment ID: p43ny6ew
[2022-04-01 12:00:00] Starting web server...
[2022-04-01 12:00:01] Setting up...
[2022-04-01 12:00:01] Web portal URLs: http://127.0.0.1:8080 http://192.168.1.1:8080
[2022-04-01 12:00:01] To stop experiment run "nnictl stop p43ny6ew" or "nnictl stop --all"
[2022-04-01 12:00:01] Reference: https://nni.readthedocs.io/en/stable/reference/nnictl.html
When the experiment is done, use ``nnictl stop`` command to stop it.
.. code-block:: bash
$ nnictl stop p43ny6ew
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
INFO: Stopping experiment 7u8yg9zw
INFO: Stop experiment success.
"""
NNI HPO Quickstart with PyTorch
===============================
HPO Quickstart with PyTorch
===========================
This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.
There is also a :doc:`TensorFlow version<../hpo_quickstart_tensorflow/main>` if you prefer it.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
......@@ -113,16 +111,16 @@ experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
# %%
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
#
# .. note::
#
# ``max_trial_number`` is set to 10 here for a fast example.
# In real world it should be set to a larger number.
# With default config TPE tuner requires 20 trials to warm up.
#
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
# %%
# Step 4: Run the experiment
......@@ -154,4 +152,4 @@ experiment.stop()
#
# This example uses :doc:`Python API </reference/experiment>` to create experiment.
#
# You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`.
# You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
"""
NNI HPO Quickstart with TensorFlow
==================================
HPO Quickstart with TensorFlow
==============================
This tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.
The tutorial consists of 4 steps:
......@@ -113,16 +113,16 @@ experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
# %%
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
#
# .. note::
#
# ``max_trial_number`` is set to 10 here for a fast example.
# In real world it should be set to a larger number.
# With default config TPE tuner requires 20 trials to warm up.
#
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
# %%
# Step 4: Run the experiment
......@@ -154,4 +154,4 @@ experiment.stop()
#
# This example uses :doc:`Python API </reference/experiment>` to create experiment.
#
# You can also create and manage experiments with :doc:`command line tool </reference/nnictl>`.
# You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment