Commit e773dfcc authored by qianyj's avatar qianyj
Browse files

create branch for v2.9

parents
.. _sphx_glr_tutorials_hpo_quickstart_pytorch:
.. raw:: html
<div class="sphx-glr-thumbnails">
.. raw:: html
<div class="sphx-glr-thumbcontainer" tooltip="The tutorial consists of 4 steps: ">
.. only:: html
.. image:: /tutorials/hpo_quickstart_pytorch/images/thumb/sphx_glr_main_thumb.png
:alt: HPO Quickstart with PyTorch
:ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_main.py`
.. raw:: html
<div class="sphx-glr-thumbnail-title">HPO Quickstart with PyTorch</div>
</div>
.. raw:: html
<div class="sphx-glr-thumbcontainer" tooltip="It can be run directly and will have the exact same result as original version.">
.. only:: html
.. image:: /tutorials/hpo_quickstart_pytorch/images/thumb/sphx_glr_model_thumb.png
:alt: Port PyTorch Quickstart to NNI
:ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_model.py`
.. raw:: html
<div class="sphx-glr-thumbnail-title">Port PyTorch Quickstart to NNI</div>
</div>
.. raw:: html
</div>
.. toctree::
:hidden:
/tutorials/hpo_quickstart_pytorch/main
/tutorials/hpo_quickstart_pytorch/model
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# HPO Quickstart with PyTorch\nThis tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.\n\nThe tutorial consists of 4 steps: \n\n1. Modify the model for auto-tuning.\n2. Define hyperparameters' search space.\n3. Configure the experiment.\n4. Run the experiment.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Prepare the model\nIn first step, we need to prepare the model to be tuned.\n\nThe model should be put in a separate script.\nIt will be evaluated many times concurrently,\nand possibly will be trained on distributed platforms.\n\nIn this tutorial, the model is defined in :doc:`model.py <model>`.\n\nIn short, it is a PyTorch model with 3 additional API calls:\n\n1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.\n2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.\n3. Use :func:`nni.report_final_result` to report final accuracy.\n\nPlease understand the model code before continue to next step.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Define search space\nIn model code, we have prepared 3 hyperparameters to be tuned:\n*features*, *lr*, and *momentum*.\n\nHere we need to define their *search space* so the tuning algorithm can sample them in desired range.\n\nAssuming we have following prior knowledge for these hyperparameters:\n\n1. *features* should be one of 128, 256, 512, 1024.\n2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.\n3. *momentum* should be a float between 0 and 1.\n\nIn NNI, the space of *features* is called ``choice``;\nthe space of *lr* is called ``loguniform``;\nand the space of *momentum* is called ``uniform``.\nYou may have noticed, these names are derived from ``numpy.random``.\n\nFor full specification of search space, check :doc:`the reference </hpo/search_space>`.\n\nNow we can define the search space as follow:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"search_space = {\n 'features': {'_type': 'choice', '_value': [128, 256, 512, 1024]},\n 'lr': {'_type': 'loguniform', '_value': [0.0001, 0.1]},\n 'momentum': {'_type': 'uniform', '_value': [0, 1]},\n}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Configure the experiment\nNNI uses an *experiment* to manage the HPO process.\nThe *experiment config* defines how to train the models and how to explore the search space.\n\nIn this tutorial we use a *local* mode experiment,\nwhich means models will be trained on local machine, without using any special training platform.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from nni.experiment import Experiment\nexperiment = Experiment('local')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we start to configure the experiment.\n\n### Configure trial code\nIn NNI evaluation of each hyperparameter set is called a *trial*.\nSo the model script is called *trial code*.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.config.trial_command = 'python model.py'\nexperiment.config.trial_code_directory = '.'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When ``trial_code_directory`` is a relative path, it relates to current working directory.\nTo run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.\n(`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__\nis only available in standard Python, not in Jupyter Notebook.)\n\n.. attention::\n\n If you are using Linux system without Conda,\n you may need to change ``\"python model.py\"`` to ``\"python3 model.py\"``.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure search space\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.config.search_space = search_space"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure tuning algorithm\nHere we use :doc:`TPE tuner </hpo/tuners>`.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.config.tuner.name = 'TPE'\nexperiment.config.tuner.class_args['optimize_mode'] = 'maximize'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure how many trials to run\nHere we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.config.max_trial_number = 10\nexperiment.config.trial_concurrency = 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You may also set ``max_experiment_duration = '1h'`` to limit running time.\n\nIf neither ``max_trial_number`` nor ``max_experiment_duration`` are set,\nthe experiment will run forever until you press Ctrl-C.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>``max_trial_number`` is set to 10 here for a fast example.\n In real world it should be set to a larger number.\n With default config TPE tuner requires 20 trials to warm up.</p></div>\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Run the experiment\nNow the experiment is ready. Choose a port and launch it. (Here we use port 8080.)\n\nYou can use the web portal to view experiment status: http://localhost:8080.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.run(8080)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## After the experiment is done\nEverything is done and it is safe to exit now. The following are optional.\n\nIf you are using standard Python instead of Jupyter Notebook,\nyou can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,\nallowing you to view the web portal after the experiment is done.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# input('Press enter to quit')\nexperiment.stop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,\nso it can be omitted in your code.\n\nAfter the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.\n\n.. tip::\n\n This example uses :doc:`Python API </reference/experiment>` to create experiment.\n\n You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.\n\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
\ No newline at end of file
"""
HPO Quickstart with PyTorch
===========================
This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
2. Define hyperparameters' search space.
3. Configure the experiment.
4. Run the experiment.
.. _official PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
"""
# %%
# Step 1: Prepare the model
# -------------------------
# In first step, we need to prepare the model to be tuned.
#
# The model should be put in a separate script.
# It will be evaluated many times concurrently,
# and possibly will be trained on distributed platforms.
#
# In this tutorial, the model is defined in :doc:`model.py <model>`.
#
# In short, it is a PyTorch model with 3 additional API calls:
#
# 1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
# 2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
# 3. Use :func:`nni.report_final_result` to report final accuracy.
#
# Please understand the model code before continue to next step.
# %%
# Step 2: Define search space
# ---------------------------
# In model code, we have prepared 3 hyperparameters to be tuned:
# *features*, *lr*, and *momentum*.
#
# Here we need to define their *search space* so the tuning algorithm can sample them in desired range.
#
# Assuming we have following prior knowledge for these hyperparameters:
#
# 1. *features* should be one of 128, 256, 512, 1024.
# 2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
# 3. *momentum* should be a float between 0 and 1.
#
# In NNI, the space of *features* is called ``choice``;
# the space of *lr* is called ``loguniform``;
# and the space of *momentum* is called ``uniform``.
# You may have noticed, these names are derived from ``numpy.random``.
#
# For full specification of search space, check :doc:`the reference </hpo/search_space>`.
#
# Now we can define the search space as follow:
search_space = {
'features': {'_type': 'choice', '_value': [128, 256, 512, 1024]},
'lr': {'_type': 'loguniform', '_value': [0.0001, 0.1]},
'momentum': {'_type': 'uniform', '_value': [0, 1]},
}
# %%
# Step 3: Configure the experiment
# --------------------------------
# NNI uses an *experiment* to manage the HPO process.
# The *experiment config* defines how to train the models and how to explore the search space.
#
# In this tutorial we use a *local* mode experiment,
# which means models will be trained on local machine, without using any special training platform.
from nni.experiment import Experiment
experiment = Experiment('local')
# %%
# Now we start to configure the experiment.
#
# Configure trial code
# ^^^^^^^^^^^^^^^^^^^^
# In NNI evaluation of each hyperparameter set is called a *trial*.
# So the model script is called *trial code*.
experiment.config.trial_command = 'python model.py'
experiment.config.trial_code_directory = '.'
# %%
# When ``trial_code_directory`` is a relative path, it relates to current working directory.
# To run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.
# (`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__
# is only available in standard Python, not in Jupyter Notebook.)
#
# .. attention::
#
# If you are using Linux system without Conda,
# you may need to change ``"python model.py"`` to ``"python3 model.py"``.
# %%
# Configure search space
# ^^^^^^^^^^^^^^^^^^^^^^
experiment.config.search_space = search_space
# %%
# Configure tuning algorithm
# ^^^^^^^^^^^^^^^^^^^^^^^^^^
# Here we use :doc:`TPE tuner </hpo/tuners>`.
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
# %%
# Configure how many trials to run
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
# %%
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
#
# .. note::
#
# ``max_trial_number`` is set to 10 here for a fast example.
# In real world it should be set to a larger number.
# With default config TPE tuner requires 20 trials to warm up.
# %%
# Step 4: Run the experiment
# --------------------------
# Now the experiment is ready. Choose a port and launch it. (Here we use port 8080.)
#
# You can use the web portal to view experiment status: http://localhost:8080.
experiment.run(8080)
# %%
# After the experiment is done
# ----------------------------
# Everything is done and it is safe to exit now. The following are optional.
#
# If you are using standard Python instead of Jupyter Notebook,
# you can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,
# allowing you to view the web portal after the experiment is done.
# input('Press enter to quit')
experiment.stop()
# %%
# :meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,
# so it can be omitted in your code.
#
# After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.
#
# .. tip::
#
# This example uses :doc:`Python API </reference/experiment>` to create experiment.
#
# You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
e732cee426a4629b71f5fa28ce16fad7
\ No newline at end of file
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_quickstart_pytorch/main.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_main.py:
HPO Quickstart with PyTorch
===========================
This tutorial optimizes the model in `official PyTorch quickstart`_ with auto-tuning.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
2. Define hyperparameters' search space.
3. Configure the experiment.
4. Run the experiment.
.. _official PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
.. GENERATED FROM PYTHON SOURCE LINES 17-34
Step 1: Prepare the model
-------------------------
In first step, we need to prepare the model to be tuned.
The model should be put in a separate script.
It will be evaluated many times concurrently,
and possibly will be trained on distributed platforms.
In this tutorial, the model is defined in :doc:`model.py <model>`.
In short, it is a PyTorch model with 3 additional API calls:
1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
3. Use :func:`nni.report_final_result` to report final accuracy.
Please understand the model code before continue to next step.
.. GENERATED FROM PYTHON SOURCE LINES 36-57
Step 2: Define search space
---------------------------
In model code, we have prepared 3 hyperparameters to be tuned:
*features*, *lr*, and *momentum*.
Here we need to define their *search space* so the tuning algorithm can sample them in desired range.
Assuming we have following prior knowledge for these hyperparameters:
1. *features* should be one of 128, 256, 512, 1024.
2. *lr* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
3. *momentum* should be a float between 0 and 1.
In NNI, the space of *features* is called ``choice``;
the space of *lr* is called ``loguniform``;
and the space of *momentum* is called ``uniform``.
You may have noticed, these names are derived from ``numpy.random``.
For full specification of search space, check :doc:`the reference </hpo/search_space>`.
Now we can define the search space as follow:
.. GENERATED FROM PYTHON SOURCE LINES 57-64
.. code-block:: default
search_space = {
'features': {'_type': 'choice', '_value': [128, 256, 512, 1024]},
'lr': {'_type': 'loguniform', '_value': [0.0001, 0.1]},
'momentum': {'_type': 'uniform', '_value': [0, 1]},
}
.. GENERATED FROM PYTHON SOURCE LINES 65-72
Step 3: Configure the experiment
--------------------------------
NNI uses an *experiment* to manage the HPO process.
The *experiment config* defines how to train the models and how to explore the search space.
In this tutorial we use a *local* mode experiment,
which means models will be trained on local machine, without using any special training platform.
.. GENERATED FROM PYTHON SOURCE LINES 72-75
.. code-block:: default
from nni.experiment import Experiment
experiment = Experiment('local')
.. GENERATED FROM PYTHON SOURCE LINES 76-82
Now we start to configure the experiment.
Configure trial code
^^^^^^^^^^^^^^^^^^^^
In NNI evaluation of each hyperparameter set is called a *trial*.
So the model script is called *trial code*.
.. GENERATED FROM PYTHON SOURCE LINES 82-84
.. code-block:: default
experiment.config.trial_command = 'python model.py'
experiment.config.trial_code_directory = '.'
.. GENERATED FROM PYTHON SOURCE LINES 85-94
When ``trial_code_directory`` is a relative path, it relates to current working directory.
To run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.
(`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__
is only available in standard Python, not in Jupyter Notebook.)
.. attention::
If you are using Linux system without Conda,
you may need to change ``"python model.py"`` to ``"python3 model.py"``.
.. GENERATED FROM PYTHON SOURCE LINES 96-98
Configure search space
^^^^^^^^^^^^^^^^^^^^^^
.. GENERATED FROM PYTHON SOURCE LINES 98-100
.. code-block:: default
experiment.config.search_space = search_space
.. GENERATED FROM PYTHON SOURCE LINES 101-104
Configure tuning algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we use :doc:`TPE tuner </hpo/tuners>`.
.. GENERATED FROM PYTHON SOURCE LINES 104-107
.. code-block:: default
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
.. GENERATED FROM PYTHON SOURCE LINES 108-111
Configure how many trials to run
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
.. GENERATED FROM PYTHON SOURCE LINES 111-113
.. code-block:: default
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
.. GENERATED FROM PYTHON SOURCE LINES 114-124
You may also set ``max_experiment_duration = '1h'`` to limit running time.
If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
the experiment will run forever until you press Ctrl-C.
.. note::
``max_trial_number`` is set to 10 here for a fast example.
In real world it should be set to a larger number.
With default config TPE tuner requires 20 trials to warm up.
.. GENERATED FROM PYTHON SOURCE LINES 126-131
Step 4: Run the experiment
--------------------------
Now the experiment is ready. Choose a port and launch it. (Here we use port 8080.)
You can use the web portal to view experiment status: http://localhost:8080.
.. GENERATED FROM PYTHON SOURCE LINES 131-133
.. code-block:: default
experiment.run(8080)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:07:29] Creating experiment, Experiment ID: hgkju3iq
[2022-04-13 12:07:29] Starting web server...
[2022-04-13 12:07:30] Setting up...
[2022-04-13 12:07:30] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
True
.. GENERATED FROM PYTHON SOURCE LINES 134-141
After the experiment is done
----------------------------
Everything is done and it is safe to exit now. The following are optional.
If you are using standard Python instead of Jupyter Notebook,
you can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,
allowing you to view the web portal after the experiment is done.
.. GENERATED FROM PYTHON SOURCE LINES 141-145
.. code-block:: default
# input('Press enter to quit')
experiment.stop()
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:08:50] Stopping experiment, please wait...
[2022-04-13 12:08:53] Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 146-156
:meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,
so it can be omitted in your code.
After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.
.. tip::
This example uses :doc:`Python API </reference/experiment>` to create experiment.
You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 24.367 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: main.py <main.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: main.ipynb <main.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
.. a395c59bf5359c3583b7a0a3ab66d705
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_quickstart_pytorch/main.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_main.py:
HPO 教程(PyTorch 版本)
========================
本教程对 `PyTorch 官方教程 <https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html>`__ 进行超参调优。
教程分为四步:
1. 修改调优的模型代码;
2. 定义超参的搜索空间;
3. 配置实验;
4. 运行实验。
.. GENERATED FROM PYTHON SOURCE LINES 17-34
步骤一:准备模型
----------------
首先,我们需要准备待调优的模型。
由于被调优的模型会被独立地运行多次,
并且使用特定训练平台时还可能会被上传到云端执行,
我们需要将代码写在另一个 py 文件中。
本教程使用的模型的代码是 :doc:`model.py <model>`。
模型代码在一个普通的 PyTorch 模型基础之上,增加了3个 API 调用:
1. 使用 :func:`nni.get_next_parameter` 获取需要评估的超参;
2. 使用 :func:`nni.report_intermediate_result` 报告每个 epoch 产生的中间训练结果;
3. 使用 :func:`nni.report_final_result` 报告最终准确率。
请先理解模型代码,然后再继续下一步。
.. GENERATED FROM PYTHON SOURCE LINES 36-57
步骤二:定义搜索空间
--------------------
在模型代码中,我们准备了三个需要调优的超参:features、lr、momentum。
现在,我们需要定义的它们的“搜索空间”,指定它们的取值范围和分布规律。
假设我们对三个超参有以下先验知识:
1. features 的取值可以为128、256、512、1024;
2. lr 的取值在0.0001到0.1之间,其取值符合指数分布;
3. momentum 的取值在0到1之间。
在 NNI 中,features 的取值范围称为 ``choice`` ,
lr 的取值范围称为 ``loguniform`` ,
momentum 的取值范围称为 ``uniform`` 。
您可能已经注意到了,这些名称和 ``numpy.random`` 中的函数名一致。
完整的搜索空间文档: :doc:`/hpo/search_space`.
我们的搜索空间定义如下:
.. GENERATED FROM PYTHON SOURCE LINES 57-64
.. code-block:: default
search_space = {
'features': {'_type': 'choice', '_value': [128, 256, 512, 1024]},
'lr': {'_type': 'loguniform', '_value': [0.0001, 0.1]},
'momentum': {'_type': 'uniform', '_value': [0, 1]},
}
.. GENERATED FROM PYTHON SOURCE LINES 65-72
步骤三:配置实验
----------------
NNI 使用“实验”来管理超参调优,“实验配置”定义了如何训练模型、如何遍历搜索空间。
在本教程中我们使用 local 模式的实验,这意味着实验只在本机运行,不使用任何特别的训练平台。
.. GENERATED FROM PYTHON SOURCE LINES 72-75
.. code-block:: default
from nni.experiment import Experiment
experiment = Experiment('local')
.. GENERATED FROM PYTHON SOURCE LINES 76-82
现在我们开始配置实验。
配置 trial
^^^^^^^^^^
在 NNI 中评估一组超参的过程被称为一个“trial”(试验),上面的模型代码被称为“trial 代码”。
.. GENERATED FROM PYTHON SOURCE LINES 82-84
.. code-block:: default
experiment.config.trial_command = 'python model.py'
experiment.config.trial_code_directory = '.'
.. GENERATED FROM PYTHON SOURCE LINES 85-94
如果 ``trial_code_directory`` 是一个相对路径,它被认为相对于当前的工作目录。
如果您想在其他路径下运行本文件 ``main.py`` ,您可以将代码目录设置为 ``Path(__file__).parent`` 。
(`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__
只能在 py 文件中使用,不能在 Jupyter Notebook 中使用)
.. attention::
如果您使用 Linux 系统,并且没有使用 Conda,
您可能需要将 ``"python model.py"`` 改为 ``"python3 model.py"`` 。
.. GENERATED FROM PYTHON SOURCE LINES 96-98
配置搜索空间
^^^^^^^^^^^^
.. GENERATED FROM PYTHON SOURCE LINES 98-100
.. code-block:: default
experiment.config.search_space = search_space
.. GENERATED FROM PYTHON SOURCE LINES 101-104
配置调优算法
^^^^^^^^^^^^
此处我们使用 :doc:`TPE 算法</hpo/tuners>` 。
.. GENERATED FROM PYTHON SOURCE LINES 104-107
.. code-block:: default
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
.. GENERATED FROM PYTHON SOURCE LINES 108-111
配置运行多少 trial
^^^^^^^^^^^^^^^^^^
本教程中我们总共尝试10组超参,并且每次并行地评估2组超参。
.. GENERATED FROM PYTHON SOURCE LINES 111-113
.. code-block:: default
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
.. GENERATED FROM PYTHON SOURCE LINES 114-124
您也可以设置 ``max_experiment_duration = '1h'`` 来限制运行时间。
如果 ``max_trial_number`` 和 ``max_experiment_duration`` 都没有设置,实验将会一直运行,直到您按下 Ctrl-C。
.. note::
此处将 ``max_trial_number`` 设置为10是为了让教程能够较快地运行结束,
在实际使用中应该设为更大的数值,TPE 算法在默认参数下需要评估20组超参才会完成初始化。
.. GENERATED FROM PYTHON SOURCE LINES 126-131
步骤四:运行实验
----------------
现在实验已经配置完成了,您可以指定一个端口来运行它,教程中我们使用8080端口。
您可以通过网页控制台查看实验状态: http://localhost:8080.
.. GENERATED FROM PYTHON SOURCE LINES 131-133
.. code-block:: default
experiment.run(8080)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:07:29] Creating experiment, Experiment ID: hgkju3iq
[2022-04-13 12:07:29] Starting web server...
[2022-04-13 12:07:30] Setting up...
[2022-04-13 12:07:30] Web portal URLs: http://127.0.0.1:8080 http://192.168.100.103:8080
True
.. GENERATED FROM PYTHON SOURCE LINES 134-141
实验结束之后
------------
您只需要等待函数返回就可以正常结束实验,以下内容为可选项。
如果您使用的是普通 Python 而不是 Jupyter Notebook,
您可以在代码末尾加上一行 ``input()`` 或者 ``signal.pause()`` 来避免 Python 解释器自动退出,
这样您就能继续使用网页控制台。
.. GENERATED FROM PYTHON SOURCE LINES 141-145
.. code-block:: default
# input('Press enter to quit')
experiment.stop()
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
[2022-04-13 12:08:50] Stopping experiment, please wait...
[2022-04-13 12:08:53] Experiment stopped
.. GENERATED FROM PYTHON SOURCE LINES 146-156
:meth:`nni.experiment.Experiment.stop` 会在 Python 退出前自动调用,所以您可以将其省略,不写在自己的代码中。
实验完全停止之后,您可以使用 :meth:`nni.experiment.Experiment.view` 重新启动网页控制台。
.. tip::
本教程使用 :doc:`Python API </reference/experiment>` 创建实验,
除此之外您也可以选择使用 :doc:`命令行工具 <../hpo_nnictl/nnictl>` 。
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 1 minutes 24.367 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_main.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: main.py <main.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: main.ipynb <main.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Port PyTorch Quickstart to NNI\nThis is a modified version of `PyTorch quickstart`_.\n\nIt can be run directly and will have the exact same result as original version.\n\nFurthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.\n\nIt is recommended to run this script directly first to verify the environment.\n\nThere are 2 key differences from the original version:\n\n1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.\n2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import nni\nimport torch\nfrom torch import nn\nfrom torch.utils.data import DataLoader\nfrom torchvision import datasets\nfrom torchvision.transforms import ToTensor"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hyperparameters to be tuned\nThese are the hyperparameters that will be tuned.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"params = {\n 'features': 512,\n 'lr': 0.001,\n 'momentum': 0,\n}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get optimized hyperparameters\nIf run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.\nBut with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"optimized_params = nni.get_next_parameter()\nparams.update(optimized_params)\nprint(params)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load dataset\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"training_data = datasets.FashionMNIST(root=\"data\", train=True, download=True, transform=ToTensor())\ntest_data = datasets.FashionMNIST(root=\"data\", train=False, download=True, transform=ToTensor())\n\nbatch_size = 64\n\ntrain_dataloader = DataLoader(training_data, batch_size=batch_size)\ntest_dataloader = DataLoader(test_data, batch_size=batch_size)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build model with hyperparameters\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\nprint(f\"Using {device} device\")\n\nclass NeuralNetwork(nn.Module):\n def __init__(self):\n super(NeuralNetwork, self).__init__()\n self.flatten = nn.Flatten()\n self.linear_relu_stack = nn.Sequential(\n nn.Linear(28*28, params['features']),\n nn.ReLU(),\n nn.Linear(params['features'], params['features']),\n nn.ReLU(),\n nn.Linear(params['features'], 10)\n )\n\n def forward(self, x):\n x = self.flatten(x)\n logits = self.linear_relu_stack(x)\n return logits\n\nmodel = NeuralNetwork().to(device)\n\nloss_fn = nn.CrossEntropyLoss()\noptimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define train and test\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def train(dataloader, model, loss_fn, optimizer):\n size = len(dataloader.dataset)\n model.train()\n for batch, (X, y) in enumerate(dataloader):\n X, y = X.to(device), y.to(device)\n pred = model(X)\n loss = loss_fn(pred, y)\n optimizer.zero_grad()\n loss.backward()\n optimizer.step()\n\ndef test(dataloader, model, loss_fn):\n size = len(dataloader.dataset)\n num_batches = len(dataloader)\n model.eval()\n test_loss, correct = 0, 0\n with torch.no_grad():\n for X, y in dataloader:\n X, y = X.to(device), y.to(device)\n pred = model(X)\n test_loss += loss_fn(pred, y).item()\n correct += (pred.argmax(1) == y).type(torch.float).sum().item()\n test_loss /= num_batches\n correct /= size\n return correct"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train model and report accuracy\nReport accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"epochs = 5\nfor t in range(epochs):\n print(f\"Epoch {t+1}\\n-------------------------------\")\n train(train_dataloader, model, loss_fn, optimizer)\n accuracy = test(test_dataloader, model, loss_fn)\n nni.report_intermediate_result(accuracy)\nnni.report_final_result(accuracy)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
\ No newline at end of file
"""
Port PyTorch Quickstart to NNI
==============================
This is a modified version of `PyTorch quickstart`_.
It can be run directly and will have the exact same result as original version.
Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
It is recommended to run this script directly first to verify the environment.
There are 2 key differences from the original version:
1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.
.. _PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
"""
# %%
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
# %%
# Hyperparameters to be tuned
# ---------------------------
# These are the hyperparameters that will be tuned.
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
# %%
# Get optimized hyperparameters
# -----------------------------
# If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
# But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
# %%
# Load dataset
# ------------
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
# %%
# Build model with hyperparameters
# --------------------------------
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
# %%
# Define train and test
# ---------------------
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
# %%
# Train model and report accuracy
# -------------------------------
# Report accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
ed8bfc27e3d555d842fc4eec2635e619
\ No newline at end of file
:orphan:
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_quickstart_pytorch/model.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_quickstart_pytorch_model.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_model.py:
Port PyTorch Quickstart to NNI
==============================
This is a modified version of `PyTorch quickstart`_.
It can be run directly and will have the exact same result as original version.
Furthermore, it enables the ability of auto tuning with an NNI *experiment*, which will be detailed later.
It is recommended to run this script directly first to verify the environment.
There are 2 key differences from the original version:
1. In `Get optimized hyperparameters`_ part, it receives generated hyperparameters.
2. In `Train model and report accuracy`_ part, it reports accuracy metrics to NNI.
.. _PyTorch quickstart: https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html
.. GENERATED FROM PYTHON SOURCE LINES 21-28
.. code-block:: default
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
.. GENERATED FROM PYTHON SOURCE LINES 29-32
Hyperparameters to be tuned
---------------------------
These are the hyperparameters that will be tuned.
.. GENERATED FROM PYTHON SOURCE LINES 32-38
.. code-block:: default
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
.. GENERATED FROM PYTHON SOURCE LINES 39-43
Get optimized hyperparameters
-----------------------------
If run directly, :func:`nni.get_next_parameter` is a no-op and returns an empty dict.
But with an NNI *experiment*, it will receive optimized hyperparameters from tuning algorithm.
.. GENERATED FROM PYTHON SOURCE LINES 43-47
.. code-block:: default
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
{'features': 512, 'lr': 0.001, 'momentum': 0}
.. GENERATED FROM PYTHON SOURCE LINES 48-50
Load dataset
------------
.. GENERATED FROM PYTHON SOURCE LINES 50-58
.. code-block:: default
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
.. GENERATED FROM PYTHON SOURCE LINES 59-61
Build model with hyperparameters
--------------------------------
.. GENERATED FROM PYTHON SOURCE LINES 61-86
.. code-block:: default
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Using cpu device
.. GENERATED FROM PYTHON SOURCE LINES 87-89
Define train and test
---------------------
.. GENERATED FROM PYTHON SOURCE LINES 89-115
.. code-block:: default
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
.. GENERATED FROM PYTHON SOURCE LINES 116-119
Train model and report accuracy
-------------------------------
Report accuracy metrics to NNI so the tuning algorithm can suggest better hyperparameters.
.. GENERATED FROM PYTHON SOURCE LINES 119-126
.. code-block:: default
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Epoch 1
-------------------------------
[2022-03-21 01:09:37] INFO (nni/MainThread) Intermediate result: 0.461 (Index 0)
Epoch 2
-------------------------------
[2022-03-21 01:09:42] INFO (nni/MainThread) Intermediate result: 0.5529 (Index 1)
Epoch 3
-------------------------------
[2022-03-21 01:09:47] INFO (nni/MainThread) Intermediate result: 0.6155 (Index 2)
Epoch 4
-------------------------------
[2022-03-21 01:09:52] INFO (nni/MainThread) Intermediate result: 0.6345 (Index 3)
Epoch 5
-------------------------------
[2022-03-21 01:09:56] INFO (nni/MainThread) Intermediate result: 0.6505 (Index 4)
[2022-03-21 01:09:56] INFO (nni/MainThread) Final result: 0.6505
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 24.441 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_model.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: model.py <model.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: model.ipynb <model.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
.. e083b4dc8e350428ddf680e97b47cc8e
:orphan:
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "tutorials/hpo_quickstart_pytorch/model.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
Click :ref:`here <sphx_glr_download_tutorials_hpo_quickstart_pytorch_model.py>`
to download the full example code
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_model.py:
将 PyTorch 官方教程移植到NNI
============================
本文件是 `PyTorch 官方教程 <https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.htlm>`__ 的修改版。
您可以直接运行本文件,其结果和原版完全一致。同时,您也可以在 NNI 实验中使用本文件,进行超参调优。
我们建议您先直接运行一次本文件,在熟悉代码的同时检查运行环境。
和原版相比,我们做了两处修改:
1. `获取调优后的参数`_ 部分,我们使用调优算法生成的参数替换默认参数;
2. `训练模型并上传结果`_ 部分,我们将准确率数据报告给 NNI
.. GENERATED FROM PYTHON SOURCE LINES 21-28
.. code-block:: default
import nni
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
.. GENERATED FROM PYTHON SOURCE LINES 29-32
准备调优的超参
--------------
以下超参将被调优:
.. GENERATED FROM PYTHON SOURCE LINES 32-38
.. code-block:: default
params = {
'features': 512,
'lr': 0.001,
'momentum': 0,
}
.. GENERATED FROM PYTHON SOURCE LINES 39-43
获取调优后的参数
----------------
直接运行时 :func:`nni.get_next_parameter` 会返回空 dict
而在 NNI 实验中使用时,它会返回调优算法生成的超参组合。
.. GENERATED FROM PYTHON SOURCE LINES 43-47
.. code-block:: default
optimized_params = nni.get_next_parameter()
params.update(optimized_params)
print(params)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
{'features': 512, 'lr': 0.001, 'momentum': 0}
.. GENERATED FROM PYTHON SOURCE LINES 48-50
加载数据集
----------
.. GENERATED FROM PYTHON SOURCE LINES 50-58
.. code-block:: default
training_data = datasets.FashionMNIST(root="data", train=True, download=True, transform=ToTensor())
test_data = datasets.FashionMNIST(root="data", train=False, download=True, transform=ToTensor())
batch_size = 64
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)
.. GENERATED FROM PYTHON SOURCE LINES 59-61
使用超参构建模型
----------------
.. GENERATED FROM PYTHON SOURCE LINES 61-86
.. code-block:: default
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, params['features']),
nn.ReLU(),
nn.Linear(params['features'], params['features']),
nn.ReLU(),
nn.Linear(params['features'], 10)
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork().to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Using cpu device
.. GENERATED FROM PYTHON SOURCE LINES 87-89
定义训练和测试函数
------------------
.. GENERATED FROM PYTHON SOURCE LINES 89-115
.. code-block:: default
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
return correct
.. GENERATED FROM PYTHON SOURCE LINES 116-119
训练模型并上传结果
------------------
将准确率数据报告给 NNI 的调参算法,以使其能够预测更优的超参组合。
.. GENERATED FROM PYTHON SOURCE LINES 119-126
.. code-block:: default
epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
accuracy = test(test_dataloader, model, loss_fn)
nni.report_intermediate_result(accuracy)
nni.report_final_result(accuracy)
.. rst-class:: sphx-glr-script-out
Out:
.. code-block:: none
Epoch 1
-------------------------------
[2022-03-21 01:09:37] INFO (nni/MainThread) Intermediate result: 0.461 (Index 0)
Epoch 2
-------------------------------
[2022-03-21 01:09:42] INFO (nni/MainThread) Intermediate result: 0.5529 (Index 1)
Epoch 3
-------------------------------
[2022-03-21 01:09:47] INFO (nni/MainThread) Intermediate result: 0.6155 (Index 2)
Epoch 4
-------------------------------
[2022-03-21 01:09:52] INFO (nni/MainThread) Intermediate result: 0.6345 (Index 3)
Epoch 5
-------------------------------
[2022-03-21 01:09:56] INFO (nni/MainThread) Intermediate result: 0.6505 (Index 4)
[2022-03-21 01:09:56] INFO (nni/MainThread) Final result: 0.6505
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 24.441 seconds)
.. _sphx_glr_download_tutorials_hpo_quickstart_pytorch_model.py:
.. only :: html
.. container:: sphx-glr-footer
:class: sphx-glr-footer-example
.. container:: sphx-glr-download sphx-glr-download-python
:download:`Download Python source code: model.py <model.py>`
.. container:: sphx-glr-download sphx-glr-download-jupyter
:download:`Download Jupyter notebook: model.ipynb <model.ipynb>`
.. only:: html
.. rst-class:: sphx-glr-signature
`Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
:orphan:
.. _sphx_glr_tutorials_hpo_quickstart_pytorch_sg_execution_times:
Computation times
=================
**01:24.367** total execution time for **tutorials_hpo_quickstart_pytorch** files:
+--------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_main.py` (``main.py``) | 01:24.367 | 0.0 MB |
+--------------------------------------------------------------------------+-----------+--------+
| :ref:`sphx_glr_tutorials_hpo_quickstart_pytorch_model.py` (``model.py``) | 00:00.000 | 0.0 MB |
+--------------------------------------------------------------------------+-----------+--------+
.. _sphx_glr_tutorials_hpo_quickstart_tensorflow:
.. raw:: html
<div class="sphx-glr-thumbnails">
.. raw:: html
<div class="sphx-glr-thumbcontainer" tooltip="The tutorial consists of 4 steps: ">
.. only:: html
.. image:: /tutorials/hpo_quickstart_tensorflow/images/thumb/sphx_glr_main_thumb.png
:alt: HPO Quickstart with TensorFlow
:ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_main.py`
.. raw:: html
<div class="sphx-glr-thumbnail-title">HPO Quickstart with TensorFlow</div>
</div>
.. raw:: html
<div class="sphx-glr-thumbcontainer" tooltip="It can be run directly and will have the exact same result as original version.">
.. only:: html
.. image:: /tutorials/hpo_quickstart_tensorflow/images/thumb/sphx_glr_model_thumb.png
:alt: Port TensorFlow Quickstart to NNI
:ref:`sphx_glr_tutorials_hpo_quickstart_tensorflow_model.py`
.. raw:: html
<div class="sphx-glr-thumbnail-title">Port TensorFlow Quickstart to NNI</div>
</div>
.. raw:: html
</div>
.. toctree::
:hidden:
/tutorials/hpo_quickstart_tensorflow/main
/tutorials/hpo_quickstart_tensorflow/model
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# HPO Quickstart with TensorFlow\nThis tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.\n\nThe tutorial consists of 4 steps: \n\n1. Modify the model for auto-tuning.\n2. Define hyperparameters' search space.\n3. Configure the experiment.\n4. Run the experiment.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: Prepare the model\nIn first step, we need to prepare the model to be tuned.\n\nThe model should be put in a separate script.\nIt will be evaluated many times concurrently,\nand possibly will be trained on distributed platforms.\n\nIn this tutorial, the model is defined in :doc:`model.py <model>`.\n\nIn short, it is a TensorFlow model with 3 additional API calls:\n\n1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.\n2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.\n3. Use :func:`nni.report_final_result` to report final accuracy.\n\nPlease understand the model code before continue to next step.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: Define search space\nIn model code, we have prepared 4 hyperparameters to be tuned:\n*dense_units*, *activation_type*, *dropout_rate*, and *learning_rate*.\n\nHere we need to define their *search space* so the tuning algorithm can sample them in desired range.\n\nAssuming we have following prior knowledge for these hyperparameters:\n\n1. *dense_units* should be one of 64, 128, 256.\n2. *activation_type* should be one of 'relu', 'tanh', 'swish', or None.\n3. *dropout_rate* should be a float between 0.5 and 0.9.\n4. *learning_rate* should be a float between 0.0001 and 0.1, and it follows exponential distribution.\n\nIn NNI, the space of *dense_units* and *activation_type* is called ``choice``;\nthe space of *dropout_rate* is called ``uniform``;\nand the space of *learning_rate* is called ``loguniform``.\nYou may have noticed, these names are derived from ``numpy.random``.\n\nFor full specification of search space, check :doc:`the reference </hpo/search_space>`.\n\nNow we can define the search space as follow:\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"search_space = {\n 'dense_units': {'_type': 'choice', '_value': [64, 128, 256]},\n 'activation_type': {'_type': 'choice', '_value': ['relu', 'tanh', 'swish', None]},\n 'dropout_rate': {'_type': 'uniform', '_value': [0.5, 0.9]},\n 'learning_rate': {'_type': 'loguniform', '_value': [0.0001, 0.1]},\n}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: Configure the experiment\nNNI uses an *experiment* to manage the HPO process.\nThe *experiment config* defines how to train the models and how to explore the search space.\n\nIn this tutorial we use a *local* mode experiment,\nwhich means models will be trained on local machine, without using any special training platform.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from nni.experiment import Experiment\nexperiment = Experiment('local')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we start to configure the experiment.\n\n### Configure trial code\nIn NNI evaluation of each hyperparameter set is called a *trial*.\nSo the model script is called *trial code*.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.config.trial_command = 'python model.py'\nexperiment.config.trial_code_directory = '.'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When ``trial_code_directory`` is a relative path, it relates to current working directory.\nTo run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.\n(`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__\nis only available in standard Python, not in Jupyter Notebook.)\n\n.. attention::\n\n If you are using Linux system without Conda,\n you may need to change ``\"python model.py\"`` to ``\"python3 model.py\"``.\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure search space\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.config.search_space = search_space"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure tuning algorithm\nHere we use :doc:`TPE tuner </hpo/tuners>`.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.config.tuner.name = 'TPE'\nexperiment.config.tuner.class_args['optimize_mode'] = 'maximize'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Configure how many trials to run\nHere we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.config.max_trial_number = 10\nexperiment.config.trial_concurrency = 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You may also set ``max_experiment_duration = '1h'`` to limit running time.\n\nIf neither ``max_trial_number`` nor ``max_experiment_duration`` are set,\nthe experiment will run forever until you press Ctrl-C.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>``max_trial_number`` is set to 10 here for a fast example.\n In real world it should be set to a larger number.\n With default config TPE tuner requires 20 trials to warm up.</p></div>\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4: Run the experiment\nNow the experiment is ready. Choose a port and launch it. (Here we use port 8080.)\n\nYou can use the web portal to view experiment status: http://localhost:8080.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"experiment.run(8080)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## After the experiment is done\nEverything is done and it is safe to exit now. The following are optional.\n\nIf you are using standard Python instead of Jupyter Notebook,\nyou can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,\nallowing you to view the web portal after the experiment is done.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# input('Press enter to quit')\nexperiment.stop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
":meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,\nso it can be omitted in your code.\n\nAfter the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.\n\n.. tip::\n\n This example uses :doc:`Python API </reference/experiment>` to create experiment.\n\n You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.\n\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
\ No newline at end of file
"""
HPO Quickstart with TensorFlow
==============================
This tutorial optimizes the model in `official TensorFlow quickstart`_ with auto-tuning.
The tutorial consists of 4 steps:
1. Modify the model for auto-tuning.
2. Define hyperparameters' search space.
3. Configure the experiment.
4. Run the experiment.
.. _official TensorFlow quickstart: https://www.tensorflow.org/tutorials/quickstart/beginner
"""
# %%
# Step 1: Prepare the model
# -------------------------
# In first step, we need to prepare the model to be tuned.
#
# The model should be put in a separate script.
# It will be evaluated many times concurrently,
# and possibly will be trained on distributed platforms.
#
# In this tutorial, the model is defined in :doc:`model.py <model>`.
#
# In short, it is a TensorFlow model with 3 additional API calls:
#
# 1. Use :func:`nni.get_next_parameter` to fetch the hyperparameters to be evalutated.
# 2. Use :func:`nni.report_intermediate_result` to report per-epoch accuracy metrics.
# 3. Use :func:`nni.report_final_result` to report final accuracy.
#
# Please understand the model code before continue to next step.
# %%
# Step 2: Define search space
# ---------------------------
# In model code, we have prepared 4 hyperparameters to be tuned:
# *dense_units*, *activation_type*, *dropout_rate*, and *learning_rate*.
#
# Here we need to define their *search space* so the tuning algorithm can sample them in desired range.
#
# Assuming we have following prior knowledge for these hyperparameters:
#
# 1. *dense_units* should be one of 64, 128, 256.
# 2. *activation_type* should be one of 'relu', 'tanh', 'swish', or None.
# 3. *dropout_rate* should be a float between 0.5 and 0.9.
# 4. *learning_rate* should be a float between 0.0001 and 0.1, and it follows exponential distribution.
#
# In NNI, the space of *dense_units* and *activation_type* is called ``choice``;
# the space of *dropout_rate* is called ``uniform``;
# and the space of *learning_rate* is called ``loguniform``.
# You may have noticed, these names are derived from ``numpy.random``.
#
# For full specification of search space, check :doc:`the reference </hpo/search_space>`.
#
# Now we can define the search space as follow:
search_space = {
'dense_units': {'_type': 'choice', '_value': [64, 128, 256]},
'activation_type': {'_type': 'choice', '_value': ['relu', 'tanh', 'swish', None]},
'dropout_rate': {'_type': 'uniform', '_value': [0.5, 0.9]},
'learning_rate': {'_type': 'loguniform', '_value': [0.0001, 0.1]},
}
# %%
# Step 3: Configure the experiment
# --------------------------------
# NNI uses an *experiment* to manage the HPO process.
# The *experiment config* defines how to train the models and how to explore the search space.
#
# In this tutorial we use a *local* mode experiment,
# which means models will be trained on local machine, without using any special training platform.
from nni.experiment import Experiment
experiment = Experiment('local')
# %%
# Now we start to configure the experiment.
#
# Configure trial code
# ^^^^^^^^^^^^^^^^^^^^
# In NNI evaluation of each hyperparameter set is called a *trial*.
# So the model script is called *trial code*.
experiment.config.trial_command = 'python model.py'
experiment.config.trial_code_directory = '.'
# %%
# When ``trial_code_directory`` is a relative path, it relates to current working directory.
# To run ``main.py`` in a different path, you can set trial code directory to ``Path(__file__).parent``.
# (`__file__ <https://docs.python.org/3.10/reference/datamodel.html#index-43>`__
# is only available in standard Python, not in Jupyter Notebook.)
#
# .. attention::
#
# If you are using Linux system without Conda,
# you may need to change ``"python model.py"`` to ``"python3 model.py"``.
# %%
# Configure search space
# ^^^^^^^^^^^^^^^^^^^^^^
experiment.config.search_space = search_space
# %%
# Configure tuning algorithm
# ^^^^^^^^^^^^^^^^^^^^^^^^^^
# Here we use :doc:`TPE tuner </hpo/tuners>`.
experiment.config.tuner.name = 'TPE'
experiment.config.tuner.class_args['optimize_mode'] = 'maximize'
# %%
# Configure how many trials to run
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# Here we evaluate 10 sets of hyperparameters in total, and concurrently evaluate 2 sets at a time.
experiment.config.max_trial_number = 10
experiment.config.trial_concurrency = 2
# %%
# You may also set ``max_experiment_duration = '1h'`` to limit running time.
#
# If neither ``max_trial_number`` nor ``max_experiment_duration`` are set,
# the experiment will run forever until you press Ctrl-C.
#
# .. note::
#
# ``max_trial_number`` is set to 10 here for a fast example.
# In real world it should be set to a larger number.
# With default config TPE tuner requires 20 trials to warm up.
# %%
# Step 4: Run the experiment
# --------------------------
# Now the experiment is ready. Choose a port and launch it. (Here we use port 8080.)
#
# You can use the web portal to view experiment status: http://localhost:8080.
experiment.run(8080)
# %%
# After the experiment is done
# ----------------------------
# Everything is done and it is safe to exit now. The following are optional.
#
# If you are using standard Python instead of Jupyter Notebook,
# you can add ``input()`` or ``signal.pause()`` to prevent Python from exiting,
# allowing you to view the web portal after the experiment is done.
# input('Press enter to quit')
experiment.stop()
# %%
# :meth:`nni.experiment.Experiment.stop` is automatically invoked when Python exits,
# so it can be omitted in your code.
#
# After the experiment is stopped, you can run :meth:`nni.experiment.Experiment.view` to restart web portal.
#
# .. tip::
#
# This example uses :doc:`Python API </reference/experiment>` to create experiment.
#
# You can also create and manage experiments with :doc:`command line tool <../hpo_nnictl/nnictl>`.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment