"test/vscode:/vscode.git/clone" did not exist on "faba293a0d6c144de0a9687ffc0ed2be6699600d"
Unverified Commit a911b856 authored by Yuge Zhang's avatar Yuge Zhang Committed by GitHub
Browse files

Resolve conflicts for #4760 (#4762)

parent 14d2966b
Overview
========
NNI has supported many training services listed below. Users can go through each page to learning how to configure the corresponding training service. NNI has high extensibility by design, users can customize new training service for their special resource, platform or needs.
.. list-table::
:header-rows: 1
* - Training Service
- Description
* - :doc:`Local <local>`
- The whole experiment runs on your dev machine (i.e., a single local machine)
* - :doc:`Remote <remote>`
- The trials are dispatched to your configured SSH servers
* - :doc:`OpenPAI <openpai>`
- Running trials on OpenPAI, a DNN model training platform based on Kubernetes
* - :doc:`Kubeflow <kubeflow>`
- Running trials with Kubeflow, a DNN model training framework based on Kubernetes
* - :doc:`AdaptDL <adaptdl>`
- Running trials on AdaptDL, an elastic DNN model training platform
* - :doc:`FrameworkController <frameworkcontroller>`
- Running trials with FrameworkController, a DNN model training framework on Kubernetes
* - :doc:`AML <aml>`
- Running trials on Azure Machine Learning (AML) cloud service
* - :doc:`PAI-DLC <paidlc>`
- Running trials on PAI-DLC, which is deep learning containers based on Alibaba ACK
* - :doc:`Hybrid <hybrid>`
- Support jointly using multiple above training services
.. _training-service-reuse:
Training Service Under Reuse Mode
---------------------------------
Since NNI v2.0, there are two sets of training service implementations in NNI. The new one is called *reuse mode*. When reuse mode is enabled, a cluster, such as a remote machine or a computer instance on AML, will launch a long-running environment, so that NNI will submit trials to these environments iteratively, which saves the time to create new jobs. For instance, using OpenPAI training platform under reuse mode can avoid the overhead of pulling docker images, creating containers, and downloading data repeatedly.
.. note:: In the reuse mode, users need to make sure each trial can run independently in the same job (e.g., avoid loading checkpoints from previous trials).
**Run an Experiment on Aliyun PAI-DSW + PAI-DLC**
===================================================
PAI-DLC Training Service
========================
NNI supports running an experiment on `PAI-DSW <https://help.aliyun.com/document_detail/194831.html>`__ , submit trials to `PAI-DLC <https://help.aliyun.com/document_detail/165137.html>`__ called dlc mode.
NNI supports running an experiment on `PAI-DSW <https://help.aliyun.com/document_detail/194831.html>`__ , submit trials to `PAI-DLC <https://help.aliyun.com/document_detail/165137.html>`__ which is deep learning containers based on Alibaba ACK.
PAI-DSW server performs the role to submit a job while PAI-DLC is where the training job runs.
Setup environment
-----------------
Prerequisite
------------
Step 1. Install NNI, follow the install guide `here <../Tutorial/QuickStart.rst>`__.
Step 1. Install NNI, follow the :doc:`install guide </installation>`.
Step 2. Create PAI-DSW server following this `link <https://help.aliyun.com/document_detail/163684.html?section-2cw-lsi-es9#title-ji9-re9-88x>`__. Note as the training service will be run on PAI-DLC, it won't cost many resources to run and you may just need a PAI-DSW server with CPU.
......@@ -24,8 +24,8 @@ Step 4. Open your PAI-DSW server command line, download and install PAI-DLC pyth
pip install ./pai-dlc-20201203 # pai-dlc-20201203 refer to unzipped sdk file name, replace it accordingly.
Run an experiment
-----------------
Usage
-----
Use ``examples/trials/mnist-pytorch`` as an example. The NNI config YAML file's content is like:
......@@ -60,7 +60,7 @@ Use ``examples/trials/mnist-pytorch`` as an example. The NNI config YAML file's
Note: You should set ``platform: dlc`` in NNI config YAML file if you want to start experiment in dlc mode.
Compared with `LocalMode <LocalMode.rst>`__ training service configuration in dlc mode have these additional keys like ``type/image/jobType/podCount/ecsSpec/region/nasDataSourceId/accessKeyId/accessKeySecret``, for detailed explanation ref to this `link <https://help.aliyun.com/document_detail/203111.html#h2-url-3>`__.
Compared with :doc:`local`, training service configuration in dlc mode have these additional keys like ``type/image/jobType/podCount/ecsSpec/region/nasDataSourceId/accessKeyId/accessKeySecret``, for detailed explanation ref to this `link <https://help.aliyun.com/document_detail/203111.html#h2-url-3>`__.
Also, as dlc mode requires DSW/DLC to mount the same NAS disk to share information, there are two extra keys related to this: ``localStorageMountPoint`` and ``containerStorageMountPoint``.
......@@ -78,6 +78,6 @@ Run the following commands to start the example experiment:
Replace ``${NNI_VERSION}`` with a released version name or branch name, e.g., ``v2.3``.
Monitor your job
----------------
^^^^^^^^^^^^^^^^
To monitor your job on DLC, you need to visit `DLC <https://pai-dlc.console.aliyun.com/#/jobs>`__ to check job status.
Remote Training Service
=======================
NNI can run one experiment on multiple remote machines through SSH, called ``remote`` mode. It's like a lightweight training platform. In this mode, NNI can be started from your computer, and dispatch trials to remote machines in parallel.
The OS of remote machines supports ``Linux``\ , ``Windows 10``\ , and ``Windows Server 2019``.
Prerequisite
------------
1. Make sure the default environment of remote machines meets requirements of your trial code. If the default environment does not meet the requirements, the setup script can be added into ``command`` field of NNI config.
2. Make sure remote machines can be accessed through SSH from the machine which runs ``nnictl`` command. It supports both password and key authentication of SSH. For advanced usage, please refer to :ref:`reference-remote-config-label` in reference for detailed usage.
3. Make sure the NNI version on each machine is consistent. Follow the install guide :doc:`here </installation>` to install NNI.
4. Make sure the command of Trial is compatible with remote OSes, if you want to use remote Linux and Windows together. For example, the default python 3.x executable called ``python3`` on Linux, and ``python`` on Windows.
In addition, there are several steps for Windows server.
1. Install and start ``OpenSSH Server``.
1) Open ``Settings`` app on Windows.
2) Click ``Apps``\ , then click ``Optional features``.
3) Click ``Add a feature``\ , search and select ``OpenSSH Server``\ , and then click ``Install``.
4) Once it's installed, run below command to start and set to automatic start.
.. code-block:: bat
sc config sshd start=auto
net start sshd
2. Make sure remote account is administrator, so that it can stop running trials.
3. Make sure there is no welcome message more than default, since it causes ssh2 failed in NodeJs. For example, if you're using Data Science VM on Azure, it needs to remove extra echo commands in ``C:\dsvm\tools\setup\welcome.bat``.
The output like below is ok, when opening a new command window.
.. code-block:: text
Microsoft Windows [Version 10.0.17763.1192]
(c) 2018 Microsoft Corporation. All rights reserved.
(py37_default) C:\Users\AzureUser>
Usage
-----
Use ``examples/trials/mnist-pytorch`` as the example. Suppose there are two machines, which can be logged in with username and password or key authentication of SSH. Here is a template configuration specification.
.. code-block:: yaml
searchSpaceFile: search_space.json
trialCommand: python3 mnist.py
trialGpuNumber: 0
trialConcurrency: 4
maxTrialNumber: 20
tuner:
name: TPE
classArgs:
optimize_mode: maximize
trainingService:
platform: remote
machineList:
- host: 192.0.2.1
user: alice
ssh_key_file: ~/.ssh/id_rsa
- host: 192.0.2.2
port: 10022
user: bob
password: bob123
The example configuration is saved in ``examples/trials/mnist-pytorch/config_remote.yml``.
You can run below command on Windows, Linux, or macOS to spawn trials on remote Linux machines:
.. code-block:: bash
nnictl create --config examples/trials/mnist-pytorch/config_remote.yml
.. _nniignore:
.. Note:: If you are planning to use remote machines or clusters as your training service, to avoid too much pressure on network, NNI limits the number of files to 2000 and total size to 300MB. If your trial code directory contains too many files, you can choose which files and subfolders should be excluded by adding a ``.nniignore`` file that works like a ``.gitignore`` file. For more details on how to write this file, see the `git documentation <https://git-scm.com/docs/gitignore#_pattern_format>`__.
*Example:* :githublink:`config_detailed.yml <examples/trials/mnist-pytorch/config_detailed.yml>` and :githublink:`.nniignore <examples/trials/mnist-pytorch/.nniignore>`
More features
-------------
Configure python environment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By default, commands and scripts will be executed in the default environment in remote machine. If there are multiple python virtual environments in your remote machine, and you want to run experiments in a specific environment, then use **pythonPath** to specify a python environment on your remote machine.
For example, with anaconda you can specify:
.. code-block:: yaml
pythonPath: /home/bob/.conda/envs/ENV-NAME/bin
Configure shared storage
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Remote training service support shared storage, which can help use your own storage during using NNI. Follow the guide `here <./shared_storage.rst>`__ to learn how to use shared storage.
Monitor via TensorBoard
^^^^^^^^^^^^^^^^^^^^^^^
Remote training service support trial visualization via TensorBoard. Follow the guide :doc:`/experiment/web_portal/tensorboard` to learn how to use TensorBoard.
**How to Use Shared Storage**
=============================
How to Use Shared Storage
=========================
If you want to use your own storage during using NNI, shared storage can satisfy you.
Instead of using training service native storage, shared storage can bring you more convenience.
......@@ -7,7 +7,7 @@ All the information generated by the experiment will be stored under ``/nni`` fo
All the output produced by the trial will be located under ``/nni/{EXPERIMENT_ID}/trials/{TRIAL_ID}/nnioutput`` folder in your shared storage.
This saves you from finding for experiment-related information in various places.
Remember that your trial working directory is ``/nni/{EXPERIMENT_ID}/trials/{TRIAL_ID}``, so if you upload your data in this shared storage, you can open it like a local file in your trial code without downloading it.
And we will develop more practical features in the future based on shared storage. The config reference can be found `here <../reference/experiment_config.html#sharedstorageconfig>`_.
And we will develop more practical features in the future based on shared storage. The config reference can be found :ref:`here <reference-sharedstorage-config-label>`.
.. note::
Shared storage is currently in the experimental stage. We suggest use AzureBlob under Ubuntu/CentOS/RHEL, and NFS under Ubuntu/CentOS/RHEL/Fedora/Debian for remote.
......@@ -43,8 +43,8 @@ If you want to use AzureBlob, add below to your config. Full config file see :gi
You can find ``storageAccountName``, ``storageAccountKey``, ``containerName`` on azure storage account portal.
.. image:: ../../img/azure_storage.png
:target: ../../img/azure_storage.png
.. image:: ../../../img/azure_storage.png
:target: ../../../img/azure_storage.png
:alt:
If you want to use NFS, add below to your config. Full config file see :githublink:`mnist-sharedstorage/config_nfs.yml <examples/trials/mnist-sharedstorage/config_nfs.yml>`.
......
Training Service
================
.. toctree::
:hidden:
Overview <overview>
Local <local>
Remote <remote>
OpenPAI <openpai>
Kubeflow <kubeflow>
AdaptDL <adaptdl>
FrameworkController <frameworkcontroller>
AML <aml>
PAI-DLC <paidlc>
Hybrid <hybrid>
Customize a Training Service <customize>
Shared Storage <shared_storage>
How to Use Tensorboard within WebUI
===================================
Visualize Trial with TensorBoard
================================
You can launch a tensorboard process cross one or multi trials within webui since NNI v2.2. This feature supports local training service and reuse mode training service with shared storage for now, and will support more scenarios in later nni version.
You can launch a tensorboard process cross one or multi trials within webportal since NNI v2.2. This feature supports local training service and reuse mode training service with shared storage for now, and will support more scenarios in later nni version.
Preparation
-----------
......@@ -11,8 +12,8 @@ Make sure tensorboard installed in your environment. If you never used tensorboa
Use WebUI Launch Tensorboard
----------------------------
1. Save Logs
^^^^^^^^^^^^
Save Logs
^^^^^^^^^
NNI will automatically fetch the ``tensorboard`` subfolder under trial's output folder as tensorboard logdir. So in trial's source code, you need to save the tensorboard logs under ``NNI_OUTPUT_DIR/tensorboard``. This log path can be joined as:
......@@ -20,32 +21,35 @@ NNI will automatically fetch the ``tensorboard`` subfolder under trial's output
log_dir = os.path.join(os.environ["NNI_OUTPUT_DIR"], 'tensorboard')
2. Launch Tensorboard
^^^^^^^^^^^^^^^^^^^^^
Launch Tensorboard
^^^^^^^^^^^^^^^^^^
Like compare, select the trials you want to combine to launch the tensorboard at first, then click the ``Tensorboard`` button.
* Like compare, select the trials you want to combine to launch the tensorboard at first, then click the ``Tensorboard`` button.
.. image:: ../../img/Tensorboard_1.png
:target: ../../img/Tensorboard_1.png
.. image:: ../../../img/Tensorboard_1.png
:target: ../../../img/Tensorboard_1.png
:alt:
After click the ``OK`` button in the pop-up box, you will jump to the tensorboard portal.
* After click the ``OK`` button in the pop-up box, you will jump to the tensorboard portal.
.. image:: ../../img/Tensorboard_2.png
:target: ../../img/Tensorboard_2.png
.. image:: ../../../img/Tensorboard_2.png
:target: ../../../img/Tensorboard_2.png
:alt:
You can see the ``SequenceID-TrialID`` on the tensorboard portal.
* You can see the ``SequenceID-TrialID`` on the tensorboard portal.
.. image:: ../../img/Tensorboard_3.png
:target: ../../img/Tensorboard_3.png
.. image:: ../../../img/Tensorboard_3.png
:target: ../../../img/Tensorboard_3.png
:alt:
3. Stop All
^^^^^^^^^^^^
Stop All
^^^^^^^^
If you want to open the portal you have already launched, click the tensorboard id. If you don't need the tensorboard anymore, click ``Stop all tensorboard`` button.
.. image:: ../../img/Tensorboard_4.png
:target: ../../img/Tensorboard_4.png
.. image:: ../../../img/Tensorboard_4.png
:target: ../../../img/Tensorboard_4.png
:alt:
Web Portal
==========
.. toctree::
:hidden:
Experiment Web Portal <web_portal>
Visualize with TensorBoard <tensorboard>
Web Portal
==========
Web portal is for users to conveniently visualize their NNI experiments, tuning and training progress, detailed metrics, and error logs. Web portal also allows users to control their NNI experiments, trials, such as updating an experiment of its concurrency, duration, rerunning trials.
.. image:: ../../../static/img/webui.gif
:width: 100%
Q&A
---
There are many trials in the detail table but ``Default Metric`` chart is empty
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. note::
First you should know that ``Default metric`` and ``Hyper parameter`` chart only show succeeded trials.
What should you do when you think the chart is strange, such as ``Default metric``, ``Hyper parameter``...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Download the experiment results(``experiment config``, ``trial message`` and ``intermeidate metrics``) from ``Experiment summary`` and then upload these results in your issue.
.. image:: ../../../img/webui-img/summary.png
:target: ../../../img/webui-img/summary.png
:alt: summary
What should you do when your experiment has error
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Click the icon in the right of ``experiment status`` and screenshot the error message.
* And then click the ``learn about`` to download ``nni-manager`` and ``dispatcher`` logfile.
* Please file an issue from the `Feedback` in the `About` and upload above message.
.. image:: ../../../img/webui-img/experimentError.png
:target: ../../../img/webui-img/experimentError.png
:alt: experimentError
What should you do when your trial fails
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* ``Customized trial`` could be used in here. Just submit the same parameters to the experiment to rerun the trial.
.. image:: ../../../img/webui-img/detail/customizedTrialButton.png
:target: ../../../img/webui-img/detail/customizedTrialButton.png
:alt: customizedTrialButton
.. image:: ../../../img/webui-img/detail/customizedTrial.png
:target: ../../../img/webui-img/detail/customizedTrial.png
:alt: customizedTrial
* ``Log model`` will help you find the error reason. There are three buttons ``View trial log``, ``View trial error`` and ``View trial stdout`` on local mode. If you run on the OpenPAI or Kubeflow platform, you could see trial stdout and nfs log.
If you have any question you could tell us in the issue.
**local mode:**
.. image:: ../../../img/webui-img/detail/log-local.png
:target: ../../../img/webui-img/detail/log-local.png
:alt: logOnLocal
**OpenPAI, Kubeflow and other mode:**
.. image:: ../../../img/webui-img/detail-pai.png
:target: ../../../img/webui-img/detail-pai.png
:alt: detailPai
How to use dict intermediate result
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`The discussion <https://github.com/microsoft/nni/discussions/4289>`_ could help you.
.. _exp-manage-webportal:
Experiments management
----------------------
Experiments management page could manage many experiments on your machine.
.. image:: ../../../img/webui-img/managerExperimentList/experimentListNav.png
:target: ../../../img/webui-img/managerExperimentList/experimentListNav.png
:alt: ExperimentList nav
* On the ``All experiments`` page, you can see all the experiments on your machine.
.. image:: ../../../img/webui-img/managerExperimentList/expList.png
:target: ../../../img/webui-img/managerExperimentList/expList.png
:alt: Experiments list
* When you want to see more details about an experiment you could click the trial id, look that:
.. image:: ../../../img/webui-img/managerExperimentList/toAnotherExp.png
:target: ../../../img/webui-img/managerExperimentList/toAnotherExp.png
:alt: See this experiment detail
* If has many experiments on the table, you can use the ``filter`` button.
.. image:: ../../../img/webui-img/managerExperimentList/expFilter.png
:target: ../../../img/webui-img/managerExperimentList/expFilter.png
:alt: filter button
Experiment details
------------------
View overview page
^^^^^^^^^^^^^^^^^^
* On the overview tab, you can see the experiment information and status and the performance of ``top trials``.
.. image:: ../../../img/webui-img/full-oview.png
:target: ../../../img/webui-img/full-oview.png
:alt: overview
* If you want to see experiment search space and config, please click the right button ``Search space`` and ``Config`` (when you hover on this button).
**Search space file:**
.. image:: ../../../img/webui-img/searchSpace.png
:target: ../../../img/webui-img/searchSpace.png
:alt: searchSpace
**Config file:**
.. image:: ../../../img/webui-img/config.png
:target: ../../../img/webui-img/config.png
:alt: config
* You can view and download ``nni-manager/dispatcher log files`` on here.
.. image:: ../../../img/webui-img/review-log.png
:target: ../../../img/webui-img/review-log.png
:alt: logfile
* If your experiment has many trials, you can change the refresh interval here.
.. image:: ../../../img/webui-img/refresh-interval.png
:target: ../../../img/webui-img/refresh-interval.png
:alt: refresh
* You can change some experiment configurations such as ``maxExecDuration``, ``maxTrialNum`` and ``trial concurrency`` on here.
.. image:: ../../../img/webui-img/edit-experiment-param.png
:target: ../../../img/webui-img/edit-experiment-param.png
:alt: editExperimentParams
View job default metric
^^^^^^^^^^^^^^^^^^^^^^^
* Click the tab ``Default metric`` to see the point chart of all trials. Hover to see its specific default metric and search space message.
.. image:: ../../../img/webui-img/default-metric.png
:target: ../../../img/webui-img/default-metric.png
:alt: defaultMetricGraph
* Turn on the switch named ``Optimization curve`` to see the experiment's optimization curve.
.. image:: ../../../img/webui-img/best-curve.png
:target: ../../../img/webui-img/best-curve.png
:alt: bestCurveGraph
View hyper parameter
^^^^^^^^^^^^^^^^^^^^
Click the tab ``Hyper-parameter`` to see the parallel chart.
* You can click the ``add/remove`` button to add or remove axes.
* Drag the axes to swap axes on the chart.
* You can select the percentage to see top trials.
.. image:: ../../../img/webui-img/hyperPara.png
:target: ../../../img/webui-img/hyperPara.png
:alt: hyperParameterGraph
View Trial Duration
^^^^^^^^^^^^^^^^^^^
Click the tab ``Trial Duration`` to see the bar chart.
.. image:: ../../../img/webui-img/trial_duration.png
:target: ../../../img/webui-img/trial_duration.png
:alt: trialDurationGraph
View Trial Intermediate Result chart
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Click the tab ``Intermediate Result`` to see the line chart.
.. image:: ../../../img/webui-img/trials_intermeidate.png
:target: ../../../img/webui-img/trials_intermeidate.png
:alt: trialIntermediateGraph
The trial may have many intermediate results in the training process. In order to see the trend of some trials more clearly, we set a filtering function for the intermediate result chart.
You may find that these trials will get better or worse at an intermediate result. This indicates that it is an important and relevant intermediate result. To take a closer look at the point here, you need to enter its corresponding X-value at #Intermediate. Then input the range of metrics on this intermedia result. In the picture below, we choose the No. 4 intermediate result and set the range of metrics to 0.8-1.
.. image:: ../../../img/webui-img/filter-intermediate.png
:target: ../../../img/webui-img/filter-intermediate.png
:alt: filterIntermediateGraph
View trials status
^^^^^^^^^^^^^^^^^^
Click the tab ``Trials Detail`` to see the status of all trials. Specifically:
* Trial detail: trial's id, trial's duration, start time, end time, status, accuracy, and search space file.
.. image:: ../../../img/webui-img/detail-local.png
:target: ../../../img/webui-img/detail-local.png
:alt: detailLocalImage
* Support searching for a specific trial by its id, status, Trial No. and trial parameters.
**Trial id:**
.. image:: ../../../img/webui-img/detail/searchId.png
:target: ../../../img/webui-img/detail/searchId.png
:alt: searchTrialId
**Trial No.:**
.. image:: ../../../img/webui-img/detail/searchNo.png
:target: ../../../img/webui-img/detail/searchNo.png
:alt: searchTrialNo.
**Trial status:**
.. image:: ../../../img/webui-img/detail/searchStatus.png
:target: ../../../img/webui-img/detail/searchStatus.png
:alt: searchStatus
**Trial parameters:**
``parameters whose type is choice:``
.. image:: ../../../img/webui-img/detail/searchParameterChoice.png
:target: ../../../img/webui-img/detail/searchParameterChoice.png
:alt: searchParameterChoice
``parameters whose type is not choice:``
.. image:: ../../../img/webui-img/detail/searchParameterRange.png
:target: ../../../img/webui-img/detail/searchParameterRange.png
:alt: searchParameterRange
* The button named ``Add column`` can select which column to show on the table. If you run an experiment whose final result is a dict, you can see other keys in the table. You can choose the column ``Intermediate count`` to watch the trial's progress.
.. image:: ../../../img/webui-img/addColumn.png
:target: ../../../img/webui-img/addColumn.png
:alt: addColumnGraph
* If you want to compare some trials, you can select them and then click ``Compare`` to see the results.
.. image:: ../../../img/webui-img/select-trial.png
:target: ../../../img/webui-img/select-trial.png
:alt: selectTrialGraph
.. image:: ../../../img/webui-img/compare.png
:target: ../../../img/webui-img/compare.png
:alt: compareTrialsGraph
* You can use the button named ``Copy as python`` to copy the trial's parameters.
.. image:: ../../../img/webui-img/copyParameter.png
:target: ../../../img/webui-img/copyParameter.png
:alt: copyTrialParameters
* Intermediate Result chart: you can see the default metric in this chart by clicking the intermediate button.
.. image:: ../../../img/webui-img/intermediate.png
:target: ../../../img/webui-img/intermediate.png
:alt: intermeidateGraph
* Kill: you can kill a job that status is running.
.. image:: ../../../img/webui-img/kill-running.png
:target: ../../../img/webui-img/kill-running.png
:alt: killTrial
.. bb68c969dbc2b3a2ec79d323cbd31401
.. 424a57ff9c92c3f4738a9beabc4cfb50
Web 界面
==================
========
Experiments 管理
-----------------------
点击导航栏上的 ``All experiments`` 标签。
Q&A
---
.. image:: ../../img/webui-img/managerExperimentList/experimentListNav.png
:target: ../../img/webui-img/managerExperimentList/experimentListNav.png
在 detail 页面的表格里明明有很多 trial 但是 Default Metric 图是空的没有数据
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. note::
首先你要明白 ``Default metric`` 和 ``Hyper parameter`` 图只展示成功 trial。
当你觉得 ``Default metric``、``Hyper parameter`` 图有问题的时候应该做什么
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* 从 Experiment summary 下载实验结果(实验配置,trial 信息,中间值),并把这些结果上传进 issue 里。
.. image:: ../../../img/webui-img/summary.png
:target: ../../../img/webui-img/summary.png
:alt: summary
当你的实验有故障时应该做什么
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* 点击实验状态右边的小图标把 error 信息截屏。
* 然后点击 learn about 去下载 log 文件。And then click the ``learn about`` to download ``nni-manager`` and ``dispatcher`` logfile.
* 点击页面导航栏的 About 按钮点 Feedback 开一个 issue,附带上以上的截屏和 log 信息。
.. image:: ../../../img/webui-img/experimentError.png
:target: ../../../img/webui-img/experimentError.png
:alt: experimentError
当你的 trial 跑失败了你应该怎么做
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* 使用 Customized trial 功能。向实验提交相同的 trial 参数即可。
.. image:: ../../../img/webui-img/detail/customizedTrialButton.png
:target: ../../../img/webui-img/detail/customizedTrialButton.png
:alt: customizedTrialButton
.. image:: ../../../img/webui-img/detail/customizedTrial.png
:target: ../../../img/webui-img/detail/customizedTrial.png
:alt: customizedTrial
* ``Log 模块`` 能帮助你找到错误原因。 有三个按钮: ``View trial log``, ``View trial error`` 和 ``View trial stdout`` 可查 log。如果你用 OpenPai 或者 Kubeflow,你能看到 trial stdout 和 nfs log。
有任何问题请在 issue 里联系我们。
**local mode:**
.. image:: ../../../img/webui-img/detail/log-local.png
:target: ../../../img/webui-img/detail/log-local.png
:alt: logOnLocal
**OpenPAI, Kubeflow and other mode:**
.. image:: ../../../img/webui-img/detail-pai.png
:target: ../../../img/webui-img/detail-pai.png
:alt: detailPai
怎样去使用 dict intermediate result
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
`The discussion <https://github.com/microsoft/nni/discussions/4289>`_ 能帮助你。
.. _exp-manage-webportal:
实验管理
--------
实验管理页面能统筹你机器上的所有实验。
.. image:: ../../../img/webui-img/managerExperimentList/experimentListNav.png
:target: ../../../img/webui-img/managerExperimentList/experimentListNav.png
:alt: ExperimentList nav
* 在 ``All experiments`` 页面,可以看到机器上的所有 Experiment。
.. image:: ../../img/webui-img/managerExperimentList/expList.png
:target: ../../img/webui-img/managerExperimentList/expList.png
:alt: Experiments list
.. image:: ../../../img/webui-img/managerExperimentList/expList.png
:target: ../../../img/webui-img/managerExperimentList/expList.png
:alt: Experiments list
* 查看 Experiment 更多详细信息时,可以单击 trial ID 跳转至该 Experiment 详情页,如下所示:
.. image:: ../../img/webui-img/managerExperimentList/toAnotherExp.png
:target: ../../img/webui-img/managerExperimentList/toAnotherExp.png
:alt: See this experiment detail
* 查看 Experiment 更多详细信息时,可以单击 trial ID 跳转至该 Experiment 详情页,如下所示:
* 如果表格里有很多 Experiment,可以使用 ``filter`` 按钮。
.. image:: ../../img/webui-img/managerExperimentList/expFilter.png
:target: ../../img/webui-img/managerExperimentList/expFilter.png
:alt: filter button
.. image:: ../../../img/webui-img/managerExperimentList/toAnotherExp.png
:target: ../../../img/webui-img/managerExperimentList/toAnotherExp.png
:alt: See this experiment detail
查看概要页面
-----------------
* 如果表格里有很多 Experiment,可以使用 ``filter`` 按钮。
点击 ``Overview`` 标签。
* 在 Overview 标签上,可看到 Experiment trial 的概况、搜索空间以及 ``top trials`` 的结果。
.. image:: ../../../img/webui-img/managerExperimentList/expFilter.png
:target: ../../../img/webui-img/managerExperimentList/expFilter.png
:alt: filter button
.. image:: ../../img/webui-img/full-oview.png
:target: ../../img/webui-img/full-oview.png
:alt: overview
实验详情
--------
如果想查看 Experiment 配置和搜索空间,点击右边的 ``Search space`` 和 ``Config`` 按钮。
查看实验 overview 页面
^^^^^^^^^^^^^^^^^^^^^^^
1. 搜索空间文件:
* 在 Overview 标签上,可看到 Experiment trial 的概况、搜索空间以及 ``top trials`` 的结果。
.. image:: ../../img/webui-img/searchSpace.png
:target: ../../img/webui-img/searchSpace.png
:alt: searchSpace
.. image:: ../../../img/webui-img/full-oview.png
:target: ../../../img/webui-img/full-oview.png
:alt: overview
2. 配置文件:
.. image:: ../../img/webui-img/config.png
:target: ../../img/webui-img/config.png
:alt: config
* 如果想查看 Experiment 配置和搜索空间,点击右边的 ``Search space`` 和 ``Config`` 按钮。
**搜索空间文件:**
* 你可以在这里查看和下载 ``nni-manager/dispatcher 日志文件``。
.. image:: ../../../img/webui-img/searchSpace.png
:target: ../../../img/webui-img/searchSpace.png
:alt: searchSpace
.. image:: ../../img/webui-img/review-log.png
:target: ../../img/webui-img/review-log.png
:alt: logfile
**配置文件:**
* 如果 Experiment 包含了较多 Trial,可改变刷新间隔。
.. image:: ../../img/webui-img/refresh-interval.png
:target: ../../img/webui-img/refresh-interval.png
:alt: refresh
.. image:: ../../../img/webui-img/config.png
:target: ../../../img/webui-img/config.png
:alt: config
* 你可以在这里查看和下载 ``nni-manager/dispatcher 日志文件``。
* 单击按钮 ``Experiment summary`` ,可以查看和下载 Experiment 结果(``Experiment 配置``,``trial 信息`` 和 ``中间结果`` )。
.. image:: ../../img/webui-img/summary.png
:target: ../../img/webui-img/summary.png
:alt: summary
.. image:: ../../../img/webui-img/review-log.png
:target: ../../../img/webui-img/review-log.png
:alt: logfile
* 在这里修改 Experiment 配置(例如 ``maxExecDuration``, ``maxTrialNum`` 和 ``trial concurrency``)
* 如果 Experiment 包含了较多 Trial,可改变刷新间隔
.. image:: ../../img/webui-img/edit-experiment-param.png
:target: ../../img/webui-img/edit-experiment-param.png
:alt: editExperimentParams
.. image:: ../../../img/webui-img/refresh-interval.png
:target: ../../../img/webui-img/refresh-interval.png
:alt: refresh
* 通过单击 ``Learn about`` ,可以查看错误消息和 ``nni-manager/dispatcher 日志文件``
* 在这里修改 Experiment 配置(例如 ``maxExecDuration``, ``maxTrialNum`` 和 ``trial concurrency``)。
.. image:: ../../img/webui-img/experimentError.png
:target: ../../img/webui-img/experimentError.png
:alt: experimentError
.. image:: ../../../img/webui-img/edit-experiment-param.png
:target: ../../../img/webui-img/edit-experiment-param.png
:alt: editExperimentParams
* ``About`` 菜单内含有版本信息以及问题反馈渠道。
查看 trial 最终结果
----------------------------------------------
^^^^^^^^^^^^^^^^^^^^^
* ``Default metric`` 是所有 trial 的最终结果图。 在每一个结果上悬停鼠标可以看到 trial 信息,比如 trial id、No. 超参等。
* ``Default metric`` 是所有 trial 的最终结果图。 在每一个结果上悬停鼠标可以看到 trial 信息,比如 trial id、No.、超参等。
.. image:: ../../img/webui-img/default-metric.png
:target: ../../img/webui-img/default-metric.png
.. image:: ../../../img/webui-img/default-metric.png
:target: ../../../img/webui-img/default-metric.png
:alt: defaultMetricGraph
......@@ -138,13 +221,15 @@ Experiments 管理
* 打开 ``Optimization curve`` 来查看 Experiment 的优化曲线。
.. image:: ../../img/webui-img/best-curve.png
:target: ../../img/webui-img/best-curve.png
.. image:: ../../../img/webui-img/best-curve.png
:target: ../../../img/webui-img/best-curve.png
:alt: bestCurveGraph
查看超参
--------------------
^^^^^^^^^^
单击 ``Hyper-parameter`` 标签查看平行坐标系图。
......@@ -154,32 +239,35 @@ Experiments 管理
* 通过调节百分比来查看 top trial。
.. image:: ../../img/webui-img/hyperPara.png
:target: ../../img/webui-img/hyperPara.png
.. image:: ../../../img/webui-img/hyperPara.png
:target: ../../../img/webui-img/hyperPara.png
:alt: hyperParameterGraph
查看 Trial 运行时间
-------------------
^^^^^^^^^^^^^^^^^^^^^^
点击 ``Trial Duration`` 标签来查看柱状图。
.. image:: ../../img/webui-img/trial_duration.png
:target: ../../img/webui-img/trial_duration.png
.. image:: ../../../img/webui-img/trial_duration.png
:target: ../../../img/webui-img/trial_duration.png
:alt: trialDurationGraph
查看 Trial 中间结果
------------------------------------
^^^^^^^^^^^^^^^^^^^^^^
单击 ``Intermediate Result`` 标签查看折线图。
.. image:: ../../img/webui-img/trials_intermeidate.png
:target: ../../img/webui-img/trials_intermeidate.png
.. image:: ../../../img/webui-img/trials_intermeidate.png
:target: ../../../img/webui-img/trials_intermeidate.png
:alt: trialIntermediateGraph
......@@ -189,14 +277,15 @@ Trial 在训练过程中可能有大量中间结果。 为了更清楚的理解
这样可以发现 Trial 在某个中间结果上会变得更好或更差。 这表明它是一个重要的并相关的中间结果。 如果要仔细查看这个点,可以在 #Intermediate 中输入其 X 坐标。 并输入这个中间结果的指标范围。 在下图中,选择了第四个中间结果并将指标范围设置为了 0.8 -1。
.. image:: ../../img/webui-img/filter-intermediate.png
:target: ../../img/webui-img/filter-intermediate.png
.. image:: ../../../img/webui-img/filter-intermediate.png
:target: ../../../img/webui-img/filter-intermediate.png
:alt: filterIntermediateGraph
查看 Trial 状态
------------------
^^^^^^^^^^^^^^^^^^
点击 ``Trials Detail`` 标签查看所有 Trial 的状态。具体如下:
......@@ -204,54 +293,73 @@ Trial 在训练过程中可能有大量中间结果。 为了更清楚的理解
* Trial 详情:Trial id,持续时间,开始时间,结束时间,状态,精度和 search space 文件。
.. image:: ../../img/webui-img/detail-local.png
:target: ../../img/webui-img/detail-local.png
.. image:: ../../../img/webui-img/detail-local.png
:target: ../../../img/webui-img/detail-local.png
:alt: detailLocalImage
* 支持通过 id,状态,Trial 编号以及参数来搜索。
* * 支持通过 id,状态,Trial 编号以及参数来搜索。
**Trial id:**
.. image:: ../../../img/webui-img/detail/searchId.png
:target: ../../../img/webui-img/detail/searchId.png
:alt: searchTrialId
1. Trial id:
.. image:: ../../img/webui-img/detail/searchId.png
:target: ../../img/webui-img/detail/searchId.png
:alt: searchTrialId
**Trial No.:**
2. Trial No.:
.. image:: ../../img/webui-img/detail/searchNo.png
:target: ../../img/webui-img/detail/searchNo.png
:alt: searchTrialNo.
.. image:: ../../../img/webui-img/detail/searchNo.png
:target: ../../../img/webui-img/detail/searchNo.png
:alt: searchTrialNo.
3. Trial 状态:
.. image:: ../../img/webui-img/detail/searchStatus.png
:target: ../../img/webui-img/detail/searchStatus.png
:alt: searchStatus
4. Trial 参数:
**Trial status:**
(1) 类型为 choice 的参数:
.. image:: ../../img/webui-img/detail/searchParameterChoice.png
:target: ../../img/webui-img/detail/searchParameterChoice.png
:alt: searchParameterChoice
(2) 类型不是 choice 的参数:
.. image:: ../../../img/webui-img/detail/searchStatus.png
:target: ../../../img/webui-img/detail/searchStatus.png
:alt: searchStatus
**Trial parameters:**
``类型为 choice 的参数:``
.. image:: ../../../img/webui-img/detail/searchParameterChoice.png
:target: ../../../img/webui-img/detail/searchParameterChoice.png
:alt: searchParameterChoice
``类型不是 choice 的参数:``
.. image:: ../../../img/webui-img/detail/searchParameterRange.png
:target: ../../../img/webui-img/detail/searchParameterRange.png
:alt: searchParameterRange
.. image:: ../../img/webui-img/detail/searchParameterRange.png
:target: ../../img/webui-img/detail/searchParameterRange.png
:alt: searchParameterRange
* ``Add column`` 按钮可选择在表格中显示的列。 如果 Experiment 的最终结果是 dict,则可以在表格中查看其它键。可选择 ``Intermediate count`` 列来查看 Trial 进度。
.. image:: ../../img/webui-img/addColumn.png
:target: ../../img/webui-img/addColumn.png
.. image:: ../../../img/webui-img/addColumn.png
:target: ../../../img/webui-img/addColumn.png
:alt: addColumnGraph
......@@ -259,70 +367,49 @@ Trial 在训练过程中可能有大量中间结果。 为了更清楚的理解
* 如果要比较某些 Trial,可选择并点击 ``Compare`` 来查看结果。
.. image:: ../../img/webui-img/select-trial.png
:target: ../../img/webui-img/select-trial.png
.. image:: ../../../img/webui-img/select-trial.png
:target: ../../../img/webui-img/select-trial.png
:alt: selectTrialGraph
.. image:: ../../img/webui-img/compare.png
:target: ../../img/webui-img/compare.png
.. image:: ../../../img/webui-img/compare.png
:target: ../../../img/webui-img/compare.png
:alt: compareTrialsGraph
* ``Tensorboard`` 请参考 `此文档 <Tensorboard.rst>`__。
* 可使用 ``Copy as python`` 按钮来拷贝 Trial 的参数。
.. image:: ../../img/webui-img/copyParameter.png
:target: ../../img/webui-img/copyParameter.png
:alt: copyTrialParameters
* 您可以在 ``Log`` 选项卡上看到 Trial 日志。 在本地模式下有 ``View trial log``, ``View trial error`` 和 ``View trial stdout`` 三个按钮。 * 如果在 OpenPAI 或 Kubeflow 平台上运行,还可以看到 hdfsLog。
1. 本机模式
.. image:: ../../img/webui-img/detail/log-local.png
:target: ../../img/webui-img/detail/log-local.png
:alt: logOnLocal
.. image:: ../../../img/webui-img/copyParameter.png
:target: ../../../img/webui-img/copyParameter.png
:alt: copyTrialParameters
2. OpenPAI、Kubeflow 等模式:
.. image:: ../../img/webui-img/detail-pai.png
:target: ../../img/webui-img/detail-pai.png
:alt: detailPai
* 中间结果图:可在此图中通过点击 intermediate 按钮来查看默认指标。
.. image:: ../../img/webui-img/intermediate.png
:target: ../../img/webui-img/intermediate.png
:alt: intermeidateGraph
.. image:: ../../../img/webui-img/intermediate.png
:target: ../../../img/webui-img/intermediate.png
:alt: intermeidateGraph
* Kill: 可终止正在运行的任务。
.. image:: ../../img/webui-img/kill-running.png
:target: ../../img/webui-img/kill-running.png
:alt: killTrial
* Kill: 可终止正在运行的 trial。
* 自定义 Trial:您可以更改此 Trial 参数,然后将其提交给 Experiment。如果您想重新运行失败的 Trial ,您可以向 Experiment 提交相同的参数。
.. image:: ../../img/webui-img/detail/customizedTrialButton.png
:target: ../../img/webui-img/detail/customizedTrialButton.png
:alt: customizedTrialButton
.. image:: ../../../img/webui-img/kill-running.png
:target: ../../../img/webui-img/kill-running.png
:alt: killTrial
.. image:: ../../img/webui-img/detail/customizedTrial.png
:target: ../../img/webui-img/detail/customizedTrial.png
:alt: customizedTrial
###################
Feature Engineering
###################
We are glad to introduce Feature Engineering toolkit on top of NNI,
it's still in the experiment phase which might evolve based on usage feedback.
We'd like to invite you to use, feedback and even contribute.
For details, please refer to the following tutorials:
.. toctree::
:maxdepth: 2
Overview <FeatureEngineering/Overview>
GradientFeatureSelector <FeatureEngineering/GradientFeatureSelector>
GBDTSelector <FeatureEngineering/GBDTSelector>
Feature Engineering with NNI
============================
We are glad to announce the alpha release for Feature Engineering toolkit on top of NNI, it's still in the experiment phase which might evolve based on user feedback. We'd like to invite you to use, feedback and even contribute.
.. note::
We are glad to announce the alpha release for Feature Engineering toolkit on top of NNI, it's still in the experiment phase which might evolve based on user feedback. We'd like to invite you to use, feedback and even contribute.
For now, we support the following feature selector:
* `GradientFeatureSelector <./GradientFeatureSelector.rst>`__
* `GBDTSelector <./GBDTSelector.rst>`__
* :doc:`GradientFeatureSelector <./gradient_feature_selector>`
* :doc:`GBDTSelector <./gbdt_selector>`
These selectors are suitable for tabular data(which means it doesn't include image, speech and text data).
......@@ -61,8 +63,8 @@ Here is an example:
from nni.feature_engineering.feature_selector import FeatureSelector
class CustomizedSelector(FeatureSelector):
def __init__(self, ...):
...
def __init__(self, *args, **kwargs):
...
**2. Implement fit and _get_selected features Function**
......@@ -73,8 +75,8 @@ Here is an example:
from nni.feature_engineering.feature_selector import FeatureSelector
class CustomizedSelector(FeatureSelector):
def __init__(self, ...):
...
def __init__(self, *args, **kwargs):
...
def fit(self, X, y, **kwargs):
"""
......@@ -126,16 +128,15 @@ Here is an example:
from nni.feature_engineering.feature_selector import FeatureSelector
class CustomizedSelector(FeatureSelector, BaseEstimator):
def __init__(self, ...):
...
def __init__(self, *args, **kwargs):
...
def get_params(self, ...):
def get_params(self, *args, **kwargs):
"""
Get parameters for this estimator.
"""
params = self.__dict__
params = {key: val for (key, val) in params.items()
if not key.endswith('_')}
params = {key: val for (key, val) in params.items() if not key.endswith('_')}
return params
def set_params(self, **params):
......@@ -143,8 +144,8 @@ Here is an example:
Set the parameters of this estimator.
"""
for param in params:
if hasattr(self, param):
setattr(self, param, params[param])
if hasattr(self, param):
setattr(self, param, params[param])
return self
**2. Inherit the SelectorMixin Class and its Function**
......@@ -157,10 +158,10 @@ Here is an example:
from nni.feature_engineering.feature_selector import FeatureSelector
class CustomizedSelector(FeatureSelector, BaseEstimator, SelectorMixin):
def __init__(self, ...):
def __init__(self, *args, **kwargs):
...
def get_params(self, ...):
def get_params(self, *args, **kwargs):
"""
Get parameters for this estimator.
"""
......@@ -174,8 +175,8 @@ Here is an example:
Set the parameters of this estimator.
"""
for param in params:
if hasattr(self, param):
setattr(self, param, params[param])
if hasattr(self, param):
setattr(self, param, params[param])
return self
def get_support(self, indices=False):
......@@ -308,13 +309,3 @@ Benchmark
The dataset of benchmark could be download in `here <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/>`__
The code could be refenrence ``/examples/feature_engineering/gradient_feature_selector/benchmark_test.py``.
Reference and Feedback
----------------------
* To `report a bug <https://github.com/microsoft/nni/issues/new?template=bug-report.rst>`__ for this feature in GitHub;
* To `file a feature or improvement request <https://github.com/microsoft/nni/issues/new?template=enhancement.rst>`__ for this feature in GitHub;
* To know more about :githublink:`Neural Architecture Search with NNI <docs/en_US/NAS/Overview.rst>`\ ;
* To know more about :githublink:`Model Compression with NNI <docs/en_US/Compression/Overview.rst>`\ ;
* To know more about :githublink:`Hyperparameter Tuning with NNI <docs/en_US/Tuner/BuiltinTuner.rst>`\ ;
Feature Engineering
===================
.. toctree::
:maxdepth: 2
Overview <overview>
GradientFeatureSelector <gradient_feature_selector>
GBDTSelector <gbdt_selector>
.. 0958703dcd6f8078a1ad1bcaef9c7199
###################
特征工程
###################
很高兴在 NNI 上引入了特征工程工具包,
其仍处于试验阶段,会根据使用反馈来演化。
诚挚邀请您使用、反馈,或更多贡献。
详细信息,参考以下教程:
.. toctree::
:maxdepth: 2
概述 <FeatureEngineering/Overview>
GradientFeatureSelector <FeatureEngineering/GradientFeatureSelector>
GBDTSelector <FeatureEngineering/GBDTSelector>
Advanced Usage
==============
.. toctree::
:hidden:
Command Line Tool Example </tutorials/hpo_nnictl/nnictl>
Implement Custom Tuners and Assessors <custom_algorithm>
Install Custom or 3rd-party Tuners and Assessors <custom_algorithm_installation>
Tuner Benchmark <hpo_benchmark>
Tuner Benchmark Example Statistics <hpo_benchmark_stats>
Assessor: Early Stopping
========================
In HPO, some hyperparameter sets may have obviously poor performance and it will be unnecessary to finish the evaluation.
This is called *early stopping*, and in NNI early stopping algorithms are called *assessors*.
An assessor monitors *intermediate results* of each *trial*.
If a trial is predicted to produce suboptimal final result, the assessor will stop that trial immediately,
to save computing resources for other hyperparameter sets.
As introduced in quickstart tutorial, a trial is the evaluation process of a hyperparameter set,
and intermediate results are reported with :func:`nni.report_intermediate_result` API in trial code.
Typically, intermediate results are accuracy or loss metrics of each epoch.
Using an assessor will increase the efficiency of computing resources,
but may slightly reduce the predicition accuracy of tuners.
It is recommended to use an assessor when computing resources are insufficient.
Common Usage
------------
The usage of assessors are similar to tuners.
To use a built-in assessor you need to specify its name and arguments:
.. code-block:: python
config.assessor.name = 'Medianstop'
config.tuner.class_args = {'optimize_mode': 'maximize'}
Built-in Assessors
------------------
.. list-table::
:header-rows: 1
:widths: auto
* - Assessor
- Brief Introduction of Algorithm
* - :class:`Median Stop <nni.algorithms.hpo.medianstop_assessor.MedianstopAssessor>`
- Stop if the hyperparameter set performs worse than median at any step.
* - :class:`Curve Fitting <nni.algorithms.hpo.curvefitting_assessor.CurvefittingAssessor>`
- Stop if the learning curve will likely converge to suboptimal result.
Customize-Tuner
===============
Customizing Algorithms
======================
Customize Tuner
---------------
NNI provides state-of-the-art tuning algorithm in builtin-tuners. NNI supports to build a tuner by yourself for tuning demand.
......@@ -19,7 +22,7 @@ Here is an example:
from nni.tuner import Tuner
class CustomizedTuner(Tuner):
def __init__(self, ...):
def __init__(self, *args, **kwargs):
...
**2. Implement receive_trial_result, generate_parameter and update_search_space function**
......@@ -29,7 +32,7 @@ Here is an example:
from nni.tuner import Tuner
class CustomizedTuner(Tuner):
def __init__(self, ...):
def __init__(self, *args, **kwargs):
...
def receive_trial_result(self, parameter_id, parameters, value, **kwargs):
......@@ -122,4 +125,69 @@ More detail example you could see:
Write a more advanced automl algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called ``advisor`` which directly inherits from ``MsgDispatcherBase`` in :githublink:`msg_dispatcher_base.py <nni/runtime/msg_dispatcher_base.py>`. Please refer to `here <CustomizeAdvisor.rst>`__ for how to write a customized advisor.
The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called ``advisor`` which directly inherits from ``MsgDispatcherBase`` in :githublink:`msg_dispatcher_base.py <nni/runtime/msg_dispatcher_base.py>`.
Customize Assessor
------------------
NNI supports to build an assessor by yourself for tuning demand.
If you want to implement a customized Assessor, there are three things to do:
#. Inherit the base Assessor class
#. Implement assess_trial function
#. Configure your customized Assessor in experiment YAML config file
**1. Inherit the base Assessor class**
.. code-block:: python
from nni.assessor import Assessor
class CustomizedAssessor(Assessor):
def __init__(self, *args, **kwargs):
...
**2. Implement assess trial function**
.. code-block:: python
from nni.assessor import Assessor, AssessResult
class CustomizedAssessor(Assessor):
def __init__(self, *args, **kwargs):
...
def assess_trial(self, trial_history):
"""
Determines whether a trial should be killed. Must override.
trial_history: a list of intermediate result objects.
Returns AssessResult.Good or AssessResult.Bad.
"""
# you code implement here.
...
**3. Configure your customized Assessor in experiment YAML config file**
NNI needs to locate your customized Assessor class and instantiate the class, so you need to specify the location of the customized Assessor class and pass literal values as parameters to the __init__ constructor.
.. code-block:: yaml
assessor:
codeDir: /home/abc/myassessor
classFileName: my_customized_assessor.py
className: CustomizedAssessor
# Any parameter need to pass to your Assessor class __init__ constructor
# can be specified in this optional classArgs field, for example
classArgs:
arg1: value1
Please noted in **2**. The object ``trial_history`` are exact the object that Trial send to Assessor by using SDK ``report_intermediate_result`` function.
The working directory of your assessor is ``<home>/nni-experiments/<experiment_id>/log``\ , which can be retrieved with environment variable ``NNI_LOG_DIRECTORY``\ ,
More detail example you could see:
* :githublink:`medianstop-assessor <nni/algorithms/hpo/medianstop_assessor.py>`
* :githublink:`curvefitting-assessor <nni/algorithms/hpo/curvefitting_assessor/>`
**How to register customized algorithms as builtin tuners, assessors and advisors**
=======================================================================================
.. contents::
How to register customized algorithms as builtin tuners, assessors and advisors
===============================================================================
Overview
--------
NNI provides a lot of `builtin tuners <../Tuner/BuiltinTuner.rst>`_, `advisors <../Tuner/HyperbandAdvisor.rst>`__ and `assessors <../Assessor/BuiltinAssessor.rst>`__ can be used directly for Hyper Parameter Optimization, and some extra algorithms can be registered via ``nnictl algo register --meta <path_to_meta_file>`` after NNI is installed. You can check builtin algorithms via ``nnictl algo list`` command.
NNI provides a lot of :doc:`builtin tuners <tuners>`, and :doc:`assessors <assessors>` can be used directly for Hyper Parameter Optimization, and some extra algorithms can be registered via ``nnictl algo register --meta <path_to_meta_file>`` after NNI is installed. You can check builtin algorithms via ``nnictl algo list`` command.
NNI also provides the ability to build your own customized tuners, advisors and assessors. To use the customized algorithm, users can simply follow the spec in experiment config file to properly reference the algorithm, which has been illustrated in the tutorials of `customized tuners <../Tuner/CustomizeTuner.rst>`_ / `advisors <../Tuner/CustomizeAdvisor.rst>`__ / `assessors <../Assessor/CustomizeAssessor.rst>`__.
NNI also provides the ability to build your own customized tuners, advisors and assessors. To use the customized algorithm, users can simply follow the spec in experiment config file to properly reference the algorithm, which has been illustrated in the tutorials of :doc:`customized algorithms <custom_algorithm>`.
NNI also allows users to install the customized algorithm as a builtin algorithm, in order for users to use the algorithm in the same way as NNI builtin tuners/advisors/assessors. More importantly, it becomes much easier for users to share or distribute their implemented algorithm to others. Customized tuners/advisors/assessors can be installed into NNI as builtin algorithms, once they are installed into NNI, you can use your customized algorithms the same way as builtin tuners/advisors/assessors in your experiment configuration file. For example, you built a customized tuner and installed it into NNI using a builtin name ``mytuner``, then you can use this tuner in your configuration file like below:
......@@ -18,20 +15,15 @@ NNI also allows users to install the customized algorithm as a builtin algorithm
tuner:
builtinTunerName: mytuner
Register customized algorithms as builtin tuners, assessors and advisors
------------------------------------------------------------------------
Register customized algorithms like builtin tuners, assessors and advisors
--------------------------------------------------------------------------
You can follow below steps to build a customized tuner/assessor/advisor, and register it into NNI as builtin algorithm.
1. Create a customized tuner/assessor/advisor
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Reference following instructions to create:
* `customized tuner <../Tuner/CustomizeTuner.rst>`_
* `customized assessor <../Assessor/CustomizeAssessor.rst>`_
* `customized advisor <../Tuner/CustomizeAdvisor.rst>`_
Reference following instruction: :doc:`custom_algorithm`
2. (Optional) Create a validator to validate classArgs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -101,9 +93,9 @@ Run following command to register the customized algorithms as builtin algorithm
.. code-block:: bash
nnictl algo register --meta <path_to_meta_file>
nnictl algo register --meta PATH_TO_META_FILE
The ``<path_to_meta_file>`` is the path to the yaml file your created in above section.
The ``PATH_TO_META_FILE`` is the path to the yaml file your created in above section.
Reference `customized tuner example <#example-register-a-customized-tuner-as-a-builtin-tuner>`_ for a full example.
......@@ -128,7 +120,7 @@ List builtin algorithms
Run following command to list the registered builtin algorithms:
.. code-block:: bash
.. code-block:: text
nnictl algo list
+-----------------+------------+-----------+--------=-------------+------------------------------------------+
......@@ -167,7 +159,8 @@ For example:
Porting customized algorithms from v1.x to v2.x
-----------------------------------------------
All that needs to be modified is to delete ``NNI Package :: tuner`` metadata in ``setup.py`` and add a meta file mentioned in `4. Prepare meta file`_. Then you can follow `Register customized algorithms as builtin tuners, assessors and advisors`_ to register your customized algorithms.
All that needs to be modified is to delete ``NNI Package :: tuner`` metadata in ``setup.py`` and add a meta file mentioned in `4. Prepare meta file`_.
Then you can follow `Register customized algorithms like builtin tuners, assessors and advisors`_ to register your customized algorithms.
Example: Register a customized tuner as a builtin tuner
-------------------------------------------------------
......@@ -213,7 +206,7 @@ Check the registered builtin algorithms
Then run command ``nnictl algo list``\ , you should be able to see that demotuner is installed:
.. code-block:: bash
.. code-block:: text
+-----------------+------------+-----------+--------=-------------+------------------------------------------+
| Name | Type | source | Class Name | Module Name |
......
HPO Benchmarks
==============
.. toctree::
:hidden:
HPO Benchmark Example Statistics <hpo_benchmark_stats>
We provide a benchmarking tool to compare the performances of tuners provided by NNI (and users' custom tuners) on different
types of tasks. This tool uses the `automlbenchmark repository <https://github.com/openml/automlbenchmark)>`_ to run different *benchmarks* on the NNI *tuners*.
types of tasks. This tool uses the `automlbenchmark repository <https://github.com/openml/automlbenchmark>`_ to run different *benchmarks* on the NNI *tuners*.
The tool is located in ``examples/trials/benchmarking/automlbenchmark``. This document provides a brief introduction to the tool, its usage, and currently available benchmarks.
Overview and Terminologies
......@@ -29,7 +24,7 @@ and handle the repeated trial-evaluate-feedback loop in the **framework** abstra
contains two main components: a **benchmark** from the automlbenchmark library, and an **architecture** which defines the search
space and the evaluator. To further clarify, we provide the definition for the terminologies used in this document.
* **tuner**\ : a `tuner or advisor provided by NNI <https://nni.readthedocs.io/en/stable/builtin_tuner.html>`_, or a custom tuner provided by the user.
* **tuner**\ : a :doc:`tuner or advisor provided by NNI <tuners>`, or a custom tuner provided by the user.
* **task**\ : an abstraction used by automlbenchmark. A task can be thought of as a tuple (dataset, metric). It provides train and test datasets to the frameworks. Then, based on the returns predictions on the test set, the task evaluates the metric (e.g., mse for regression, f1 for classification) and reports the score.
* **benchmark**\ : an abstraction used by automlbenchmark. A benchmark is a set of tasks, along with other external constraints such as time limits.
* **framework**\ : an abstraction used by automlbenchmark. Given a task, a framework solves the proposed regression or classification problem using train data and produces predictions on the test set. In our implementation, each framework is an architecture, which defines a search space. To evaluate a task given by the benchmark on a specific tuner, we let the tuner continuously tune the hyperparameters (by giving it cross-validation score on the train data as feedback) until the time or trial limit is reached. Then, the architecture is retrained on the entire train set using the best set of hyperparameters.
......@@ -153,7 +148,7 @@ By default, the script runs the specified tuners against the specified benchmark
all tuners simultaneously in the background, set the "serialize" flag to false in ``runbenchmark_nni.sh``.
Note: the SMAC tuner, DNGO tuner, and the BOHB advisor has to be manually installed before running benchmarks on them.
Please refer to `this page <https://nni.readthedocs.io/en/stable/Tuner/BuiltinTuner.html?highlight=nni>`_ for more details
Please refer to :doc:`this page <tuners>` for more details
on installation.
Run customized benchmarks on existing tuners
......@@ -168,10 +163,10 @@ Run benchmarks on custom tuners
You may also use the benchmark to compare a custom tuner written by yourself with the NNI built-in tuners. To use custom
tuners, first make sure that the tuner inherits from ``nni.tuner.Tuner`` and correctly implements the required APIs. For
more information on implementing a custom tuner, please refer to `here <https://nni.readthedocs.io/en/stable/Tuner/CustomizeTuner.html>`_.
more information on implementing a custom tuner, please refer to :doc:`here <custom_algorithm>`.
Next, perform the following steps:
#. Install the custom tuner via the command ``nnictl algo register``. Check `this document <https://nni.readthedocs.io/en/stable/Tutorial/Nnictl.html>`_ for details.
#. Install the custom tuner via the command ``nnictl algo register``. Check :doc:`this document <../reference/nnictl>` for details.
#. In ``./nni/frameworks.yaml``\ , add a new framework extending the base framework NNI. Make sure that the parameter ``tuner_type`` corresponds to the "builtinName" of tuner installed in step 1.
#. Run the following command
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment