NNI has supported many training services listed below. Users can go through each page to learning how to configure the corresponding training service. NNI has high extensibility by design, users can customize new training service for their special resource, platform or needs.
.. list-table::
:header-rows: 1
* - Training Service
- Description
* - :doc:`Local <local>`
- The whole experiment runs on your dev machine (i.e., a single local machine)
* - :doc:`Remote <remote>`
- The trials are dispatched to your configured SSH servers
* - :doc:`OpenPAI <openpai>`
- Running trials on OpenPAI, a DNN model training platform based on Kubernetes
* - :doc:`Kubeflow <kubeflow>`
- Running trials with Kubeflow, a DNN model training framework based on Kubernetes
* - :doc:`AdaptDL <adaptdl>`
- Running trials on AdaptDL, an elastic DNN model training platform
- Running trials with FrameworkController, a DNN model training framework on Kubernetes
* - :doc:`AML <aml>`
- Running trials on Azure Machine Learning (AML) cloud service
* - :doc:`PAI-DLC <paidlc>`
- Running trials on PAI-DLC, which is deep learning containers based on Alibaba ACK
* - :doc:`Hybrid <hybrid>`
- Support jointly using multiple above training services
.. _training-service-reuse:
Training Service Under Reuse Mode
---------------------------------
Since NNI v2.0, there are two sets of training service implementations in NNI. The new one is called *reuse mode*. When reuse mode is enabled, a cluster, such as a remote machine or a computer instance on AML, will launch a long-running environment, so that NNI will submit trials to these environments iteratively, which saves the time to create new jobs. For instance, using OpenPAI training platform under reuse mode can avoid the overhead of pulling docker images, creating containers, and downloading data repeatedly.
.. note:: In the reuse mode, users need to make sure each trial can run independently in the same job (e.g., avoid loading checkpoints from previous trials).
NNI supports running an experiment on `PAI-DSW <https://help.aliyun.com/document_detail/194831.html>`__ , submit trials to `PAI-DLC <https://help.aliyun.com/document_detail/165137.html>`__ called dlc mode.
NNI supports running an experiment on `PAI-DSW <https://help.aliyun.com/document_detail/194831.html>`__ , submit trials to `PAI-DLC <https://help.aliyun.com/document_detail/165137.html>`__ which is deep learning containers based on Alibaba ACK.
PAI-DSW server performs the role to submit a job while PAI-DLC is where the training job runs.
PAI-DSW server performs the role to submit a job while PAI-DLC is where the training job runs.
Setup environment
Prerequisite
-----------------
------------
Step 1. Install NNI, follow the install guide `here <../Tutorial/QuickStart.rst>`__.
Step 1. Install NNI, follow the :doc:`install guide </installation>`.
Step 2. Create PAI-DSW server following this `link <https://help.aliyun.com/document_detail/163684.html?section-2cw-lsi-es9#title-ji9-re9-88x>`__. Note as the training service will be run on PAI-DLC, it won't cost many resources to run and you may just need a PAI-DSW server with CPU.
Step 2. Create PAI-DSW server following this `link <https://help.aliyun.com/document_detail/163684.html?section-2cw-lsi-es9#title-ji9-re9-88x>`__. Note as the training service will be run on PAI-DLC, it won't cost many resources to run and you may just need a PAI-DSW server with CPU.
...
@@ -24,8 +24,8 @@ Step 4. Open your PAI-DSW server command line, download and install PAI-DLC pyth
...
@@ -24,8 +24,8 @@ Step 4. Open your PAI-DSW server command line, download and install PAI-DLC pyth
pip install ./pai-dlc-20201203 # pai-dlc-20201203 refer to unzipped sdk file name, replace it accordingly.
pip install ./pai-dlc-20201203 # pai-dlc-20201203 refer to unzipped sdk file name, replace it accordingly.
Run an experiment
Usage
-----------------
-----
Use ``examples/trials/mnist-pytorch`` as an example. The NNI config YAML file's content is like:
Use ``examples/trials/mnist-pytorch`` as an example. The NNI config YAML file's content is like:
...
@@ -60,7 +60,7 @@ Use ``examples/trials/mnist-pytorch`` as an example. The NNI config YAML file's
...
@@ -60,7 +60,7 @@ Use ``examples/trials/mnist-pytorch`` as an example. The NNI config YAML file's
Note: You should set ``platform: dlc`` in NNI config YAML file if you want to start experiment in dlc mode.
Note: You should set ``platform: dlc`` in NNI config YAML file if you want to start experiment in dlc mode.
Compared with `LocalMode <LocalMode.rst>`__ training service configuration in dlc mode have these additional keys like ``type/image/jobType/podCount/ecsSpec/region/nasDataSourceId/accessKeyId/accessKeySecret``, for detailed explanation ref to this `link <https://help.aliyun.com/document_detail/203111.html#h2-url-3>`__.
Compared with :doc:`local`, training service configuration in dlc mode have these additional keys like ``type/image/jobType/podCount/ecsSpec/region/nasDataSourceId/accessKeyId/accessKeySecret``, for detailed explanation ref to this `link <https://help.aliyun.com/document_detail/203111.html#h2-url-3>`__.
Also, as dlc mode requires DSW/DLC to mount the same NAS disk to share information, there are two extra keys related to this: ``localStorageMountPoint`` and ``containerStorageMountPoint``.
Also, as dlc mode requires DSW/DLC to mount the same NAS disk to share information, there are two extra keys related to this: ``localStorageMountPoint`` and ``containerStorageMountPoint``.
...
@@ -78,6 +78,6 @@ Run the following commands to start the example experiment:
...
@@ -78,6 +78,6 @@ Run the following commands to start the example experiment:
Replace ``${NNI_VERSION}`` with a released version name or branch name, e.g., ``v2.3``.
Replace ``${NNI_VERSION}`` with a released version name or branch name, e.g., ``v2.3``.
Monitor your job
Monitor your job
----------------
^^^^^^^^^^^^^^^^
To monitor your job on DLC, you need to visit `DLC <https://pai-dlc.console.aliyun.com/#/jobs>`__ to check job status.
To monitor your job on DLC, you need to visit `DLC <https://pai-dlc.console.aliyun.com/#/jobs>`__ to check job status.
NNI can run one experiment on multiple remote machines through SSH, called ``remote`` mode. It's like a lightweight training platform. In this mode, NNI can be started from your computer, and dispatch trials to remote machines in parallel.
The OS of remote machines supports ``Linux``\ , ``Windows 10``\ , and ``Windows Server 2019``.
Prerequisite
------------
1. Make sure the default environment of remote machines meets requirements of your trial code. If the default environment does not meet the requirements, the setup script can be added into ``command`` field of NNI config.
2. Make sure remote machines can be accessed through SSH from the machine which runs ``nnictl`` command. It supports both password and key authentication of SSH. For advanced usage, please refer to :ref:`reference-remote-config-label` in reference for detailed usage.
3. Make sure the NNI version on each machine is consistent. Follow the install guide :doc:`here </installation>` to install NNI.
4. Make sure the command of Trial is compatible with remote OSes, if you want to use remote Linux and Windows together. For example, the default python 3.x executable called ``python3`` on Linux, and ``python`` on Windows.
In addition, there are several steps for Windows server.
1. Install and start ``OpenSSH Server``.
1) Open ``Settings`` app on Windows.
2) Click ``Apps``\ , then click ``Optional features``.
3) Click ``Add a feature``\ , search and select ``OpenSSH Server``\ , and then click ``Install``.
4) Once it's installed, run below command to start and set to automatic start.
.. code-block:: bat
sc config sshd start=auto
net start sshd
2. Make sure remote account is administrator, so that it can stop running trials.
3. Make sure there is no welcome message more than default, since it causes ssh2 failed in NodeJs. For example, if you're using Data Science VM on Azure, it needs to remove extra echo commands in ``C:\dsvm\tools\setup\welcome.bat``.
The output like below is ok, when opening a new command window.
.. code-block:: text
Microsoft Windows [Version 10.0.17763.1192]
(c) 2018 Microsoft Corporation. All rights reserved.
(py37_default) C:\Users\AzureUser>
Usage
-----
Use ``examples/trials/mnist-pytorch`` as the example. Suppose there are two machines, which can be logged in with username and password or key authentication of SSH. Here is a template configuration specification.
.. code-block:: yaml
searchSpaceFile: search_space.json
trialCommand: python3 mnist.py
trialGpuNumber: 0
trialConcurrency: 4
maxTrialNumber: 20
tuner:
name: TPE
classArgs:
optimize_mode: maximize
trainingService:
platform: remote
machineList:
- host: 192.0.2.1
user: alice
ssh_key_file: ~/.ssh/id_rsa
- host: 192.0.2.2
port: 10022
user: bob
password: bob123
The example configuration is saved in ``examples/trials/mnist-pytorch/config_remote.yml``.
You can run below command on Windows, Linux, or macOS to spawn trials on remote Linux machines:
.. Note:: If you are planning to use remote machines or clusters as your training service, to avoid too much pressure on network, NNI limits the number of files to 2000 and total size to 300MB. If your trial code directory contains too many files, you can choose which files and subfolders should be excluded by adding a ``.nniignore`` file that works like a ``.gitignore`` file. For more details on how to write this file, see the `git documentation <https://git-scm.com/docs/gitignore#_pattern_format>`__.
*Example:* :githublink:`config_detailed.yml <examples/trials/mnist-pytorch/config_detailed.yml>` and :githublink:`.nniignore <examples/trials/mnist-pytorch/.nniignore>`
More features
-------------
Configure python environment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
By default, commands and scripts will be executed in the default environment in remote machine. If there are multiple python virtual environments in your remote machine, and you want to run experiments in a specific environment, then use **pythonPath** to specify a python environment on your remote machine.
For example, with anaconda you can specify:
.. code-block:: yaml
pythonPath: /home/bob/.conda/envs/ENV-NAME/bin
Configure shared storage
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Remote training service support shared storage, which can help use your own storage during using NNI. Follow the guide `here <./shared_storage.rst>`__ to learn how to use shared storage.
Monitor via TensorBoard
^^^^^^^^^^^^^^^^^^^^^^^
Remote training service support trial visualization via TensorBoard. Follow the guide :doc:`/experiment/web_portal/tensorboard` to learn how to use TensorBoard.
If you want to use your own storage during using NNI, shared storage can satisfy you.
If you want to use your own storage during using NNI, shared storage can satisfy you.
Instead of using training service native storage, shared storage can bring you more convenience.
Instead of using training service native storage, shared storage can bring you more convenience.
...
@@ -7,7 +7,7 @@ All the information generated by the experiment will be stored under ``/nni`` fo
...
@@ -7,7 +7,7 @@ All the information generated by the experiment will be stored under ``/nni`` fo
All the output produced by the trial will be located under ``/nni/{EXPERIMENT_ID}/trials/{TRIAL_ID}/nnioutput`` folder in your shared storage.
All the output produced by the trial will be located under ``/nni/{EXPERIMENT_ID}/trials/{TRIAL_ID}/nnioutput`` folder in your shared storage.
This saves you from finding for experiment-related information in various places.
This saves you from finding for experiment-related information in various places.
Remember that your trial working directory is ``/nni/{EXPERIMENT_ID}/trials/{TRIAL_ID}``, so if you upload your data in this shared storage, you can open it like a local file in your trial code without downloading it.
Remember that your trial working directory is ``/nni/{EXPERIMENT_ID}/trials/{TRIAL_ID}``, so if you upload your data in this shared storage, you can open it like a local file in your trial code without downloading it.
And we will develop more practical features in the future based on shared storage. The config reference can be found `here <../reference/experiment_config.html#sharedstorageconfig>`_.
And we will develop more practical features in the future based on shared storage. The config reference can be found :ref:`here <reference-sharedstorage-config-label>`.
.. note::
.. note::
Shared storage is currently in the experimental stage. We suggest use AzureBlob under Ubuntu/CentOS/RHEL, and NFS under Ubuntu/CentOS/RHEL/Fedora/Debian for remote.
Shared storage is currently in the experimental stage. We suggest use AzureBlob under Ubuntu/CentOS/RHEL, and NFS under Ubuntu/CentOS/RHEL/Fedora/Debian for remote.
...
@@ -43,8 +43,8 @@ If you want to use AzureBlob, add below to your config. Full config file see :gi
...
@@ -43,8 +43,8 @@ If you want to use AzureBlob, add below to your config. Full config file see :gi
You can find ``storageAccountName``, ``storageAccountKey``, ``containerName`` on azure storage account portal.
You can find ``storageAccountName``, ``storageAccountKey``, ``containerName`` on azure storage account portal.
.. image:: ../../img/azure_storage.png
.. image:: ../../../img/azure_storage.png
:target: ../../img/azure_storage.png
:target: ../../../img/azure_storage.png
:alt:
:alt:
If you want to use NFS, add below to your config. Full config file see :githublink:`mnist-sharedstorage/config_nfs.yml <examples/trials/mnist-sharedstorage/config_nfs.yml>`.
If you want to use NFS, add below to your config. Full config file see :githublink:`mnist-sharedstorage/config_nfs.yml <examples/trials/mnist-sharedstorage/config_nfs.yml>`.
You can launch a tensorboard process cross one or multi trials within webui since NNI v2.2. This feature supports local training service and reuse mode training service with shared storage for now, and will support more scenarios in later nni version.
You can launch a tensorboard process cross one or multi trials within webportal since NNI v2.2. This feature supports local training service and reuse mode training service with shared storage for now, and will support more scenarios in later nni version.
Preparation
Preparation
-----------
-----------
...
@@ -11,8 +12,8 @@ Make sure tensorboard installed in your environment. If you never used tensorboa
...
@@ -11,8 +12,8 @@ Make sure tensorboard installed in your environment. If you never used tensorboa
Use WebUI Launch Tensorboard
Use WebUI Launch Tensorboard
----------------------------
----------------------------
1. Save Logs
Save Logs
^^^^^^^^^^^^
^^^^^^^^^
NNI will automatically fetch the ``tensorboard`` subfolder under trial's output folder as tensorboard logdir. So in trial's source code, you need to save the tensorboard logs under ``NNI_OUTPUT_DIR/tensorboard``. This log path can be joined as:
NNI will automatically fetch the ``tensorboard`` subfolder under trial's output folder as tensorboard logdir. So in trial's source code, you need to save the tensorboard logs under ``NNI_OUTPUT_DIR/tensorboard``. This log path can be joined as:
...
@@ -20,32 +21,35 @@ NNI will automatically fetch the ``tensorboard`` subfolder under trial's output
...
@@ -20,32 +21,35 @@ NNI will automatically fetch the ``tensorboard`` subfolder under trial's output
Like compare, select the trials you want to combine to launch the tensorboard at first, then click the ``Tensorboard`` button.
* Like compare, select the trials you want to combine to launch the tensorboard at first, then click the ``Tensorboard`` button.
.. image:: ../../img/Tensorboard_1.png
.. image:: ../../../img/Tensorboard_1.png
:target: ../../img/Tensorboard_1.png
:target: ../../../img/Tensorboard_1.png
:alt:
:alt:
After click the ``OK`` button in the pop-up box, you will jump to the tensorboard portal.
* After click the ``OK`` button in the pop-up box, you will jump to the tensorboard portal.
.. image:: ../../img/Tensorboard_2.png
.. image:: ../../../img/Tensorboard_2.png
:target: ../../img/Tensorboard_2.png
:target: ../../../img/Tensorboard_2.png
:alt:
:alt:
You can see the ``SequenceID-TrialID`` on the tensorboard portal.
* You can see the ``SequenceID-TrialID`` on the tensorboard portal.
.. image:: ../../img/Tensorboard_3.png
.. image:: ../../../img/Tensorboard_3.png
:target: ../../img/Tensorboard_3.png
:target: ../../../img/Tensorboard_3.png
:alt:
:alt:
3. Stop All
Stop All
^^^^^^^^^^^^
^^^^^^^^
If you want to open the portal you have already launched, click the tensorboard id. If you don't need the tensorboard anymore, click ``Stop all tensorboard`` button.
If you want to open the portal you have already launched, click the tensorboard id. If you don't need the tensorboard anymore, click ``Stop all tensorboard`` button.
Web portal is for users to conveniently visualize their NNI experiments, tuning and training progress, detailed metrics, and error logs. Web portal also allows users to control their NNI experiments, trials, such as updating an experiment of its concurrency, duration, rerunning trials.
.. image:: ../../../static/img/webui.gif
:width: 100%
Q&A
---
There are many trials in the detail table but ``Default Metric`` chart is empty
* Download the experiment results(``experiment config``, ``trial message`` and ``intermeidate metrics``) from ``Experiment summary`` and then upload these results in your issue.
.. image:: ../../../img/webui-img/summary.png
:target: ../../../img/webui-img/summary.png
:alt: summary
What should you do when your experiment has error
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Click the icon in the right of ``experiment status`` and screenshot the error message.
* And then click the ``learn about`` to download ``nni-manager`` and ``dispatcher`` logfile.
* Please file an issue from the `Feedback` in the `About` and upload above message.
* ``Log model`` will help you find the error reason. There are three buttons ``View trial log``, ``View trial error`` and ``View trial stdout`` on local mode. If you run on the OpenPAI or Kubeflow platform, you could see trial stdout and nfs log.
If you have any question you could tell us in the issue.
* On the overview tab, you can see the experiment information and status and the performance of ``top trials``.
.. image:: ../../../img/webui-img/full-oview.png
:target: ../../../img/webui-img/full-oview.png
:alt: overview
* If you want to see experiment search space and config, please click the right button ``Search space`` and ``Config`` (when you hover on this button).
**Search space file:**
.. image:: ../../../img/webui-img/searchSpace.png
:target: ../../../img/webui-img/searchSpace.png
:alt: searchSpace
**Config file:**
.. image:: ../../../img/webui-img/config.png
:target: ../../../img/webui-img/config.png
:alt: config
* You can view and download ``nni-manager/dispatcher log files`` on here.
.. image:: ../../../img/webui-img/review-log.png
:target: ../../../img/webui-img/review-log.png
:alt: logfile
* If your experiment has many trials, you can change the refresh interval here.
The trial may have many intermediate results in the training process. In order to see the trend of some trials more clearly, we set a filtering function for the intermediate result chart.
You may find that these trials will get better or worse at an intermediate result. This indicates that it is an important and relevant intermediate result. To take a closer look at the point here, you need to enter its corresponding X-value at #Intermediate. Then input the range of metrics on this intermedia result. In the picture below, we choose the No. 4 intermediate result and set the range of metrics to 0.8-1.
* The button named ``Add column`` can select which column to show on the table. If you run an experiment whose final result is a dict, you can see other keys in the table. You can choose the column ``Intermediate count`` to watch the trial's progress.
.. image:: ../../../img/webui-img/addColumn.png
:target: ../../../img/webui-img/addColumn.png
:alt: addColumnGraph
* If you want to compare some trials, you can select them and then click ``Compare`` to see the results.
We are glad to announce the alpha release for Feature Engineering toolkit on top of NNI, it's still in the experiment phase which might evolve based on user feedback. We'd like to invite you to use, feedback and even contribute.
.. note::
We are glad to announce the alpha release for Feature Engineering toolkit on top of NNI, it's still in the experiment phase which might evolve based on user feedback. We'd like to invite you to use, feedback and even contribute.
For now, we support the following feature selector:
For now, we support the following feature selector:
@@ -122,4 +125,69 @@ More detail example you could see:
...
@@ -122,4 +125,69 @@ More detail example you could see:
Write a more advanced automl algorithm
Write a more advanced automl algorithm
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called ``advisor`` which directly inherits from ``MsgDispatcherBase`` in :githublink:`msg_dispatcher_base.py <nni/runtime/msg_dispatcher_base.py>`. Please refer to `here <CustomizeAdvisor.rst>`__ for how to write a customized advisor.
The methods above are usually enough to write a general tuner. However, users may also want more methods, for example, intermediate results, trials' state (e.g., the methods in assessor), in order to have a more powerful automl algorithm. Therefore, we have another concept called ``advisor`` which directly inherits from ``MsgDispatcherBase`` in :githublink:`msg_dispatcher_base.py <nni/runtime/msg_dispatcher_base.py>`.
Customize Assessor
------------------
NNI supports to build an assessor by yourself for tuning demand.
If you want to implement a customized Assessor, there are three things to do:
#. Inherit the base Assessor class
#. Implement assess_trial function
#. Configure your customized Assessor in experiment YAML config file
**1. Inherit the base Assessor class**
.. code-block:: python
from nni.assessor import Assessor
class CustomizedAssessor(Assessor):
def __init__(self, *args, **kwargs):
...
**2. Implement assess trial function**
.. code-block:: python
from nni.assessor import Assessor, AssessResult
class CustomizedAssessor(Assessor):
def __init__(self, *args, **kwargs):
...
def assess_trial(self, trial_history):
"""
Determines whether a trial should be killed. Must override.
trial_history: a list of intermediate result objects.
Returns AssessResult.Good or AssessResult.Bad.
"""
# you code implement here.
...
**3. Configure your customized Assessor in experiment YAML config file**
NNI needs to locate your customized Assessor class and instantiate the class, so you need to specify the location of the customized Assessor class and pass literal values as parameters to the __init__ constructor.
.. code-block:: yaml
assessor:
codeDir: /home/abc/myassessor
classFileName: my_customized_assessor.py
className: CustomizedAssessor
# Any parameter need to pass to your Assessor class __init__ constructor
# can be specified in this optional classArgs field, for example
classArgs:
arg1: value1
Please noted in **2**. The object ``trial_history`` are exact the object that Trial send to Assessor by using SDK ``report_intermediate_result`` function.
The working directory of your assessor is ``<home>/nni-experiments/<experiment_id>/log``\ , which can be retrieved with environment variable ``NNI_LOG_DIRECTORY``\ ,
NNI provides a lot of `builtin tuners <../Tuner/BuiltinTuner.rst>`_, `advisors <../Tuner/HyperbandAdvisor.rst>`__ and `assessors <../Assessor/BuiltinAssessor.rst>`__ can be used directly for Hyper Parameter Optimization, and some extra algorithms can be registered via ``nnictl algo register --meta <path_to_meta_file>`` after NNI is installed. You can check builtin algorithms via ``nnictl algo list`` command.
NNI provides a lot of :doc:`builtin tuners <tuners>`, and :doc:`assessors <assessors>` can be used directly for Hyper Parameter Optimization, and some extra algorithms can be registered via ``nnictl algo register --meta <path_to_meta_file>`` after NNI is installed. You can check builtin algorithms via ``nnictl algo list`` command.
NNI also provides the ability to build your own customized tuners, advisors and assessors. To use the customized algorithm, users can simply follow the spec in experiment config file to properly reference the algorithm, which has been illustrated in the tutorials of `customized tuners <../Tuner/CustomizeTuner.rst>`_ / `advisors <../Tuner/CustomizeAdvisor.rst>`__ / `assessors <../Assessor/CustomizeAssessor.rst>`__.
NNI also provides the ability to build your own customized tuners, advisors and assessors. To use the customized algorithm, users can simply follow the spec in experiment config file to properly reference the algorithm, which has been illustrated in the tutorials of :doc:`customized algorithms <custom_algorithm>`.
NNI also allows users to install the customized algorithm as a builtin algorithm, in order for users to use the algorithm in the same way as NNI builtin tuners/advisors/assessors. More importantly, it becomes much easier for users to share or distribute their implemented algorithm to others. Customized tuners/advisors/assessors can be installed into NNI as builtin algorithms, once they are installed into NNI, you can use your customized algorithms the same way as builtin tuners/advisors/assessors in your experiment configuration file. For example, you built a customized tuner and installed it into NNI using a builtin name ``mytuner``, then you can use this tuner in your configuration file like below:
NNI also allows users to install the customized algorithm as a builtin algorithm, in order for users to use the algorithm in the same way as NNI builtin tuners/advisors/assessors. More importantly, it becomes much easier for users to share or distribute their implemented algorithm to others. Customized tuners/advisors/assessors can be installed into NNI as builtin algorithms, once they are installed into NNI, you can use your customized algorithms the same way as builtin tuners/advisors/assessors in your experiment configuration file. For example, you built a customized tuner and installed it into NNI using a builtin name ``mytuner``, then you can use this tuner in your configuration file like below:
...
@@ -18,20 +15,15 @@ NNI also allows users to install the customized algorithm as a builtin algorithm
...
@@ -18,20 +15,15 @@ NNI also allows users to install the customized algorithm as a builtin algorithm
tuner:
tuner:
builtinTunerName: mytuner
builtinTunerName: mytuner
Register customized algorithms as builtin tuners, assessors and advisors
Register customized algorithms like builtin tuners, assessors and advisors
All that needs to be modified is to delete ``NNI Package :: tuner`` metadata in ``setup.py`` and add a meta file mentioned in `4. Prepare meta file`_. Then you can follow `Register customized algorithms as builtin tuners, assessors and advisors`_ to register your customized algorithms.
All that needs to be modified is to delete ``NNI Package :: tuner`` metadata in ``setup.py`` and add a meta file mentioned in `4. Prepare meta file`_.
Then you can follow `Register customized algorithms like builtin tuners, assessors and advisors`_ to register your customized algorithms.
Example: Register a customized tuner as a builtin tuner
Example: Register a customized tuner as a builtin tuner
HPO Benchmark Example Statistics <hpo_benchmark_stats>
We provide a benchmarking tool to compare the performances of tuners provided by NNI (and users' custom tuners) on different
We provide a benchmarking tool to compare the performances of tuners provided by NNI (and users' custom tuners) on different
types of tasks. This tool uses the `automlbenchmark repository <https://github.com/openml/automlbenchmark)>`_ to run different *benchmarks* on the NNI *tuners*.
types of tasks. This tool uses the `automlbenchmark repository <https://github.com/openml/automlbenchmark>`_ to run different *benchmarks* on the NNI *tuners*.
The tool is located in ``examples/trials/benchmarking/automlbenchmark``. This document provides a brief introduction to the tool, its usage, and currently available benchmarks.
The tool is located in ``examples/trials/benchmarking/automlbenchmark``. This document provides a brief introduction to the tool, its usage, and currently available benchmarks.
Overview and Terminologies
Overview and Terminologies
...
@@ -29,7 +24,7 @@ and handle the repeated trial-evaluate-feedback loop in the **framework** abstra
...
@@ -29,7 +24,7 @@ and handle the repeated trial-evaluate-feedback loop in the **framework** abstra
contains two main components: a **benchmark** from the automlbenchmark library, and an **architecture** which defines the search
contains two main components: a **benchmark** from the automlbenchmark library, and an **architecture** which defines the search
space and the evaluator. To further clarify, we provide the definition for the terminologies used in this document.
space and the evaluator. To further clarify, we provide the definition for the terminologies used in this document.
* **tuner**\ : a `tuner or advisor provided by NNI <https://nni.readthedocs.io/en/stable/builtin_tuner.html>`_, or a custom tuner provided by the user.
* **tuner**\ : a :doc:`tuner or advisor provided by NNI <tuners>`, or a custom tuner provided by the user.
* **task**\ : an abstraction used by automlbenchmark. A task can be thought of as a tuple (dataset, metric). It provides train and test datasets to the frameworks. Then, based on the returns predictions on the test set, the task evaluates the metric (e.g., mse for regression, f1 for classification) and reports the score.
* **task**\ : an abstraction used by automlbenchmark. A task can be thought of as a tuple (dataset, metric). It provides train and test datasets to the frameworks. Then, based on the returns predictions on the test set, the task evaluates the metric (e.g., mse for regression, f1 for classification) and reports the score.
* **benchmark**\ : an abstraction used by automlbenchmark. A benchmark is a set of tasks, along with other external constraints such as time limits.
* **benchmark**\ : an abstraction used by automlbenchmark. A benchmark is a set of tasks, along with other external constraints such as time limits.
* **framework**\ : an abstraction used by automlbenchmark. Given a task, a framework solves the proposed regression or classification problem using train data and produces predictions on the test set. In our implementation, each framework is an architecture, which defines a search space. To evaluate a task given by the benchmark on a specific tuner, we let the tuner continuously tune the hyperparameters (by giving it cross-validation score on the train data as feedback) until the time or trial limit is reached. Then, the architecture is retrained on the entire train set using the best set of hyperparameters.
* **framework**\ : an abstraction used by automlbenchmark. Given a task, a framework solves the proposed regression or classification problem using train data and produces predictions on the test set. In our implementation, each framework is an architecture, which defines a search space. To evaluate a task given by the benchmark on a specific tuner, we let the tuner continuously tune the hyperparameters (by giving it cross-validation score on the train data as feedback) until the time or trial limit is reached. Then, the architecture is retrained on the entire train set using the best set of hyperparameters.
...
@@ -153,7 +148,7 @@ By default, the script runs the specified tuners against the specified benchmark
...
@@ -153,7 +148,7 @@ By default, the script runs the specified tuners against the specified benchmark
all tuners simultaneously in the background, set the "serialize" flag to false in ``runbenchmark_nni.sh``.
all tuners simultaneously in the background, set the "serialize" flag to false in ``runbenchmark_nni.sh``.
Note: the SMAC tuner, DNGO tuner, and the BOHB advisor has to be manually installed before running benchmarks on them.
Note: the SMAC tuner, DNGO tuner, and the BOHB advisor has to be manually installed before running benchmarks on them.
Please refer to `this page <https://nni.readthedocs.io/en/stable/Tuner/BuiltinTuner.html?highlight=nni>`_ for more details
Please refer to :doc:`this page <tuners>` for more details
on installation.
on installation.
Run customized benchmarks on existing tuners
Run customized benchmarks on existing tuners
...
@@ -168,10 +163,10 @@ Run benchmarks on custom tuners
...
@@ -168,10 +163,10 @@ Run benchmarks on custom tuners
You may also use the benchmark to compare a custom tuner written by yourself with the NNI built-in tuners. To use custom
You may also use the benchmark to compare a custom tuner written by yourself with the NNI built-in tuners. To use custom
tuners, first make sure that the tuner inherits from ``nni.tuner.Tuner`` and correctly implements the required APIs. For
tuners, first make sure that the tuner inherits from ``nni.tuner.Tuner`` and correctly implements the required APIs. For
more information on implementing a custom tuner, please refer to `here <https://nni.readthedocs.io/en/stable/Tuner/CustomizeTuner.html>`_.
more information on implementing a custom tuner, please refer to :doc:`here <custom_algorithm>`.
Next, perform the following steps:
Next, perform the following steps:
#. Install the custom tuner via the command ``nnictl algo register``. Check `this document <https://nni.readthedocs.io/en/stable/Tutorial/Nnictl.html>`_ for details.
#. Install the custom tuner via the command ``nnictl algo register``. Check :doc:`this document <../reference/nnictl>` for details.
#. In ``./nni/frameworks.yaml``\ , add a new framework extending the base framework NNI. Make sure that the parameter ``tuner_type`` corresponds to the "builtinName" of tuner installed in step 1.
#. In ``./nni/frameworks.yaml``\ , add a new framework extending the base framework NNI. Make sure that the parameter ``tuner_type`` corresponds to the "builtinName" of tuner installed in step 1.