the source code of NNI for DCU

1011377c · qianyj · abc22158 · 1011377c · 1011377c · 1011377c
Commit 1011377c authored Mar 31, 2022 by qianyj
20 changed files
--- a/docs/en_US/_templates/nnSpider/crying.html
+++ b/docs/en_US/_templates/nnSpider/crying.html
+{% extends "!layout.html" %}
+{% set title = "Welcome To Neural Network Intelligence !!!"%}
+{% block document %}
+<h2>Crying</h2>
+<div class="details-container">
+    <img src="../_static/img/Crying.png" alt="Crying" />
+</div>
+{% endblock %}
\ No newline at end of file
--- a/docs/en_US/_templates/nnSpider/cut.html
+++ b/docs/en_US/_templates/nnSpider/cut.html
+{% extends "!layout.html" %}
+{% set title = "Welcome To Neural Network Intelligence !!!"%}
+{% block document %}
+<h2>Cut</h2>
+<div class="details-container">
+    <img src="../_static/img/Cut.png" alt="Cut" />
+</div>
+{% endblock %}
\ No newline at end of file
--- a/docs/en_US/_templates/nnSpider/errorEmotion.html
+++ b/docs/en_US/_templates/nnSpider/errorEmotion.html
+{% extends "!layout.html" %}
+{% set title = "Welcome To Neural Network Intelligence !!!"%}
+{% block document %}
+<h2>Error</h2>
+<div class="details-container">
+    <img src="../_static/img/Error.png" alt="Error" />
+</div>
+{% endblock %}
\ No newline at end of file
--- a/docs/en_US/_templates/nnSpider/holiday.html
+++ b/docs/en_US/_templates/nnSpider/holiday.html
+{% extends "!layout.html" %}
+{% set title = "Welcome To Neural Network Intelligence !!!"%}
+{% block document %}
+<h2>Holiday</h2>
+<div class="details-container">
+    <img src="../_static/img/Holiday.png" alt="NoBug" />
+</div>
+{% endblock %}
\ No newline at end of file
--- a/docs/en_US/_templates/nnSpider/nobug.html
+++ b/docs/en_US/_templates/nnSpider/nobug.html
+{% extends "!layout.html" %}
+{% set title = "Welcome To Neural Network Intelligence !!!"%}
+{% block document %}
+<h2>NoBug</h2>
+<div class="details-container">
+    <img src="../_static/img/NoBug.png" alt="NoBug" />
+</div>
+{% endblock %}
\ No newline at end of file
--- a/docs/en_US/_templates/nnSpider/sign.html
+++ b/docs/en_US/_templates/nnSpider/sign.html
+{% extends "!layout.html" %}
+{% set title = "Welcome To Neural Network Intelligence !!!"%}
+{% block document %}
+<h2>Sign</h2>
+<div class="details-container">
+    <img src="../_static/img/Sign.png" alt="Sign" />
+</div>
+{% endblock %}
\ No newline at end of file
--- a/docs/en_US/_templates/nnSpider/sweat.html
+++ b/docs/en_US/_templates/nnSpider/sweat.html
+{% extends "!layout.html" %}
+{% set title = "Welcome To Neural Network Intelligence !!!"%}
+{% block document %}
+<h2>Sweat</h2>
+<div class="details-container">
+    <img src="../_static/img/Sweat.png" alt="Sweat" />
+</div>
+{% endblock %}
\ No newline at end of file
--- a/docs/en_US/_templates/nnSpider/weaving.html
+++ b/docs/en_US/_templates/nnSpider/weaving.html
+{% extends "!layout.html" %}
+{% set title = "Welcome To Neural Network Intelligence !!!"%}
+{% block document %}
+<h2>Weaving</h2>
+<div class="details-container">
+    <img src="../_static/img/Weaving.png" alt="Weaving" />
+</div>
+{% endblock %}
\ No newline at end of file
--- a/docs/en_US/_templates/nnSpider/working.html
+++ b/docs/en_US/_templates/nnSpider/working.html
+{% extends "!layout.html" %}
+{% set title = "Welcome To Neural Network Intelligence !!!"%}
+{% block document %}
+<h2>Working</h2>
+<div class="details-container">
+    <img src="../_static/img/Working.png" alt="Working" />
+</div>
+{% endblock %}
\ No newline at end of file
--- a/docs/en_US/autotune_ref.rst
+++ b/docs/en_US/autotune_ref.rst
+Python API Reference of Auto Tune
+=================================
+.. contents::
+Trial
+-----
+..  autofunction:: nni.get_next_parameter
+..  autofunction:: nni.get_current_parameter
+..  autofunction:: nni.report_intermediate_result
+..  autofunction:: nni.report_final_result
+..  autofunction:: nni.get_experiment_id
+..  autofunction:: nni.get_trial_id
+..  autofunction:: nni.get_sequence_id
+Tuner
+-----
+..  autoclass:: nni.tuner.Tuner
+    :members:
+..  autoclass:: nni.algorithms.hpo.tpe_tuner.TpeTuner
+    :members:
+..  autoclass:: nni.algorithms.hpo.random_tuner.RandomTuner
+    :members:
+..  autoclass:: nni.algorithms.hpo.hyperopt_tuner.HyperoptTuner
+    :members:
+..  autoclass:: nni.algorithms.hpo.evolution_tuner.EvolutionTuner
+    :members:
+..  autoclass:: nni.algorithms.hpo.smac_tuner.SMACTuner
+    :members:
+..  autoclass:: nni.algorithms.hpo.gridsearch_tuner.GridSearchTuner
+    :members:
+..  autoclass:: nni.algorithms.hpo.networkmorphism_tuner.NetworkMorphismTuner
+    :members:
+..  autoclass:: nni.algorithms.hpo.metis_tuner.MetisTuner
+    :members:
+..  autoclass:: nni.algorithms.hpo.ppo_tuner.PPOTuner
+    :members:
+..  autoclass:: nni.algorithms.hpo.batch_tuner.BatchTuner
+    :members:
+..  autoclass:: nni.algorithms.hpo.gp_tuner.GPTuner
+    :members:
+Assessor
+--------
+..  autoclass:: nni.assessor.Assessor
+    :members:
+..  autoclass:: nni.assessor.AssessResult
+    :members:
+..  autoclass:: nni.algorithms.hpo.curvefitting_assessor.CurvefittingAssessor
+    :members:
+..  autoclass:: nni.algorithms.hpo.medianstop_assessor.MedianstopAssessor
+    :members:
+Advisor
+-------
+..  autoclass:: nni.runtime.msg_dispatcher_base.MsgDispatcherBase
+    :members:
+..  autoclass:: nni.algorithms.hpo.hyperband_advisor.Hyperband
+    :members:
+..  autoclass:: nni.algorithms.hpo.bohb_advisor.BOHB
+    :members:
+Utilities
+---------
+..  autofunction:: nni.utils.merge_parameter
+..  autofunction:: nni.trace
+..  autofunction:: nni.dump
+..  autofunction:: nni.load
--- a/docs/en_US/builtin_assessor.rst
+++ b/docs/en_US/builtin_assessor.rst
+Builtin-Assessors
+=================
+In order to save on computing resources, NNI supports an early stopping policy and has an interface called **Assessor** to do this job.
+Assessor receives the intermediate result from a trial and decides whether the trial should be killed using a specific algorithm. Once the trial experiment meets the early stopping conditions (which means Assessor is pessimistic about the final results), the assessor will kill the trial and the status of the trial will be `EARLY_STOPPED`.
+Here is an experimental result of MNIST after using the 'Curvefitting' Assessor in 'maximize' mode. You can see that Assessor successfully **early stopped** many trials with bad hyperparameters in advance. If you use Assessor, you may get better hyperparameters using the same computing resources.
+Implemented code directory: :githublink:`config_assessor.yml <examples/trials/mnist-pytorch/config_assessor.yml>`
+..  image:: ../img/Assessor.png
+..  toctree::
+    :maxdepth: 1
+    Overview<./Assessor/BuiltinAssessor>
+    Medianstop<./Assessor/MedianstopAssessor>
+    Curvefitting<./Assessor/CurvefittingAssessor>
--- a/docs/en_US/builtin_tuner.rst
+++ b/docs/en_US/builtin_tuner.rst
+Builtin-Tuners
+==============
+NNI provides an easy way to adopt an approach to set up parameter tuning algorithms, we call them **Tuner**.
+Tuner receives metrics from `Trial` to evaluate the performance of a specific parameters/architecture configuration. Tuner sends the next hyper-parameter or architecture configuration to Trial.
+..  toctree::
+    :maxdepth: 1
+    Overview <Tuner/BuiltinTuner>
+    TPE <Tuner/TpeTuner>
+    Random Search <Tuner/RandomTuner>
+    Anneal <Tuner/AnnealTuner>
+    Naive Evolution <Tuner/EvolutionTuner>
+    SMAC <Tuner/SmacTuner>
+    Metis Tuner <Tuner/MetisTuner>
+    Batch Tuner <Tuner/BatchTuner>
+    Grid Search <Tuner/GridsearchTuner>
+    GP Tuner <Tuner/GPTuner>
+    Network Morphism <Tuner/NetworkmorphismTuner>
+    Hyperband <Tuner/HyperbandAdvisor>
+    BOHB <Tuner/BohbAdvisor>
+    PBT Tuner <Tuner/PBTTuner>
+    DNGO Tuner <Tuner/DngoTuner>
--- a/docs/en_US/conf.py
+++ b/docs/en_US/conf.py
+# -*- coding: utf-8 -*-
+#
+# Configuration file for the Sphinx documentation builder.
+#
+# This file does only contain a selection of the most common options. For a
+# full list see the documentation:
+# http://www.sphinx-doc.org/en/master/config
+# -- Path setup --------------------------------------------------------------
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import subprocess
+import sys
+sys.path.insert(0, os.path.abspath('../..'))
+# -- Project information ---------------------------------------------------
+project = 'NNI'
+copyright = '2021, Microsoft'
+author = 'Microsoft'
+# The short X.Y version
+version = ''
+# The full version, including alpha/beta/rc tags
+release = 'v2.6.1'
+# -- General configuration ---------------------------------------------------
+# If your documentation needs a minimal Sphinx version, state it here.
+#
+# needs_sphinx = '1.0'
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    'sphinx.ext.autodoc',
+    'sphinx.ext.mathjax',
+    'sphinxarg.ext',
+    'sphinx.ext.napoleon',
+    'sphinx.ext.viewcode',
+    'sphinx.ext.intersphinx',
+    'nbsphinx',
+    'sphinx.ext.extlinks',
+    'IPython.sphinxext.ipython_console_highlighting',
+]
+# Add mock modules
+autodoc_mock_imports = ['apex', 'nni_node', 'tensorrt', 'pycuda', 'nn_meter']
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+source_suffix = ['.rst']
+# The master toctree document.
+master_doc = 'contents'
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = None
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', 'Release_v1.0.md', '**.ipynb_checkpoints']
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = None
+html_additional_pages = {
+    'index': 'index.html',
+    'nnSpider': 'nnSpider.html',
+    'nnSpider/nobug': 'nnSpider/nobug.html',
+    'nnSpider/holiday': 'nnSpider/holiday.html',
+    'nnSpider/errorEmotion': 'nnSpider/errorEmotion.html',
+    'nnSpider/working': 'nnSpider/working.html',
+    'nnSpider/sign': 'nnSpider/sign.html',
+    'nnSpider/crying': 'nnSpider/crying.html',
+    'nnSpider/cut': 'nnSpider/cut.html',
+    'nnSpider/weaving': 'nnSpider/weaving.html',
+    'nnSpider/comfort': 'nnSpider/comfort.html',
+    'nnSpider/sweat': 'nnSpider/sweat.html'
+}
+# -- Options for HTML output -------------------------------------------------
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'sphinx_rtd_theme'
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#
+html_theme_options = {
+    'logo_only': True,
+}
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['../static']
+# Custom sidebar templates, must be a dictionary that maps document names
+# to template names.
+#
+# The default sidebars (for documents that don't match any pattern) are
+# defined by theme itself.  Builtin themes are using these templates by
+# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
+# 'searchbox.html']``.
+#
+# html_sidebars = {}
+html_logo = '../img/nni_logo_dark.png'
+html_title = 'An open source AutoML toolkit for neural architecture search, model compression and hyper-parameter tuning (%s %s)' % \
+    (project, release)
+# -- Options for HTMLHelp output ---------------------------------------------
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'NeuralNetworkIntelligencedoc'
+# -- Options for LaTeX output ------------------------------------------------
+latex_elements = {
+    # The paper size ('letterpaper' or 'a4paper').
+    #
+    # 'papersize': 'letterpaper',
+    # The font size ('10pt', '11pt' or '12pt').
+    #
+    # 'pointsize': '10pt',
+    # Additional stuff for the LaTeX preamble.
+    #
+    # 'preamble': '',
+    # Latex figure (float) alignment
+    #
+    # 'figure_align': 'htbp',
+}
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title,
+#  author, documentclass [howto, manual, or own class]).
+latex_documents = [
+    (master_doc, 'NeuralNetworkIntelligence.tex', 'Neural Network Intelligence Documentation',
+     'Microsoft', 'manual'),
+]
+# -- Options for manual page output ------------------------------------------
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+    (master_doc, 'neuralnetworkintelligence', 'Neural Network Intelligence Documentation',
+     [author], 1)
+]
+# -- Options for Texinfo output ----------------------------------------------
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+#  dir menu entry, description, category)
+texinfo_documents = [
+    (master_doc, 'NeuralNetworkIntelligence', 'Neural Network Intelligence Documentation',
+     author, 'NeuralNetworkIntelligence', 'One line description of project.',
+     'Miscellaneous'),
+]
+# -- Options for Epub output -------------------------------------------------
+# Bibliographic Dublin Core info.
+epub_title = project
+# The unique identifier of the text. This can be a ISBN number
+# or the project homepage.
+#
+# epub_identifier = ''
+# A unique identification for the text.
+#
+# epub_uid = ''
+# A list of files that should not be packed into the epub file.
+epub_exclude_files = ['search.html']
+# external links (for github code)
+# Reference the code via :githublink:`path/to/your/example/code.py`
+git_commit_id = subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode().strip()
+extlinks = {
+    'githublink': ('https://github.com/microsoft/nni/blob/' + git_commit_id + '/%s', 'Github link: ')
+}
+# -- Extension configuration -------------------------------------------------
+def setup(app):
+    app.add_css_file('css/custom.css')
--- a/docs/en_US/contents.rst
+++ b/docs/en_US/contents.rst
+###########################
+Neural Network Intelligence
+###########################
+..  toctree::
+    :caption: Table of Contents
+    :maxdepth: 2
+    :titlesonly:
+    Overview
+    Installation <installation>
+    QuickStart <Tutorial/QuickStart>
+    Auto (Hyper-parameter) Tuning <hyperparameter_tune>
+    Neural Architecture Search <nas>
+    Model Compression <model_compression>
+    Feature Engineering <feature_engineering>
+    References <reference>
+    Use Cases and Solutions <CommunitySharings/community_sharings>
+    Research and Publications <ResearchPublications>
+    FAQ <Tutorial/FAQ>
+    How to Contribute <contribution>
+    Change Log <Release>
--- a/docs/en_US/contribution.rst
+++ b/docs/en_US/contribution.rst
+###############################
+Contribute to NNI
+###############################
+..  toctree::
+    Development Setup<./Tutorial/SetupNniDeveloperEnvironment>
+    Contribution Guide<./Tutorial/Contributing>
\ No newline at end of file
--- a/docs/en_US/examples.rst
+++ b/docs/en_US/examples.rst
+######################
+Examples
+######################
+..  toctree::
+    :maxdepth: 2
+    MNIST<./TrialExample/MnistExamples>
+    Cifar10<./TrialExample/Cifar10Examples>
+    Scikit-learn<./TrialExample/SklearnExamples>
+    GBDT<./TrialExample/GbdtExample>
+    Pix2pix<./TrialExample/Pix2pixExample>
--- a/docs/en_US/feature_engineering.rst
+++ b/docs/en_US/feature_engineering.rst
+###################
+Feature Engineering
+###################
+We are glad to introduce Feature Engineering toolkit on top of NNI,
+it's still in the experiment phase which might evolve based on usage feedback.
+We'd like to invite you to use, feedback and even contribute.
+For details, please refer to the following tutorials:
+..  toctree::
+    :maxdepth: 2
+    Overview <FeatureEngineering/Overview>
+    GradientFeatureSelector <FeatureEngineering/GradientFeatureSelector>
+    GBDTSelector <FeatureEngineering/GBDTSelector>
--- a/docs/en_US/hpo_advanced.rst
+++ b/docs/en_US/hpo_advanced.rst
+Advanced Features
+=================
+..  toctree::
+    :maxdepth: 2
+    Write a New Tuner <Tuner/CustomizeTuner>
+    Write a New Assessor <Assessor/CustomizeAssessor>
+    Write a New Advisor <Tuner/CustomizeAdvisor>
+    Write a New Training Service <TrainingService/HowToImplementTrainingService>
+    Install Customized Algorithms as Builtin Tuners/Assessors/Advisors <Tutorial/InstallCustomizedAlgos>
--- a/docs/en_US/hpo_benchmark.rst
+++ b/docs/en_US/hpo_benchmark.rst
+HPO Benchmarks
+==============
+..  toctree::
+    :hidden:
+    HPO Benchmark Example Statistics <hpo_benchmark_stats>
+We provide a benchmarking tool to compare the performances of tuners provided by NNI (and users' custom tuners) on different
+types of tasks. This tool uses the `automlbenchmark repository <https://github.com/openml/automlbenchmark)>`_  to run different *benchmarks* on the NNI *tuners*.
+The tool is located in ``examples/trials/benchmarking/automlbenchmark``. This document provides a brief introduction to the tool, its usage, and currently available benchmarks.
+Overview and Terminologies
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+Ideally, an **HPO Benchmark** provides a tuner with a search space, calls the tuner repeatedly, and evaluates how the tuner probes
+the search space and approaches to good solutions. In addition, inside the benchmark, an evaluator should be associated to
+each search space for evaluating the score of points in this search space to give feedbacks to the tuner. For instance,
+the search space could be the space of hyperparameters for a neural network. Then the evaluator should contain train data,
+test data, and a criterion. To evaluate a point in the search space, the evaluator will train the network on the train data
+and report the score of the model on the test data as the score for the point.
+However, a **benchmark** provided by the automlbenchmark repository only provides part of the functionality of the evaluator.
+More concretely, it assumes that it is evaluating a **framework**. Different from a tuner, given train data, a **framework**
+can directly solve a **task** and predict on the test set. The **benchmark** from the automlbenchmark repository directly provides
+train and test datasets to a **framework**, evaluates the prediction on the test set, and reports this score as the final score.
+Therefore, to implement **HPO Benchmark** using automlbenchmark, we pair up a tuner with a search space to form a **framework**,
+and handle the repeated trial-evaluate-feedback loop in the **framework** abstraction. In other words, each **HPO Benchmark**
+contains two main components: a **benchmark** from the automlbenchmark library, and an **architecture** which defines the search
+space and the evaluator. To further clarify, we provide the definition for the terminologies used in this document.
+* **tuner**\ : a `tuner or advisor provided by NNI <https://nni.readthedocs.io/en/stable/builtin_tuner.html>`_, or a custom tuner provided by the user.
+* **task**\ : an abstraction used by automlbenchmark. A task can be thought of as a tuple (dataset, metric). It provides train and test datasets to the frameworks. Then, based on the returns predictions on the test set, the task evaluates the metric (e.g., mse for regression, f1 for classification) and reports the score.
+* **benchmark**\ : an abstraction used by automlbenchmark. A benchmark is a set of tasks, along with other external constraints such as time limits.
+* **framework**\ : an abstraction used by automlbenchmark. Given a task, a framework solves the proposed regression or classification problem using train data and produces predictions on the test set. In our implementation, each framework is an architecture, which defines a search space. To evaluate a task given by the benchmark on a specific tuner, we let the tuner continuously tune the hyperparameters (by giving it cross-validation score on the train data as feedback) until the time or trial limit is reached. Then, the architecture is retrained on the entire train set using the best set of hyperparameters.
+* **architecture**\ : an architecture is a specific method for solving the tasks, along with a set of hyperparameters to optimize (i.e., the search space). See ``./nni/extensions/NNI/architectures`` for examples.
+Supported HPO Benchmarks
+^^^^^^^^^^^^^^^^^^^^^^^^
+From the previous discussion, we can see that to define an **HPO Benchmark**, we need to specify a **benchmark** and an **architecture**.
+Currently, the only architectures we support are random forest and MLP. We use the
+`scikit-learn implementation <https://scikit-learn.org/stable/modules/classes.html#>`_. Typically, there are a number of
+hyperparameters that may directly affect the performances of random forest and MLP models. We design the search
+spaces to be the following.
+Search Space for Random Forest:
+.. code-block:: json
+   {
+       "n_estimators": {"_type":"randint", "_value": [4, 2048]},
+       "max_depth": {"_type":"choice", "_value": [4, 8, 16, 32, 64, 128, 256, 0]},
+       "min_samples_leaf": {"_type":"randint", "_value": [1, 8]},
+       "min_samples_split": {"_type":"randint", "_value": [2, 16]},
+       "max_leaf_nodes": {"_type":"randint", "_value": [0, 4096]}
+    }
+Search Space for MLP:
+.. code-block:: json
+    {
+       "hidden_layer_sizes": {"_type":"choice", "_value": [[16], [64], [128], [256], [16, 16], [64, 64], [128, 128], [256, 256], [16, 16, 16], [64, 64, 64], [128, 128, 128], [256, 256, 256], [256, 128, 64, 16], [128, 64, 16], [64, 16], [16, 64, 128, 256], [16, 64, 128], [16, 64]]},
+       "learning_rate_init": {"_type":"choice", "_value": [0.1, 0.05, 0.01, 0.005, 0.001, 0.0005, 0.0001, 0.00005, 0.00001]},
+       "alpha": {"_type":"choice", "_value": [0.1, 0.05, 0.01, 0.005, 0.001, 0.0005, 0.0001]},
+       "momentum": {"_type":"uniform","_value":[0, 1]},
+       "beta_1": {"_type":"uniform","_value":[0, 1]},
+       "tol": {"_type":"choice", "_value": [0.001, 0.0005, 0.0001, 0.00005, 0.00001]},
+       "max_iter": {"_type":"randint", "_value": [2, 256]}
+    }
+In addition, we write the search space in different ways (e.g., using "choice" or "randint" or "loguniform").
+The architecture implementation and search space definition can be found in ``./nni/extensions/NNI/architectures/``.
+You may replace the search space definition in this file to experiment different search spaces.
+For the automlbenchmarks, in addition to the built-in benchmarks provided by automl
+(defined in ``/examples/trials/benchmarking/automlbenchmark/automlbenchmark/resources/benchmarks/``), we design several
+additional benchmarks, defined in ``/examples/trials/benchmarking/automlbenchmark/nni/benchmarks``.
+One example of larger benchmarks is "nnismall", which consists of 8 regression tasks, 8 binary classification tasks, and
+8 multi-class classification tasks. We also provide three separate 8-task benchmarks "nnismall-regression", "nnismall-binary", and "nnismall-multiclass"
+corresponding to the three types of tasks in nnismall. These tasks are suitable to solve with random forest and MLP.
+The following table summarizes the benchmarks we provide. For ``nnismall``, please check ``/examples/trials/benchmarking/automlbenchmark/automlbenchmark/resources/benchmarks/``
+for a more detailed description for each task. Also, since all tasks are from the OpenML platform, you can find the descriptions
+of all datasets at `this webpage <https://www.openml.org/search?type=data>`_.
+.. list-table::
+   :header-rows: 1
+   :widths: 1 2 2 2
+   * - Benchmark name
+     - Description
+     - Task List
+     - Location
+   * - nnivalid
+     - A three-task benchmark to validate benchmark installation.
+     - ``kc2, iris, cholesterol``
+     - ``/examples/trials/benchmarking/automlbenchmark/nni/benchmarks/``
+   * - nnismall-regression
+     - An eight-task benchmark consisting of **regression** tasks only.
+     - ``cholesterol, liver-disorders, kin8nm, cpu_small, titanic_2, boston, stock, space_ga``
+     - ``/examples/trials/benchmarking/automlbenchmark/nni/benchmarks/``
+   * - nnismall-binary
+     - An eight-task benchmark consisting of **binary classification** tasks only.
+     - ``Australian, blood-transfusion, christine, credit-g, kc1, kr-vs-kp, phoneme, sylvine``
+     - ``/examples/trials/benchmarking/automlbenchmark/nni/benchmarks/``
+   * - nnismall-multiclass
+     - An eight-task benchmark consisting of **multi-class classification** tasks only.
+     - ``car, cnae-9, dilbert, fabert, jasmine, mfeat-factors, segment, vehicle``
+     - ``/examples/trials/benchmarking/automlbenchmark/nni/benchmarks/``
+   * - nnismall
+     - A 24-task benchmark that is the superset of nnismall-regression, nnismall-binary, and nnismall-multiclass.
+     - ``cholesterol, liver-disorders, kin8nm, cpu_small, titanic_2, boston, stock, space_ga, Australian, blood-transfusion, christine, credit-g, kc1, kr-vs-kp, phoneme, sylvine, car, cnae-9, dilbert, fabert, jasmine, mfeat-factors, segment, vehicle``
+     - ``/examples/trials/benchmarking/automlbenchmark/nni/benchmarks/``
+Setup
+^^^^^
+Due to some incompatibilities between automlbenchmark and python 3.8, python 3.7 is recommended for running experiments contained in this folder. First, run the following shell script to clone the automlbenchmark repository. Note: it is recommended to perform the following steps in a separate virtual environment, as the setup code may install several packages. 
+.. code-block:: bash
+   ./setup.sh
+Run predefined benchmarks on existing tuners
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+.. code-block:: bash
+   ./runbenchmark_nni.sh [tuner-names]
+This script runs the benchmark 'nnivalid', which consists of a regression task, a binary classification task, and a
+multi-class classification task. After the script finishes, you can find a summary of the results in the folder results_[time]/reports/.
+To run on other predefined benchmarks, change the ``benchmark`` variable in ``runbenchmark_nni.sh``. To change to another
+search space (by using another architecture), chang the `arch_type` parameter in ``./nni/frameworks.yaml``. Note that currently,
+we only support ``random_forest`` or ``mlp`` as the `arch_type`. To experiment on other search spaces with the same
+architecture, please change the search space defined in ``./nni/extensions/NNI/architectures/run_[architecture].py``.
+The ``./nni/frameworks.yaml`` is the actual configuration file for the HPO Benchmark. The ``limit_type`` parameter specifies
+the limits for running the benchmark on one tuner. If ``limit_type`` is set to `ntrials`, then the tuner is called for
+`trial_limit` times and then stopped. If ``limit_type`` is set to `time`, then the tuner is continuously called until
+timeout for the benchmark is reached. The timeout for the benchmarks can be changed in the each benchmark file located
+in ``./nni/benchmarks``.
+By default, the script runs the benchmark on all embedded tuners in NNI. If provided a list of tuners in [tuner-names],
+it only runs the tuners in the list. Currently, the following tuner names are supported: "TPE", "Random", "Anneal",
+"Evolution", "SMAC", "GPTuner", "MetisTuner", "DNGOTuner", "Hyperband", "BOHB". It is also possible to run the benchmark
+on custom tuners. See the next sections for details.
+By default, the script runs the specified tuners against the specified benchmark one by one. To run the experiment for
+all tuners simultaneously in the background, set the "serialize" flag to false in ``runbenchmark_nni.sh``.
+Note: the SMAC tuner, DNGO tuner, and the BOHB advisor has to be manually installed before running benchmarks on them.
+Please refer to `this page <https://nni.readthedocs.io/en/stable/Tuner/BuiltinTuner.html?highlight=nni>`_ for more details
+on installation.
+Run customized benchmarks on existing tuners
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+You can design your own benchmarks and evaluate the performance of NNI tuners on them. To run customized benchmarks,
+add a benchmark_name.yaml file in the folder ``./nni/benchmarks``, and change the ``benchmark`` variable in ``runbenchmark_nni.sh``.
+See ``./automlbenchmark/resources/benchmarks/`` for some examples of defining a custom benchmark.
+Run benchmarks on custom tuners
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+You may also use the benchmark to compare a custom tuner written by yourself with the NNI built-in tuners. To use custom
+tuners, first make sure that the tuner inherits from ``nni.tuner.Tuner`` and correctly implements the required APIs. For
+more information on implementing a custom tuner, please refer to `here <https://nni.readthedocs.io/en/stable/Tuner/CustomizeTuner.html>`_.
+Next, perform the following steps:
+#. Install the custom tuner via the command ``nnictl algo register``. Check `this document <https://nni.readthedocs.io/en/stable/Tutorial/Nnictl.html>`_ for details. 
+#. In ``./nni/frameworks.yaml``\ , add a new framework extending the base framework NNI. Make sure that the parameter ``tuner_type`` corresponds to the "builtinName" of tuner installed in step 1.
+#. Run the following command
+.. code-block:: bash
+      ./runbenchmark_nni.sh new-tuner-builtinName
+The benchmark will automatically find and match the tuner newly added to your NNI installation.
--- a/docs/en_US/hpo_benchmark_stats.rst
+++ b/docs/en_US/hpo_benchmark_stats.rst
+HPO Benchmark Example Statistics
+================================
+A Benchmark Example
+^^^^^^^^^^^^^^^^^^^
+As an example, we ran the "nnismall" benchmark with the random forest search space on the following 8 tuners: "TPE",
+"Random", "Anneal", "Evolution", "SMAC", "GPTuner", "MetisTuner", "DNGOTuner". For convenience of reference, we also list
+the search space we experimented on here. Note that the way in which the search space is written may significantly affect
+hyperparameter optimization performance, and we plan to conduct further experiments on how well NNI built-in tuners adapt
+to different search space formulations using this benchmarking tool.
+.. code-block:: json
+   {
+       "n_estimators": {"_type":"randint", "_value": [8, 512]},
+       "max_depth": {"_type":"choice", "_value": [4, 8, 16, 32, 64, 128, 256, 0]},
+       "min_samples_leaf": {"_type":"randint", "_value": [1, 8]},
+       "min_samples_split": {"_type":"randint", "_value": [2, 16]},
+       "max_leaf_nodes": {"_type":"randint", "_value": [0, 4096]}
+    }
+As some of the tasks contains a considerable amount of training data, it took about 2 days to run the whole benchmark on
+one tuner. For a more detailed description of the tasks, please check
+``/examples/trials/benchmarking/automlbenchmark/nni/benchmarks/nnismall_description.txt``. For binary and multi-class
+classification tasks, the metric "auc" and "logloss" were used for evaluation, while for regression, "r2" and "rmse" were used.
+After the script finishes, the final scores of each tuner are summarized in the file ``results[time]/reports/performances.txt``.
+Since the file is large, we only show the following screenshot and summarize other important statistics instead.
+.. image:: ../img/hpo_benchmark/performances.png
+   :target: ../img/hpo_benchmark/performances.png
+   :alt:
+When the results are parsed, the tuners are also ranked based on their final performance. The following three tables show
+the average ranking of the tuners for each metric (logloss, rmse, auc).
+Also, for every tuner, their performance for each type of metric is summarized (another view of the same data).
+We present this statistics in the fourth table. Note that this information can be found at ``results[time]/reports/rankings.txt``.
+Average rankings for metric rmse (for regression tasks). We found that Anneal performs the best among all NNI built-in tuners.
+.. list-table::
+   :header-rows: 1
+   * - Tuner Name
+     - Average Ranking
+   * - Anneal
+     - 3.75
+   * - Random
+     - 4.00
+   * - Evolution
+     - 4.44
+   * - DNGOTuner
+     - 4.44
+   * - SMAC
+     - 4.56
+   * - TPE
+     - 4.94
+   * - GPTuner
+     - 4.94
+   * - MetisTuner
+     - 4.94
+Average rankings for metric auc (for classification tasks). We found that SMAC performs the best among all NNI built-in tuners.
+.. list-table::
+   :header-rows: 1
+   * - Tuner Name
+     - Average Ranking
+   * - SMAC
+     - 3.67
+   * - GPTuner
+     - 4.00
+   * - Evolution
+     - 4.22
+   * - Anneal
+     - 4.39
+   * - MetisTuner
+     - 4.39
+   * - TPE
+     - 4.67
+   * - Random
+     - 5.33
+   * - DNGOTuner
+     - 5.33
+Average rankings for metric logloss (for classification tasks). We found that Random performs the best among all NNI built-in tuners.
+.. list-table::
+   :header-rows: 1
+   * - Tuner Name
+     - Average Ranking
+   * - Random
+     - 3.36
+   * - DNGOTuner
+     - 3.50
+   * - SMAC
+     - 3.93
+   * - GPTuner
+     - 4.64
+   * - TPE
+     - 4.71
+   * - Anneal
+     - 4.93
+   * - Evolution
+     - 5.00
+   * - MetisTuner
+     - 5.93
+To view the same data in another way, for each tuner, we present the average rankings on different types of metrics. From the table, we can find that, for example, the DNGOTuner performs better for the tasks whose metric is "logloss" than for the tasks with metric "auc". We hope this information can to some extent guide the choice of tuners given some knowledge of task types.
+.. list-table::
+   :header-rows: 1
+   * - Tuner Name
+     - rmse
+     - auc
+     - logloss
+   * - TPE
+     - 4.94
+     - 4.67
+     - 4.71
+   * - Random
+     - 4.00
+     - 5.33
+     - 3.36
+   * - Anneal
+     - 3.75
+     - 4.39
+     - 4.93
+   * - Evolution
+     - 4.44
+     - 4.22
+     - 5.00
+   * - GPTuner
+     - 4.94
+     - 4.00
+     - 4.64
+   * - MetisTuner
+     - 4.94
+     - 4.39
+     - 5.93
+   * - SMAC
+     - 4.56
+     - 3.67
+     - 3.93
+   * - DNGOTuner
+     - 4.44
+     - 5.33
+     - 3.50
+Besides these reports, our script also generates two graphs for each fold of each task: one graph presents the best score received by each tuner until trial x, and another graph shows the score that each tuner receives in trial x. These two graphs can give some information regarding how the tuners are "converging" to their final solution. We found that for "nnismall", tuners on the random forest model with search space defined in ``/examples/trials/benchmarking/automlbenchmark/nni/extensions/NNI/architectures/run_random_forest.py`` generally converge to the final solution after 40 to 60 trials. As there are too much graphs to incldue in a single report (96 graphs in total), we only present 10 graphs here.
+.. image:: ../img/hpo_benchmark/car_fold1_1.jpg
+   :target: ../img/hpo_benchmark/car_fold1_1.jpg
+   :alt:
+.. image:: ../img/hpo_benchmark/car_fold1_2.jpg
+   :target: ../img/hpo_benchmark/car_fold1_2.jpg
+   :alt:
+The previous two graphs are generated for fold 1 of the task "car". In the first graph, we observe that most tuners find a relatively good solution within 40 trials. In this experiment, among all tuners, the DNGOTuner converges fastest to the best solution (within 10 trials). Its best score improved for three times in the entire experiment. In the second graph, we observe that most tuners have their score flucturate between 0.8 and 1 throughout the experiment. However, it seems that the Anneal tuner (green line) is more unstable (having more fluctuations) while the GPTuner has a more stable pattern. This may be interpreted as the Anneal tuner explores more aggressively than the GPTuner and thus its scores for different trials vary a lot. Regardless, although this pattern can to some extent hint a tuner's position on the explore-exploit tradeoff, it is not a comprehensive evaluation of a tuner's effectiveness.
+.. image:: ../img/hpo_benchmark/christine_fold0_1.jpg
+   :target: ../img/hpo_benchmark/christine_fold0_1.jpg
+   :alt:
+.. image:: ../img/hpo_benchmark/christine_fold0_2.jpg
+   :target: ../img/hpo_benchmark/christine_fold0_2.jpg
+   :alt:
+.. image:: ../img/hpo_benchmark/cnae-9_fold0_1.jpg
+   :target: ../img/hpo_benchmark/cnae-9_fold0_1.jpg
+   :alt:
+.. image:: ../img/hpo_benchmark/cnae-9_fold0_2.jpg
+   :target: ../img/hpo_benchmark/cnae-9_fold0_2.jpg
+   :alt:
+.. image:: ../img/hpo_benchmark/credit-g_fold1_1.jpg
+   :target: ../img/hpo_benchmark/credit-g_fold1_1.jpg
+   :alt:
+.. image:: ../img/hpo_benchmark/credit-g_fold1_2.jpg
+   :target: ../img/hpo_benchmark/credit-g_fold1_2.jpg
+   :alt:
+.. image:: ../img/hpo_benchmark/titanic_2_fold1_1.jpg
+   :target: ../img/hpo_benchmark/titanic_2_fold1_1.jpg
+   :alt:
+.. image:: ../img/hpo_benchmark/titanic_2_fold1_2.jpg
+   :target: ../img/hpo_benchmark/titanic_2_fold1_2.jpg
+   :alt: