Unverified Commit 6f3ed2bf authored by J-shang's avatar J-shang Committed by GitHub
Browse files

Merge pull request #4670 from liuzhe-lz/doc-merge

parents 553e91f4 13dc0f8f
# Contributing to NNI
Welcome, and thank you for your interest in contributing to NNI!
There are many ways in which you can contribute, beyond writing code. The goal of this document is to provide a high-level overview of how you can get involved.
# Provide feedback or ask a question
* [File an issue](https://github.com/microsoft/nni/issues/new/choose) on GitHub.
* Ask a question with NNI tags on [Stack Overflow](https://stackoverflow.com/questions/tagged/nni?sort=Newest&edited=true).
* Discuss on the NNI [Gitter](https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) in NNI.
Join IM discussion groups:
|Gitter||WeChat|
|----|----|----|
|![image](https://user-images.githubusercontent.com/39592018/80665738-e0574a80-8acc-11ea-91bc-0836dc4cbf89.png)| OR |![image](https://github.com/scarlett2018/nniutil/raw/master/wechat.png)|
# Look for an existing issue
Before you create a new issue, please do a search in [open issues](https://github.com/microsoft/nni/issues) to see if the issue or feature request has already been filed.
Be sure to scan through the [most popular](https://github.com/microsoft/nni/issues?q=is%3Aopen+is%3Aissue+label%3AFAQ+sort%3Areactions-%2B1-desc) feature requests.
If you find your issue already exists, make relevant comments and add your [reaction](https://github.com/blog/2119-add-reactions-to-pull-requests-issues-and-comments). Use a reaction in place of a "+1" comment:
* 👍 - upvote
* 👎 - downvote
If you cannot find an existing issue that describes your bug or feature, create a new issue using the guidelines below.
# Writing good bug reports or feature requests
File a single issue per problem and feature request. Do not enumerate multiple bugs or feature requests in the same issue.
Provide as much information as you think might relevant to the context (thinking the issue is assigning to you, what kinds of info you will need to debug it!!!). To give you a general idea about what kinds of info are useful for developers to dig out the issue, we had provided issue template for you.
Once you had submitted an issue, be sure to follow it for questions and discussions.
Once the bug is fixed or feature is addressed, be sure to close the issue.
# Contributing fixes or examples
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
# Code of Conduct
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
# How to Contribute
After getting familiar with contribution agreements, you are ready to create your first PR =), follow the NNI developer tutorials to get start:
* We recommend new contributors to start with simple issues: ['good first issue'](https://github.com/Microsoft/nni/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) or ['help-wanted'](https://github.com/microsoft/nni/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22).
* [NNI developer environment installation tutorial](docs/en_US/Tutorial/SetupNniDeveloperEnvironment.rst)
* [How to debug](docs/en_US/Tutorial/HowToDebug.rst)
* If you have any questions on usage, review [FAQ](https://github.com/microsoft/nni/blob/master/docs/en_US/Tutorial/FAQ.rst) first, if there are no relevant issues and answers to your question, try contact NNI dev team and users in [Gitter](https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) or [File an issue](https://github.com/microsoft/nni/issues/new/choose) on GitHub.
* [Customize your own Tuner](docs/en_US/Tuner/CustomizeTuner.rst)
* [Implement customized TrainingService](docs/en_US/TrainingService/HowToImplementTrainingService.rst)
* [Implement a new NAS trainer on NNI](docs/en_US/NAS/Advanced.rst)
* [Customize your own Advisor](docs/en_US/Tuner/CustomizeAdvisor.rst)
# 贡献代码
非常感谢您有兴趣对 NNI 做出贡献!
除了编写代码外,您还可以通过多种方式参与, 本文档的目的是提供一个如何参与贡献的高层次概述。
# 反馈或提问
* 在 Github 上创建 [issue](https://github.com/microsoft/nni/issues/new/choose)
*[Stack Overflow](https://stackoverflow.com/questions/tagged/nni?sort=Newest&edited=true) 上使用 nni 标签提问。
*[Gitter](https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) 中参与讨论。
加入聊天组:
| Gitter | | 微信 |
| -------------------------------------------------------------------------------------------------------------- | - | ----------------------------------------------------------------------- |
| ![image](https://user-images.githubusercontent.com/39592018/80665738-e0574a80-8acc-11ea-91bc-0836dc4cbf89.png) | 或 | ![image](https://github.com/scarlett2018/nniutil/raw/master/wechat.png) |
# 查找现有问题
在创建新 issue 之前,请在 [open issues](https://github.com/microsoft/nni/issues) 中进行搜索,以查看问题或功能请求是否已经存在。
确保已经浏览了 [最热门](https://github.com/microsoft/nni/issues?q=is%3Aopen+is%3Aissue+label%3AFAQ+sort%3Areactions-%2B1-desc) 的功能请求。
如果您的问题已经存在,请在下方发表评论或添加[回应](https://github.com/blog/2119-add-reactions-to-pull-requests-issues-and-comments)。 通过回应来代替“+1”评论:
* 👍 - 赞成
* 👎 - 反对
如果未能找到描述您 Bug 或功能的现有问题,请使用以下指南创建一个新问题。
# 编写良好的错误报告或功能请求
针对每个错误和功能请求提交一个问题, 不要在同一问题中列举多个 Bug 或功能请求。
尽可能多地提供您认为与上下文相关的信息(思考问题如果分配给您,您需要什么样的信息来调试它)。 为了让您大致了解哪些信息对开发人员解决问题有帮助,我们为您提供了问题模板。
提交问题后,请务必跟进问题并参与讨论。
修正 Bug 或实现功能后,请务必关闭此问题。
# 贡献修复或示例
此项目欢迎任何贡献和建议。 大多数贡献需要您同意参与者许可协议(CLA),来声明您有权并授予我们使用您贡献的权利。 有关详细信息,请访问 https://cla.opensource.microsoft.com。
当你提交拉取请求时,CLA 机器人会自动检查你是否需要提供 CLA,并修饰这个拉取请求(例如标签、注释等)。 只需要按照机器人提供的说明进行操作即可。 CLA 只需要同意一次,就能应用到所有的代码仓库上。
# 行为准则
该项目采用了 [ Microsoft 开源行为准则 ](https://opensource.microsoft.com/codeofconduct/)。 有关详细信息,请参阅[行为守则常见问题解答](https://opensource.microsoft.com/codeofconduct/faq/)或联系 opencode@microsoft.com 咨询问题或评论。
# 参与贡献
熟悉贡献协议后,即可按照 NNI 开发人员教程,创建第一个 PR =):
* 推荐新贡献者先从简单的问题开始:['good first issue'](https://github.com/Microsoft/nni/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)['help-wanted'](https://github.com/microsoft/nni/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22)
* [NNI 开发环境安装教程](docs/zh_CN/Tutorial/SetupNniDeveloperEnvironment.rst)
* [如何调试](docs/zh_CN/Tutorial/HowToDebug.rst)
* 如果有使用上的问题,可先查看[常见问题解答](https://github.com/microsoft/nni/blob/master/docs/zh_CN/Tutorial/FAQ.rst)。如果没能解决问题,可通过 [Gitter](https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) 联系 NNI 开发团队或在 GitHub 上 [报告问题](https://github.com/microsoft/nni/issues/new/choose)
* [自定义 Tuner](docs/zh_CN/Tuner/CustomizeTuner.rst)
* [实现定制的训练平台](docs/zh_CN/TrainingService/HowToImplementTrainingService.rst)
* [在 NNI 上实现新的 NAS Trainer](docs/zh_CN/NAS/Advanced.rst)
* [自定义 Advisor](docs/zh_CN/Tuner/CustomizeAdvisor.rst)
# Copyright (c) Microsoft Corporation. # Copyright (c) Microsoft Corporation.
# Licensed under the MIT license. # Licensed under the MIT license.
FROM nvidia/cuda:10.2-cudnn8-runtime-ubuntu18.04 FROM nvidia/cuda:11.3.1-cudnn8-runtime-ubuntu20.04
ARG NNI_RELEASE ARG NNI_RELEASE
...@@ -11,84 +11,48 @@ ENV DEBIAN_FRONTEND=noninteractive ...@@ -11,84 +11,48 @@ ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get -y update RUN apt-get -y update
RUN apt-get -y install \ RUN apt-get -y install \
sudo \ automake \
apt-utils \
git \
curl \
vim \
unzip \
wget \
build-essential \ build-essential \
cmake \ cmake \
libopenblas-dev \ curl \
automake \ git \
openssh-client \
openssh-server \ openssh-server \
lsof \ python3 \
python3.6 \
python3-dev \ python3-dev \
python3-pip \ python3-pip \
python3-tk \ sudo \
libcupti-dev unzip \
wget \
zip
RUN apt-get clean RUN apt-get clean
RUN rm -rf /var/lib/apt/lists/* RUN rm -rf /var/lib/apt/lists/*
#
# generate python script
#
RUN ln -s python3 /usr/bin/python RUN ln -s python3 /usr/bin/python
# RUN python3 -m pip --no-cache-dir install pip==22.0.3 setuptools==60.9.1 wheel==0.37.1
# update pip
#
RUN python3 -m pip install --upgrade pip==20.2.4 setuptools==50.3.2
# numpy 1.19.5 scipy 1.5.4 RUN python3 -m pip --no-cache-dir install \
RUN python3 -m pip --no-cache-dir install numpy==1.19.5 scipy==1.5.4 lightgbm==3.3.2 \
numpy==1.22.2 \
pandas==1.4.1 \
scikit-learn==1.0.2 \
scipy==1.8.0
# RUN python3 -m pip --no-cache-dir install \
# TensorFlow torch==1.10.2+cu113 \
# torchvision==0.11.3+cu113 \
RUN python3 -m pip --no-cache-dir install tensorflow==2.3.1 torchaudio==0.10.2+cu113 \
-f https://download.pytorch.org/whl/cu113/torch_stable.html
RUN python3 -m pip --no-cache-dir install pytorch-lightning==1.5.10
# RUN python3 -m pip --no-cache-dir install tensorflow==2.8.0
# Keras
#
RUN python3 -m pip --no-cache-dir install Keras==2.4.3
# RUN python3 -m pip --no-cache-dir install azureml==0.2.7 azureml-sdk==1.38.0
# PyTorch
#
RUN python3 -m pip --no-cache-dir install torch==1.7.1 torchvision==0.8.2 pytorch-lightning==1.3.3
#
# sklearn 0.24.1
#
RUN python3 -m pip --no-cache-dir install scikit-learn==0.24.1
#
# pandas==0.23.4 lightgbm==2.2.2
#
RUN python3 -m pip --no-cache-dir install pandas==1.1 lightgbm==2.2.2
#
# Install NNI
#
COPY dist/nni-${NNI_RELEASE}-py3-none-manylinux1_x86_64.whl . COPY dist/nni-${NNI_RELEASE}-py3-none-manylinux1_x86_64.whl .
RUN python3 -m pip install nni-${NNI_RELEASE}-py3-none-manylinux1_x86_64.whl RUN python3 -m pip install nni-${NNI_RELEASE}-py3-none-manylinux1_x86_64.whl
RUN rm nni-${NNI_RELEASE}-py3-none-manylinux1_x86_64.whl
# ENV PATH=/root/.local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:/usr/bin:/usr/sbin
# Vision patch. Need del later
#
COPY test/vso_tools/interim_patch.py .
RUN python3 interim_patch.py
#
# install aml package
#
RUN python3 -m pip --no-cache-dir install azureml
RUN python3 -m pip --no-cache-dir install azureml-sdk
ENV PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/root/.local/bin:/usr/bin:/bin:/sbin
WORKDIR /root WORKDIR /root
...@@ -354,7 +354,7 @@ NNI 有一个月度发布周期(主要发布)。 如果您遇到问题可以 ...@@ -354,7 +354,7 @@ NNI 有一个月度发布周期(主要发布)。 如果您遇到问题可以
* [OpenPAI](https://github.com/Microsoft/pai):作为开源平台,提供了完整的 AI 模型训练和资源管理能力,能轻松扩展,并支持各种规模的私有部署、云和混合环境。 * [OpenPAI](https://github.com/Microsoft/pai):作为开源平台,提供了完整的 AI 模型训练和资源管理能力,能轻松扩展,并支持各种规模的私有部署、云和混合环境。
* [FrameworkController](https://github.com/Microsoft/frameworkcontroller):开源的通用 Kubernetes Pod 控制器,通过单个控制器来编排 Kubernetes 上所有类型的应用。 * [FrameworkController](https://github.com/Microsoft/frameworkcontroller):开源的通用 Kubernetes Pod 控制器,通过单个控制器来编排 Kubernetes 上所有类型的应用。
* [MMdnn](https://github.com/Microsoft/MMdnn):一个完整、跨框架的解决方案,能够转换、可视化、诊断深度神经网络模型。 MMdnn 中的 "MM" 表示 model management(模型管理),而 "dnn" 是 deep neural network(深度神经网络)的缩写。 MMdnn 中的 "MM" 表示 model management(模型管理),而 "dnn" 是 deep neural network(深度神经网络)的缩写。 * [MMdnn](https://github.com/Microsoft/MMdnn):一个完整、跨框架的解决方案,能够转换、可视化、诊断深度神经网络模型。 MMdnn 中的 "MM" 表示 model management(模型管理),而 "dnn" 是 deep neural network(深度神经网络)的缩写。
* [SPTAG](https://github.com/Microsoft/SPTAG) : Space Partition Tree And Graph (SPTAG) 是用于大规模向量的最近邻搜索场景的开源库。 * [SPTAG](https://github.com/Microsoft/SPTAG) : Space Partition Tree And Graph (SPTAG) 是用于大规模向量的最近邻搜索场景的开源库。
我们鼓励研究人员和学生利用这些项目来加速 AI 开发和研究。 我们鼓励研究人员和学生利用这些项目来加速 AI 开发和研究。
......
Contributing to Neural Network Intelligence (NNI)
=================================================
Great!! We are always on the lookout for more contributors to our code base.
Firstly, if you are unsure or afraid of anything, just ask or submit the issue or pull request anyways. You won't be yelled at for giving your best effort. The worst that can happen is that you'll be politely asked to change something. We appreciate any sort of contributions and don't want a wall of rules to get in the way of that.
However, for those individuals who want a bit more guidance on the best way to contribute to the project, read on. This document will cover all the points we're looking for in your contributions, raising your chances of quickly merging or addressing your contributions.
Looking for a quickstart, get acquainted with our `Get Started <QuickStart.rst>`__ guide.
There are a few simple guidelines that you need to follow before providing your hacks.
Raising Issues
--------------
When raising issues, please specify the following:
* Setup details needs to be filled as specified in the issue template clearly for the reviewer to check.
* A scenario where the issue occurred (with details on how to reproduce it).
* Errors and log messages that are displayed by the software.
* Any other details that might be useful.
Submit Proposals for New Features
---------------------------------
*
There is always something more that is required, to make it easier to suit your use-cases. Feel free to join the discussion on new features or raise a PR with your proposed change.
*
Fork the repository under your own github handle. After cloning the repository. Add, commit, push and sqaush (if necessary) the changes with detailed commit messages to your fork. From where you can proceed to making a pull request.
Contributing to Source Code and Bug Fixes
-----------------------------------------
Provide PRs with appropriate tags for bug fixes or enhancements to the source code. Do follow the correct naming conventions and code styles when you work on and do try to implement all code reviews along the way.
If you are looking for How to develop and debug the NNI source code, you can refer to `How to set up NNI developer environment doc <./SetupNniDeveloperEnvironment.rst>`__ file in the ``docs`` folder.
Similarly for `Quick Start <QuickStart.rst>`__. For everything else, refer to `NNI Home page <http://nni.readthedocs.io>`__.
Solve Existing Issues
---------------------
Head over to `issues <https://github.com/Microsoft/nni/issues>`__ to find issues where help is needed from contributors. You can find issues tagged with 'good-first-issue' or 'help-wanted' to contribute in.
A person looking to contribute can take up an issue by claiming it as a comment/assign their Github ID to it. In case there is no PR or update in progress for a week on the said issue, then the issue reopens for anyone to take up again. We need to consider high priority issues/regressions where response time must be a day or so.
Code Styles & Naming Conventions
--------------------------------
* We follow `PEP8 <https://www.python.org/dev/peps/pep-0008/>`__ for Python code and naming conventions, do try to adhere to the same when making a pull request or making a change. One can also take the help of linters such as ``flake8`` or ``pylint``
* We also follow `NumPy Docstring Style <https://www.sphinx-doc.org/en/master/usage/extensions/example_numpy.html#example-numpy>`__ for Python Docstring Conventions. During the `documentation building <Contributing.rst#documentation>`__\ , we use `sphinx.ext.napoleon <https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html>`__ to generate Python API documentation from Docstring.
* For docstrings, please refer to `numpydoc docstring guide <https://numpydoc.readthedocs.io/en/latest/format.html>`__ and `pandas docstring guide <https://python-sprints.github.io/pandas/guide/pandas_docstring.html>`__
* For function docstring, **description**, **Parameters**, and **Returns** **Yields** are mandatory.
* For class docstring, **description**, **Attributes** are mandatory.
* For docstring to describe ``dict``, which is commonly used in our hyper-param format description, please refer to `Internal Guideline on Writing Standards <https://ribokit.github.io/docs/text/>`__
Documentation
-------------
Our documentation is built with :githublink:`sphinx <docs>`.
* Before submitting the documentation change, please **build homepage locally**: ``cd docs/en_US && make html``, then you can see all the built documentation webpage under the folder ``docs/en_US/_build/html``. It's also highly recommended taking care of **every WARNING** during the build, which is very likely the signal of a **deadlink** and other annoying issues.
*
For links, please consider using **relative paths** first. However, if the documentation is written in reStructuredText format, and:
* It's an image link which needs to be formatted with embedded html grammar, please use global URL like ``https://user-images.githubusercontent.com/44491713/51381727-e3d0f780-1b4f-11e9-96ab-d26b9198ba65.png``, which can be automatically generated by dragging picture onto `Github Issue <https://github.com/Microsoft/nni/issues/new>`__ Box.
* It cannot be re-formatted by sphinx, such as source code, please use its global URL. For source code that links to our github repo, please use URLs rooted at ``https://github.com/Microsoft/nni/tree/master/`` (:githublink:`mnist.py <examples/trials/mnist-pytorch/mnist.py>` for example).
Setup NNI development environment
=================================
NNI development environment supports Ubuntu 1604 (or above), and Windows 10 with Python3 64bit.
Installation
------------
1. Clone source code
^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
git clone https://github.com/Microsoft/nni.git
Note, if you want to contribute code back, it needs to fork your own NNI repo, and clone from there.
2. Install from source code
^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
python3 -m pip install -U -r dependencies/setup.txt
python3 -m pip install -r dependencies/develop.txt
python3 setup.py develop
This installs NNI in `development mode <https://setuptools.readthedocs.io/en/latest/userguide/development_mode.html>`__,
so you don't need to reinstall it after edit.
3. Check if the environment is ready
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Now, you can try to start an experiment to check if your environment is ready.
For example, run the command
.. code-block:: bash
nnictl create --config examples/trials/mnist-pytorch/config.yml
And open WebUI to check if everything is OK
4. Reload changes
^^^^^^^^^^^^^^^^^
Python
******
Nothing to do, the code is already linked to package folders.
TypeScript (Linux and macOS)
****************************
* If ``ts/nni_manager`` is changed, run ``yarn watch`` under this folder. It will watch and build code continually. The ``nnictl`` need to be restarted to reload NNI manager.
* If ``ts/webui`` is changed, run ``yarn dev``\ , which will run a mock API server and a webpack dev server simultaneously. Use ``EXPERIMENT`` environment variable (e.g., ``mnist-tfv1-running``\ ) to specify the mock data being used. Built-in mock experiments are listed in ``src/webui/mock``. An example of the full command is ``EXPERIMENT=mnist-tfv1-running yarn dev``.
TypeScript (Windows)
********************
Currently you must rebuild TypeScript modules with `python3 setup.py build_ts` after edit.
5. Submit Pull Request
^^^^^^^^^^^^^^^^^^^^^^
All changes are merged to master branch from your forked repo. The description of Pull Request must be meaningful, and useful.
We will review the changes as soon as possible. Once it passes review, we will merge it to master branch.
For more contribution guidelines and coding styles, you can refer to the `contributing document <Contributing.rst>`__.
###############################
Contribute to NNI
###############################
.. toctree::
Development Setup<./Tutorial/SetupNniDeveloperEnvironment>
Contribution Guide<./Tutorial/Contributing>
\ No newline at end of file
.. 24da49b25d3d36c476a69aceb825cb94
###############################
贡献代码
###############################
.. toctree::
设置开发环境<./Tutorial/SetupNniDeveloperEnvironment>
贡献指南<./Tutorial/Contributing>
\ No newline at end of file
...@@ -40,7 +40,7 @@ Neural Network Intelligence ...@@ -40,7 +40,7 @@ Neural Network Intelligence
Research and Publications <misc/research_publications> Research and Publications <misc/research_publications>
FAQ <misc/faq> FAQ <misc/faq>
notes/build_from_source notes/build_from_source
How to Contribute <contribution> Contribution Guide <notes/contributing>
Change Log <Release> Change Log <Release>
**NNI (Neural Network Intelligence)** is a lightweight but powerful toolkit to help users **automate**: **NNI (Neural Network Intelligence)** is a lightweight but powerful toolkit to help users **automate**:
......
.. 4a8dacf5afeed980409b777b3fb60fe2 .. 4d622b7ee5031e9cccec635bf6c7427d
########################### ###########################
Neural Network Intelligence Neural Network Intelligence
...@@ -25,7 +25,7 @@ Neural Network Intelligence ...@@ -25,7 +25,7 @@ Neural Network Intelligence
研究和出版物 <misc/research_publications> 研究和出版物 <misc/research_publications>
常见问题 <misc/faq> 常见问题 <misc/faq>
从源代码安装 <notes/build_from_source> 从源代码安装 <notes/build_from_source>
如何贡献 <contribution> 如何贡献 <notes/contributing>
更改日志 <Release> 更改日志 <Release>
......
Overview :orphan:
========
NNI (Neural Network Intelligence) is a toolkit to help users design and tune machine learning models (e.g., hyperparameters), neural network architectures, or complex system's parameters, in an efficient and automatic way. NNI has several appealing properties: ease-of-use, scalability, flexibility, and efficiency. Architecture Overview
=====================
NNI (Neural Network Intelligence) is a toolkit to help users design and tune machine learning models (e.g., hyperparameters), neural network architectures, or complex system's parameters, in an efficient and automatic way. NNI has several appealing properties: ease-of-use, scalability, flexibility, and efficiency.
* **Ease-of-use**\ : NNI can be easily installed through python pip. Only several lines need to be added to your code in order to use NNI's power. You can use both the commandline tool and WebUI to work with your experiments. * **Ease-of-use**: NNI can be easily installed through python pip. Only several lines need to be added to your code in order to use NNI's power. You can use both the commandline tool and WebUI to work with your experiments.
* **Scalability**\ : Tuning hyperparameters or the neural architecture often demands a large number of computational resources, while NNI is designed to fully leverage different computation resources, such as remote machines, training platforms (e.g., OpenPAI, Kubernetes). Hundreds of trials could run in parallel by depending on the capacity of your configured training platforms. * **Scalability**: Tuning hyperparameters or the neural architecture often demands a large number of computational resources, while NNI is designed to fully leverage different computation resources, such as remote machines, training platforms (e.g., OpenPAI, Kubernetes). Hundreds of trials could run in parallel by depending on the capacity of your configured training platforms.
* **Flexibility**\ : Besides rich built-in algorithms, NNI allows users to customize various hyperparameter tuning algorithms, neural architecture search algorithms, early stopping algorithms, etc. Users can also extend NNI with more training platforms, such as virtual machines, kubernetes service on the cloud. Moreover, NNI can connect to external environments to tune special applications/models on them. * **Flexibility**: Besides rich built-in algorithms, NNI allows users to customize various hyperparameter tuning algorithms, neural architecture search algorithms, early stopping algorithms, etc. Users can also extend NNI with more training platforms, such as virtual machines, kubernetes service on the cloud. Moreover, NNI can connect to external environments to tune special applications/models on them.
* **Efficiency**\ : We are intensively working on more efficient model tuning on both the system and algorithm level. For example, we leverage early feedback to speedup the tuning procedure. * **Efficiency**: We are intensively working on more efficient model tuning on both the system and algorithm level. For example, we leverage early feedback to speedup the tuning procedure.
The figure below shows high-level architecture of NNI. The figure below shows high-level architecture of NNI.
.. raw:: html .. image:: https://user-images.githubusercontent.com/16907603/92089316-94147200-ee00-11ea-9944-bf3c4544257f.png
:width: 700
<p align="center">
<img src="https://user-images.githubusercontent.com/16907603/92089316-94147200-ee00-11ea-9944-bf3c4544257f.png" alt="drawing" width="700"/>
</p>
Key Concepts Key Concepts
------------ ------------
* *Experiment*: One task of, for example, finding out the best hyperparameters of a model, finding out the best neural network architecture, etc. It consists of trials and AutoML algorithms.
* * *Search Space*: The feasible region for tuning the model. For example, the value range of each hyperparameter.
*Experiment*\ : One task of, for example, finding out the best hyperparameters of a model, finding out the best neural network architecture, etc. It consists of trials and AutoML algorithms.
*
*Search Space*\ : The feasible region for tuning the model. For example, the value range of each hyperparameter.
* * *Configuration*: An instance from the search space, that is, each hyperparameter has a specific value.
*Configuration*\ : An instance from the search space, that is, each hyperparameter has a specific value.
* * *Trial*: An individual attempt at applying a new configuration (e.g., a set of hyperparameter values, a specific neural architecture, etc.). Trial code should be able to run with the provided configuration.
*Trial*\ : An individual attempt at applying a new configuration (e.g., a set of hyperparameter values, a specific neural architecture, etc.). Trial code should be able to run with the provided configuration.
* * *Tuner*: An AutoML algorithm, which generates a new configuration for the next try. A new trial will run with this configuration.
*Tuner*\ : An AutoML algorithm, which generates a new configuration for the next try. A new trial will run with this configuration.
* * *Assessor*: Analyze a trial's intermediate results (e.g., periodically evaluated accuracy on test dataset) to tell whether this trial can be early stopped or not.
*Assessor*\ : Analyze a trial's intermediate results (e.g., periodically evaluated accuracy on test dataset) to tell whether this trial can be early stopped or not.
* * *Training Platform*: Where trials are executed. Depending on your experiment's configuration, it could be your local machine, or remote servers, or large-scale training platform (e.g., OpenPAI, Kubernetes).
*Training Platform*\ : Where trials are executed. Depending on your experiment's configuration, it could be your local machine, or remote servers, or large-scale training platform (e.g., OpenPAI, Kubernetes).
Basically, an experiment runs as follows: Tuner receives search space and generates configurations. These configurations will be submitted to training platforms, such as the local machine, remote machines, or training clusters. Their performances are reported back to Tuner. Then, new configurations are generated and submitted. Basically, an experiment runs as follows: Tuner receives search space and generates configurations. These configurations will be submitted to training platforms, such as the local machine, remote machines, or training clusters. Their performances are reported back to Tuner. Then, new configurations are generated and submitted.
For each experiment, the user only needs to define a search space and update a few lines of code, and then leverage NNI built-in Tuner/Assessor and training platforms to search the best hyperparameters and/or neural architecture. There are basically 3 steps: For each experiment, the user only needs to define a search space and update a few lines of code, and then leverage NNI built-in Tuner/Assessor and training platforms to search the best hyperparameters and/or neural architecture. There are basically 3 steps:
.. * Step 1: `Define search space <Tutorial/SearchSpaceSpec.rst>`__
Step 1: `Define search space <Tutorial/SearchSpaceSpec.rst>`__
Step 2: `Update model codes <TrialExample/Trials.rst>`__ * Step 2: `Update model codes <TrialExample/Trials.rst>`__
Step 3: `Define Experiment <reference/experiment_config.rst>`__ * Step 3: `Define Experiment <reference/experiment_config.rst>`__
.. raw:: html
<p align="center">
<img src="https://user-images.githubusercontent.com/23273522/51816627-5d13db80-2302-11e9-8f3e-627e260203d5.jpg" alt="drawing"/>
</p>
.. image:: https://user-images.githubusercontent.com/23273522/51816627-5d13db80-2302-11e9-8f3e-627e260203d5.jpg
For more details about how to run an experiment, please refer to `Get Started <Tutorial/QuickStart.rst>`__. For more details about how to run an experiment, please refer to `Get Started <Tutorial/QuickStart.rst>`__.
...@@ -103,21 +83,3 @@ Automatic Feature Engineering ...@@ -103,21 +83,3 @@ Automatic Feature Engineering
Automatic feature engineering is for users to find the best features for their tasks. A detailed description of automatic feature engineering and its usage can be found `here <FeatureEngineering/Overview.rst>`__. It is supported through NNI trial SDK, which means you do not have to create an NNI experiment. Instead, simply import a built-in auto-feature-engineering algorithm in your trial code and directly run your trial code. Automatic feature engineering is for users to find the best features for their tasks. A detailed description of automatic feature engineering and its usage can be found `here <FeatureEngineering/Overview.rst>`__. It is supported through NNI trial SDK, which means you do not have to create an NNI experiment. Instead, simply import a built-in auto-feature-engineering algorithm in your trial code and directly run your trial code.
The auto-feature-engineering algorithms usually have a bunch of hyperparameters themselves. If you want to automatically tune those hyperparameters, you can leverage hyperparameter tuning of NNI, that is, choose a tuning algorithm (i.e., tuner) and start an NNI experiment for it. The auto-feature-engineering algorithms usually have a bunch of hyperparameters themselves. If you want to automatically tune those hyperparameters, you can leverage hyperparameter tuning of NNI, that is, choose a tuning algorithm (i.e., tuner) and start an NNI experiment for it.
Learn More
----------
* `Get started <Tutorial/QuickStart.rst>`__
* `How to adapt your trial code on NNI? <TrialExample/Trials.rst>`__
* `What are tuners supported by NNI? <Tuner/BuiltinTuner.rst>`__
* `How to customize your own tuner? <Tuner/CustomizeTuner.rst>`__
* `What are assessors supported by NNI? <Assessor/BuiltinAssessor.rst>`__
* `How to customize your own assessor? <Assessor/CustomizeAssessor.rst>`__
* `How to run an experiment on local? <TrainingService/LocalMode.rst>`__
* `How to run an experiment on multiple machines? <TrainingService/RemoteMachineMode.rst>`__
* `How to run an experiment on OpenPAI? <TrainingService/PaiMode.rst>`__
* `Examples <TrialExample/MnistExamples.rst>`__
* `Neural Architecture Search on NNI <NAS/Overview.rst>`__
* `Model Compression on NNI <Compression/Overview.rst>`__
* `Automatic feature engineering on NNI <FeatureEngineering/Overview.rst>`__
Contribution Guide
==================
Great! We are always on the lookout for more contributors to our code base.
Firstly, if you are unsure or afraid of anything, just ask or submit the issue or pull request anyways. You won't be yelled at for giving your best effort. The worst that can happen is that you'll be politely asked to change something. We appreciate any sort of contributions and don't want a wall of rules to get in the way of that.
However, for those individuals who want a bit more guidance on the best way to contribute to the project, read on. This document will cover all the points we're looking for in your contributions, raising your chances of quickly merging or addressing your contributions.
There are a few simple guidelines that you need to follow before providing your hacks.
Bug Reports and Feature Requests
--------------------------------
If you encountered a problem when using NNI, or have an idea for a new feature, your feedbacks are always welcome. Here are some possible channels:
* `File an issue <https://github.com/microsoft/nni/issues/new/choose>`_ on GitHub.
* Open or participate in a `discussion <https://github.com/microsoft/nni/discussions>`_.
* Discuss on the NNI `Gitter <https://gitter.im/Microsoft/nni?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge>`_ in NNI.
* Join IM discussion groups:
.. list-table::
:widths: 50 50
:header-rows: 1
* - Gitter
- WeChat
* - .. image:: https://user-images.githubusercontent.com/39592018/80665738-e0574a80-8acc-11ea-91bc-0836dc4cbf89.png
- .. image:: https://github.com/scarlett2018/nniutil/raw/master/wechat.png
Looking for an existing issue
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Before you create a new issue, please do a search in `open issues <https://github.com/microsoft/nni/issues>`_ to see if the issue or feature request has already been filed.
Be sure to scan through the `most popular <https://github.com/microsoft/nni/issues?q=is%3Aopen+is%3Aissue+label%3AFAQ+sort%3Areactions-%2B1-desc>`_ feature requests.
If you find your issue already exists, make relevant comments and add your `reaction <https://github.com/blog/2119-add-reactions-to-pull-requests-issues-and-comments>`_. Use a reaction in place of a "+1" comment:
* 👍 - upvote
* 👎 - downvote
If you cannot find an existing issue that describes your bug or feature, create a new issue following the guidelines below.
Writing good bug reports or feature requests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* File a single issue per problem and feature request. Do not enumerate multiple bugs or feature requests in the same issue.
* Provide as much information as you think might relevant to the context (thinking the issue is assigning to you, what kinds of info you will need to debug it!!!). To give you a general idea about what kinds of info are useful for developers to dig out the issue, we had provided issue template for you.
* Once you had submitted an issue, be sure to follow it for questions and discussions.
* Once the bug is fixed or feature is addressed, be sure to close the issue.
Writing code
------------
There is always something more that is required, to make it easier to suit your use-cases.
Before starting to write code, we recommend checking for `issues <https://github.com/microsoft/nni/issues>`_ on GitHub or open a new issue to initiate a discussion. There could be cases where people are already working on a fix, or similar features have already been under discussion.
To contribute code, you first need to find the NNI code repo located on `GitHub <https://github.com/microsoft/nni>`_. Firstly, fork the repository under your own GitHub handle. After cloning the repository, add, commit, push and squash (if necessary) the changes with detailed commit messages to your fork. From where you can proceed to making a pull request. The pull request will then be reviewed by our core maintainers before merging into master branch. `Here <https://github.com/firstcontributions/first-contributions>`_ is a step-by-step guide for this process.
Contributions to NNI should follow our code of conduct. Please see details :ref:`here <code-of-conduct>`.
Find the code snippet that concerns you
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The NNI repository is large code-base. High-level speaking, it can be decomposed into several core parts:
* ``nni``: the core Python package that contains most features of hyper-parameter tuner, neural architecture search, model compression.
* ``ts``: contains ``nni_manager`` that manages experiments and training services, and ``webui`` for visualization.
* ``pipelines`` and ``test``: unit test and integration test, alongside their configurations.
See :doc:`./architecture_overview` if you are interested in details.
.. _get-started-dev:
Get started with development
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NNI development environment supports Ubuntu 1604 (or above), and Windows 10 with Python 3.7+ (documentation build requires Python 3.8+). We recommend using `conda <https://docs.conda.io/>`_ on Windows.
1. Fork the NNI's GitHub repository and clone the forked repository to your machine.
.. code-block:: bash
git clone https://github.com/<your_github_handle>/nni.git
2. Create a new working branch. Use any name you like.
.. code-block:: bash
cd nni
git checkout -b feature-xyz
3. Install NNI from source code if you need to modify the source code, and test it.
.. code-block:: bash
python3 -m pip install -U -r dependencies/setup.txt
python3 -m pip install -r dependencies/develop.txt
python3 setup.py develop
This installs NNI in `development mode <https://setuptools.readthedocs.io/en/latest/userguide/development_mode.html>`_,
so you don't need to reinstall it after edit.
4. Try to start an experiment to check if your environment is ready. For example, run the command
.. code-block:: bash
nnictl create --config examples/trials/mnist-pytorch/config.yml
And open WebUI to check if everything is OK. Or check the version of installed NNI,
.. code-block:: python
>>> import nni
>>> nni.__version__
'999.dev0'
.. note:: Please don't run test under the same folder where the NNI repository is located. As the repository is probably also called ``nni``, it could import the wrong ``nni`` package.
5. Write your code along with tests to verify whether the bug is fixed, or the feature works as expected.
6. Reload changes. For Python, nothing needs to be done, because the code is already linked to package folders. For TypeScript on Linux and MacOS,
* If ``ts/nni_manager`` is changed, run ``yarn watch`` under this folder. It will watch and build code continually. The ``nnictl`` need to be restarted to reload NNI manager.
* If ``ts/webui`` is changed, run ``yarn dev``\ , which will run a mock API server and a webpack dev server simultaneously. Use ``EXPERIMENT`` environment variable (e.g., ``mnist-tfv1-running``\ ) to specify the mock data being used. Built-in mock experiments are listed in ``src/webui/mock``. An example of the full command is ``EXPERIMENT=mnist-tfv1-running yarn dev``.
For TypeScript on Windows, currently you must rebuild TypeScript modules with `python3 setup.py build_ts` after edit.
7. Commit and push your changes, and submit your pull request!
Coding Tips
-----------
We expect all contributors to respect the following coding styles and naming conventions upon their contribution.
Python
^^^^^^
* We follow `PEP8 <https://www.python.org/dev/peps/pep-0008/>`__ for Python code and naming conventions, do try to adhere to the same when making a pull request. Our pull request has a mandatory code scan with ``pylint`` and ``flake8``.
.. note:: To scan your own code locally, run
.. code-block:: bash
python -m pylint --rcfile pylintrc nni
.. tip:: One can also take the help of auto-format tools such as `autopep8 <https://code.visualstudio.com/docs/python/editing#_formatting>`_, which will automatically resolve most of the styling issues.
* We recommend documenting all the methods and classes in your code. Follow `NumPy Docstring Style <https://numpydoc.readthedocs.io/en/latest/format.html>`__ for Python Docstring Conventions.
* For function docstring, **description**, **Parameters**, and **Returns** are mandatory.
* For class docstring, **description** is mandatory. Optionally **Parameters** and **Attributes**. The parameters of ``__init__`` should be documented in the docstring of class.
* For docstring to describe ``dict``, which is commonly used in our hyper-parameter format description, please refer to `Internal Guideline on Writing Standards <https://ribokit.github.io/docs/text/>`_.
.. tip:: Basically, you can use :ref:`ReStructuredText <restructuredtext-intro>` syntax in docstrings, without some exceptions. For example, custom headings are not allowed in docstrings.
TypeScript
^^^^^^^^^^
TypeScript code checks can be done with,
.. code-block:: bash
# for nni manager
cd ts/nni_manager
yarn eslint
# for webui
cd ts/webui
yarn sanity-check
Tests
-----
When a new feature is added or a bug is fixed, tests are highly recommended to make sure that the fix is effective or the feature won't break in future. There are two types of tests in NNI:
* Unit test (**UT**): each test targets at a specific class / function / module.
* Integration test (**IT**): each test is an end-to-end example / demo.
Unit test (Python)
^^^^^^^^^^^^^^^^^^
Python UT are located in ``test/ut/`` folder. We use `pytest <https://docs.pytest.org/>`_ to launch the tests, and the working directory is ``test/ut/``.
.. tip:: pytest can be used on a single file or a single test function.
.. code-block:: bash
pytest sdk/test_tuner.py
pytest sdk/test_tuner.py::test_tpe
Unit test (TypeScript)
^^^^^^^^^^^^^^^^^^^^^^
TypeScript UT are paired with TypeScript code. Use ``yarn test`` to run them.
Integration test
^^^^^^^^^^^^^^^^
The integration tests can be found in ``pipelines/`` folder.
The integration tests are run on Azure DevOps platform on a daily basis, in order to make sure that our examples and training service integrations work properly. However, for critical changes that have impacts on the core functionalities of NNI, we recommend to `trigger the pipeline on the pull request branch <https://stackoverflow.com/questions/60157818/azure-pipeline-run-build-on-pull-request-branch>`_.
The integration tests won't be automatically triggered on pull requests. You might need to contact the core developers to help you trigger the tests.
Documentation
-------------
Build and check documentation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Our documentation is located under ``docs/`` folder. The following command can be used to build the documentation.
.. code-block:: bash
cd docs
make html
.. note::
If you experience issues in building documentation, and see errors like:
* ``Could not import extension xxx (exception: No module named 'xxx')`` : please check your development environment and make sure dependencies have been properly installed: :ref:`get-started-dev`.
* ``unsupported pickle protocol: 5``: please upgrade to Python 3.8.
* ``autodoc: No module named 'xxx'``: some dependencies in ``dependencies/`` are not installed. In this case, documentation can be still mostly successfully built, but some API reference could be missing.
It's also highly recommended taking care of **every WARNING** during the build, which is very likely the signal of a **deadlink** and other annoying issues. Our code check will also make sure that the documentation build completes with no warning.
The built documentation can be found in ``docs/build/html`` folder.
.. attention:: Always use your web browser to check the documentation before committing your change.
.. tip:: `Live Server <https://github.com/ritwickdey/vscode-live-server>`_ is a great extension if you are looking for a static-files server to serve contents in ``docs/build/html``.
Writing new documents
^^^^^^^^^^^^^^^^^^^^^
.. |link_example| raw:: html
<code class="docutils literal notranslate">`Link text &lt;https://domain.invalid/&gt;`_</code>
.. |link_example_2| raw:: html
<code class="docutils literal notranslate">`Link text &lt;https://domain.invalid/&gt;`__</code>
.. |link_example_3| raw:: html
<code class="docutils literal notranslate">:doc:`./relative/to/my_doc`</code>
.. |githublink_example| raw:: html
<code class="docutils literal notranslate">:githublink:`path/to/file.ext`</code>
.. |githublink_example_2| raw:: html
<code class="docutils literal notranslate">:githublink:`text &lt;path/to/file.ext&gt;`</code>
.. _restructuredtext-intro:
`ReStructuredText <https://docutils.sourceforge.io/docs/user/rst/quickstart.html>`_ is our documentation language. Please find the reference of RST `here <https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html>`__.
.. tip:: Sphinx has `an excellent cheatsheet of rst <https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html>`_ which contains almost everything you might need to know to write a elegant document.
**Dealing with sections.** ``=`` for sections. ``-`` for subsections. ``^`` for subsubsections. ``"`` for paragraphs.
**Dealing with images.** Images should be put into ``docs/img`` folder. Then, reference the image in the document with relative links. For example, ``.. image:: ../../img/example.png``.
**Dealing with codes.** We recommend using ``.. code-block:: python`` to start a code block. The ``python`` here annotates the syntax highlighting.
**Dealing with links.** Use |link_example_3| for links to another doc (no suffix like ``.rst``). To reference a specific section, please use ``:ref:`` (see `Cross-referencing arbitrary locations <https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#cross-referencing-arbitrary-locations>`_). For general links that ``:doc:`` and ``:ref:`` can't handle, you can also use |link_example| for inline web links. Note that use one underline might cause `"duplicated target name" error <https://stackoverflow.com/questions/27420317/restructured-text-rst-http-links-underscore-vs-use>`_ when multiple targets share the same name. In that case, use double-underline to avoid the error: |link_example_2|.
Other than built-in directives provided by Sphinx, we also provide some custom directives:
* ``.. cardlinkitem::``: A tutorial card, useful in :doc:`../tutorials`.
* |githublink_example| or |githublink_example_2|: reference a file on the GitHub. Linked to the same commit id as where the documentation is built.
Writing new tutorials
^^^^^^^^^^^^^^^^^^^^^
Our tutorials are powered by `sphinx-gallery <https://sphinx-gallery.github.io/>`. Sphinx-gallery is an extension that builds an HTML gallery of examples from any set of Python scripts.
To contribute a new tutorial, here are the steps to follow:
1. Create a notebook styled python file. If you want it executed while inserted into documentation, save the file under ``examples/tutorials/``. If your tutorial contains other auxiliary scripts which are not intended to be included into documentation, save them under ``examples/tutorials/scripts/``.
.. tip:: The syntax to write a "notebook styled python file" is very simple. In essence, you only need to write a slightly well formatted python file. Here is a useful guide of `how to structure your Python scripts for Sphinx-Gallery <https://sphinx-gallery.github.io/stable/syntax.html>`_.
2. Put the tutorials into ``docs/source/tutorials.rst``. You should add it both in ``toctree`` (to make it appear in the sidebar content table), and ``cardlinkitem`` (to create a card link), and specify the appropriate ``header``, ``description``, ``link``, ``image``, ``background`` (for image) and ``tags``.
``link`` are the generated link, which is usually ``tutorials/<your_python_file_name>.html``. Some useful images can be found in ``docs/img/thumbnails``, but you can always use your own. Available background colors are: ``red``, ``pink``, ``purple``, ``deep-purple``, ``blue``, ``light-blue``, ``cyan``, ``teal``, ``green``, ``deep-orange``, ``brown``, ``indigo``.
In case you prefer to write your tutorial in jupyter, you can use `this script <https://gist.github.com/chsasank/7218ca16f8d022e02a9c0deb94a310fe>`_ to convert the notebook to python file. After conversion and addition to the project, please make sure the sections headings etc are in logical order.
3. Build the tutorials. Since some of the tutorials contain complex AutoML examples, it's very inefficient to build them over and over again. Therefore, we cache the built tutorials in ``docs/source/tutorials``, so that the unchanged tutorials won't be rebuilt. To trigger the build, run ``make html``. This will execute the tutorials and convert the scripts into HTML files. How long it takes depends on your tutorial. As ``make html`` is not very debug-friendly, we suggest making the script runnable by itself before using this building tool.
.. note::
Some useful HOW-TOs in writing new tutorials:
* `How to force rebuilding one tutorial <https://sphinx-gallery.github.io/stable/configuration.html#rerunning-stale-examples>`_.
* `How to add images to notebooks <https://sphinx-gallery.github.io/stable/configuration.html#adding-images-to-notebooks>`_.
* `How to reference a tutorial in documentation <https://sphinx-gallery.github.io/stable/advanced.html#cross-referencing>`_.
Chinese translation
^^^^^^^^^^^^^^^^^^^
We only maintain `a partial set of documents <https://github.com/microsoft/nni/issues/4298>`_ with Chinese translation. If you intend to contribute more, follow the steps:
1. Add a ``xxx_zh.rst`` in the same folder where ``xxx.rst`` exists.
2. Run ``python tools/chineselink.py`` under ``docs`` folder, to generate a hash string in your created ``xxx_zh.rst``.
3. Don't delete the hash string, add your translation after it.
In case you modify an English document with Chinese translation already exists, you also need to run ``python tools/chineselink.py`` first to update the hash string, and update the Chinese translation contents accordingly.
.. _code-of-conduct:
Code of Conduct
---------------
This project has adopted the `Microsoft Open Source Code of Conduct <https://opensource.microsoft.com/codeofconduct/>`_.
For more information see the `Code of Conduct FAQ <https://opensource.microsoft.com/codeofconduct/faq/>`_ or contact `opencode@microsoft.com <mailto:opencode@microsoft.com>`_ with any additional questions or comments.
Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
...@@ -4,8 +4,8 @@ import warnings ...@@ -4,8 +4,8 @@ import warnings
import torch import torch
import torch.nn as torch_nn import torch.nn as torch_nn
from torchvision.models.utils import load_state_dict_from_url
import torch.nn.functional as F import torch.nn.functional as F
from nni.retiarii import model_wrapper
import sys import sys
from pathlib import Path from pathlib import Path
...@@ -111,7 +111,7 @@ def _get_depths(depths, alpha): ...@@ -111,7 +111,7 @@ def _get_depths(depths, alpha):
rather than down. """ rather than down. """
return [_round_to_multiple_of(depth * alpha, 8) for depth in depths] return [_round_to_multiple_of(depth * alpha, 8) for depth in depths]
@model_wrapper
class MNASNet(nn.Module): class MNASNet(nn.Module):
""" MNASNet, as described in https://arxiv.org/pdf/1807.11626.pdf. This """ MNASNet, as described in https://arxiv.org/pdf/1807.11626.pdf. This
implements the B1 variant of the model. implements the B1 variant of the model.
...@@ -180,7 +180,7 @@ class MNASNet(nn.Module): ...@@ -180,7 +180,7 @@ class MNASNet(nn.Module):
nn.ReLU(inplace=True), nn.ReLU(inplace=True),
] ]
self.layers = nn.Sequential(*layers) self.layers = nn.Sequential(*layers)
self.classifier = nn.Sequential(nn.Dropout(p=dropout, inplace=True), self.classifier = nn.Sequential(nn.Dropout(p=dropout),
nn.Linear(1280, num_classes)) nn.Linear(1280, num_classes))
self._initialize_weights() self._initialize_weights()
#self.for_test = 10 #self.for_test = 10
......
...@@ -10,7 +10,7 @@ from torch.optim import Optimizer ...@@ -10,7 +10,7 @@ from torch.optim import Optimizer
from torch.optim.lr_scheduler import _LRScheduler from torch.optim.lr_scheduler import _LRScheduler
from nni.common.serializer import _trace_cls from nni.common.serializer import _trace_cls
from nni.common.serializer import Traceable from nni.common.serializer import Traceable, is_traceable
__all__ = ['OptimizerConstructHelper', 'LRSchedulerConstructHelper'] __all__ = ['OptimizerConstructHelper', 'LRSchedulerConstructHelper']
...@@ -80,14 +80,14 @@ class OptimizerConstructHelper(ConstructHelper): ...@@ -80,14 +80,14 @@ class OptimizerConstructHelper(ConstructHelper):
@staticmethod @staticmethod
def from_trace(model: Module, optimizer_trace: Traceable): def from_trace(model: Module, optimizer_trace: Traceable):
assert isinstance(optimizer_trace, Traceable), \ assert is_traceable(optimizer_trace), \
'Please use nni.trace to wrap the optimizer class before initialize the optimizer.' 'Please use nni.trace to wrap the optimizer class before initialize the optimizer.'
assert isinstance(optimizer_trace, Optimizer), \ assert isinstance(optimizer_trace, Optimizer), \
'It is not an instance of torch.nn.Optimizer.' 'It is not an instance of torch.nn.Optimizer.'
return OptimizerConstructHelper(model, return OptimizerConstructHelper(model,
optimizer_trace._get_nni_attr('symbol'), optimizer_trace.trace_symbol,
*optimizer_trace._get_nni_attr('args'), *optimizer_trace.trace_args,
**optimizer_trace._get_nni_attr('kwargs')) **optimizer_trace.trace_kwargs)
class LRSchedulerConstructHelper(ConstructHelper): class LRSchedulerConstructHelper(ConstructHelper):
...@@ -112,7 +112,7 @@ class LRSchedulerConstructHelper(ConstructHelper): ...@@ -112,7 +112,7 @@ class LRSchedulerConstructHelper(ConstructHelper):
@staticmethod @staticmethod
def from_trace(lr_scheduler_trace: Traceable): def from_trace(lr_scheduler_trace: Traceable):
assert isinstance(lr_scheduler_trace, Traceable), \ assert is_traceable(lr_scheduler_trace), \
'Please use nni.trace to wrap the lr scheduler class before initialize the scheduler.' 'Please use nni.trace to wrap the lr scheduler class before initialize the scheduler.'
assert isinstance(lr_scheduler_trace, _LRScheduler), \ assert isinstance(lr_scheduler_trace, _LRScheduler), \
'It is not an instance of torch.nn.lr_scheduler._LRScheduler.' 'It is not an instance of torch.nn.lr_scheduler._LRScheduler.'
......
...@@ -6,6 +6,7 @@ import functools ...@@ -6,6 +6,7 @@ import functools
import inspect import inspect
import numbers import numbers
import os import os
import sys
import types import types
import warnings import warnings
from io import IOBase from io import IOBase
...@@ -14,7 +15,7 @@ from typing import Any, Dict, List, Optional, TypeVar, Union ...@@ -14,7 +15,7 @@ from typing import Any, Dict, List, Optional, TypeVar, Union
import cloudpickle # use cloudpickle as backend for unserializable types and instances import cloudpickle # use cloudpickle as backend for unserializable types and instances
import json_tricks # use json_tricks as serializer backend import json_tricks # use json_tricks as serializer backend
__all__ = ['trace', 'dump', 'load', 'PayloadTooLarge', 'Translatable', 'Traceable', 'is_traceable'] __all__ = ['trace', 'dump', 'load', 'PayloadTooLarge', 'Translatable', 'Traceable', 'is_traceable', 'is_wrapped_with_trace']
T = TypeVar('T') T = TypeVar('T')
...@@ -24,46 +25,43 @@ class PayloadTooLarge(Exception): ...@@ -24,46 +25,43 @@ class PayloadTooLarge(Exception):
pass pass
class Traceable(abc.ABC): class Traceable:
""" """
A traceable object have copy and dict. Copy and mutate are used to copy the object for further mutations. A traceable object have copy and dict. Copy and mutate are used to copy the object for further mutations.
Dict returns a TraceDictType to enable serialization. Dict returns a TraceDictType to enable serialization.
""" """
@abc.abstractmethod
def trace_copy(self) -> 'Traceable': def trace_copy(self) -> 'Traceable':
""" """
Perform a shallow copy. Perform a shallow copy.
NOTE: NONE of the attributes will be preserved. NOTE: NONE of the attributes will be preserved.
This is the one that should be used when you want to "mutate" a serializable object. This is the one that should be used when you want to "mutate" a serializable object.
""" """
... raise NotImplementedError()
@property @property
@abc.abstractmethod
def trace_symbol(self) -> Any: def trace_symbol(self) -> Any:
""" """
Symbol object. Could be a class or a function. Symbol object. Could be a class or a function.
``get_hybrid_cls_or_func_name`` and ``import_cls_or_func_from_hybrid_name`` is a pair to ``get_hybrid_cls_or_func_name`` and ``import_cls_or_func_from_hybrid_name`` is a pair to
convert the symbol into a string and convert the string back to symbol. convert the symbol into a string and convert the string back to symbol.
""" """
... raise NotImplementedError()
@property @property
@abc.abstractmethod
def trace_args(self) -> List[Any]: def trace_args(self) -> List[Any]:
""" """
List of positional arguments passed to symbol. Usually empty if ``kw_only`` is true, List of positional arguments passed to symbol. Usually empty if ``kw_only`` is true,
in which case all the positional arguments are converted into keyword arguments. in which case all the positional arguments are converted into keyword arguments.
""" """
... raise NotImplementedError()
@property @property
@abc.abstractmethod
def trace_kwargs(self) -> Dict[str, Any]: def trace_kwargs(self) -> Dict[str, Any]:
""" """
Dict of keyword arguments. Dict of keyword arguments.
""" """
... raise NotImplementedError()
class Translatable(abc.ABC): class Translatable(abc.ABC):
...@@ -85,13 +83,27 @@ class Translatable(abc.ABC): ...@@ -85,13 +83,27 @@ class Translatable(abc.ABC):
def is_traceable(obj: Any) -> bool: def is_traceable(obj: Any) -> bool:
""" """
Check whether an object is a traceable instance (not type). Check whether an object is a traceable instance or type.
Note that an object is traceable only means that it implements the "Traceable" interface,
and the properties have been implemented. It doesn't necessary mean that its type is wrapped with trace,
because the properties could be added **after** the instance has been created.
""" """
return hasattr(obj, 'trace_copy') and \ return hasattr(obj, 'trace_copy') and \
hasattr(obj, 'trace_symbol') and \ hasattr(obj, 'trace_symbol') and \
hasattr(obj, 'trace_args') and \ hasattr(obj, 'trace_args') and \
hasattr(obj, 'trace_kwargs') and \ hasattr(obj, 'trace_kwargs')
not inspect.isclass(obj)
def is_wrapped_with_trace(cls_or_func: Any) -> bool:
"""
Check whether a function or class is already wrapped with ``@nni.trace``.
If a class or function is already wrapped with trace, then the created object must be "traceable".
"""
return getattr(cls_or_func, '_traced', False) and (
not hasattr(cls_or_func, '__dict__') or # in case it's a function
'_traced' in cls_or_func.__dict__ # must be in this class, super-class traced doesn't count
)
class SerializableObject(Traceable): class SerializableObject(Traceable):
...@@ -161,6 +173,15 @@ class SerializableObject(Traceable): ...@@ -161,6 +173,15 @@ class SerializableObject(Traceable):
def inject_trace_info(obj: Any, symbol: T, args: List[Any], kwargs: Dict[str, Any]) -> Any: def inject_trace_info(obj: Any, symbol: T, args: List[Any], kwargs: Dict[str, Any]) -> Any:
# If an object is already created, this can be a fix so that the necessary info are re-injected into the object. # If an object is already created, this can be a fix so that the necessary info are re-injected into the object.
# Make obj complying with the interface of traceable, though we cannot change its base class.
obj.__dict__.update(_nni_symbol=symbol, _nni_args=args, _nni_kwargs=kwargs)
return obj
def _make_class_traceable(cls: T, create_wrapper: bool = False) -> T:
# Make an already exist class traceable, without creating a new class.
# Should be used together with `inject_trace_info`.
def getter_factory(x): def getter_factory(x):
return lambda self: self.__dict__['_nni_' + x] return lambda self: self.__dict__['_nni_' + x]
...@@ -185,20 +206,18 @@ def inject_trace_info(obj: Any, symbol: T, args: List[Any], kwargs: Dict[str, An ...@@ -185,20 +206,18 @@ def inject_trace_info(obj: Any, symbol: T, args: List[Any], kwargs: Dict[str, An
'trace_copy': trace_copy 'trace_copy': trace_copy
} }
if hasattr(obj, '__class__') and hasattr(obj, '__dict__'): if not create_wrapper:
for name, method in attributes.items(): for name, method in attributes.items():
setattr(obj.__class__, name, method) setattr(cls, name, method)
return cls
else: else:
wrapper = type('wrapper', (Traceable, type(obj)), attributes) # sometimes create_wrapper is mandatory, e.g., for built-in types like list/int.
obj = wrapper(obj) # pylint: disable=abstract-class-instantiated # but I don't want to check here because it's unreliable.
wrapper = type('wrapper', (Traceable, cls), attributes)
# make obj complying with the interface of traceable, though we cannot change its base class return wrapper
obj.__dict__.update(_nni_symbol=symbol, _nni_args=args, _nni_kwargs=kwargs)
return obj
def trace(cls_or_func: T = None, *, kw_only: bool = True) -> Union[T, Traceable]: def trace(cls_or_func: T = None, *, kw_only: bool = True, inheritable: bool = False) -> Union[T, Traceable]:
""" """
Annotate a function or a class if you want to preserve where it comes from. Annotate a function or a class if you want to preserve where it comes from.
This is usually used in the following scenarios: This is usually used in the following scenarios:
...@@ -222,6 +241,9 @@ def trace(cls_or_func: T = None, *, kw_only: bool = True) -> Union[T, Traceable] ...@@ -222,6 +241,9 @@ def trace(cls_or_func: T = None, *, kw_only: bool = True) -> Union[T, Traceable]
list and types. This can be useful to extract semantics, but can be tricky in some corner cases. list and types. This can be useful to extract semantics, but can be tricky in some corner cases.
Therefore, in some cases, some positional arguments will still be kept. Therefore, in some cases, some positional arguments will still be kept.
If ``inheritable`` is true, the trace information from superclass will also be available in subclass.
This however, will make the subclass un-trace-able. Note that this argument has no effect when tracing functions.
.. warning:: .. warning::
Generators will be first expanded into a list, and the resulting list will be further passed into the wrapped function/class. Generators will be first expanded into a list, and the resulting list will be further passed into the wrapped function/class.
...@@ -245,10 +267,10 @@ def trace(cls_or_func: T = None, *, kw_only: bool = True) -> Union[T, Traceable] ...@@ -245,10 +267,10 @@ def trace(cls_or_func: T = None, *, kw_only: bool = True) -> Union[T, Traceable]
def wrap(cls_or_func): def wrap(cls_or_func):
# already annotated, do nothing # already annotated, do nothing
if getattr(cls_or_func, '_traced', False): if is_wrapped_with_trace(cls_or_func):
return cls_or_func return cls_or_func
if isinstance(cls_or_func, type): if isinstance(cls_or_func, type):
cls_or_func = _trace_cls(cls_or_func, kw_only) cls_or_func = _trace_cls(cls_or_func, kw_only, inheritable=inheritable)
elif _is_function(cls_or_func): elif _is_function(cls_or_func):
cls_or_func = _trace_func(cls_or_func, kw_only) cls_or_func = _trace_func(cls_or_func, kw_only)
else: else:
...@@ -361,11 +383,60 @@ def load(string: Optional[str] = None, *, fp: Optional[Any] = None, ignore_comme ...@@ -361,11 +383,60 @@ def load(string: Optional[str] = None, *, fp: Optional[Any] = None, ignore_comme
return json_tricks.load(fp, obj_pairs_hooks=hooks, **json_tricks_kwargs) return json_tricks.load(fp, obj_pairs_hooks=hooks, **json_tricks_kwargs)
def _trace_cls(base, kw_only, call_super=True): def _trace_cls(base, kw_only, call_super=True, inheritable=False):
# the implementation to trace a class is to store a copy of init arguments # the implementation to trace a class is to store a copy of init arguments
# this won't support class that defines a customized new but should work for most cases # this won't support class that defines a customized new but should work for most cases
class wrapper(SerializableObject, base): if sys.platform != 'linux':
if not call_super:
raise ValueError("'call_super' is mandatory to be set true on non-linux platform")
try:
# In non-linux envs, dynamically creating new classes doesn't work with pickle.
# We have to replace the ``__init__`` with a new ``__init__``.
# This, however, causes side-effects where the replacement is not intended.
# This also doesn't work built-in types (e.g., OrderedDict), and the replacement
# won't be effective any more if ``nni.trace`` is called in-place (e.g., ``nni.trace(nn.Conv2d)(...)``).
original_init = base.__init__
# Makes the new init have the exact same signature as the old one,
# so as to make pytorch-lightning happy.
# https://github.com/PyTorchLightning/pytorch-lightning/blob/4cc05b2cf98e49168a5f5dc265647d75d1d3aae9/pytorch_lightning/utilities/parsing.py#L143
@functools.wraps(original_init)
def new_init(self, *args, **kwargs):
args, kwargs = _formulate_arguments(original_init, args, kwargs, kw_only, is_class_init=True)
original_init(
self,
*[_argument_processor(arg) for arg in args],
**{kw: _argument_processor(arg) for kw, arg in kwargs.items()}
)
inject_trace_info(self, base, args, kwargs)
base.__init__ = new_init
base = _make_class_traceable(base)
return base
except TypeError:
warnings.warn("In-place __init__ replacement failed in `@nni.trace`, probably because the type is a built-in/extension type, "
"and it's __init__ can't be replaced. `@nni.trace` is now falling back to the 'inheritance' approach. "
"However, this could cause issues when using pickle. See https://github.com/microsoft/nni/issues/4434",
RuntimeWarning)
# This is trying to solve the case where superclass and subclass are both decorated with @nni.trace.
# We use a metaclass to "unwrap" the superclass.
# However, this doesn't work if:
# 1. Base class already has a customized metaclass. We will raise error in that class.
# 2. SerializableObject in ancester (instead of parent). I think this case is rare and I didn't handle this case yet. FIXME
if type(base) is type and not inheritable:
metaclass = _unwrap_metaclass
else:
metaclass = type
if SerializableObject in inspect.getmro(base):
raise TypeError(f"{base} has a superclass already decorated with trace, and it's using a customized metaclass {type(base)}. "
"Please either use the default metaclass, or remove trace from the super-class.")
class wrapper(SerializableObject, base, metaclass=metaclass):
def __init__(self, *args, **kwargs): def __init__(self, *args, **kwargs):
# store a copy of initial parameters # store a copy of initial parameters
args, kwargs = _formulate_arguments(base.__init__, args, kwargs, kw_only, is_class_init=True) args, kwargs = _formulate_arguments(base.__init__, args, kwargs, kw_only, is_class_init=True)
...@@ -373,6 +444,32 @@ def _trace_cls(base, kw_only, call_super=True): ...@@ -373,6 +444,32 @@ def _trace_cls(base, kw_only, call_super=True):
# calling serializable object init to initialize the full object # calling serializable object init to initialize the full object
super().__init__(symbol=base, args=args, kwargs=kwargs, call_super=call_super) super().__init__(symbol=base, args=args, kwargs=kwargs, call_super=call_super)
def __reduce__(self):
# The issue that decorator and pickler doesn't play well together is well known.
# The workaround solution is to use a fool class (_pickling_object) which pretends to be the pickled object.
# We then put the original type, as well as args and kwargs in its `__new__` argument.
# I suspect that their could still be problems when things get complex,
# e.g., the wrapped class has a custom pickling (`__reduce__``) or `__new__`.
# But it can't be worse because the previous pickle doesn't work at all.
#
# Linked issue: https://github.com/microsoft/nni/issues/4434
# SO: https://stackoverflow.com/questions/52185507/pickle-and-decorated-classes-picklingerror-not-the-same-object
# Store the inner class. The wrapped class couldn't be properly pickled.
type_ = cloudpickle.dumps(type(self).__wrapped__)
# in case they have customized ``__getstate__``.
if hasattr(self, '__getstate__'):
obj_ = self.__getstate__()
else:
obj_ = self.__dict__
# Pickle can't handle type objects.
if '_nni_symbol' in obj_:
obj_['_nni_symbol'] = cloudpickle.dumps(obj_['_nni_symbol'])
return _pickling_object, (type_, kw_only, obj_)
_copy_class_wrapper_attributes(base, wrapper) _copy_class_wrapper_attributes(base, wrapper)
return wrapper return wrapper
...@@ -399,6 +496,8 @@ def _trace_func(func, kw_only): ...@@ -399,6 +496,8 @@ def _trace_func(func, kw_only):
elif hasattr(res, '__class__') and hasattr(res, '__dict__'): elif hasattr(res, '__class__') and hasattr(res, '__dict__'):
# is a class, inject interface directly # is a class, inject interface directly
# need to be done before primitive types because there could be inheritance here. # need to be done before primitive types because there could be inheritance here.
if not getattr(type(res), '_traced', False):
_make_class_traceable(type(res), False) # in-place
res = inject_trace_info(res, func, args, kwargs) res = inject_trace_info(res, func, args, kwargs)
elif isinstance(res, (collections.abc.Callable, types.ModuleType, IOBase)): elif isinstance(res, (collections.abc.Callable, types.ModuleType, IOBase)):
raise TypeError(f'Try to add trace info to {res}, but functions and modules are not supported.') raise TypeError(f'Try to add trace info to {res}, but functions and modules are not supported.')
...@@ -408,6 +507,8 @@ def _trace_func(func, kw_only): ...@@ -408,6 +507,8 @@ def _trace_func(func, kw_only):
# will be directly captured by python json encoder # will be directly captured by python json encoder
# and thus not possible to restore the trace parameters after dump and reload. # and thus not possible to restore the trace parameters after dump and reload.
# this is a known limitation. # this is a known limitation.
new_type = _make_class_traceable(type(res), True)
res = new_type(res) # re-creating the object
res = inject_trace_info(res, func, args, kwargs) res = inject_trace_info(res, func, args, kwargs)
else: else:
raise TypeError(f'Try to add trace info to {res}, but the type "{type(res)}" is unknown. ' raise TypeError(f'Try to add trace info to {res}, but the type "{type(res)}" is unknown. '
...@@ -433,6 +534,48 @@ def _copy_class_wrapper_attributes(base, wrapper): ...@@ -433,6 +534,48 @@ def _copy_class_wrapper_attributes(base, wrapper):
wrapper.__wrapped__ = base wrapper.__wrapped__ = base
class _unwrap_metaclass(type):
# When a subclass is created, it detects whether the super-class is already annotated with @nni.trace.
# If yes, it gets the ``__wrapped__`` inner class, so that it doesn't inherit SerializableObject twice.
# Note that this doesn't work when metaclass is already defined (such as ABCMeta). We give up in that case.
def __new__(cls, name, bases, dct):
bases = tuple([getattr(base, '__wrapped__', base) for base in bases])
return super().__new__(cls, name, bases, dct)
# Using a customized "bases" breaks default isinstance and issubclass.
# We recover this by overriding the subclass and isinstance behavior, which conerns wrapped class only.
def __subclasscheck__(cls, subclass):
inner_cls = getattr(cls, '__wrapped__', cls)
return inner_cls in inspect.getmro(subclass)
def __instancecheck__(cls, instance):
inner_cls = getattr(cls, '__wrapped__', cls)
return inner_cls in inspect.getmro(type(instance))
class _pickling_object:
# Need `cloudpickle.load` on the callable because the callable is pickled with cloudpickle.
# Used in `_trace_cls`.
def __new__(cls, type_, kw_only, data):
type_ = cloudpickle.loads(type_)
# Restore the trace type
type_ = _trace_cls(type_, kw_only)
# restore type
if '_nni_symbol' in data:
data['_nni_symbol'] = cloudpickle.loads(data['_nni_symbol'])
# https://docs.python.org/3/library/pickle.html#pickling-class-instances
obj = type_.__new__(type_)
if hasattr(obj, '__setstate__'):
obj.__setstate__(data)
else:
obj.__dict__.update(data)
return obj
def _argument_processor(arg): def _argument_processor(arg):
# 1) translate # 1) translate
# handle cases like ValueChoice # handle cases like ValueChoice
...@@ -541,7 +684,9 @@ def _import_cls_or_func_from_name(target: str) -> Any: ...@@ -541,7 +684,9 @@ def _import_cls_or_func_from_name(target: str) -> Any:
def _strip_trace_type(traceable: Any) -> Any: def _strip_trace_type(traceable: Any) -> Any:
if getattr(traceable, '_traced', False): if getattr(traceable, '_traced', False):
return traceable.__wrapped__ # sometimes, ``__wrapped__`` could be unavailable (e.g., with `inject_trace_info`)
# need to have a default value
return getattr(traceable, '__wrapped__', traceable)
return traceable return traceable
...@@ -606,7 +751,7 @@ def _json_tricks_serializable_object_encode(obj: Any, primitives: bool = False, ...@@ -606,7 +751,7 @@ def _json_tricks_serializable_object_encode(obj: Any, primitives: bool = False,
# Encodes a serializable object instance to json. # Encodes a serializable object instance to json.
# do nothing to instance that is not a serializable object and do not use trace # do nothing to instance that is not a serializable object and do not use trace
if not use_trace or not is_traceable(obj): if not (use_trace and hasattr(obj, '__class__') and is_traceable(type(obj))):
return obj return obj
if isinstance(obj.trace_symbol, property): if isinstance(obj.trace_symbol, property):
......
...@@ -41,7 +41,8 @@ replace_module = { ...@@ -41,7 +41,8 @@ replace_module = {
'Dropout3d': lambda module, masks: no_replace(module, masks), 'Dropout3d': lambda module, masks: no_replace(module, masks),
'Upsample': lambda module, masks: no_replace(module, masks), 'Upsample': lambda module, masks: no_replace(module, masks),
'LayerNorm': lambda module, masks: replace_layernorm(module, masks), 'LayerNorm': lambda module, masks: replace_layernorm(module, masks),
'ConvTranspose2d': lambda module, masks: replace_convtranspose2d(module, masks) 'ConvTranspose2d': lambda module, masks: replace_convtranspose2d(module, masks),
'Flatten': lambda module, masks: no_replace(module, masks)
} }
......
...@@ -171,10 +171,14 @@ class AutoMaskInference: ...@@ -171,10 +171,14 @@ class AutoMaskInference:
# apply the input mask # apply the input mask
for tid, in_tensor in enumerate(self.dummy_input): for tid, in_tensor in enumerate(self.dummy_input):
if isinstance(in_tensor, torch.Tensor) and self.in_masks[tid] is not None: if isinstance(in_tensor, torch.Tensor) and self.in_masks[tid] is not None:
# in_tensor.data = in_tensor.data * \
# self.in_masks[tid] + \
# (1-self.in_masks[tid]) * self.in_constants[tid]
# issue-4540 when two tensors are multiplied, the constants part make
# the propagation weaker, and lead to shape misaligment. Currently, we
# donnot support the constant folding, so, we just remove the constant here
in_tensor.data = in_tensor.data * \ in_tensor.data = in_tensor.data * \
self.in_masks[tid] + \ self.in_masks[tid]
(1-self.in_masks[tid]) * self.in_constants[tid]
def __apply_weight_mask(self): def __apply_weight_mask(self):
""" """
......
...@@ -165,7 +165,13 @@ class ChannelDependency(Dependency): ...@@ -165,7 +165,13 @@ class ChannelDependency(Dependency):
parent_layers = [] parent_layers = []
# find the node that contains aten::add # find the node that contains aten::add
# or aten::cat operations # or aten::cat operations
if node.op_type in ADD_TYPES: if node.op_type in ADD_TYPES or node.op_type in MUL_TYPES:
# refer issue 4540 for more details. Multiplication actually
# will not introduce the channel dependency, cause the misaligned
# channels can propagate to each other. However, when one of the input
# tensor is from skip connection(residual), the channel propagation
# may be failed(the input is also used by another layer and cannot be
# pruned), in this case, we need to fix the conflict maunally.
parent_layers = self._get_parent_layers(node) parent_layers = self._get_parent_layers(node)
elif node.op_type == CAT_TYPE: elif node.op_type == CAT_TYPE:
# To determine if this cat operation will introduce channel # To determine if this cat operation will introduce channel
......
...@@ -79,7 +79,7 @@ class Experiment: ...@@ -79,7 +79,7 @@ class Experiment:
self.id: str = management.generate_experiment_id() self.id: str = management.generate_experiment_id()
self.port: Optional[int] = None self.port: Optional[int] = None
self._proc: Optional[Popen] = None self._proc: Optional[Popen] = None
self.mode = 'new' self.action = 'create'
self.url_prefix: Optional[str] = None self.url_prefix: Optional[str] = None
if isinstance(config_or_platform, (str, list)): if isinstance(config_or_platform, (str, list)):
...@@ -114,7 +114,7 @@ class Experiment: ...@@ -114,7 +114,7 @@ class Experiment:
log_dir = Path.home() / f'nni-experiments/{self.id}/log' log_dir = Path.home() / f'nni-experiments/{self.id}/log'
nni.runtime.log.start_experiment_log(self.id, log_dir, debug) nni.runtime.log.start_experiment_log(self.id, log_dir, debug)
self._proc = launcher.start_experiment(self.mode, self.id, config, port, debug, run_mode, self.url_prefix) self._proc = launcher.start_experiment(self.action, self.id, config, port, debug, run_mode, self.url_prefix)
assert self._proc is not None assert self._proc is not None
self.port = port # port will be None if start up failed self.port = port # port will be None if start up failed
...@@ -247,7 +247,7 @@ class Experiment: ...@@ -247,7 +247,7 @@ class Experiment:
def _resume(exp_id, exp_dir=None): def _resume(exp_id, exp_dir=None):
exp = Experiment(None) exp = Experiment(None)
exp.id = exp_id exp.id = exp_id
exp.mode = 'resume' exp.action = 'resume'
exp.config = launcher.get_stopped_experiment_config(exp_id, exp_dir) exp.config = launcher.get_stopped_experiment_config(exp_id, exp_dir)
return exp return exp
...@@ -255,7 +255,7 @@ class Experiment: ...@@ -255,7 +255,7 @@ class Experiment:
def _view(exp_id, exp_dir=None): def _view(exp_id, exp_dir=None):
exp = Experiment(None) exp = Experiment(None)
exp.id = exp_id exp.id = exp_id
exp.mode = 'view' exp.action = 'view'
exp.config = launcher.get_stopped_experiment_config(exp_id, exp_dir) exp.config = launcher.get_stopped_experiment_config(exp_id, exp_dir)
return exp return exp
......
...@@ -27,23 +27,27 @@ _logger = logging.getLogger('nni.experiment') ...@@ -27,23 +27,27 @@ _logger = logging.getLogger('nni.experiment')
@dataclass(init=False) @dataclass(init=False)
class NniManagerArgs: class NniManagerArgs:
# argv sent to "ts/nni_manager/main.js"
port: int port: int
experiment_id: int experiment_id: int
start_mode: str # new or resume action: str # 'new', 'resume', 'view'
mode: str # training service platform mode: str # training service platform, to be removed
log_dir: str experiments_directory: str # renamed "config.nni_experiments_directory", must be absolute
log_level: str log_level: str
readonly: bool = False
foreground: bool = False foreground: bool = False
url_prefix: Optional[str] = None url_prefix: Optional[str] = None # leading and trailing "/" must be stripped
dispatcher_pipe: Optional[str] = None dispatcher_pipe: Optional[str] = None
def __init__(self, action, exp_id, config, port, debug, foreground, url_prefix): def __init__(self, action, exp_id, config, port, debug, foreground, url_prefix):
self.port = port self.port = port
self.experiment_id = exp_id self.experiment_id = exp_id
self.action = action
self.foreground = foreground self.foreground = foreground
self.url_prefix = url_prefix self.url_prefix = url_prefix
self.log_dir = config.experiment_working_directory # config field name "experiment_working_directory" is a mistake
# see "ts/nni_manager/common/globals/arguments.ts" for details
self.experiments_directory = config.experiment_working_directory
if isinstance(config.training_service, list): if isinstance(config.training_service, list):
self.mode = 'hybrid' self.mode = 'hybrid'
...@@ -54,20 +58,14 @@ class NniManagerArgs: ...@@ -54,20 +58,14 @@ class NniManagerArgs:
if debug and self.log_level not in ['debug', 'trace']: if debug and self.log_level not in ['debug', 'trace']:
self.log_level = 'debug' self.log_level = 'debug'
if action == 'resume':
self.start_mode = 'resume'
elif action == 'view':
self.start_mode = 'resume'
self.readonly = True
else:
self.start_mode = 'new'
def to_command_line_args(self): def to_command_line_args(self):
# reformat fields to meet yargs library's format
# see "ts/nni_manager/common/globals/arguments.ts" for details
ret = [] ret = []
for field in fields(self): for field in fields(self):
value = getattr(self, field.name) value = getattr(self, field.name)
if value is not None: if value is not None:
ret.append('--' + field.name) ret.append('--' + field.name.replace('_', '-'))
if isinstance(value, bool): if isinstance(value, bool):
ret.append(str(value).lower()) ret.append(str(value).lower())
else: else:
...@@ -76,6 +74,8 @@ class NniManagerArgs: ...@@ -76,6 +74,8 @@ class NniManagerArgs:
def start_experiment(action, exp_id, config, port, debug, run_mode, url_prefix): def start_experiment(action, exp_id, config, port, debug, run_mode, url_prefix):
foreground = run_mode.value == 'foreground' foreground = run_mode.value == 'foreground'
if url_prefix is not None:
url_prefix = url_prefix.strip('/')
nni_manager_args = NniManagerArgs(action, exp_id, config, port, debug, foreground, url_prefix) nni_manager_args = NniManagerArgs(action, exp_id, config, port, debug, foreground, url_prefix)
_ensure_port_idle(port) _ensure_port_idle(port)
...@@ -135,7 +135,7 @@ def _start_rest_server(nni_manager_args, run_mode) -> Tuple[int, Popen]: ...@@ -135,7 +135,7 @@ def _start_rest_server(nni_manager_args, run_mode) -> Tuple[int, Popen]:
cmd += nni_manager_args.to_command_line_args() cmd += nni_manager_args.to_command_line_args()
if run_mode.value == 'detach': if run_mode.value == 'detach':
log = Path(nni_manager_args.log_dir, nni_manager_args.experiment_id, 'log') log = Path(nni_manager_args.experiments_directory, nni_manager_args.experiment_id, 'log')
out = (log / 'nnictl_stdout.log').open('a') out = (log / 'nnictl_stdout.log').open('a')
err = (log / 'nnictl_stderr.log').open('a') err = (log / 'nnictl_stderr.log').open('a')
header = f'Experiment {nni_manager_args.experiment_id} start: {datetime.now()}' header = f'Experiment {nni_manager_args.experiment_id} start: {datetime.now()}'
...@@ -201,7 +201,7 @@ def _ensure_port_idle(port: int, message: Optional[str] = None) -> None: ...@@ -201,7 +201,7 @@ def _ensure_port_idle(port: int, message: Optional[str] = None) -> None:
def _start_rest_server_retiarii(config: ExperimentConfig, port: int, debug: bool, experiment_id: str, def _start_rest_server_retiarii(config: ExperimentConfig, port: int, debug: bool, experiment_id: str,
pipe_path: str = None, mode: str = 'new') -> Tuple[int, Popen]: pipe_path: str, mode: str = 'create') -> Tuple[int, Popen]:
if isinstance(config.training_service, list): if isinstance(config.training_service, list):
ts = 'hybrid' ts = 'hybrid'
else: else:
...@@ -213,24 +213,20 @@ def _start_rest_server_retiarii(config: ExperimentConfig, port: int, debug: bool ...@@ -213,24 +213,20 @@ def _start_rest_server_retiarii(config: ExperimentConfig, port: int, debug: bool
'port': port, 'port': port,
'mode': ts, 'mode': ts,
'experiment_id': experiment_id, 'experiment_id': experiment_id,
'start_mode': mode, 'action': mode,
'log_dir': config.experiment_working_directory, 'experiments_directory': config.experiment_working_directory,
'log_level': 'debug' if debug else 'info' 'log_level': 'debug' if debug else 'info'
} }
if pipe_path is not None: if pipe_path is not None:
args['dispatcher_pipe'] = pipe_path args['dispatcher_pipe'] = pipe_path
if mode == 'view':
args['start_mode'] = 'resume'
args['readonly'] = 'true'
import nni_node import nni_node
node_dir = Path(nni_node.__path__[0]) node_dir = Path(nni_node.__path__[0])
node = str(node_dir / ('node.exe' if sys.platform == 'win32' else 'node')) node = str(node_dir / ('node.exe' if sys.platform == 'win32' else 'node'))
main_js = str(node_dir / 'main.js') main_js = str(node_dir / 'main.js')
cmd = [node, '--max-old-space-size=4096', main_js] cmd = [node, '--max-old-space-size=4096', main_js]
for arg_key, arg_value in args.items(): for arg_key, arg_value in args.items():
cmd.append('--' + arg_key) cmd.append('--' + arg_key.replace('_', '-'))
cmd.append(str(arg_value)) cmd.append(str(arg_value))
if sys.platform == 'win32': if sys.platform == 'win32':
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment