"vscode:/vscode.git/clone" did not exist on "c7c0b57541bec67bcf0b9bf2fde935375fc66380"
Unverified Commit f9ea49ff authored by Yuge Zhang's avatar Yuge Zhang Committed by GitHub
Browse files

One-shot documentation update (#4880)

parent ade0b5b1
:orphan:
One-shot Strategy (legacy)
==========================
.. warning:: This page will be removed in future releases.
.. _darts-strategy:
DARTS
-----
The paper `DARTS: Differentiable Architecture Search <https://arxiv.org/abs/1806.09055>`__ addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Their method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent.
Authors' code optimizes the network weights and architecture weights alternatively in mini-batches. They further explore the possibility that uses second order optimization (unroll) instead of first order, to improve the performance.
Implementation on NNI is based on the `official implementation <https://github.com/quark0/darts>`__ and a `popular 3rd-party repo <https://github.com/khanrc/pt.darts>`__. DARTS on NNI is designed to be general for arbitrary search space. A CNN search space tailored for CIFAR10, same as the original paper, is implemented as a use case of DARTS.
.. autoclass:: nni.retiarii.oneshot.pytorch.DartsTrainer
Reproduction Results
^^^^^^^^^^^^^^^^^^^^
The above-mentioned example is meant to reproduce the results in the paper, we do experiments with first and second order optimization. Due to the time limit, we retrain *only the best architecture* derived from the search phase and we repeat the experiment *only once*. Our results is currently on par with the results reported in paper. We will add more results later when ready.
.. list-table::
:header-rows: 1
:widths: auto
* -
- In paper
- Reproduction
* - First order (CIFAR10)
- 3.00 +/- 0.14
- 2.78
* - Second order (CIFAR10)
- 2.76 +/- 0.09
- 2.80
Examples
^^^^^^^^
:githublink:`Example code <examples/nas/oneshot/darts>`
.. code-block:: bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/oneshot/darts
python3 search.py
# train the best architecture
python3 retrain.py --arc-checkpoint ./checkpoints/epoch_49.json
Limitations
^^^^^^^^^^^
* DARTS doesn't support DataParallel and needs to be customized in order to support DistributedDataParallel.
.. _enas-strategy:
ENAS
----
The paper `Efficient Neural Architecture Search via Parameter Sharing <https://arxiv.org/abs/1802.03268>`__ uses parameter sharing between child models to accelerate the NAS process. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss.
Implementation on NNI is based on the `official implementation in Tensorflow <https://github.com/melodyguan/enas>`__, including a general-purpose Reinforcement-learning controller and a trainer that trains target network and this controller alternatively. Following paper, we have also implemented macro and micro search space on CIFAR10 to demonstrate how to use these trainers. Since code to train from scratch on NNI is not ready yet, reproduction results are currently unavailable.
.. autoclass:: nni.retiarii.oneshot.pytorch.EnasTrainer
Examples
^^^^^^^^
:githublink:`Example code <examples/nas/oneshot/enas>`
.. code-block:: bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/oneshot/enas
# search in macro search space
python3 search.py --search-for macro
# search in micro search space
python3 search.py --search-for micro
# view more options for search
python3 search.py -h
.. _fbnet-strategy:
FBNet
-----
.. note:: This one-shot NAS is still implemented under NNI NAS 1.0, and will `be migrated to Retiarii framework in near future <https://github.com/microsoft/nni/issues/3814>`__.
For the mobile application of facial landmark, based on the basic architecture of PFLD model, we have applied the FBNet (Block-wise DNAS) to design an concise model with the trade-off between latency and accuracy. References are listed as below:
* `FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search <https://arxiv.org/abs/1812.03443>`__
* `PFLD: A Practical Facial Landmark Detector <https://arxiv.org/abs/1902.10859>`__
FBNet is a block-wise differentiable NAS method (Block-wise DNAS), where the best candidate building blocks can be chosen by using Gumbel Softmax random sampling and differentiable training. At each layer (or stage) to be searched, the diverse candidate blocks are side by side planned (just like the effectiveness of structural re-parameterization), leading to sufficient pre-training of the supernet. The pre-trained supernet is further sampled for finetuning of the subnet, to achieve better performance.
.. image:: ../../img/fbnet.png
:width: 800
:align: center
PFLD is a lightweight facial landmark model for realtime application. The architecture of PLFD is firstly simplified for acceleration, by using the stem block of PeleeNet, average pooling with depthwise convolution and eSE module.
To achieve better trade-off between latency and accuracy, the FBNet is further applied on the simplified PFLD for searching the best block at each specific layer. The search space is based on the FBNet space, and optimized for mobile deployment by using the average pooling with depthwise convolution and eSE module etc.
Experiments
^^^^^^^^^^^
To verify the effectiveness of FBNet applied on PFLD, we choose the open source dataset with 106 landmark points as the benchmark:
* `Grand Challenge of 106-Point Facial Landmark Localization <https://arxiv.org/abs/1905.03469>`__
The baseline model is denoted as MobileNet-V3 PFLD (`Reference baseline <https://github.com/Hsintao/pfld_106_face_landmarks>`__), and the searched model is denoted as Subnet. The experimental results are listed as below, where the latency is tested on Qualcomm 625 CPU (ARMv8):
.. list-table::
:header-rows: 1
:widths: auto
* - Model
- Size
- Latency
- Validation NME
* - MobileNet-V3 PFLD
- 1.01MB
- 10ms
- 6.22%
* - Subnet
- 693KB
- 1.60ms
- 5.58%
Example
^^^^^^^
`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/oneshot/pfld>`__
Please run the following scripts at the example directory.
The Python dependencies used here are listed as below:
.. code-block:: bash
numpy==1.18.5
opencv-python==4.5.1.48
torch==1.6.0
torchvision==0.7.0
onnx==1.8.1
onnx-simplifier==0.3.5
onnxruntime==1.7.0
To run the tutorial, follow the steps below:
1. **Data Preparation**: Firstly, you should download the dataset `106points dataset <https://drive.google.com/file/d/1I7QdnLxAlyG2Tq3L66QYzGhiBEoVfzKo/view?usp=sharing>`__ to the path ``./data/106points`` . The dataset includes the train-set and test-set:
.. code-block:: bash
./data/106points/train_data/imgs
./data/106points/train_data/list.txt
./data/106points/test_data/imgs
./data/106points/test_data/list.txt
2. **Search**: Based on the architecture of simplified PFLD, the setting of multi-stage search space and hyper-parameters for searching should be firstly configured to construct the supernet. For example,
.. code-block::
from lib.builder import search_space
from lib.ops import PRIMITIVES
from lib.supernet import PFLDInference, AuxiliaryNet
from nni.algorithms.nas.pytorch.fbnet import LookUpTable, NASConfig,
# configuration of hyper-parameters
# search_space defines the multi-stage search space
nas_config = NASConfig(
model_dir=^./ckpt_save^,
nas_lr=0.01,
mode=^mul^,
alpha=0.25,
beta=0.6,
search_space=search_space,
)
# lookup table to manage the information
lookup_table = LookUpTable(config=nas_config, primitives=PRIMITIVES)
# created supernet
pfld_backbone = PFLDInference(lookup_table)
After creation of the supernet with the specification of search space and hyper-parameters, we can run below command to start searching and training of the supernet:
.. code-block:: bash
python train.py --dev_id ^0,1^ --snapshot ^./ckpt_save^ --data_root ^./data/106points^
The validation accuracy will be shown during training, and the model with best accuracy will be saved as ``./ckpt_save/supernet/checkpoint_best.pth``.
3. **Finetune**: After pre-training of the supernet, we can run below command to sample the subnet and conduct the finetuning:
.. code-block:: bash
python retrain.py --dev_id ^0,1^ --snapshot ^./ckpt_save^ --data_root ^./data/106points^ \
--supernet ^./ckpt_save/supernet/checkpoint_best.pth^
The validation accuracy will be shown during training, and the model with best accuracy will be saved as ``./ckpt_save/subnet/checkpoint_best.pth``.
4. **Export**: After the finetuning of subnet, we can run below command to export the ONNX model:
.. code-block:: bash
python export.py --supernet ^./ckpt_save/supernet/checkpoint_best.pth^ \
--resume ^./ckpt_save/subnet/checkpoint_best.pth^
ONNX model is saved as ``./output/subnet.onnx``, which can be further converted to the mobile inference engine by using `MNN <https://github.com/alibaba/MNN>`__ .
The checkpoints of pre-trained supernet and subnet are offered as below:
* `Supernet <https://drive.google.com/file/d/1TCuWKq8u4_BQ84BWbHSCZ45N3JGB9kFJ/view?usp=sharing>`__
* `Subnet <https://drive.google.com/file/d/160rkuwB7y7qlBZNM3W_T53cb6MQIYHIE/view?usp=sharing>`__
* `ONNX model <https://drive.google.com/file/d/1s-v-aOiMv0cqBspPVF3vSGujTbn_T_Uo/view?usp=sharing>`__
.. _spos-strategy:
SPOS
----
Proposed in `Single Path One-Shot Neural Architecture Search with Uniform Sampling <https://arxiv.org/abs/1904.00420>`__ is a one-shot NAS method that addresses the difficulties in training One-Shot NAS models by constructing a simplified supernet trained with an uniform path sampling method, so that all underlying architectures (and their weights) get trained fully and equally. An evolutionary algorithm is then applied to efficiently search for the best-performing architectures without any fine tuning.
Implementation on NNI is based on `official repo <https://github.com/megvii-model/SinglePathOneShot>`__. We implement a trainer that trains the supernet and a evolution tuner that leverages the power of NNI framework that speeds up the evolutionary search phase.
.. autoclass:: nni.retiarii.oneshot.pytorch.SinglePathTrainer
Examples
^^^^^^^^
Here is a use case, which is the search space in paper. However, we applied latency limit instead of flops limit to perform the architecture search phase.
:githublink:`Example code <examples/nas/oneshot/spos>`
**Requirements:** Prepare ImageNet in the standard format (follow the script `here <https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4>`__). Linking it to ``data/imagenet`` will be more convenient. Download the checkpoint file from `here <https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN>`__ (maintained by `Megvii <https://github.com/megvii-model>`__) if you don't want to retrain the supernet. Put ``checkpoint-150000.pth.tar`` under ``data`` directory. After preparation, it's expected to have the following code structure:
.. code-block:: bash
spos
├── architecture_final.json
├── blocks.py
├── data
│ ├── imagenet
│ │ ├── train
│ │ └── val
│ └── checkpoint-150000.pth.tar
├── network.py
├── readme.md
├── supernet.py
├── evaluation.py
├── search.py
└── utils.py
Then follow the 3 steps:
1. **Train Supernet**:
.. code-block:: bash
python supernet.py
This will export the checkpoint to ``checkpoints`` directory, for the next step.
.. note:: The data loading used in the official repo is `slightly different from usual <https://github.com/megvii-model/SinglePathOneShot/issues/5>`__, as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option ``--spos-preprocessing`` will simulate the behavior used originally and enable you to use the checkpoints pretrained.
2. **Evolution Search**: Single Path One-Shot leverages evolution algorithm to search for the best architecture. In the paper, the search module, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
In this example, it will inherit the ``state_dict`` of supernet from `./data/checkpoint-150000.pth.tar`, and search the best architecture with the regularized evolution strategy. Search in the supernet with the following command
.. code-block:: bash
python search.py
NNI support a latency filter to filter unsatisfied model from search phase. Latency is predicted by Microsoft nn-Meter (https://github.com/microsoft/nn-Meter). To apply the latency filter, users could run search.py with additional arguments ``--latency-filter``. Here is an example:
.. code-block:: bash
python search.py --latency-filter cortexA76cpu_tflite21
Note that the latency filter is only supported for base execution engine.
The final architecture exported from every epoch of evolution can be found in ``trials`` under the working directory of your tuner, which, by default, is ``$HOME/nni-experiments/your_experiment_id/trials``.
3. **Train for Evaluation**:
.. code-block:: bash
python evaluation.py
By default, it will use ``architecture_final.json``. This architecture is provided by the official repo (converted into NNI format). You can use any architecture (e.g., the architecture found in step 2) with ``--fixed-arc`` option.
Known Limitations
^^^^^^^^^^^^^^^^^
* Block search only. Channel search is not supported yet.
Current Reproduction Results
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Reproduction is still undergoing. Due to the gap between official release and original paper, we compare our current results with official repo (our run) and paper.
* Evolution phase is almost aligned with official repo. Our evolution algorithm shows a converging trend and reaches ~65% accuracy at the end of search. Nevertheless, this result is not on par with paper. For details, please refer to `this issue <https://github.com/megvii-model/SinglePathOneShot/issues/6>`__.
* Retrain phase is not aligned. Our retraining code, which uses the architecture released by the authors, reaches 72.14% accuracy, still having a gap towards 73.61% by official release and 74.3% reported in original paper.
.. _proxylessnas-strategy:
ProxylessNAS
------------
The paper `ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware <https://arxiv.org/abs/1812.00332>`__ removes proxy, it directly learns the architectures for large-scale target tasks and target hardware platforms. They address high memory consumption issue of differentiable NAS and reduce the computational cost to the same level of regular training while still allowing a large candidate set. Please refer to the paper for the details.
.. autoclass:: nni.retiarii.oneshot.pytorch.ProxylessTrainer
To use ProxylessNAS training/searching approach, users need to specify search space in their model using :doc:`NNI NAS interface </nas/construct_space>`, e.g., ``LayerChoice``, ``InputChoice``. After defining and instantiating the model, the following work can be leaved to ProxylessNasTrainer by instantiating the trainer and passing the model to it.
.. code-block:: python
trainer = ProxylessTrainer(model,
loss=LabelSmoothingLoss(),
dataset=None,
optimizer=optimizer,
metrics=lambda output, target: accuracy(output, target, topk=(1, 5,)),
num_epochs=120,
log_frequency=10,
grad_reg_loss_type=args.grad_reg_loss_type,
grad_reg_loss_params=grad_reg_loss_params,
applied_hardware=args.applied_hardware, dummy_input=(1, 3, 224, 224),
ref_latency=args.reference_latency)
trainer.train()
trainer.export(args.arch_path)
The complete example code can be found :githublink:`here <examples/nas/oneshot/proxylessnas>`.
Implementation
^^^^^^^^^^^^^^
The implementation on NNI is based on the `offical implementation <https://github.com/mit-han-lab/ProxylessNAS>`__. The official implementation supports two training approaches: gradient descent and RL based. In our current implementation on NNI, gradient descent training approach is supported. The complete support of ProxylessNAS is ongoing.
The official implementation supports different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In NNI repo, the hardware latency prediction is supported by `Microsoft nn-Meter <https://github.com/microsoft/nn-Meter>`__. nn-Meter is an accurate inference latency predictor for DNN models on diverse edge devices. nn-Meter support four hardwares up to now, including ``cortexA76cpu_tflite21``, ``adreno640gpu_tflite21``, ``adreno630gpu_tflite21``, and ``myriadvpu_openvino2019r2``. Users can find more information about nn-Meter on its website. More hardware will be supported in the future. Users could find more details about applying ``nn-Meter`` :doc:`here </nas/hardware_aware_nas>`.
Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, please refer to :githublink:`example code <examples/nas/oneshot/proxylessnas>` for a reference.
.. image:: ../../img/proxylessnas.png
:width: 450
:align: center
ProxylessNAS training approach is composed of ProxylessLayerChoice and ProxylessNasTrainer. ProxylessLayerChoice instantiates MixedOp for each mutable (i.e., LayerChoice), and manage architecture weights in MixedOp. **For DataParallel**, architecture weights should be included in user model. Specifically, in ProxylessNAS implementation, we add MixedOp to the corresponding mutable (i.e., LayerChoice) as a member variable. The ProxylessLayerChoice class also exposes two member functions, i.e., ``resample``, ``finalize_grad``, for the trainer to control the training of architecture weights.
Reproduction Results
^^^^^^^^^^^^^^^^^^^^
To reproduce the result, we first run the search, we found that though it runs many epochs the chosen architecture converges at the first several epochs. This is probably induced by hyper-parameters or the implementation, we are working on it.
Customization
-------------
.. autoclass:: nni.retiarii.oneshot.BaseOneShotTrainer
:members:
.. autofunction:: nni.retiarii.oneshot.pytorch.utils.replace_layer_choice
.. autofunction:: nni.retiarii.oneshot.pytorch.utils.replace_input_choice
...@@ -42,6 +42,8 @@ The simplest way to customize a new evaluator is with :class:`FunctionalEvaluato ...@@ -42,6 +42,8 @@ The simplest way to customize a new evaluator is with :class:`FunctionalEvaluato
If the conversion is successful, the model will be able to be visualized with powerful tools `Netron <https://netron.app/>`__. If the conversion is successful, the model will be able to be visualized with powerful tools `Netron <https://netron.app/>`__.
.. _lightning-evaluator:
Evaluators with PyTorch-Lightning Evaluators with PyTorch-Lightning
--------------------------------- ---------------------------------
......
...@@ -30,19 +30,19 @@ Here is the list of exploration strategies that NNI has supported. ...@@ -30,19 +30,19 @@ Here is the list of exploration strategies that NNI has supported.
* - :class:`PolicyBasedRL <nni.retiarii.strategy.PolicyBasedRL>` * - :class:`PolicyBasedRL <nni.retiarii.strategy.PolicyBasedRL>`
- :ref:`Multi-trial <multi-trial-nas>` - :ref:`Multi-trial <multi-trial-nas>`
- Policy-based reinforcement learning, based on implementation of tianshou. `Reference <https://arxiv.org/abs/1611.01578>`__ - Policy-based reinforcement learning, based on implementation of tianshou. `Reference <https://arxiv.org/abs/1611.01578>`__
* - :ref:`darts-strategy` * - :class:`DARTS <nni.retiarii.strategy.DARTS>`
- :ref:`One-shot <one-shot-nas>` - :ref:`One-shot <one-shot-nas>`
- Continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. `Reference <https://arxiv.org/abs/1806.09055>`__ - Continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. `Reference <https://arxiv.org/abs/1806.09055>`__
* - :ref:`enas-strategy` * - :class:`ENAS <nni.retiarii.strategy.ENAS>`
- :ref:`One-shot <one-shot-nas>` - :ref:`One-shot <one-shot-nas>`
- RL controller learns to generate the best network on a super-net. `Reference <https://arxiv.org/abs/1802.03268>`__ - RL controller learns to generate the best network on a super-net. `Reference <https://arxiv.org/abs/1802.03268>`__
* - :ref:`fbnet-strategy` * - :class:`GumbelDARTS <nni.retiarii.strategy.GumbelDARTS>`
- :ref:`One-shot <one-shot-nas>` - :ref:`One-shot <one-shot-nas>`
- Choose the best block by using Gumbel Softmax random sampling and differentiable training. `Reference <https://arxiv.org/abs/1812.03443>`__ - Choose the best block by using Gumbel Softmax random sampling and differentiable training. `Reference <https://arxiv.org/abs/1812.03443>`__
* - :ref:`spos-strategy` * - :class:`RandomOneShot <nni.retiarii.strategy.RandomOneShot>`
- :ref:`One-shot <one-shot-nas>` - :ref:`One-shot <one-shot-nas>`
- Train a super-net with uniform path sampling. `Reference <https://arxiv.org/abs/1904.00420>`__ - Train a super-net with uniform path sampling. `Reference <https://arxiv.org/abs/1904.00420>`__
* - :ref:`proxylessnas-strategy` * - :class:`Proxyless <nni.retiarii.strategy.Proxyless>`
- :ref:`One-shot <one-shot-nas>` - :ref:`One-shot <one-shot-nas>`
- A low-memory-consuming optimized version of differentiable architecture search. `Reference <https://arxiv.org/abs/1812.00332>`__ - A low-memory-consuming optimized version of differentiable architecture search. `Reference <https://arxiv.org/abs/1812.00332>`__
...@@ -53,7 +53,7 @@ Multi-trial strategy ...@@ -53,7 +53,7 @@ Multi-trial strategy
Multi-trial NAS means each sampled model from model space is trained independently. A typical multi-trial NAS is `NASNet <https://arxiv.org/abs/1707.07012>`__. In multi-trial NAS, users need model evaluator to evaluate the performance of each sampled model, and need an exploration strategy to sample models from a defined model space. Here, users could use NNI provided model evaluators or write their own model evalutor. They can simply choose a exploration strategy. Advanced users can also customize new exploration strategy. Multi-trial NAS means each sampled model from model space is trained independently. A typical multi-trial NAS is `NASNet <https://arxiv.org/abs/1707.07012>`__. In multi-trial NAS, users need model evaluator to evaluate the performance of each sampled model, and need an exploration strategy to sample models from a defined model space. Here, users could use NNI provided model evaluators or write their own model evalutor. They can simply choose a exploration strategy. Advanced users can also customize new exploration strategy.
To use an exploration strategy, users simply instantiate an exploration strategy and pass the instantiated object to :class:`RetiariiExperiment <nni.retiarii.experiment.pytorch.RetiariiExperiment>`. Below is a simple example. To use an exploration strategy, users simply instantiate an exploration strategy and pass the instantiated object to :class:`~nni.retiarii.experiment.pytorch.RetiariiExperiment`. Below is a simple example.
.. code-block:: python .. code-block:: python
...@@ -69,7 +69,25 @@ One-shot strategy ...@@ -69,7 +69,25 @@ One-shot strategy
One-shot NAS algorithms leverage weight sharing among models in neural architecture search space to train a supernet, and use this supernet to guide the selection of better models. This type of algorihtms greatly reduces computational resource compared to independently training each model from scratch (which we call "Multi-trial NAS"). One-shot NAS algorithms leverage weight sharing among models in neural architecture search space to train a supernet, and use this supernet to guide the selection of better models. This type of algorihtms greatly reduces computational resource compared to independently training each model from scratch (which we call "Multi-trial NAS").
Currently, the usage of one-shot NAS strategy is a little different from multi-trial strategy. One-shot strategy is implemented with a special type of objects named *Trainer*. Following the common practice of one-shot NAS, *Trainer* trains the super-net and searches for the optimal architecture in a single run. For example, Starting from v2.8, the usage of one-shot strategies are much alike to multi-trial strategies. Users simply need to create a strategy and run :class:`~nni.retiarii.experiment.pytorch.RetiariiExperiment`. Since one-shot strategies will manipulate the training recipe, to use a one-shot strategy, the evaluator needs to be one of the :ref:`PyTorch-Lightning evaluators <lightning-evaluator>`, either built-in or customized. Example follows:
.. code-block:: python
import nni.retiarii.strategy as strategy
import nni.retiarii.evaluator.pytorch.lightning as pl
evaluator = pl.Classification(...)
exploration_strategy = strategy.DARTS()
One-shot strategies only support a limited set of :ref:`mutation-primitives`, and does not support :doc:`customizing mutators <mutator>` at all. See the :ref:`reference <one-shot-strategy-reference>` for the detailed support list of each algorithm.
*New in v2.8*: One-shot strategy is now compatible with `Lightning accelerators <https://pytorch-lightning.readthedocs.io/en/stable/accelerators/gpu.html>`__. It means that, you can accelerate one-shot strategies on hardwares like multiple GPUs. To enable this feature, you only need to pass the keyword arguments which used to be set in ``pytorch_lightning.Trainer``, to your evaluator. See :doc:`this reference </reference/nas/evaluator>` for more details.
One-shot strategy (legacy)
--------------------------
.. warning:: The following usages are deprecated and will be removed in future releases. If you intend to use them, the references can be found :doc:`here </deprecated/oneshot_legacy>`.
The usage of one-shot NAS strategy is a little different from multi-trial strategy. One-shot strategy is implemented with a special type of objects named *Trainer*. Following the common practice of one-shot NAS, *Trainer* trains the super-net and searches for the optimal architecture in a single run. For example,
.. code-block:: python .. code-block:: python
...@@ -86,7 +104,7 @@ Currently, the usage of one-shot NAS strategy is a little different from multi-t ...@@ -86,7 +104,7 @@ Currently, the usage of one-shot NAS strategy is a little different from multi-t
) )
trainer.fit() trainer.fit()
One-shot strategy can be used without :class:`RetiariiExperiment <nni.retiarii.experiment.pytorch.RetiariiExperiment>`. Thus, the ``trainer.fit()`` here runs the experiment locally. One-shot strategy can be used without :class:`~nni.retiarii.experiment.pytorch.RetiariiExperiment`. Thus, the ``trainer.fit()`` here runs the experiment locally.
After ``trainer.fit()`` completes, we can use ``trainer.export()`` to export the searched architecture (a dict of choices) to a file. After ``trainer.fit()`` completes, we can use ``trainer.export()`` to export the searched architecture (a dict of choices) to a file.
......
...@@ -218,7 +218,7 @@ Our documentation is located under ``docs/`` folder. The following command can b ...@@ -218,7 +218,7 @@ Our documentation is located under ``docs/`` folder. The following command can b
.. code-block:: bash .. code-block:: bash
cd docs cd docs
make html make en
.. note:: .. note::
...@@ -295,7 +295,7 @@ To contribute a new tutorial, here are the steps to follow: ...@@ -295,7 +295,7 @@ To contribute a new tutorial, here are the steps to follow:
In case you prefer to write your tutorial in jupyter, you can use `this script <https://gist.github.com/chsasank/7218ca16f8d022e02a9c0deb94a310fe>`_ to convert the notebook to python file. After conversion and addition to the project, please make sure the sections headings etc are in logical order. In case you prefer to write your tutorial in jupyter, you can use `this script <https://gist.github.com/chsasank/7218ca16f8d022e02a9c0deb94a310fe>`_ to convert the notebook to python file. After conversion and addition to the project, please make sure the sections headings etc are in logical order.
3. Build the tutorials. Since some of the tutorials contain complex AutoML examples, it's very inefficient to build them over and over again. Therefore, we cache the built tutorials in ``docs/source/tutorials``, so that the unchanged tutorials won't be rebuilt. To trigger the build, run ``make html``. This will execute the tutorials and convert the scripts into HTML files. How long it takes depends on your tutorial. As ``make html`` is not very debug-friendly, we suggest making the script runnable by itself before using this building tool. 3. Build the tutorials. Since some of the tutorials contain complex AutoML examples, it's very inefficient to build them over and over again. Therefore, we cache the built tutorials in ``docs/source/tutorials``, so that the unchanged tutorials won't be rebuilt. To trigger the build, run ``make en``. This will execute the tutorials and convert the scripts into HTML files. How long it takes depends on your tutorial. As ``make en`` is not very debug-friendly, we suggest making the script runnable by itself before using this building tool.
.. note:: .. note::
...@@ -327,7 +327,7 @@ To build the translated documentation (for example Chinese documentation), pleas ...@@ -327,7 +327,7 @@ To build the translated documentation (for example Chinese documentation), pleas
.. code-block:: bash .. code-block:: bash
make -e SPHINXOPTS="-D language='zh'" html make zh
If you ever encountered problems for translation builds, try to remove the previous build via ``rm -r docs/build/``. If you ever encountered problems for translation builds, try to remove the previous build via ``rm -r docs/build/``.
......
This diff is collapsed.
...@@ -306,6 +306,18 @@ class Classification(Lightning): ...@@ -306,6 +306,18 @@ class Classification(Lightning):
trainer_kwargs : dict trainer_kwargs : dict
Optional keyword arguments passed to trainer. See Optional keyword arguments passed to trainer. See
`Lightning documentation <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html>`__ for details. `Lightning documentation <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html>`__ for details.
Examples
--------
>>> evaluator = Classification()
To use customized criterion and optimizer:
>>> evaluator = Classification(nn.LabelSmoothingCrossEntropy, optimizer=torch.optim.SGD)
Extra keyword arguments will be passed to trainer, some of which might be necessary to enable GPU acceleration:
>>> evaluator = Classification(accelerator='gpu', devices=2, strategy='ddp')
""" """
def __init__(self, criterion: Type[nn.Module] = nn.CrossEntropyLoss, def __init__(self, criterion: Type[nn.Module] = nn.CrossEntropyLoss,
...@@ -363,6 +375,14 @@ class Regression(Lightning): ...@@ -363,6 +375,14 @@ class Regression(Lightning):
trainer_kwargs : dict trainer_kwargs : dict
Optional keyword arguments passed to trainer. See Optional keyword arguments passed to trainer. See
`Lightning documentation <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html>`__ for details. `Lightning documentation <https://pytorch-lightning.readthedocs.io/en/stable/common/trainer.html>`__ for details.
Examples
--------
>>> evaluator = Regression()
Extra keyword arguments will be passed to trainer, some of which might be necessary to enable GPU acceleration:
>>> evaluator = Regression(gpus=1)
""" """
def __init__(self, criterion: Type[nn.Module] = nn.MSELoss, def __init__(self, criterion: Type[nn.Module] = nn.MSELoss,
......
...@@ -21,7 +21,13 @@ from nni.retiarii.nn.pytorch.api import ValueChoiceX ...@@ -21,7 +21,13 @@ from nni.retiarii.nn.pytorch.api import ValueChoiceX
from nni.typehint import Literal from nni.typehint import Literal
from .supermodule.base import BaseSuperNetModule from .supermodule.base import BaseSuperNetModule
__all__ = ['MutationHook', 'BaseSuperNetModule', 'BaseOneShotLightningModule', 'traverse_and_mutate_submodules'] __all__ = [
'MutationHook',
'BaseSuperNetModule',
'BaseOneShotLightningModule',
'traverse_and_mutate_submodules',
'no_default_hook'
]
MutationHook = Callable[[nn.Module, str, Dict[str, Any], Dict[str, Any]], Union[nn.Module, bool, Tuple[nn.Module, bool]]] MutationHook = Callable[[nn.Module, str, Dict[str, Any], Dict[str, Any]], Union[nn.Module, bool, Tuple[nn.Module, bool]]]
...@@ -147,35 +153,46 @@ def no_default_hook(module: nn.Module, name: str, memo: dict[str, Any], mutate_k ...@@ -147,35 +153,46 @@ def no_default_hook(module: nn.Module, name: str, memo: dict[str, Any], mutate_k
class BaseOneShotLightningModule(pl.LightningModule): class BaseOneShotLightningModule(pl.LightningModule):
_mutation_hooks_note = """mutation_hooks : list[MutationHook] _mutation_hooks_note = """mutation_hooks : list[MutationHook]
Mutation hooks are callable that inputs an Module and returns a :class:`BaseSuperNetModule`. Extra mutation hooks to support customized mutation on primitives other than built-ins.
Mutation hooks are callable that inputs an Module and returns a
:class:`~nni.retiarii.oneshot.pytorch.supermodule.base.BaseSuperNetModule`.
They are invoked in :meth:`traverse_and_mutate_submodules`, on each submodules. They are invoked in :meth:`traverse_and_mutate_submodules`, on each submodules.
For each submodule, the hook list are invoked subsequently, For each submodule, the hook list are invoked subsequently,
the later hooks can see the result from previous hooks. the later hooks can see the result from previous hooks.
The modules that are processed by ``mutation_hooks`` will be replaced by the returned module, The modules that are processed by ``mutation_hooks`` will be replaced by the returned module,
stored in ``nas_modules``, and be the focus of the NAS algorithm. stored in :attr:`nas_modules`, and be the focus of the NAS algorithm.
The hook list will be appended by ``default_mutation_hooks`` in each one-shot module. The hook list will be appended by ``default_mutation_hooks`` in each one-shot module.
To be more specific, the input arguments are four arguments: To be more specific, the input arguments are four arguments:
#. a module that might be processed, 1. a module that might be processed,
#. name of the module in its parent module, 2. name of the module in its parent module,
#. a memo dict whose usage depends on the particular algorithm. 3. a memo dict whose usage depends on the particular algorithm.
#. keyword arguments (configurations). 4. keyword arguments (configurations).
Note that the memo should be read/written by hooks. Note that the memo should be read/written by hooks.
There won't be any hooks called on root module. There won't be any hooks called on root module.
The returned arguments can be also one of the three kinds: The returned arguments can be also one of the three kinds:
#. tuple of: :class:`BaseSuperNetModule` or None, and boolean, 1. tuple of: :class:`~nni.retiarii.oneshot.pytorch.supermodule.base.BaseSuperNetModule` or None, and boolean,
#. boolean, 2. boolean,
#. :class:`BaseSuperNetModule` or None. 3. :class:`~nni.retiarii.oneshot.pytorch.supermodule.base.BaseSuperNetModule` or None.
The boolean value is ``suppress`` indicates whether the folliwng hooks should be called. The boolean value is ``suppress`` indicates whether the following hooks should be called.
When it's true, it suppresses the subsequent hooks, and they will never be invoked. When it's true, it suppresses the subsequent hooks, and they will never be invoked.
Without boolean value specified, it's assumed to be false. Without boolean value specified, it's assumed to be false.
If a none value appears on the place of :class:`BaseSuperNetModule`, it means the hook suggests to If a none value appears on the place of
:class:`~nni.retiarii.oneshot.pytorch.supermodule.base.BaseSuperNetModule`,
it means the hook suggests to
keep the module unchanged, and nothing will happen. keep the module unchanged, and nothing will happen.
An example of mutation hook is given in :func:`no_default_hook`.
However it's recommended to implement mutation hooks by deriving
:class:`~nni.retiarii.oneshot.pytorch.supermodule.base.BaseSuperNetModule`,
and add its classmethod ``mutate`` to this list.
""" """
_inner_module_note = """inner_module : pytorch_lightning.LightningModule _inner_module_note = """inner_module : pytorch_lightning.LightningModule
...@@ -203,6 +220,8 @@ class BaseOneShotLightningModule(pl.LightningModule): ...@@ -203,6 +220,8 @@ class BaseOneShotLightningModule(pl.LightningModule):
---------- ----------
nas_modules : list[BaseSuperNetModule] nas_modules : list[BaseSuperNetModule]
Modules that have been mutated, which the search algorithms should care about. Modules that have been mutated, which the search algorithms should care about.
model : pl.LightningModule
PyTorch lightning module. A model space with training recipe defined (wrapped by LightningModule in evaluator).
Parameters Parameters
---------- ----------
...@@ -235,7 +254,7 @@ class BaseOneShotLightningModule(pl.LightningModule): ...@@ -235,7 +254,7 @@ class BaseOneShotLightningModule(pl.LightningModule):
self.model, mutation_hooks, self.mutate_kwargs(), topdown=True) self.model, mutation_hooks, self.mutate_kwargs(), topdown=True)
def search_space_spec(self) -> dict[str, ParameterSpec]: def search_space_spec(self) -> dict[str, ParameterSpec]:
"""Get the search space specification from ``nas_module``. """Get the search space specification from :attr:`nas_modules`.
Returns Returns
------- -------
...@@ -248,7 +267,7 @@ class BaseOneShotLightningModule(pl.LightningModule): ...@@ -248,7 +267,7 @@ class BaseOneShotLightningModule(pl.LightningModule):
return result return result
def resample(self) -> dict[str, Any]: def resample(self) -> dict[str, Any]:
"""Trigger the resample for each ``nas_module``. """Trigger the resample for each :attr:`nas_modules`.
Sometimes (e.g., in differentiable cases), it does nothing. Sometimes (e.g., in differentiable cases), it does nothing.
Returns Returns
...@@ -263,8 +282,8 @@ class BaseOneShotLightningModule(pl.LightningModule): ...@@ -263,8 +282,8 @@ class BaseOneShotLightningModule(pl.LightningModule):
def export(self) -> dict[str, Any]: def export(self) -> dict[str, Any]:
""" """
Export the NAS result, ideally the best choice of each ``nas_module``. Export the NAS result, ideally the best choice of each :attr:`nas_modules`.
You may implement an ``export`` method for your customized ``nas_module``. You may implement an ``export`` method for your customized :attr:`nas_modules`.
Returns Returns
-------- --------
...@@ -288,8 +307,9 @@ class BaseOneShotLightningModule(pl.LightningModule): ...@@ -288,8 +307,9 @@ class BaseOneShotLightningModule(pl.LightningModule):
def configure_optimizers(self): def configure_optimizers(self):
""" """
Combine architecture optimizers and user's model optimizers. Combine architecture optimizers and user's model optimizers.
You can overwrite configure_architecture_optimizers if architecture optimizers are needed in your NAS algorithm. You can overwrite :meth:`configure_architecture_optimizers` if architecture optimizers are needed in your NAS algorithm.
For now ``self.model`` is tested against :class:`nni.retiarii.evaluator.pytorch.lightning._SupervisedLearningModule`
For now :attr:`model` is tested against evaluators in :mod:`nni.retiarii.evaluator.pytorch.lightning`
and it only returns 1 optimizer. and it only returns 1 optimizer.
But for extendibility, codes for other return value types are also implemented. But for extendibility, codes for other return value types are also implemented.
""" """
...@@ -468,12 +488,12 @@ class BaseOneShotLightningModule(pl.LightningModule): ...@@ -468,12 +488,12 @@ class BaseOneShotLightningModule(pl.LightningModule):
def architecture_optimizers(self) -> list[Optimizer] | Optimizer | None: def architecture_optimizers(self) -> list[Optimizer] | Optimizer | None:
""" """
Get architecture optimizers from all optimizers. Use this to get your architecture optimizers in ``training_step``. Get architecture optimizers from all optimizers. Use this to get your architecture optimizers in :meth:`training_step`.
Returns Returns
---------- ----------
opts : list[Optimizer], Optimizer, None opts : list[Optimizer], Optimizer, None
Architecture optimizers defined in ``configure_architecture_optimizers``. This will be None if there is no Architecture optimizers defined in :meth:`configure_architecture_optimizers`. This will be None if there is no
architecture optimizers. architecture optimizers.
""" """
opts = self.optimizers() opts = self.optimizers()
...@@ -490,7 +510,7 @@ class BaseOneShotLightningModule(pl.LightningModule): ...@@ -490,7 +510,7 @@ class BaseOneShotLightningModule(pl.LightningModule):
def weight_optimizers(self) -> list[Optimizer] | Optimizer | None: def weight_optimizers(self) -> list[Optimizer] | Optimizer | None:
""" """
Get user optimizers from all optimizers. Use this to get user optimizers in ``training_step``. Get user optimizers from all optimizers. Use this to get user optimizers in :meth:`training_step`.
Returns Returns
---------- ----------
......
...@@ -7,6 +7,8 @@ from typing import Any ...@@ -7,6 +7,8 @@ from typing import Any
from pytorch_lightning.trainer.supporters import CombinedLoader, CombinedLoaderIterator from pytorch_lightning.trainer.supporters import CombinedLoader, CombinedLoaderIterator
__all__ = ['ConcatLoader']
class ConcatLoader(CombinedLoader): class ConcatLoader(CombinedLoader):
"""This loader is same as CombinedLoader in PyTorch-Lightning, but concatenate sub-loaders """This loader is same as CombinedLoader in PyTorch-Lightning, but concatenate sub-loaders
......
...@@ -16,13 +16,15 @@ from .supermodule.differentiable import ( ...@@ -16,13 +16,15 @@ from .supermodule.differentiable import (
DifferentiableMixedCell, DifferentiableMixedRepeat DifferentiableMixedCell, DifferentiableMixedRepeat
) )
from .supermodule.proxyless import ProxylessMixedInput, ProxylessMixedLayer from .supermodule.proxyless import ProxylessMixedInput, ProxylessMixedLayer
from .supermodule.operation import NATIVE_MIXED_OPERATIONS from .supermodule.operation import NATIVE_MIXED_OPERATIONS, NATIVE_SUPPORTED_OP_NAMES
class DartsLightningModule(BaseOneShotLightningModule): class DartsLightningModule(BaseOneShotLightningModule):
_darts_note = """ _darts_note = """
DARTS :cite:p:`liu2018darts` algorithm is one of the most fundamental one-shot algorithm. Continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent.
`Reference <https://arxiv.org/abs/1806.09055>`__.
DARTS algorithm is one of the most fundamental one-shot algorithm.
DARTS repeats iterations, where each iteration consists of 2 training phases. DARTS repeats iterations, where each iteration consists of 2 training phases.
The phase 1 is architecture step, in which model parameters are frozen and the architecture parameters are trained. The phase 1 is architecture step, in which model parameters are frozen and the architecture parameters are trained.
The phase 2 is model step, in which architecture parameters are frozen and model parameters are trained. The phase 2 is model step, in which architecture parameters are frozen and model parameters are trained.
...@@ -33,6 +35,15 @@ class DartsLightningModule(BaseOneShotLightningModule): ...@@ -33,6 +35,15 @@ class DartsLightningModule(BaseOneShotLightningModule):
`FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions <https://arxiv.org/abs/2004.05565>`__. `FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions <https://arxiv.org/abs/2004.05565>`__.
One difference is that, in DARTS, we are using Softmax instead of GumbelSoftmax. One difference is that, in DARTS, we are using Softmax instead of GumbelSoftmax.
The supported mutation primitives of DARTS are:
* :class:`nni.retiarii.nn.pytorch.LayerChoice`.
* :class:`nni.retiarii.nn.pytorch.InputChoice`.
* :class:`nni.retiarii.nn.pytorch.ValueChoice` (only when used in {supported_ops}).
* :class:`nni.retiarii.nn.pytorch.Repeat`.
* :class:`nni.retiarii.nn.pytorch.Cell`.
* :class:`nni.retiarii.nn.pytorch.NasBench201Cell`.
{{module_notes}} {{module_notes}}
Parameters Parameters
...@@ -41,7 +52,10 @@ class DartsLightningModule(BaseOneShotLightningModule): ...@@ -41,7 +52,10 @@ class DartsLightningModule(BaseOneShotLightningModule):
{base_params} {base_params}
arc_learning_rate : float arc_learning_rate : float
Learning rate for architecture optimizer. Default: 3.0e-4 Learning rate for architecture optimizer. Default: 3.0e-4
""".format(base_params=BaseOneShotLightningModule._mutation_hooks_note) """.format(
base_params=BaseOneShotLightningModule._mutation_hooks_note,
supported_ops=', '.join(NATIVE_SUPPORTED_OP_NAMES)
)
__doc__ = _darts_note.format( __doc__ = _darts_note.format(
module_notes='The DARTS Module should be trained with :class:`nni.retiarii.oneshot.utils.InterleavedTrainValDataLoader`.', module_notes='The DARTS Module should be trained with :class:`nni.retiarii.oneshot.utils.InterleavedTrainValDataLoader`.',
...@@ -123,11 +137,17 @@ class DartsLightningModule(BaseOneShotLightningModule): ...@@ -123,11 +137,17 @@ class DartsLightningModule(BaseOneShotLightningModule):
class ProxylessLightningModule(DartsLightningModule): class ProxylessLightningModule(DartsLightningModule):
_proxyless_note = """ _proxyless_note = """
Implementation of ProxylessNAS :cite:p:`cai2018proxylessnas`. A low-memory-consuming optimized version of differentiable architecture search. See `reference <https://arxiv.org/abs/1812.00332>`__.
It's a DARTS-based method that resamples the architecture to reduce memory consumption.
This is a DARTS-based method that resamples the architecture to reduce memory consumption.
Essentially, it samples one path on forward, Essentially, it samples one path on forward,
and implements its own backward to update the architecture parameters based on only one path. and implements its own backward to update the architecture parameters based on only one path.
The supported mutation primitives of Proxyless are:
* :class:`nni.retiarii.nn.pytorch.LayerChoice`.
* :class:`nni.retiarii.nn.pytorch.InputChoice`.
{{module_notes}} {{module_notes}}
Parameters Parameters
...@@ -160,14 +180,25 @@ class ProxylessLightningModule(DartsLightningModule): ...@@ -160,14 +180,25 @@ class ProxylessLightningModule(DartsLightningModule):
class GumbelDartsLightningModule(DartsLightningModule): class GumbelDartsLightningModule(DartsLightningModule):
_gumbel_darts_note = """ _gumbel_darts_note = """
Implementation of SNAS :cite:p:`xie2018snas`. Choose the best block by using Gumbel Softmax random sampling and differentiable training.
It's a DARTS-based method that uses gumbel-softmax to simulate one-hot distribution. See `FBNet <https://arxiv.org/abs/1812.03443>`__ and `SNAS <https://arxiv.org/abs/1812.09926>`__.
This is a DARTS-based method that uses gumbel-softmax to simulate one-hot distribution.
Essentially, it samples one path on forward, Essentially, it samples one path on forward,
and implements its own backward to update the architecture parameters based on only one path. and implements its own backward to update the architecture parameters based on only one path.
*New in v2.8*: Supports searching for ValueChoices on operations, with the technique described in *New in v2.8*: Supports searching for ValueChoices on operations, with the technique described in
`FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions <https://arxiv.org/abs/2004.05565>`__. `FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions <https://arxiv.org/abs/2004.05565>`__.
The supported mutation primitives of GumbelDARTS are:
* :class:`nni.retiarii.nn.pytorch.LayerChoice`.
* :class:`nni.retiarii.nn.pytorch.InputChoice`.
* :class:`nni.retiarii.nn.pytorch.ValueChoice` (only when used in {supported_ops}).
* :class:`nni.retiarii.nn.pytorch.Repeat`.
* :class:`nni.retiarii.nn.pytorch.Cell`.
* :class:`nni.retiarii.nn.pytorch.NasBench201Cell`.
{{module_notes}} {{module_notes}}
Parameters Parameters
...@@ -178,12 +209,15 @@ class GumbelDartsLightningModule(DartsLightningModule): ...@@ -178,12 +209,15 @@ class GumbelDartsLightningModule(DartsLightningModule):
The initial temperature used in gumbel-softmax. The initial temperature used in gumbel-softmax.
use_temp_anneal : bool use_temp_anneal : bool
If true, a linear annealing will be applied to ``gumbel_temperature``. If true, a linear annealing will be applied to ``gumbel_temperature``.
Otherwise, run at a fixed temperature. See :cite:t:`xie2018snas` for details. Otherwise, run at a fixed temperature. See `SNAS <https://arxiv.org/abs/1812.09926>`__ for details.
min_temp : float min_temp : float
The minimal temperature for annealing. No need to set this if you set ``use_temp_anneal`` False. The minimal temperature for annealing. No need to set this if you set ``use_temp_anneal`` False.
arc_learning_rate : float arc_learning_rate : float
Learning rate for architecture optimizer. Default: 3.0e-4 Learning rate for architecture optimizer. Default: 3.0e-4
""".format(base_params=BaseOneShotLightningModule._mutation_hooks_note) """.format(
base_params=BaseOneShotLightningModule._mutation_hooks_note,
supported_ops=', '.join(NATIVE_SUPPORTED_OP_NAMES)
)
def mutate_kwargs(self): def mutate_kwargs(self):
"""Use gumbel softmax.""" """Use gumbel softmax."""
......
...@@ -12,25 +12,38 @@ import torch.nn as nn ...@@ -12,25 +12,38 @@ import torch.nn as nn
import torch.optim as optim import torch.optim as optim
from .base_lightning import BaseOneShotLightningModule, MutationHook, no_default_hook from .base_lightning import BaseOneShotLightningModule, MutationHook, no_default_hook
from .supermodule.operation import NATIVE_MIXED_OPERATIONS, NATIVE_SUPPORTED_OP_NAMES
from .supermodule.sampling import ( from .supermodule.sampling import (
PathSamplingInput, PathSamplingLayer, MixedOpPathSamplingPolicy, PathSamplingInput, PathSamplingLayer, MixedOpPathSamplingPolicy,
PathSamplingCell, PathSamplingRepeat PathSamplingCell, PathSamplingRepeat
) )
from .supermodule.operation import NATIVE_MIXED_OPERATIONS
from .enas import ReinforceController, ReinforceField from .enas import ReinforceController, ReinforceField
class RandomSamplingLightningModule(BaseOneShotLightningModule): class RandomSamplingLightningModule(BaseOneShotLightningModule):
_random_note = """ _random_note = """
Random Sampling NAS Algorithm. Train a super-net with uniform path sampling. See `reference <https://arxiv.org/abs/1904.00420>`__.
In each epoch, model parameters are trained after a uniformly random sampling of each choice. In each epoch, model parameters are trained after a uniformly random sampling of each choice.
Notably, the exporting result is **also a random sample** of the search space. Notably, the exporting result is **also a random sample** of the search space.
The supported mutation primitives of RandomOneShot are:
* :class:`nni.retiarii.nn.pytorch.LayerChoice`.
* :class:`nni.retiarii.nn.pytorch.InputChoice`.
* :class:`nni.retiarii.nn.pytorch.ValueChoice` (only when used in {supported_ops}).
* :class:`nni.retiarii.nn.pytorch.Repeat`.
* :class:`nni.retiarii.nn.pytorch.Cell`.
* :class:`nni.retiarii.nn.pytorch.NasBench201Cell`.
Parameters Parameters
---------- ----------
{{module_params}} {{module_params}}
{base_params} {base_params}
""".format(base_params=BaseOneShotLightningModule._mutation_hooks_note) """.format(
base_params=BaseOneShotLightningModule._mutation_hooks_note,
supported_ops=', '.join(NATIVE_SUPPORTED_OP_NAMES)
)
__doc__ = _random_note.format( __doc__ = _random_note.format(
module_params=BaseOneShotLightningModule._inner_module_note, module_params=BaseOneShotLightningModule._inner_module_note,
...@@ -66,9 +79,24 @@ class RandomSamplingLightningModule(BaseOneShotLightningModule): ...@@ -66,9 +79,24 @@ class RandomSamplingLightningModule(BaseOneShotLightningModule):
class EnasLightningModule(RandomSamplingLightningModule): class EnasLightningModule(RandomSamplingLightningModule):
_enas_note = """ _enas_note = """
The implementation of ENAS :cite:p:`pham2018efficient`. There are 2 steps in an epoch. RL controller learns to generate the best network on a super-net. See `ENAS paper <https://arxiv.org/abs/1802.03268>`__.
Firstly, training model parameters.
Secondly, training ENAS RL agent. The agent will produce a sample of model architecture to get the best reward. There are 2 steps in an epoch.
- Firstly, training model parameters.
- Secondly, training ENAS RL agent. The agent will produce a sample of model architecture to get the best reward.
ENAS requires the evaluator to report metrics via ``self.log`` in its ``validation_step``.
See explanation of ``reward_metric_name`` for details.
The supported mutation primitives of ENAS are:
* :class:`nni.retiarii.nn.pytorch.LayerChoice`.
* :class:`nni.retiarii.nn.pytorch.InputChoice`.
* :class:`nni.retiarii.nn.pytorch.ValueChoice` (only when used in {supported_ops}).
* :class:`nni.retiarii.nn.pytorch.Repeat`.
* :class:`nni.retiarii.nn.pytorch.Cell`.
* :class:`nni.retiarii.nn.pytorch.NasBench201Cell`.
{{module_notes}} {{module_notes}}
...@@ -94,7 +122,10 @@ class EnasLightningModule(RandomSamplingLightningModule): ...@@ -94,7 +122,10 @@ class EnasLightningModule(RandomSamplingLightningModule):
If there are multiple, it will find the metric with key name ``reward_metric_name``, If there are multiple, it will find the metric with key name ``reward_metric_name``,
which is "default" by default. which is "default" by default.
Otherwise it raises an exception indicating multiple metrics are found. Otherwise it raises an exception indicating multiple metrics are found.
""".format(base_params=BaseOneShotLightningModule._mutation_hooks_note) """.format(
base_params=BaseOneShotLightningModule._mutation_hooks_note,
supported_ops=', '.join(NATIVE_SUPPORTED_OP_NAMES)
)
__doc__ = _enas_note.format( __doc__ = _enas_note.format(
module_notes='``ENASModule`` should be trained with :class:`nni.retiarii.oneshot.utils.ConcatenateTrainValDataloader`.', module_notes='``ENASModule`` should be trained with :class:`nni.retiarii.oneshot.utils.ConcatenateTrainValDataloader`.',
......
...@@ -34,6 +34,16 @@ from typing import Callable, Iterator, TypeVar, Any, Optional, Tuple, Union, Lis ...@@ -34,6 +34,16 @@ from typing import Callable, Iterator, TypeVar, Any, Optional, Tuple, Union, Lis
import numpy as np import numpy as np
import torch import torch
__all__ = [
'slice_type',
'multidim_slice',
'scalar_or_scalar_dict',
'int_or_int_dict',
'zeros_like',
'Slicable',
'MaybeWeighted',
]
T = TypeVar('T') T = TypeVar('T')
slice_type = Union[slice, List[slice]] slice_type = Union[slice, List[slice]]
......
...@@ -9,6 +9,8 @@ import torch.nn as nn ...@@ -9,6 +9,8 @@ import torch.nn as nn
from nni.common.hpo_utils import ParameterSpec from nni.common.hpo_utils import ParameterSpec
__all__ = ['BaseSuperNetModule']
class BaseSuperNetModule(nn.Module): class BaseSuperNetModule(nn.Module):
""" """
...@@ -88,6 +90,6 @@ class BaseSuperNetModule(nn.Module): ...@@ -88,6 +90,6 @@ class BaseSuperNetModule(nn.Module):
------- -------
Union[BaseSuperNetModule, bool, tuple[BaseSuperNetModule, bool]] Union[BaseSuperNetModule, bool, tuple[BaseSuperNetModule, bool]]
The mutation result, along with an optional boolean flag indicating whether to suppress follow-up mutation hooks. The mutation result, along with an optional boolean flag indicating whether to suppress follow-up mutation hooks.
See :class:`nni.retiarii.oneshot.pytorch.base.BaseOneShotLightningModule` for details. See :class:`BaseOneShotLightningModule <nni.retiarii.oneshot.pytorch.base_lightning.BaseOneShotLightningModule>` for details.
""" """
raise NotImplementedError() raise NotImplementedError()
...@@ -25,6 +25,12 @@ from ._valuechoice_utils import traverse_all_options, dedup_inner_choices ...@@ -25,6 +25,12 @@ from ._valuechoice_utils import traverse_all_options, dedup_inner_choices
_logger = logging.getLogger(__name__) _logger = logging.getLogger(__name__)
__all__ = [
'DifferentiableMixedLayer', 'DifferentiableMixedInput',
'DifferentiableMixedRepeat', 'DifferentiableMixedCell',
'MixedOpDifferentiablePolicy'
]
class GumbelSoftmax(nn.Softmax): class GumbelSoftmax(nn.Softmax):
"""Wrapper of ``F.gumbel_softmax``. dim = -1 by default.""" """Wrapper of ``F.gumbel_softmax``. dim = -1 by default."""
......
...@@ -28,6 +28,15 @@ from ._operation_utils import Slicable as _S, MaybeWeighted as _W, int_or_int_di ...@@ -28,6 +28,15 @@ from ._operation_utils import Slicable as _S, MaybeWeighted as _W, int_or_int_di
T = TypeVar('T') T = TypeVar('T')
__all__ = [
'MixedOperationSamplingPolicy',
'MixedOperation',
'MixedLinear',
'MixedConv2d',
'MixedBatchNorm2d',
'MixedMultiHeadAttention',
'NATIVE_MIXED_OPERATIONS',
]
class MixedOperationSamplingPolicy: class MixedOperationSamplingPolicy:
""" """
...@@ -66,9 +75,11 @@ class MixedOperationSamplingPolicy: ...@@ -66,9 +75,11 @@ class MixedOperationSamplingPolicy:
class MixedOperation(BaseSuperNetModule): class MixedOperation(BaseSuperNetModule):
"""This is the base class for all mixed operations. """This is the base class for all mixed operations.
It's what you should inherit to support a new operation with ValueChoice.
It contains commonly used utilities that will ease the effort to write customized mixed oeprations, It contains commonly used utilities that will ease the effort to write customized mixed oeprations,
i.e., operations with ValueChoice in its arguments. i.e., operations with ValueChoice in its arguments.
To customize, please write your own mixed operation, and add the hook into ``mutation_hooks`` parameter when using the strategy.
By design, for a mixed operation to work in a specific algorithm, By design, for a mixed operation to work in a specific algorithm,
at least two classes are needed. at least two classes are needed.
...@@ -574,3 +585,6 @@ NATIVE_MIXED_OPERATIONS: list[Type[MixedOperation]] = [ ...@@ -574,3 +585,6 @@ NATIVE_MIXED_OPERATIONS: list[Type[MixedOperation]] = [
MixedBatchNorm2d, MixedBatchNorm2d,
MixedMultiHeadAttention, MixedMultiHeadAttention,
] ]
# For the supported operations to be properly rendered in documentation
NATIVE_SUPPORTED_OP_NAMES: list[str] = [op.bound_type.__name__ for op in NATIVE_MIXED_OPERATIONS]
...@@ -18,6 +18,8 @@ import torch.nn as nn ...@@ -18,6 +18,8 @@ import torch.nn as nn
from .differentiable import DifferentiableMixedLayer, DifferentiableMixedInput from .differentiable import DifferentiableMixedLayer, DifferentiableMixedInput
__all__ = ['ProxylessMixedLayer', 'ProxylessMixedInput']
class _ArchGradientFunction(torch.autograd.Function): class _ArchGradientFunction(torch.autograd.Function):
@staticmethod @staticmethod
......
...@@ -19,6 +19,12 @@ from .base import BaseSuperNetModule ...@@ -19,6 +19,12 @@ from .base import BaseSuperNetModule
from ._valuechoice_utils import evaluate_value_choice_with_dict, dedup_inner_choices from ._valuechoice_utils import evaluate_value_choice_with_dict, dedup_inner_choices
from .operation import MixedOperationSamplingPolicy, MixedOperation from .operation import MixedOperationSamplingPolicy, MixedOperation
__all__ = [
'PathSamplingLayer', 'PathSamplingInput',
'PathSamplingRepeat', 'PathSamplingCell',
'MixedOpPathSamplingPolicy'
]
class PathSamplingLayer(BaseSuperNetModule): class PathSamplingLayer(BaseSuperNetModule):
""" """
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment