To empower affordable DNN on the edge and mobile devices, hardware-aware NAS searches both high accuracy and low latency models. In particular, the search algorithm only considers the models within the target latency constraints during the search process.
To run this demo, first install nn-Meter from source code (Github repo link: https://github.com/microsoft/nn-Meter. Currently we haven't released this package, so development installation is required).
To run this demo, first install nn-Meter by running:
.. code-block:: bash
python setup.py develop
pip install nn-meter
Then run multi-trail SPOS demo:
...
...
@@ -20,10 +20,11 @@ Then run multi-trail SPOS demo:
To support hardware-aware NAS, you first need a `Strategy` that supports filtering the models by latency. We provide such a filter named `LatencyFilter` in NNI and initialize a `Random` strategy with the filter:
To support hardware-aware NAS, you first need a ``Strategy`` that supports filtering the models by latency. We provide such a filter named ``LatencyFilter`` in NNI and initialize a ``Random`` strategy with the filter:
.. code-block:: python
...
...
@@ -45,3 +46,34 @@ Then, pass this strategy to ``RetiariiExperiment``:
exp.run(exp_config, port)
In ``exp_config``, ``dummy_input`` is required for tracing shape info.
End-to-end ProxylessNAS with Latency Constraints
------------------------------------------------
`ProxylessNAS <https://arxiv.org/pdf/1812.00332.pdf>`__ is a hardware-aware one-shot NAS algorithm. ProxylessNAS applies the expected latency of the model to build a differentiable metric and design efficient neural network architectures for hardware. The latency loss is added as a regularization term for architecture parameter optimization. In this example, nn-Meter provides a latency estimator to predict expected latency for the mixed operation on other types of mobile and edge hardware.
To run the one-shot ProxylessNAS demo, first install nn-Meter by running:
In the implementation of ProxylessNAS ``trainer``, we provide a ``HardwareLatencyEstimator`` which currently builds a lookup table, that stores the measured latency of each candidate building block in the search space. The latency sum of all building blocks in a candidate model will be treated as the model inference latency. The latency prediction is obtained by ``nn-Meter``. ``HardwareLatencyEstimator`` predicts expected latency for the mixed operation based on the path weight of `ProxylessLayerChoice`. With leveraging ``nn-Meter`` in NNI, users can apply ProxylessNAS to search efficient DNN models on more types of edge devices.
Despite of ``applied_hardware`` and ``reference_latency``, There are some other parameters related to hardware-aware ProxylessNAS training in this :githublink:`example <examples/nas/oneshot/proxylessnas/main.py>`:
* ``grad_reg_loss_type``: Regularization type to add hardware related loss. Allowed types include ``"mul#log"`` and ``"add#linear"``. Type of ``mul#log`` is calculate by ``(torch.log(expected_latency) / math.log(reference_latency)) ** beta``. Type of ``"add#linear"`` is calculate by ``reg_lambda * (expected_latency - reference_latency) / reference_latency``.
* ``grad_reg_loss_lambda``: Regularization params, is set to ``0.1`` by default.
* ``grad_reg_loss_alpha``: Regularization params, is set to ``0.2`` by default.
* ``grad_reg_loss_beta``: Regularization params, is set to ``0.3`` by default.
* ``dummy_input``: The dummy input shape when applied to the target hardware. This parameter is set as (1, 3, 224, 224) by default.
The implementation on NNI is based on the `offical implementation <https://github.com/mit-han-lab/ProxylessNAS>`__. The official implementation supports two training approaches: gradient descent and RL based. In our current implementation on NNI, gradient descent training approach is supported. The complete support of ProxylessNAS is ongoing.
The official implementation supports different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In NNI repo, the hardware latency prediction is supported by `Microsoft nn-Meter <https://github.com/microsoft/nn-Meter>`__. nn-Meter is an accurate inference latency predictor for DNN models on diverse edge devices. nn-Meter support four hardwares up to now, including *'cortexA76cpu_tflite21'*, *'adreno640gpu_tflite21'*, *'adreno630gpu_tflite21'*, and *'myriadvpu_openvino2019r2'*. Users can find more information about nn-Meter on its website. More hardware will be supported in the future.
The official implementation supports different targeted hardware, including 'mobile', 'cpu', 'gpu8', 'flops'. In NNI repo, the hardware latency prediction is supported by `Microsoft nn-Meter <https://github.com/microsoft/nn-Meter>`__. nn-Meter is an accurate inference latency predictor for DNN models on diverse edge devices. nn-Meter support four hardwares up to now, including *'cortexA76cpu_tflite21'*, *'adreno640gpu_tflite21'*, *'adreno630gpu_tflite21'*, and *'myriadvpu_openvino2019r2'*. Users can find more information about nn-Meter on its website. More hardware will be supported in the future. Users could find more details about applying ``nn-Meter`` `here <./HardwareAwareNAS.rst>`__ .
Below we will describe implementation details. Like other one-shot NAS algorithms on NNI, ProxylessNAS is composed of two parts: *search space* and *training approach*. For users to flexibly define their own search space and use built-in ProxylessNAS training approach, we put the specified search space in :githublink:`example code <examples/nas/oneshot/proxylessnas>` using :githublink:`NNI NAS interface <nni/retiarii/oneshot/pytorch/proxyless>`.