Unverified Commit b63b5246 authored by Yan Ni's avatar Yan Ni Committed by GitHub
Browse files

Merge pull request #1911 from leckie-chn/v1.3-rc0

merge v1.3 (conflict resolved) back to master
parents 1b219414 87dc3cdc
ActivationRankFilterPruner on NNI Compressor
===
## 1. Introduction
ActivationRankFilterPruner is a series of pruners which prune filters according to some importance criterion calculated from the filters' output activations.
| Pruner | Importance criterion | Reference paper |
| :----------------------------: | :-------------------------------: | :----------------------------------------------------------: |
| ActivationAPoZRankFilterPruner | APoZ(average percentage of zeros) | [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250) |
| ActivationMeanRankFilterPruner | mean value of output activations | [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440) |
## 2. Pruners
### ActivationAPoZRankFilterPruner
Hengyuan Hu, Rui Peng, Yu-Wing Tai and Chi-Keung Tang,
"[Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250)", ICLR 2016.
ActivationAPoZRankFilterPruner prunes the filters with the smallest APoZ(average percentage of zeros) of output activations.
The APoZ is defined as:
![](../../img/apoz.png)
### ActivationMeanRankFilterPruner
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila and Jan Kautz,
"[Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440)", ICLR 2017.
ActivationMeanRankFilterPruner prunes the filters with the smallest mean value of output activations
## 3. Usage
PyTorch code
```python
from nni.compression.torch import ActivationAPoZRankFilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'], 'op_names': ['conv1', 'conv2'] }]
pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1)
pruner.compress()
```
#### User configuration for ActivationAPoZRankFilterPruner
- **sparsity:** This is to specify the sparsity operations to be compressed to
- **op_types:** Only Conv2d is supported in ActivationAPoZRankFilterPruner
## 4. Experiment
TODO.
# Model Compression with NNI # Model Compression with NNI
As larger neural networks with more layers and nodes are considered, reducing their storage and computational cost becomes critical, especially for some real-time applications. Model compression can be used to address this problem.
We are glad to announce the alpha release for model compression toolkit on top of NNI, it's still in the experiment phase which might evolve based on usage feedback. We'd like to invite you to use, feedback and even contribute. We are glad to announce the alpha release for model compression toolkit on top of NNI, it's still in the experiment phase which might evolve based on usage feedback. We'd like to invite you to use, feedback and even contribute.
NNI provides an easy-to-use toolkit to help user design and use compression algorithms. It supports Tensorflow and PyTorch with unified interface. For users to compress their models, they only need to add several lines in their code. There are some popular model compression algorithms built-in in NNI. Users could further use NNI's auto tuning power to find the best compressed model, which is detailed in [Auto Model Compression](./AutoCompression.md). On the other hand, users could easily customize their new compression algorithms using NNI's interface, refer to the tutorial [here](#customize-new-compression-algorithms). NNI provides an easy-to-use toolkit to help user design and use compression algorithms. It currently supports PyTorch with unified interface. For users to compress their models, they only need to add several lines in their code. There are some popular model compression algorithms built-in in NNI. Users could further use NNI's auto tuning power to find the best compressed model, which is detailed in [Auto Model Compression](./AutoCompression.md). On the other hand, users could easily customize their new compression algorithms using NNI's interface, refer to the tutorial [here](#customize-new-compression-algorithms).
For a survey of model compression, you can refer to this paper: [Recent Advances in Efficient Computation of Deep Convolutional Neural Networks](https://arxiv.org/pdf/1802.00939.pdf).
## Supported algorithms ## Supported algorithms
...@@ -10,26 +13,31 @@ We have provided several compression algorithms, including several pruning and q ...@@ -10,26 +13,31 @@ We have provided several compression algorithms, including several pruning and q
**Pruning** **Pruning**
Pruning algorithms compress the original network by removing redundant weights or channels of layers, which can reduce model complexity and address the over-fitting issue.
|Name|Brief Introduction of Algorithm| |Name|Brief Introduction of Algorithm|
|---|---| |---|---|
| [Level Pruner](./Pruner.md#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights | | [Level Pruner](./Pruner.md#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
| [AGP Pruner](./Pruner.md#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)| | [AGP Pruner](./Pruner.md#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
| [Lottery Ticket Pruner](./Pruner.md#agp-pruner) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)| | [Lottery Ticket Pruner](./Pruner.md#agp-pruner) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
| [FPGM Pruner](./Pruner.md#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)| | [FPGM Pruner](./Pruner.md#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
| [L1Filter Pruner](./Pruner.md#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers(PRUNING FILTERS FOR EFFICIENT CONVNETS)[Reference Paper](https://arxiv.org/abs/1608.08710) | | [L1Filter Pruner](./Pruner.md#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers (Pruning Filters for Efficient Convnets) [Reference Paper](https://arxiv.org/abs/1608.08710) |
| [L2Filter Pruner](./Pruner.md#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers | | [L2Filter Pruner](./Pruner.md#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers |
| [ActivationAPoZRankFilterPruner](./Pruner.md#ActivationAPoZRankFilterPruner) | Pruning filters prunes the filters with the smallest APoZ(average percentage of zeros) of output activations(Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures)[Reference Paper](https://arxiv.org/abs/1607.03250) | | [ActivationAPoZRankFilterPruner](./Pruner.md#ActivationAPoZRankFilterPruner) | Pruning filters based on the metric APoZ (average percentage of zeros) which measures the percentage of zeros in activations of (convolutional) layers. [Reference Paper](https://arxiv.org/abs/1607.03250) |
| [ActivationMeanRankFilterPruner](./Pruner.md#ActivationMeanRankFilterPruner) | Pruning filters prunes the filters with the smallest mean value of output activations(Pruning Convolutional Neural Networks for Resource Efficient Inference)[Reference Paper](https://arxiv.org/abs/1611.06440) | | [ActivationMeanRankFilterPruner](./Pruner.md#ActivationMeanRankFilterPruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations |
| [Slim Pruner](./Pruner.md#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming)[Reference Paper](https://arxiv.org/abs/1708.06519) | | [Slim Pruner](./Pruner.md#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) |
**Quantization** **Quantization**
Quantization algorithms compress the original network by reducing the number of bits required to represent weights or activations, which can reduce the computations and the inference time.
|Name|Brief Introduction of Algorithm| |Name|Brief Introduction of Algorithm|
|---|---| |---|---|
| [Naive Quantizer](./Quantizer.md#naive-quantizer) | Quantize weights to default 8 bits | | [Naive Quantizer](./Quantizer.md#naive-quantizer) | Quantize weights to default 8 bits |
| [QAT Quantizer](./Quantizer.md#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)| | [QAT Quantizer](./Quantizer.md#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
| [DoReFa Quantizer](./Quantizer.md#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)| | [DoReFa Quantizer](./Quantizer.md#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
| [BNN Quantizer](./Quantizer.md#BNN-Quantizer) | Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [Reference Paper](https://arxiv.org/abs/1602.02830)|
## Usage of built-in compression algorithms ## Usage of built-in compression algorithms
...@@ -61,17 +69,47 @@ The function call `pruner.compress()` modifies user defined model (in Tensorflow ...@@ -61,17 +69,47 @@ The function call `pruner.compress()` modifies user defined model (in Tensorflow
When instantiate a compression algorithm, there is `config_list` passed in. We describe how to write this config below. When instantiate a compression algorithm, there is `config_list` passed in. We describe how to write this config below.
### User configuration for a compression algorithm ### User configuration for a compression algorithm
When compressing a model, users may want to specify the ratio for sparsity, to specify different ratios for different types of operations, to exclude certain types of operations, or to compress only a certain types of operations. For users to express these kinds of requirements, we define a configuration specification. It can be seen as a python `list` object, where each element is a `dict` object.
The `dict`s in the `list` are applied one by one, that is, the configurations in latter `dict` will overwrite the configurations in former ones on the operations that are within the scope of both of them.
When compressing a model, users may want to specify the ratio for sparsity, to specify different ratios for different types of operations, to exclude certain types of operations, or to compress only a certain types of operations. For users to express these kinds of requirements, we define a configuration specification. It can be seen as a python `list` object, where each element is a `dict` object. In each `dict`, there are some keys commonly supported by NNI compression: #### Common keys
In each `dict`, there are some keys commonly supported by NNI compression:
* __op_types__: This is to specify what types of operations to be compressed. 'default' means following the algorithm's default setting. * __op_types__: This is to specify what types of operations to be compressed. 'default' means following the algorithm's default setting.
* __op_names__: This is to specify by name what operations to be compressed. If this field is omitted, operations will not be filtered by it. * __op_names__: This is to specify by name what operations to be compressed. If this field is omitted, operations will not be filtered by it.
* __exclude__: Default is False. If this field is True, it means the operations with specified types and names will be excluded from the compression. * __exclude__: Default is False. If this field is True, it means the operations with specified types and names will be excluded from the compression.
There are also other keys in the `dict`, but they are specific for every compression algorithm. For example, some , some. #### Keys for quantization algorithms
**If you use quantization algorithms, you need to specify more keys. If you use pruning algorithms, you can safely skip these keys**
The `dict`s in the `list` are applied one by one, that is, the configurations in latter `dict` will overwrite the configurations in former ones on the operations that are within the scope of both of them. * __quant_types__ : list of string.
Type of quantization you want to apply, currently support 'weight', 'input', 'output'. 'weight' means applying quantization operation
to the weight parameter of modules. 'input' means applying quantization operation to the input of module forward method. 'output' means applying quantization operation to the output of module forward method, which is often called as 'activation' in some papers.
* __quant_bits__ : int or dict of {str : int}
bits length of quantization, key is the quantization type, value is the quantization bits length, eg.
```
{
quant_bits: {
'weight': 8,
'output': 4,
},
}
```
when the value is int type, all quantization types share same bits length. eg.
```
{
quant_bits: 8, # weight or output quantization are all 8 bits
}
```
#### Other keys specified for every compression algorithm
There are also other keys in the `dict`, but they are specific for every compression algorithm. For example, [Level Pruner](./Pruner.md#level-pruner) requires `sparsity` key to specify how much a model should be pruned.
#### example
A simple example of configuration is shown below: A simple example of configuration is shown below:
```python ```python
...@@ -183,11 +221,9 @@ Some algorithms may want global information for generating masks, for example, a ...@@ -183,11 +221,9 @@ Some algorithms may want global information for generating masks, for example, a
The interface for customizing quantization algorithm is similar to that of pruning algorithms. The only difference is that `calc_mask` is replaced with `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask. The interface for customizing quantization algorithm is similar to that of pruning algorithms. The only difference is that `calc_mask` is replaced with `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
```python ```python
# This is writing a Quantizer in tensorflow. from nni.compression.torch.compressor import Quantizer
# For writing a Quantizer in PyTorch, you can simply replace
# nni.compression.tensorflow.Quantizer with class YourQuantizer(Quantizer):
# nni.compression.torch.Quantizer
class YourQuantizer(nni.compression.tensorflow.Quantizer):
def __init__(self, model, config_list): def __init__(self, model, config_list):
""" """
Suggest you to use the NNI defined spec for config Suggest you to use the NNI defined spec for config
...@@ -245,21 +281,58 @@ class YourQuantizer(nni.compression.tensorflow.Quantizer): ...@@ -245,21 +281,58 @@ class YourQuantizer(nni.compression.tensorflow.Quantizer):
return new_input return new_input
# note for pytorch version, there is no sess in input arguments def update_epoch(self, epoch_num):
def update_epoch(self, epoch_num, sess):
pass pass
# note for pytorch version, there is no sess in input arguments def step(self):
def step(self, sess):
""" """
Can do some processing based on the model or weights binded Can do some processing based on the model or weights binded
in the func bind_model in the func bind_model
""" """
pass pass
``` ```
#### Customize backward function
Sometimes it's necessary for a quantization operation to have a customized backward function, such as [Straight-Through Estimator](https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste), user can customize a backward function as follow:
```python
from nni.compression.torch.compressor import Quantizer, QuantGrad, QuantType
class ClipGrad(QuantGrad):
@staticmethod
def quant_backward(tensor, grad_output, quant_type):
"""
This method should be overrided by subclass to provide customized backward function,
default implementation is Straight-Through Estimator
Parameters
----------
tensor : Tensor
input of quantization operation
grad_output : Tensor
gradient of the output of quantization operation
quant_type : QuantType
the type of quantization, it can be `QuantType.QUANT_INPUT`, `QuantType.QUANT_WEIGHT`, `QuantType.QUANT_OUTPUT`,
you can define different behavior for different types.
Returns
-------
tensor
gradient of the input of quantization operation
"""
# for quant_output function, set grad to zero if the absolute value of tensor is larger than 1
if quant_type == QuantType.QUANT_OUTPUT:
grad_output[torch.abs(tensor) > 1] = 0
return grad_output
### Usage of user customized compression algorithm class YourQuantizer(Quantizer):
def __init__(self, model, config_list):
super().__init__(model, config_list)
# set your customized backward function to overwrite default backward function
self.quant_grad = ClipGrad
```
If you do not customize `QuantGrad`, the default backward is Straight-Through Estimator.
_Coming Soon_ ... _Coming Soon_ ...
## **Reference and Feedback** ## **Reference and Feedback**
......
Pruner on NNI Compressor Pruner on NNI Compressor
=== ===
Index of supported pruning algorithms
* [Level Pruner](#level-pruner)
* [AGP Pruner](#agp-pruner)
* [Lottery Ticket Hypothesis](#lottery-ticket-hypothesis)
* [Slim Pruner](#slim-pruner)
* [Filter Pruners with Weight Rank](#weightrankfilterpruner)
* [FPGM Pruner](#fpgm-pruner)
* [L1Filter Pruner](#l1filter-pruner)
* [L2Filter Pruner](#l2filter-pruner)
* [Filter Pruners with Activation Rank](#activationrankfilterpruner)
* [APoZ Rank Pruner](#activationapozrankfilterpruner)
* [Activation Mean Rank Pruner](#activationmeanrankfilterpruner)
## Level Pruner ## Level Pruner
This is one basic one-shot pruner: you can set a target sparsity level (expressed as a fraction, 0.6 means we will prune 60%). This is one basic one-shot pruner: you can set a target sparsity level (expressed as a fraction, 0.6 means we will prune 60%).
...@@ -131,13 +144,43 @@ The above configuration means that there are 5 times of iterative pruning. As th ...@@ -131,13 +144,43 @@ The above configuration means that there are 5 times of iterative pruning. As th
* **sparsity:** The final sparsity when the compression is done. * **sparsity:** The final sparsity when the compression is done.
*** ***
## Slim Pruner
This is an one-shot pruner, In ['Learning Efficient Convolutional Networks through Network Slimming'](https://arxiv.org/pdf/1708.06519.pdf), authors Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan and Changshui Zhang.
![](../../img/slim_pruner.png)
> Slim Pruner **prunes channels in the convolution layers by masking corresponding scaling factors in the later BN layers**, L1 regularization on the scaling factors should be applied in batch normalization (BN) layers while training, scaling factors of BN layers are **globally ranked** while pruning, so the sparse model can be automatically found given sparsity.
### Usage
PyTorch code
```python
from nni.compression.torch import SlimPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
pruner = SlimPruner(model, config_list)
pruner.compress()
```
#### User configuration for Slim Pruner
- **sparsity:** This is to specify the sparsity operations to be compressed to
- **op_types:** Only BatchNorm2d is supported in Slim Pruner
## WeightRankFilterPruner ## WeightRankFilterPruner
WeightRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the weights in convolution layers to achieve a preset level of network sparsity WeightRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the weights in convolution layers to achieve a preset level of network sparsity
### 1, FPGM Pruner ### FPGM Pruner
This is an one-shot pruner, FPGM Pruner is an implementation of paper [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf) This is an one-shot pruner, FPGM Pruner is an implementation of paper [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf)
FPGMPruner prune filters with the smallest geometric median
![](../../img/fpgm_fig1.png)
>Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance. >Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
#### Usage #### Usage
...@@ -181,9 +224,9 @@ You can view example for more information ...@@ -181,9 +224,9 @@ You can view example for more information
*** ***
### 2, L1Filter Pruner ### L1Filter Pruner
This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf. This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf. The reproduced experiment results can be found [here](l1filterpruner.md)
![](../../img/l1filter_pruner.png) ![](../../img/l1filter_pruner.png)
...@@ -217,9 +260,9 @@ pruner.compress() ...@@ -217,9 +260,9 @@ pruner.compress()
*** ***
### 3, L2Filter Pruner ### L2Filter Pruner
This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights. This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights. It is implemented as a one-shot pruner.
#### Usage #### Usage
...@@ -240,9 +283,13 @@ pruner.compress() ...@@ -240,9 +283,13 @@ pruner.compress()
## ActivationRankFilterPruner ## ActivationRankFilterPruner
ActivationRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the output activations of convolution layers to achieve a preset level of network sparsity ActivationRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the output activations of convolution layers to achieve a preset level of network sparsity
### 1, ActivationAPoZRankFilterPruner ### ActivationAPoZRankFilterPruner
We implemented it as a one-shot pruner, it prunes convolutional layers based on the criterion `APoZ` which is explained in the paper [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250). Iterative pruning based on `APoZ` will be supported in future release.
This is an one-shot pruner, ActivationAPoZRankFilterPruner is an implementation of paper [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250) The APoZ is defined as:
![](../../img/apoz.png)
#### Usage #### Usage
...@@ -269,9 +316,9 @@ You can view example for more information ...@@ -269,9 +316,9 @@ You can view example for more information
*** ***
### 2, ActivationMeanRankFilterPruner ### ActivationMeanRankFilterPruner
This is an one-shot pruner, ActivationMeanRankFilterPruner is an implementation of paper [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440) We implemented it as a one-shot pruner, it prunes convolutional layers based on the criterion `mean activation` which is explained in section 2.2 of the paper[Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440). Other pruning criteria mentioned in this paper will be supported in future release.
#### Usage #### Usage
...@@ -296,28 +343,4 @@ You can view example for more information ...@@ -296,28 +343,4 @@ You can view example for more information
- **sparsity:** How much percentage of convolutional filters are to be pruned. - **sparsity:** How much percentage of convolutional filters are to be pruned.
- **op_types:** Only Conv2d is supported in ActivationMeanRankFilterPruner - **op_types:** Only Conv2d is supported in ActivationMeanRankFilterPruner
*** ***
\ No newline at end of file
## Slim Pruner
This is an one-shot pruner, In ['Learning Efficient Convolutional Networks through Network Slimming'](https://arxiv.org/pdf/1708.06519.pdf), authors Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan and Changshui Zhang.
![](../../img/slim_pruner.png)
> Slim Pruner **prunes channels in the convolution layers by masking corresponding scaling factors in the later BN layers**, L1 regularization on the scaling factors should be applied in batch normalization (BN) layers while training, scaling factors of BN layers are **globally ranked** while pruning, so the sparse model can be automatically found given sparsity.
### Usage
PyTorch code
```python
from nni.compression.torch import SlimPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
pruner = SlimPruner(model, config_list)
pruner.compress()
```
#### User configuration for Slim Pruner
- **sparsity:** This is to specify the sparsity operations to be compressed to
- **op_types:** Only BatchNorm2d is supported in Slim Pruner
...@@ -6,12 +6,10 @@ We provide Naive Quantizer to quantizer weight to default 8 bits, you can use it ...@@ -6,12 +6,10 @@ We provide Naive Quantizer to quantizer weight to default 8 bits, you can use it
### Usage ### Usage
tensorflow tensorflow
```python ```python nni.compression.tensorflow.NaiveQuantizer(model_graph).compress()
nni.compressors.tensorflow.NaiveQuantizer(model_graph).compress()
``` ```
pytorch pytorch
```python ```python nni.compression.torch.NaiveQuantizer(model).compress()
nni.compressors.torch.NaiveQuantizer(model).compress()
``` ```
*** ***
...@@ -29,7 +27,7 @@ You can quantize your model to 8 bits with the code below before your training c ...@@ -29,7 +27,7 @@ You can quantize your model to 8 bits with the code below before your training c
PyTorch code PyTorch code
```python ```python
from nni.compressors.torch import QAT_Quantizer from nni.compression.torch import QAT_Quantizer
model = Mnist() model = Mnist()
config_list = [{ config_list = [{
...@@ -51,22 +49,9 @@ quantizer.compress() ...@@ -51,22 +49,9 @@ quantizer.compress()
You can view example for more information You can view example for more information
#### User configuration for QAT Quantizer #### User configuration for QAT Quantizer
* **quant_types:** : list of string common configuration needed by compression algorithms can be found at : [Common configuration](./Overview.md#User-configuration-for-a-compression-algorithm)
type of quantization you want to apply, currently support 'weight', 'input', 'output'.
* **op_types:** list of string
specify the type of modules that will be quantized. eg. 'Conv2D'
* **op_names:** list of string
specify the name of modules that will be quantized. eg. 'conv1' configuration needed by this algorithm :
* **quant_bits:** int or dict of {str : int}
bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8},
when the type is int, all quantization types share same bits length.
* **quant_start_step:** int * **quant_start_step:** int
...@@ -85,7 +70,7 @@ To implement DoReFa Quantizer, you can add code below before your training code ...@@ -85,7 +70,7 @@ To implement DoReFa Quantizer, you can add code below before your training code
PyTorch code PyTorch code
```python ```python
from nni.compressors.torch import DoReFaQuantizer from nni.compression.torch import DoReFaQuantizer
config_list = [{ config_list = [{
'quant_types': ['weight'], 'quant_types': ['weight'],
'quant_bits': 8, 'quant_bits': 8,
...@@ -98,22 +83,9 @@ quantizer.compress() ...@@ -98,22 +83,9 @@ quantizer.compress()
You can view example for more information You can view example for more information
#### User configuration for DoReFa Quantizer #### User configuration for DoReFa Quantizer
* **quant_types:** : list of string common configuration needed by compression algorithms can be found at : [Common configuration](./Overview.md#User-configuration-for-a-compression-algorithm)
type of quantization you want to apply, currently support 'weight', 'input', 'output'.
* **op_types:** list of string
specify the type of modules that will be quantized. eg. 'Conv2D'
* **op_names:** list of string
specify the name of modules that will be quantized. eg. 'conv1' configuration needed by this algorithm :
* **quant_bits:** int or dict of {str : int}
bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8},
when the type is int, all quantization types share same bits length.
## BNN Quantizer ## BNN Quantizer
...@@ -130,13 +102,13 @@ from nni.compression.torch import BNNQuantizer ...@@ -130,13 +102,13 @@ from nni.compression.torch import BNNQuantizer
model = VGG_Cifar10(num_classes=10) model = VGG_Cifar10(num_classes=10)
configure_list = [{ configure_list = [{
'quant_types': ['weight'],
'quant_bits': 1, 'quant_bits': 1,
'quant_types': ['weight'],
'op_types': ['Conv2d', 'Linear'], 'op_types': ['Conv2d', 'Linear'],
'op_names': ['features.0', 'features.3', 'features.7', 'features.10', 'features.14', 'features.17', 'classifier.0', 'classifier.3'] 'op_names': ['features.0', 'features.3', 'features.7', 'features.10', 'features.14', 'features.17', 'classifier.0', 'classifier.3']
}, { }, {
'quant_types': ['output'],
'quant_bits': 1, 'quant_bits': 1,
'quant_types': ['output'],
'op_types': ['Hardtanh'], 'op_types': ['Hardtanh'],
'op_names': ['features.6', 'features.9', 'features.13', 'features.16', 'features.20', 'classifier.2', 'classifier.5'] 'op_names': ['features.6', 'features.9', 'features.13', 'features.16', 'features.20', 'classifier.2', 'classifier.5']
}] }]
...@@ -148,22 +120,9 @@ model = quantizer.compress() ...@@ -148,22 +120,9 @@ model = quantizer.compress()
You can view example [examples/model_compress/BNN_quantizer_cifar10.py]( https://github.com/microsoft/nni/tree/master/examples/model_compress/BNN_quantizer_cifar10.py) for more information. You can view example [examples/model_compress/BNN_quantizer_cifar10.py]( https://github.com/microsoft/nni/tree/master/examples/model_compress/BNN_quantizer_cifar10.py) for more information.
#### User configuration for BNN Quantizer #### User configuration for BNN Quantizer
* **quant_types:** : list of string common configuration needed by compression algorithms can be found at : [Common configuration](./Overview.md#User-configuration-for-a-compression-algorithm)
type of quantization you want to apply, currently support 'weight', 'input', 'output'.
* **op_types:** list of string
specify the type of modules that will be quantized. eg. 'Conv2D'
* **op_names:** list of string
specify the name of modules that will be quantized. eg. 'conv1'
* **quant_bits:** int or dict of {str : int}
bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8}, configuration needed by this algorithm :
when the type is int, all quantization types share same bits length.
### Experiment ### Experiment
We implemented one of the experiments in [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/abs/1602.02830), we quantized the **VGGNet** for CIFAR-10 in the paper. Our experiments results are as follows: We implemented one of the experiments in [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/abs/1602.02830), we quantized the **VGGNet** for CIFAR-10 in the paper. Our experiments results are as follows:
......
WeightRankFilterPruner on NNI Compressor L1FilterPruner on NNI
=== ===
## 1. Introduction ## Introduction
WeightRankFilterPruner is a series of pruners which prune filters according to some importance criterion calculated from the filters' weight.
| Pruner | Importance criterion | Reference paper |
| :------------: | :-------------------------: | :----------------------------------------------------------: |
| L1FilterPruner | L1 norm of weights | [PRUNING FILTERS FOR EFFICIENT CONVNETS](https://arxiv.org/abs/1608.08710) |
| L2FilterPruner | L2 norm of weights | |
| FPGMPruner | Geometric Median of weights | [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf) |
## 2. Pruners
### L1FilterPruner
L1FilterPruner is a general structured pruning algorithm for pruning filters in the convolutional layers. L1FilterPruner is a general structured pruning algorithm for pruning filters in the convolutional layers.
...@@ -33,37 +21,7 @@ In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), ...@@ -33,37 +21,7 @@ In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710),
> 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel > 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
> weights are copied to the new model. > weights are copied to the new model.
### L2FilterPruner ## Experiment
L2FilterPruner is similar to L1FilterPruner, but only replace the importance criterion from L1 norm to L2 norm
### FPGMPruner
Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang
"[Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/abs/1811.00250)", CVPR 2019.
FPGMPruner prune filters with the smallest geometric median
![](../../img/fpgm_fig1.png)
## 3. Usage
PyTorch code
```
from nni.compression.torch import L1FilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'], 'op_names': ['conv1', 'conv2'] }]
pruner = L1FilterPruner(model, config_list)
pruner.compress()
```
#### User configuration for L1Filter Pruner
- **sparsity:** This is to specify the sparsity operations to be compressed to
- **op_types:** Only Conv2d is supported in L1Filter Pruner
## 4. Experiment
We implemented one of the experiments in ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710) with **L1FilterPruner**, we pruned **VGG-16** for CIFAR-10 to **VGG-16-pruned-A** in the paper, in which $64\%$ parameters are pruned. Our experiments results are as follows: We implemented one of the experiments in ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710) with **L1FilterPruner**, we pruned **VGG-16** for CIFAR-10 to **VGG-16-pruned-A** in the paper, in which $64\%$ parameters are pruned. Our experiments results are as follows:
......
# DARTS on NNI # DARTS
## Introduction ## Introduction
The paper [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Their method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent The paper [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Their method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent.
To implement, authors optimize the network weights and architecture weights alternatively in mini-batches. They further explore the possibility that uses second order optimization (unroll) instead of first order, to improve the performance. Authors' code optimizes the network weights and architecture weights alternatively in mini-batches. They further explore the possibility that uses second order optimization (unroll) instead of first order, to improve the performance.
Implementation on NNI is based on the [official implementation](https://github.com/quark0/darts) and a [popular 3rd-party repo](https://github.com/khanrc/pt.darts). So far, first and second order optimization and training from scratch on CIFAR10 have been implemented. Implementation on NNI is based on the [official implementation](https://github.com/quark0/darts) and a [popular 3rd-party repo](https://github.com/khanrc/pt.darts). DARTS on NNI is designed to be general for arbitrary search space. A CNN search space tailored for CIFAR10, same as the original paper, is implemented as a use case of DARTS.
## Reproduce Results ## Reproduction Results
To reproduce the results in the paper, we do experiments with first and second order optimization. Due to the time limit, we retrain *only the best architecture* derived from the search phase and we repeat the experiment *only once*. Our results is currently on par with the results reported in paper. We will add more results later when ready. The above-mentioned example is meant to reproduce the results in the paper, we do experiments with first and second order optimization. Due to the time limit, we retrain *only the best architecture* derived from the search phase and we repeat the experiment *only once*. Our results is currently on par with the results reported in paper. We will add more results later when ready.
| | In paper | Reproduction | | | In paper | Reproduction |
| ---------------------- | ------------- | ------------ | | ---------------------- | ------------- | ------------ |
| First order (CIFAR10) | 3.00 +/- 0.14 | 2.78 | | First order (CIFAR10) | 3.00 +/- 0.14 | 2.78 |
| Second order (CIFAR10) | 2.76 +/- 0.09 | 2.89 | | Second order (CIFAR10) | 2.76 +/- 0.09 | 2.89 |
## Examples
### CNN Search Space
[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/darts)
```bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/darts
python3 search.py
# train the best architecture
python3 retrain.py --arc-checkpoint ./checkpoints/epoch_49.json
```
## Reference
### PyTorch
```eval_rst
.. autoclass:: nni.nas.pytorch.darts.DartsTrainer
:members:
.. automethod:: __init__
.. autoclass:: nni.nas.pytorch.darts.DartsMutator
:members:
```
# ENAS on NNI # ENAS
## Introduction ## Introduction
The paper [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268) uses parameter sharing between child models to accelerate the NAS process. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. The paper [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268) uses parameter sharing between child models to accelerate the NAS process. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss.
Implementation on NNI is based on the [official implementation in Tensorflow](https://github.com/melodyguan/enas), macro and micro search space on CIFAR10 included. Since code to train from scratch on NNI is not ready yet, reproduction results are currently unavailable. Implementation on NNI is based on the [official implementation in Tensorflow](https://github.com/melodyguan/enas), including a general-purpose Reinforcement-learning controller and a trainer that trains target network and this controller alternatively. Following paper, we have also implemented macro and micro search space on CIFAR10 to demonstrate how to use these trainers. Since code to train from scratch on NNI is not ready yet, reproduction results are currently unavailable.
## Examples
### CIFAR10 Macro/Micro Search Space
[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/enas)
```bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/enas
# search in macro search space
python3 search.py --search-for macro
# search in micro search space
python3 search.py --search-for micro
# view more options for search
python3 search.py -h
```
## Reference
### PyTorch
```eval_rst
.. autoclass:: nni.nas.pytorch.enas.EnasTrainer
:members:
.. automethod:: __init__
.. autoclass:: nni.nas.pytorch.enas.EnasMutator
:members:
.. automethod:: __init__
```
...@@ -6,7 +6,7 @@ However, it takes great efforts to implement NAS algorithms, and it is hard to r ...@@ -6,7 +6,7 @@ However, it takes great efforts to implement NAS algorithms, and it is hard to r
With this motivation, our ambition is to provide a unified architecture in NNI, to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster. With this motivation, our ambition is to provide a unified architecture in NNI, to accelerate innovations on NAS, and apply state-of-art algorithms on real world problems faster.
With [the unified interface](./NasInterface.md), there are two different modes for the architecture search. [The one](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on search space, and using one shot training to generate good-performing child model. [The other](./NasInterface.md#classic-distributed-search) is the traditional searching approach, where each child model in search space runs as an independent trial, the performance result is sent to tuner and the tuner generates new child model. With [the unified interface](./NasInterface.md), there are two different modes for the architecture search. [One](#supported-one-shot-nas-algorithms) is the so-called one-shot NAS, where a super-net is built based on search space, and using one shot training to generate good-performing child model. [The other](./NasInterface.md#classic-distributed-search) is the traditional searching approach, where each child model in search space runs as an independent trial, the performance result is sent to tuner and the tuner generates new child model.
* [Supported One-shot NAS Algorithms](#supported-one-shot-nas-algorithms) * [Supported One-shot NAS Algorithms](#supported-one-shot-nas-algorithms)
* [Classic Distributed NAS with NNI experiment](./NasInterface.md#classic-distributed-search) * [Classic Distributed NAS with NNI experiment](./NasInterface.md#classic-distributed-search)
...@@ -14,85 +14,24 @@ With [the unified interface](./NasInterface.md), there are two different modes f ...@@ -14,85 +14,24 @@ With [the unified interface](./NasInterface.md), there are two different modes f
## Supported One-shot NAS Algorithms ## Supported One-shot NAS Algorithms
NNI supports below NAS algorithms now and being adding more. User can reproduce an algorithm or use it on owned dataset. we also encourage user to implement other algorithms with [NNI API](#use-nni-api), to benefit more people. NNI supports below NAS algorithms now and is adding more. User can reproduce an algorithm or use it on their own dataset. We also encourage users to implement other algorithms with [NNI API](#use-nni-api), to benefit more people.
|Name|Brief Introduction of Algorithm| |Name|Brief Introduction of Algorithm|
|---|---| |---|---|
| [ENAS](#enas) | Efficient Neural Architecture Search via Parameter Sharing [Reference Paper][1] | | [ENAS](ENAS.md) | [Efficient Neural Architecture Search via Parameter Sharing](https://arxiv.org/abs/1802.03268). In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance. |
| [DARTS](#darts) | DARTS: Differentiable Architecture Search [Reference Paper][3] | | [DARTS](DARTS.md) | [DARTS: Differentiable Architecture Search](https://arxiv.org/abs/1806.09055) introduces a novel algorithm for differentiable network architecture search on bilevel optimization. |
| [P-DARTS](#p-darts) | Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation [Reference Paper](https://arxiv.org/abs/1904.12760)| | [P-DARTS](PDARTS.md) | [Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) is based on DARTS. It introduces an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure. |
| [SPOS](SPOS.md) | [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) constructs a simplified supernet trained with an uniform path sampling method, and applies an evolutionary algorithm to efficiently search for the best-performing architectures. |
Note, these algorithms run **standalone without nnictl**, and supports PyTorch only. Tensorflow 2.0 will be supported in future release. One-shot algorithms run **standalone without nnictl**. Only PyTorch version has been implemented. Tensorflow 2.x will be supported in future release.
### Dependencies Here are some common dependencies to run the examples. PyTorch needs to be above 1.2 to use ``BoolTensor``.
* NNI 1.2+ * NNI 1.2+
* tensorboard * tensorboard
* PyTorch 1.2+ * PyTorch 1.2+
* git * git
### ENAS
[Efficient Neural Architecture Search via Parameter Sharing][1]. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance.
#### Usage
ENAS in NNI is still under development and we only support search phase for macro/micro search space on CIFAR10. Training from scratch and search space on PTB has not been finished yet. [Detailed Description](ENAS.md)
```bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/enas
# search in macro search space
python3 search.py --search-for macro
# search in micro search space
python3 search.py --search-for micro
# view more options for search
python3 search.py -h
```
### DARTS
The main contribution of [DARTS: Differentiable Architecture Search][3] on algorithm is to introduce a novel algorithm for differentiable network architecture search on bilevel optimization. [Detailed Description](DARTS.md)
#### Usage
```bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/darts
python3 search.py
# train the best architecture
python3 retrain.py --arc-checkpoint ./checkpoints/epoch_49.json
```
### P-DARTS
[Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation](https://arxiv.org/abs/1904.12760) bases on [DARTS](#DARTS). It's contribution on algorithm is to introduce an efficient algorithm which allows the depth of searched architectures to grow gradually during the training procedure.
#### Usage
```bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/pdarts
python3 search.py
# train the best architecture, it's the same progress as darts.
cd ../darts
python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json
```
## Use NNI API ## Use NNI API
NOTE, we are trying to support various NAS algorithms with unified programming interface, and it's in very experimental stage. It means the current programing interface may be updated in future. NOTE, we are trying to support various NAS algorithms with unified programming interface, and it's in very experimental stage. It means the current programing interface may be updated in future.
...@@ -104,7 +43,7 @@ The programming interface of designing and searching a model is often demanded i ...@@ -104,7 +43,7 @@ The programming interface of designing and searching a model is often demanded i
1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs best. So, it needs an easy way to express the candidate layers or sub-models. 1. When designing a neural network, there may be multiple operation choices on a layer, sub-model, or connection, and it's undetermined which one or combination performs best. So, it needs an easy way to express the candidate layers or sub-models.
2. When applying NAS on a neural network, it needs an unified way to express the search space of architectures, so that it doesn't need to update trial code for different searching algorithms. 2. When applying NAS on a neural network, it needs an unified way to express the search space of architectures, so that it doesn't need to update trial code for different searching algorithms.
NNI proposed API is [here](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch). And [here](https://github.com/microsoft/nni/tree/master/examples/nas/darts) is an example of NAS implementation, which bases on NNI proposed interface. NNI proposed API is [here](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/nas/pytorch). And [here](https://github.com/microsoft/nni/tree/master/examples/nas/naive) is an example of NAS implementation, which bases on NNI proposed interface.
[1]: https://arxiv.org/abs/1802.03268 [1]: https://arxiv.org/abs/1802.03268
[2]: https://arxiv.org/abs/1707.07012 [2]: https://arxiv.org/abs/1707.07012
......
# P-DARTS
## Examples
[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/pdarts)
```bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/pdarts
python3 search.py
# train the best architecture, it's the same progress as darts.
cd ../darts
python3 retrain.py --arc-checkpoint ../pdarts/checkpoints/epoch_2.json
```
# Single Path One-Shot (SPOS)
## Introduction
Proposed in [Single Path One-Shot Neural Architecture Search with Uniform Sampling](https://arxiv.org/abs/1904.00420) is a one-shot NAS method that addresses the difficulties in training One-Shot NAS models by constructing a simplified supernet trained with an uniform path sampling method, so that all underlying architectures (and their weights) get trained fully and equally. An evolutionary algorithm is then applied to efficiently search for the best-performing architectures without any fine tuning.
Implementation on NNI is based on [official repo](https://github.com/megvii-model/SinglePathOneShot). We implement a trainer that trains the supernet and a evolution tuner that leverages the power of NNI framework that speeds up the evolutionary search phase. We have also shown
## Examples
Here is a use case, which is the search space in paper, and the way to use flops limit to perform uniform sampling.
[Example code](https://github.com/microsoft/nni/tree/master/examples/nas/spos)
### Requirements
NVIDIA DALI >= 0.16 is needed as we use DALI to accelerate the data loading of ImageNet. [Installation guide](https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/installation.html)
Download the flops lookup table from [here](https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN) (maintained by [Megvii](https://github.com/megvii-model)).
Put `op_flops_dict.pkl` and `checkpoint-150000.pth.tar` (if you don't want to retrain the supernet) under `data` directory.
Prepare ImageNet in the standard format (follow the script [here](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4)). Linking it to `data/imagenet` will be more convenient.
After preparation, it's expected to have the following code structure:
```
spos
├── architecture_final.json
├── blocks.py
├── config_search.yml
├── data
│ ├── imagenet
│ │ ├── train
│ │ └── val
│ └── op_flops_dict.pkl
├── dataloader.py
├── network.py
├── readme.md
├── scratch.py
├── supernet.py
├── tester.py
├── tuner.py
└── utils.py
```
### Step 1. Train Supernet
```
python supernet.py
```
Will export the checkpoint to `checkpoints` directory, for the next step.
NOTE: The data loading used in the official repo is [slightly different from usual](https://github.com/megvii-model/SinglePathOneShot/issues/5), as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option `--spos-preprocessing` will simulate the behavior used originally and enable you to use the checkpoints pretrained.
### Step 2. Evolution Search
Single Path One-Shot leverages evolution algorithm to search for the best architecture. The tester, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
In order to make the tuner aware of the flops limit and have the ability to calculate the flops, we created a new tuner called `EvolutionWithFlops` in `tuner.py`, inheriting the tuner in SDK.
To have a search space ready for NNI framework, first run
```
nnictl ss_gen -t "python tester.py"
```
This will generate a file called `nni_auto_gen_search_space.json`, which is a serialized representation of your search space.
By default, it will use `checkpoint-150000.pth.tar` downloaded previously. In case you want to use the checkpoint trained by yourself from the last step, specify `--checkpoint` in the command in `config_search.yml`.
Then search with evolution tuner.
```
nnictl create --config config_search.yml
```
The final architecture exported from every epoch of evolution can be found in `checkpoints` under the working directory of your tuner, which, by default, is `$HOME/nni/experiments/your_experiment_id/log`.
### Step 3. Train from Scratch
```
python scratch.py
```
By default, it will use `architecture_final.json`. This architecture is provided by the official repo (converted into NNI format). You can use any architecture (e.g., the architecture found in step 2) with `--fixed-arc` option.
## Reference
### PyTorch
```eval_rst
.. autoclass:: nni.nas.pytorch.spos.SPOSEvolution
:members:
.. automethod:: __init__
.. autoclass:: nni.nas.pytorch.spos.SPOSSupernetTrainer
:members:
.. automethod:: __init__
.. autoclass:: nni.nas.pytorch.spos.SPOSSupernetTrainingMutator
:members:
.. automethod:: __init__
```
## Known Limitations
* Block search only. Channel search is not supported yet.
* Only GPU version is provided here.
## Current Reproduction Results
Reproduction is still undergoing. Due to the gap between official release and original paper, we compare our current results with official repo (our run) and paper.
* Evolution phase is almost aligned with official repo. Our evolution algorithm shows a converging trend and reaches ~65% accuracy at the end of search. Nevertheless, this result is not on par with paper. For details, please refer to [this issue](https://github.com/megvii-model/SinglePathOneShot/issues/6).
* Retrain phase is not aligned. Our retraining code, which uses the architecture released by the authors, reaches 72.14% accuracy, still having a gap towards 73.61% by official release and 74.3% reported in original paper.
# ChangeLog # ChangeLog
## Release 1.3 - 12/30/2019
### Major Features
#### Neural Architecture Search Algorithms Support
* [Single Path One Shot](https://github.com/microsoft/nni/tree/v1.3/examples/nas/spos/) algorithm and the example using it
#### Model Compression Algorithms Support
* [Knowledge Distillation](https://github.com/microsoft/nni/blob/v1.3/docs/en_US/TrialExample/KDExample.md) algorithm and the example using itExample
* Pruners
* [L2Filter Pruner](https://github.com/microsoft/nni/blob/v1.3/docs/en_US/Compressor/Pruner.md#3-l2filter-pruner)
* [ActivationAPoZRankFilterPruner](https://github.com/microsoft/nni/blob/v1.3/docs/en_US/Compressor/Pruner.md#1-activationapozrankfilterpruner)
* [ActivationMeanRankFilterPruner](https://github.com/microsoft/nni/blob/v1.3/docs/en_US/Compressor/Pruner.md#2-activationmeanrankfilterpruner)
* [BNN Quantizer](https://github.com/microsoft/nni/blob/v1.3/docs/en_US/Compressor/Quantizer.md#bnn-quantizer)
#### Training Service
* NFS Support for PAI
Instead of using HDFS as default storage, since OpenPAI v0.11, OpenPAI can have NFS or AzureBlob or other storage as default storage. In this release, NNI extended the support for this recent change made by OpenPAI, and could integrate with OpenPAI v0.11 or later version with various default storage.
* Kubeflow update adoption
Adopted the Kubeflow 0.7's new supports for tf-operator.
### Engineering (code and build automation)
* Enforced [ESLint](https://eslint.org/) on static code analysis.
### Small changes & Bug Fixes
* correctly recognize builtin tuner and customized tuner
* logging in dispatcher base
* fix the bug where tuner/assessor's failure sometimes kills the experiment.
* Fix local system as remote machine [issue](https://github.com/microsoft/nni/issues/1852)
* de-duplicate trial configuration in smac tuner [ticket](https://github.com/microsoft/nni/issues/1364)
## Release 1.2 - 12/02/2019 ## Release 1.2 - 12/02/2019
### Major Features ### Major Features
......
...@@ -30,4 +30,4 @@ for batch_idx, (data, target) in enumerate(train_loader): ...@@ -30,4 +30,4 @@ for batch_idx, (data, target) in enumerate(train_loader):
* **kd_teacher_model:** The pre-trained teacher model * **kd_teacher_model:** The pre-trained teacher model
* **kd_T:** Temperature for smoothing teacher model's output * **kd_T:** Temperature for smoothing teacher model's output
The complete code can be found here The complete code can be found [here](https://github.com/microsoft/nni/tree/v1.3/examples/model_compress/knowledge_distill/)
\ No newline at end of file
...@@ -22,4 +22,5 @@ For details, please refer to the following tutorials: ...@@ -22,4 +22,5 @@ For details, please refer to the following tutorials:
NAS Interface <NAS/NasInterface> NAS Interface <NAS/NasInterface>
ENAS <NAS/ENAS> ENAS <NAS/ENAS>
DARTS <NAS/DARTS> DARTS <NAS/DARTS>
P-DARTS <NAS/Overview> P-DARTS <NAS/PDARTS>
SPOS <NAS/SPOS>
[Documentation](https://nni.readthedocs.io/en/latest/NAS/DARTS.html)
[Documentation](https://nni.readthedocs.io/en/latest/NAS/ENAS.html)
This is a naive example that demonstrates how to use NNI interface to implement a NAS search space.
\ No newline at end of file
[Documentation](https://nni.readthedocs.io/en/latest/NAS/PDARTS.html)
# Single Path One-Shot Neural Architecture Search with Uniform Sampling [Documentation](https://nni.readthedocs.io/en/latest/NAS/SPOS.html)
Single Path One-Shot by Megvii Research. [Paper link](https://arxiv.org/abs/1904.00420). [Official repo](https://github.com/megvii-model/SinglePathOneShot).
Block search only. Channel search is not supported yet.
Only GPU version is provided here.
## Preparation
### Requirements
* PyTorch >= 1.2
* NVIDIA DALI >= 0.16 as we use DALI to accelerate the data loading of ImageNet. [Installation guide](https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/installation.html)
### Data
Need to download the flops lookup table from [here](https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN).
Put `op_flops_dict.pkl` and `checkpoint-150000.pth.tar` (if you don't want to retrain the supernet) under `data` directory.
Prepare ImageNet in the standard format (follow the script [here](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4)). Link it to `data/imagenet` will be more convenient.
After preparation, it's expected to have the following code structure:
```
spos
├── architecture_final.json
├── blocks.py
├── config_search.yml
├── data
│   ├── imagenet
│   │   ├── train
│   │   └── val
│   └── op_flops_dict.pkl
├── dataloader.py
├── network.py
├── readme.md
├── scratch.py
├── supernet.py
├── tester.py
├── tuner.py
└── utils.py
```
## Step 1. Train Supernet
```
python supernet.py
```
Will export the checkpoint to checkpoints directory, for the next step.
NOTE: The data loading used in the official repo is [slightly different from usual](https://github.com/megvii-model/SinglePathOneShot/issues/5), as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option `--spos-preprocessing` will simulate the behavior used originally and enable you to use the checkpoints pretrained.
## Step 2. Evolution Search
Single Path One-Shot leverages evolution algorithm to search for the best architecture. The tester, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
To have a search space ready for NNI framework, first run
```
nnictl ss_gen -t "python tester.py"
```
This will generate a file called `nni_auto_gen_search_space.json`, which is a serialized representation of your search space.
Then search with evolution tuner.
```
nnictl create --config config_search.yml
```
The final architecture exported from every epoch of evolution can be found in `checkpoints` under the working directory of your tuner, which, by default, is `$HOME/nni/experiments/your_experiment_id/log`.
## Step 3. Train from Scratch
```
python scratch.py
```
By default, it will use `architecture_final.json`. This architecture is provided by the official repo (converted into NNI format). You can use any architecture (e.g., the architecture found in step 2) with `--fixed-arc` option.
## Current Reproduction Results
Reproduction is still undergoing. Due to the gap between official release and original paper, we compare our current results with official repo (our run) and paper.
* Evolution phase is almost aligned with official repo. Our evolution algorithm shows a converging trend and reaches ~65% accuracy at the end of search. Nevertheless, this result is not on par with paper. For details, please refer to [this issue](https://github.com/megvii-model/SinglePathOneShot/issues/6).
* Retrain phase is not aligned. Our retraining code, which uses the architecture released by the authors, reaches 72.14% accuracy, still having a gap towards 73.61% by official release and 74.3% reported in original paper.
...@@ -11,6 +11,6 @@ tuner: ...@@ -11,6 +11,6 @@ tuner:
classFileName: tuner.py classFileName: tuner.py
className: EvolutionWithFlops className: EvolutionWithFlops
trial: trial:
command: python tester.py --imagenet-dir /path/to/your/imagenet --spos-prep command: python tester.py --spos-prep
codeDir: . codeDir: .
gpuNum: 1 gpuNum: 1
...@@ -9,7 +9,7 @@ import { TrialJobApplicationForm, TrialJobDetail, TrialJobStatus } from '../../ ...@@ -9,7 +9,7 @@ import { TrialJobApplicationForm, TrialJobDetail, TrialJobStatus } from '../../
export class PAIClusterConfig { export class PAIClusterConfig {
public readonly userName: string; public readonly userName: string;
public readonly passWord?: string; public readonly passWord?: string;
public readonly host: string; public host: string;
public readonly token?: string; public readonly token?: string;
/** /**
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment