@@ -23,7 +23,7 @@ The experiments are performed with the following pruners/datasets/models:
...
@@ -23,7 +23,7 @@ The experiments are performed with the following pruners/datasets/models:
For the pruners with scheduling, `L1Filter Pruner` is used as the base algorithm. That is to say, after the sparsities distribution is decided by the scheduling algorithm, `L1Filter Pruner` is used to performn real pruning.
For the pruners with scheduling, `L1Filter Pruner` is used as the base algorithm. That is to say, after the sparsities distribution is decided by the scheduling algorithm, `L1Filter Pruner` is used to performn real pruning.
- All the pruners listed above are implemented in [nni](https://github.com/microsoft/nni/tree/master/docs/en_US/Compressor/Overview.md).
- All the pruners listed above are implemented in [nni](https://github.com/microsoft/nni/tree/master/docs/en_US/Compression/Overview.md).
## Experiment Result
## Experiment Result
...
@@ -60,7 +60,7 @@ From the experiment result, we get the following conclusions:
...
@@ -60,7 +60,7 @@ From the experiment result, we get the following conclusions:
* The experiment results are all collected with the default configuration of the pruners in nni, which means that when we call a pruner class in nni, we don't change any default class arguments.
* The experiment results are all collected with the default configuration of the pruners in nni, which means that when we call a pruner class in nni, we don't change any default class arguments.
* Both FLOPs and the number of parameters are counted with [Model FLOPs/Parameters Counter](https://github.com/microsoft/nni/tree/master/docs/en_US/Compressor/CompressionUtils.md#model-flopsparameters-counter) after [model speed up](https://github.com/microsoft/nni/tree/master/docs/en_US/Compressor/ModelSpeedup.md).
* Both FLOPs and the number of parameters are counted with [Model FLOPs/Parameters Counter](https://github.com/microsoft/nni/tree/master/docs/en_US/Compression/CompressionUtils.md#model-flopsparameters-counter) after [model speed up](https://github.com/microsoft/nni/tree/master/docs/en_US/Compression/ModelSpeedup.md).
This avoids potential issues of counting them of masked models.
This avoids potential issues of counting them of masked models.
* The experiment code can be found [here](https://github.com/microsoft/nni/tree/master/examples/model_compress/auto_pruners_torch.py).
* The experiment code can be found [here](https://github.com/microsoft/nni/tree/master/examples/model_compress/auto_pruners_torch.py).
In order to simplify the process of writing new compression algorithms, we have designed simple and flexible programming interface, which covers pruning and quantization. Below, we first demonstrate how to customize a new pruning algorithm and then demonstrate how to customize a new quantization algorithm.
In order to simplify the process of writing new compression algorithms, we have designed simple and flexible programming interface, which covers pruning and quantization. Below, we first demonstrate how to customize a new pruning algorithm and then demonstrate how to customize a new quantization algorithm.
**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference [Framework overview of model compression](https://nni.readthedocs.io/en/latest/Compressor/Framework.html)
**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference [Framework overview of model compression](https://nni.readthedocs.io/en/latest/Compression/Framework.html)
@@ -27,20 +27,20 @@ Pruning algorithms compress the original network by removing redundant weights o
...
@@ -27,20 +27,20 @@ Pruning algorithms compress the original network by removing redundant weights o
|Name|Brief Introduction of Algorithm|
|Name|Brief Introduction of Algorithm|
|---|---|
|---|---|
| [Level Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
| [Level Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
| [AGP Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
| [AGP Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
| [Lottery Ticket Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#lottery-ticket-hypothesis) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
| [Lottery Ticket Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#lottery-ticket-hypothesis) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
| [FPGM Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
| [FPGM Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
| [L1Filter Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers (Pruning Filters for Efficient Convnets) [Reference Paper](https://arxiv.org/abs/1608.08710) |
| [L1Filter Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers (Pruning Filters for Efficient Convnets) [Reference Paper](https://arxiv.org/abs/1608.08710) |
| [L2Filter Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers |
| [L2Filter Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers |
| [ActivationAPoZRankFilterPruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#activationapozrankfilterpruner) | Pruning filters based on the metric APoZ (average percentage of zeros) which measures the percentage of zeros in activations of (convolutional) layers. [Reference Paper](https://arxiv.org/abs/1607.03250) |
| [ActivationAPoZRankFilterPruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#activationapozrankfilterpruner) | Pruning filters based on the metric APoZ (average percentage of zeros) which measures the percentage of zeros in activations of (convolutional) layers. [Reference Paper](https://arxiv.org/abs/1607.03250) |
| [ActivationMeanRankFilterPruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#activationmeanrankfilterpruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations |
| [ActivationMeanRankFilterPruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#activationmeanrankfilterpruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations |
| [Slim Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) |
| [Slim Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) |
| [TaylorFO Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#taylorfoweightfilterpruner) | Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) [Reference Paper](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf) |
| [TaylorFO Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#taylorfoweightfilterpruner) | Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) [Reference Paper](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf) |
| [ADMM Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#admm-pruner) | Pruning based on ADMM optimization technique [Reference Paper](https://arxiv.org/abs/1804.03294) |
| [ADMM Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#admm-pruner) | Pruning based on ADMM optimization technique [Reference Paper](https://arxiv.org/abs/1804.03294) |
| [NetAdapt Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#netadapt-pruner) | Automatically simplify a pretrained network to meet the resource budget by iterative pruning [Reference Paper](https://arxiv.org/abs/1804.03230) |
| [NetAdapt Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#netadapt-pruner) | Automatically simplify a pretrained network to meet the resource budget by iterative pruning [Reference Paper](https://arxiv.org/abs/1804.03230) |
| [SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#simulatedannealing-pruner) | Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141) |
| [SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#simulatedannealing-pruner) | Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141) |
| [AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#autocompress-pruner) | Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141) |
| [AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#autocompress-pruner) | Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141) |
You can refer to this [benchmark](https://github.com/microsoft/nni/tree/master/docs/en_US/CommunitySharings/ModelCompressionComparison.md) for the performance of these pruners on some benchmark problems.
You can refer to this [benchmark](https://github.com/microsoft/nni/tree/master/docs/en_US/CommunitySharings/ModelCompressionComparison.md) for the performance of these pruners on some benchmark problems.
...
@@ -50,14 +50,14 @@ Quantization algorithms compress the original network by reducing the number of
...
@@ -50,14 +50,14 @@ Quantization algorithms compress the original network by reducing the number of
| [QAT Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
| [QAT Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
| [DoReFa Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
| [DoReFa Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
| [BNN Quantizer](https://nni.readthedocs.io/en/latest/Compressor/Quantizer.html#bnn-quantizer) | Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [Reference Paper](https://arxiv.org/abs/1602.02830)|
| [BNN Quantizer](https://nni.readthedocs.io/en/latest/Compression/Quantizer.html#bnn-quantizer) | Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. [Reference Paper](https://arxiv.org/abs/1602.02830)|
## Automatic Model Compression
## Automatic Model Compression
Given targeted compression ratio, it is pretty hard to obtain the best compressed ratio in a one shot manner. An automatic model compression algorithm usually need to explore the compression space by compressing different layers with different sparsities. NNI provides such algorithms to free users from specifying sparsity of each layer in a model. Moreover, users could leverage NNI's auto tuning power to automatically compress a model. Detailed document can be found [here](./AutoCompression.md).
Given targeted compression ratio, it is pretty hard to obtain the best compressed ratio in a one shot manner. An automatic model compression algorithm usually need to explore the compression space by compressing different layers with different sparsities. NNI provides such algorithms to free users from specifying sparsity of each layer in a model. Moreover, users could leverage NNI's auto tuning power to automatically compress a model. Detailed document can be found [here](./AutoPruningUsingTuners.md).
@@ -8,7 +8,7 @@ In this tutorial, we use the [first section](#quick-start-to-compress-a-model) t
...
@@ -8,7 +8,7 @@ In this tutorial, we use the [first section](#quick-start-to-compress-a-model) t
## Quick Start to Compress a Model
## Quick Start to Compress a Model
NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use [slim pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#slim-pruner) as an example to show the usage.
NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use [slim pruner](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#slim-pruner) as an example to show the usage.
### Write configuration
### Write configuration
...
@@ -175,7 +175,7 @@ In this example, 'op_names' is the name of layer and four layers will be quantiz
...
@@ -175,7 +175,7 @@ In this example, 'op_names' is the name of layer and four layers will be quantiz
### APIs for Updating Fine Tuning Status
### APIs for Updating Fine Tuning Status
Some compression algorithms use epochs to control the progress of compression (e.g. [AGP](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#agp-pruner)), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke: `pruner.update_epoch(epoch)` and `pruner.step()`.
Some compression algorithms use epochs to control the progress of compression (e.g. [AGP](https://nni.readthedocs.io/en/latest/Compression/Pruner.html#agp-pruner)), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke: `pruner.update_epoch(epoch)` and `pruner.step()`.
`update_epoch` should be invoked in every epoch, while `step` should be invoked after each minibatch. Note that most algorithms do not require calling the two APIs. Please refer to each algorithm's document for details. For the algorithms that do not need them, calling them is allowed but has no effect.
`update_epoch` should be invoked in every epoch, while `step` should be invoked after each minibatch. Note that most algorithms do not require calling the two APIs. Please refer to each algorithm's document for details. For the algorithms that do not need them, calling them is allowed but has no effect.
@@ -63,9 +63,11 @@ NNI has support for many one-shot NAS algorithms such as ENAS and DARTS through
...
@@ -63,9 +63,11 @@ NNI has support for many one-shot NAS algorithms such as ENAS and DARTS through
Other than one-shot NAS, NAS can also run in a classic mode where each candidate architecture runs as an independent trial job. In this mode, similar to hyperparameter tuning, users have to start an NNI experiment and choose a tuner for NAS.
Other than one-shot NAS, NAS can also run in a classic mode where each candidate architecture runs as an independent trial job. In this mode, similar to hyperparameter tuning, users have to start an NNI experiment and choose a tuner for NAS.
### Model Compression
### Model Compression
Model Compression on NNI includes pruning algorithms and quantization algorithms. These algorithms are provided through NNI trial SDK. Users can directly use them in their trial code and run the trial code without starting an NNI experiment. A detailed description of model compression and its usage can be found [here](Compressor/Overview.md).
NNI provides an easy-to-use model compression framework to compress deep neural networks, the compressed networks typically have much smaller model size and much faster
inference speed without losing performance significantlly. Model compression on NNI includes pruning algorithms and quantization algorithms. NNI provides many pruning and
quantization algorithms through NNI trial SDK. Users can directly use them in their trial code and run the trial code without starting an NNI experiment. Users can also use NNI model compression framework to customize their own pruning and quantization algorithms.
There are different types of hyperparameters in model compression. One type is the hyperparameters in input configuration (e.g., sparsity, quantization bits) to a compression algorithm. The other type is the hyperparameters in compression algorithms. Here, Hyperparameter tuning of NNI can help a lot in finding the best compressed model automatically. A simple example can be found [here](Compressor/AutoCompression.md).
A detailed description of model compression and its usage can be found [here](Compression/Overview.md).
### Automatic Feature Engineering
### Automatic Feature Engineering
Automatic feature engineering is for users to find the best features for their tasks. A detailed description of automatic feature engineering and its usage can be found [here](FeatureEngineering/Overview.md). It is supported through NNI trial SDK, which means you do not have to create an NNI experiment. Instead, simply import a built-in auto-feature-engineering algorithm in your trial code and directly run your trial code.
Automatic feature engineering is for users to find the best features for their tasks. A detailed description of automatic feature engineering and its usage can be found [here](FeatureEngineering/Overview.md). It is supported through NNI trial SDK, which means you do not have to create an NNI experiment. Instead, simply import a built-in auto-feature-engineering algorithm in your trial code and directly run your trial code.
...
@@ -85,5 +87,5 @@ The auto-feature-engineering algorithms usually have a bunch of hyperparameters
...
@@ -85,5 +87,5 @@ The auto-feature-engineering algorithms usually have a bunch of hyperparameters
*[How to run an experiment on OpenPAI?](TrainingService/PaiMode.md)
*[How to run an experiment on OpenPAI?](TrainingService/PaiMode.md)
*[Examples](TrialExample/MnistExamples.md)
*[Examples](TrialExample/MnistExamples.md)
*[Neural Architecture Search on NNI](NAS/Overview.md)
*[Neural Architecture Search on NNI](NAS/Overview.md)
*[Automatic model compression on NNI](Compressor/Overview.md)
*[Model Compression on NNI](Compression/Overview.md)
*[Automatic feature engineering on NNI](FeatureEngineering/Overview.md)
*[Automatic feature engineering on NNI](FeatureEngineering/Overview.md)