@@ -37,6 +37,10 @@ Pruning algorithms compress the original network by removing redundant weights o
| [ActivationMeanRankFilterPruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#activationmeanrankfilterpruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations |
| [Slim Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) |
| [TaylorFO Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#taylorfoweightfilterpruner) | Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) [Reference Paper](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf) |
| [ADMM Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#admm-pruner) | Pruning based on ADMM optimization technique [Reference Paper](https://arxiv.org/abs/1804.03294) |
| [NetAdapt Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#netadapt-pruner) | Automatically simplify a pretrained network to meet the resource budget by iterative pruning [Reference Paper](https://arxiv.org/abs/1804.03230) |
| [SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#simulatedannealing-pruner) | Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141) |
| [AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#autocompress-pruner) | Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141) |
-**optimize_mode:** Optimize mode, `maximize` or `minimize`, by default `maximize`.
-**base_algo:** Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
Given the sparsity distribution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
-**sparsity_per_iteration:** The sparsity to prune in each iteration. NetAdapt Pruner prune the model by the same level in each iteration to meet the resource budget progressively.
-**experiment_data_dir:** PATH to save experiment data, including the config_list generated for the base pruning algorithm and the performance of the pruned model.
## SimulatedAnnealing Pruner
We implement a guided heuristic search method, Simulated Annealing (SA) algorithm, with enhancement on guided search based on prior experience.
The enhanced SA technique is based on the observation that a DNN layer with more number of weights often has a higher degree of model compression with less impact on overall accuracy.
- Randomly initialize a pruning rate distribution (sparsities).
- While current_temperature < stop_temperature:
1. generate a perturbation to current distribution
2. Perform fast evaluation on the perturbated distribution
3. accept the perturbation according to the performance and probability, if not accepted, return to step 1
For more details, please refer to [AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates](https://arxiv.org/abs/1907.03141).
-**optimize_mode:** Optimize mode, `maximize` or `minimize`, by default `maximize`.
-**base_algo:** Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
Given the sparsity distribution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
-**start_temperature:** Simualated Annealing related parameter.
-**stop_temperature:** Simualated Annealing related parameter.
-**cool_down_rate:** Simualated Annealing related parameter.
-**perturbation_magnitude:** Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.
-**experiment_data_dir:** PATH to save experiment data, including the config_list generated for the base pruning algorithm, the performance of the pruned model and the pruning history.
## AutoCompress Pruner
For each round, AutoCompressPruner prune the model for the same sparsity to achive the overall sparsity:
1. Generate sparsities distribution using SimualtedAnnealingPruner
2. Perform ADMM-based structured pruning to generate pruning result for the next round.
Here we use `speedup` to perform real pruning.
For more details, please refer to [AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates](https://arxiv.org/abs/1907.03141).
You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/auto_pruners_torch.py) for more information.
#### User configuration for AutoCompress Pruner
-**sparsity:** The target overall sparsity.
-**op_types:** The operation type to prune. If `base_algo` is `l1` or `l2`, then only `Conv2d` is supported as `op_types`.
-**trainer:** Function used for the first subproblem.
Users should write this function as a normal function to train the Pytorch model and include `model, optimizer, criterion, epoch, callback` as function arguments.
Here `callback` acts as an L2 regulizer as presented in the formula (7) of the original paper.
The logic of `callback` is implemented inside the Pruner, users are just required to insert `callback()` between `loss.backward()` and `optimizer.step()`.
-**dummy_input:** The dummy input for model speed up, users should put it on right device before pass in.
-**iterations:** The number of overall iterations.
-**optimize_mode:** Optimize mode, `maximize` or `minimize`, by default `maximize`.
-**base_algo:** Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
Given the sparsity distribution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
-**start_temperature:** Simualated Annealing related parameter.
-**stop_temperature:** Simualated Annealing related parameter.
-**cool_down_rate:** Simualated Annealing related parameter.
-**perturbation_magnitude:** Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.
-**admm_num_iterations:** Number of iterations of ADMM Pruner.
-**admm_training_epochs:** Training epochs of the first optimization subproblem of ADMMPruner.
-**experiment_data_dir:** PATH to store temporary experiment data.
## ADMM Pruner
Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique,
by decomposing the original nonconvex problem into two subproblems that can be solved iteratively. In weight pruning problem, these two subproblems are solved via 1) gradient descent algorithm and 2) Euclidean projection respectively.
During the process of solving these two subproblems, the weights of the original model will be changed. An one-shot pruner will then be applied to prune the model according to the config list given.
This solution framework applies both to non-structured and different variations of structured pruning schemes.
For more details, please refer to [A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers](https://arxiv.org/abs/1804.03294).
You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/auto_pruners_torch.py) for more information.
#### User configuration for ADMM Pruner
-**sparsity:** This is to specify the sparsity operations to be compressed to.
-**op_types:** The operation type to prune. If `base_algo` is `l1` or `l2`, then only `Conv2d` is supported as `op_types`.
-**trainer:** Function used for the first subproblem in ADMM optimization, attention, this is not used for fine-tuning.
Users should write this function as a normal function to train the Pytorch model and include `model, optimizer, criterion, epoch, callback` as function arguments.
Here `callback` acts as an L2 regulizer as presented in the formula (7) of the original paper.
The logic of `callback` is implemented inside the Pruner, users are just required to insert `callback()` between `loss.backward()` and `optimizer.step()`.
>>> # callback should be inserted between loss.backward() and optimizer.step()
>>> if callback:
>>> callback()
>>> optimizer.step()
```
-**num_iterations:** Total number of iterations.
-**training_epochs:** Training epochs of the first subproblem.
-**row:** Penalty parameters for ADMM training.
-**base_algo:** Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
Given the sparsity distribution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
## Lottery Ticket Hypothesis
[The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635), authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis, and articulate the *lottery ticket hypothesis*: dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*) that -- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations.
...
...
@@ -396,7 +684,3 @@ We try to reproduce the experiment result of the fully connected network on MNIS

The above figure shows the result of the fully connected network. `round0-sparsity-0.0` is the performance without pruning. Consistent with the paper, pruning around 80% also obtain similar performance compared to non-pruning, and converges a little faster. If pruning too much, e.g., larger than 94%, the accuracy becomes lower and convergence becomes a little slower. A little different from the paper, the trend of the data in the paper is relatively more clear.