@@ -160,9 +160,13 @@ class AMCTaskGenerator(TaskGenerator):
...
@@ -160,9 +160,13 @@ class AMCTaskGenerator(TaskGenerator):
classAMCPruner(IterativePruner):
classAMCPruner(IterativePruner):
"""
r"""
A pytorch implementation of AMC: AutoML for Model Compression and Acceleration on Mobile Devices.
AMC pruner leverages reinforcement learning to provide the model compression policy.
(https://arxiv.org/pdf/1802.03494.pdf)
According to the author, this learning-based compression policy outperforms conventional rule-based compression policy by having a higher compression ratio,
better preserving the accuracy and freeing human labor.
For more details, please refer to `AMC: AutoML for Model Compression and Acceleration on Mobile Devices <https://arxiv.org/pdf/1802.03494.pdf>`__.
Suggust config all `total_sparsity` in `config_list` a same value.
Suggust config all `total_sparsity` in `config_list` a same value.
AMC pruner will treat the first sparsity in `config_list` as the global sparsity.
AMC pruner will treat the first sparsity in `config_list` as the global sparsity.
...
@@ -181,7 +185,7 @@ class AMCPruner(IterativePruner):
...
@@ -181,7 +185,7 @@ class AMCPruner(IterativePruner):
- op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.
- op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.
- exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
- exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
dummy_input : torch.Tensor
dummy_input : torch.Tensor
`dummy_input` is required for speed-up and tracing the model in RL environment.
`dummy_input` is required for speedup and tracing the model in RL environment.
evaluator : Callable[[Module], float]
evaluator : Callable[[Module], float]
Evaluate the pruned model and give a score.
Evaluate the pruned model and give a score.
pruning_algorithm : str
pruning_algorithm : str
...
@@ -216,6 +220,18 @@ class AMCPruner(IterativePruner):
...
@@ -216,6 +220,18 @@ class AMCPruner(IterativePruner):
target : str
target : str
'flops' or 'params'. Note that the sparsity in other pruners always means the parameters sparse, but in AMC, you can choose flops sparse.
'flops' or 'params'. Note that the sparsity in other pruners always means the parameters sparse, but in AMC, you can choose flops sparse.
This parameter is used to explain what the sparsity setting in config_list refers to.
This parameter is used to explain what the sparsity setting in config_list refers to.
Examples
--------
>>> from nni.compression.pytorch.pruning import AMCPruner
@@ -51,7 +51,16 @@ class AutoCompressTaskGenerator(LotteryTicketTaskGenerator):
...
@@ -51,7 +51,16 @@ class AutoCompressTaskGenerator(LotteryTicketTaskGenerator):
classAutoCompressPruner(IterativePruner):
classAutoCompressPruner(IterativePruner):
"""
r"""
For total iteration number :math:`N`, AutoCompressPruner prune the model that survive the previous iteration for a fixed sparsity ratio (e.g., :math:`1-{(1-0.8)}^{(1/N)}`) to achieve the overall sparsity (e.g., :math:`0.8`):
.. code-block:: bash
1. Generate sparsities distribution using SimulatedAnnealingPruner
2. Perform ADMM-based pruning to generate pruning result for the next iteration.
For more details, please refer to `AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates <https://arxiv.org/abs/1907.03141>`__.
Parameters
Parameters
----------
----------
model : Module
model : Module
...
@@ -70,7 +79,7 @@ class AutoCompressPruner(IterativePruner):
...
@@ -70,7 +79,7 @@ class AutoCompressPruner(IterativePruner):
The model will be trained or inferenced `training_epochs` epochs.
The model will be trained or inferenced `training_epochs` epochs.
For detailed example please refer to :githublink:`examples/model_compress/pruning/level_pruning_torch.py <examples/model_compress/pruning/level_pruning_torch.py>`
For detailed example please refer to :githublink:`examples/model_compress/pruning/norm_pruning_torch.py <examples/model_compress/pruning/norm_pruning_torch.py>`
@@ -338,11 +374,18 @@ class L2NormPruner(NormPruner):
...
@@ -338,11 +374,18 @@ class L2NormPruner(NormPruner):
classFPGMPruner(BasicPruner):
classFPGMPruner(BasicPruner):
"""
r"""
FPGM pruner prunes the blocks of the weight on the first dimension with the smallest geometric median.
FPGM chooses the weight blocks with the most replaceable contribution.
For more details, please refer to `Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration <https://arxiv.org/abs/1811.00250>`__.
FPGM pruner also supports dependency-aware mode.
Parameters
Parameters
----------
----------
model : torch.nn.Module
model : torch.nn.Module
Model to be pruned
Model to be pruned.
config_list : List[Dict]
config_list : List[Dict]
Supported keys:
Supported keys:
- sparsity : This is to specify the sparsity for each layer in this config to be compressed.
- sparsity : This is to specify the sparsity for each layer in this config to be compressed.
...
@@ -363,6 +406,16 @@ class FPGMPruner(BasicPruner):
...
@@ -363,6 +406,16 @@ class FPGMPruner(BasicPruner):
dummy_input : Optional[torch.Tensor]
dummy_input : Optional[torch.Tensor]
The dummy input to analyze the topology constraints. Note that, the dummy_input
The dummy input to analyze the topology constraints. Note that, the dummy_input
should on the same device with the model.
should on the same device with the model.
Examples
--------
>>> model = ...
>>> from nni.compression.pytorch.pruning import FPGMPruner
For detailed example please refer to :githublink:`examples/model_compress/pruning/fpgm_pruning_torch.py <examples/model_compress/pruning/fpgm_pruning_torch.py>`
For detailed example please refer to :githublink:`examples/model_compress/pruning/slim_pruning_torch.py <examples/model_compress/pruning/slim_pruning_torch.py>`
The traced optimizer instance which the optimizer class is wrapped by nni.trace.
The traced optimizer instance which the optimizer class is wrapped by nni.trace.
E.g. traced_optimizer = nni.trace(torch.nn.Adam)(model.parameters()).
E.g. ``traced_optimizer = nni.trace(torch.nn.Adam)(model.parameters())``.
criterion : Callable[[Tensor, Tensor], Tensor]
criterion : Callable[[Tensor, Tensor], Tensor]
The criterion function used in trainer. Take model output and target value as input, and return the loss.
The criterion function used in trainer. Take model output and target value as input, and return the loss.
training_batches
training_batches
...
@@ -627,6 +700,82 @@ class ActivationPruner(BasicPruner):
...
@@ -627,6 +700,82 @@ class ActivationPruner(BasicPruner):
classActivationAPoZRankPruner(ActivationPruner):
classActivationAPoZRankPruner(ActivationPruner):
r"""
Activation APoZ rank pruner is a pruner which prunes on the first weight dimension,
with the smallest importance criterion ``APoZ`` calculated from the output activations of convolution layers to achieve a preset level of network sparsity.
The pruning criterion ``APoZ`` is explained in the paper `Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures <https://arxiv.org/abs/1607.03250>`__.
For detailed example please refer to :githublink:`examples/model_compress/pruning/activation_pruning_torch.py <examples/model_compress/pruning/activation_pruning_torch.py>`
"""
def_activation_trans(self,output:Tensor)->Tensor:
def_activation_trans(self,output:Tensor)->Tensor:
# return a matrix that the position of zero in `output` is one, others is zero.
# return a matrix that the position of zero in `output` is one, others is zero.
@@ -636,6 +785,80 @@ class ActivationAPoZRankPruner(ActivationPruner):
...
@@ -636,6 +785,80 @@ class ActivationAPoZRankPruner(ActivationPruner):
classActivationMeanRankPruner(ActivationPruner):
classActivationMeanRankPruner(ActivationPruner):
r"""
Activation mean rank pruner is a pruner which prunes on the first weight dimension,
with the smallest importance criterion ``mean activation`` calculated from the output activations of convolution layers to achieve a preset level of network sparsity.
The pruning criterion ``mean activation`` is explained in section 2.2 of the paper `Pruning Convolutional Neural Networks for Resource Efficient Inference <https://arxiv.org/abs/1611.06440>`__.
Activation mean rank pruner also supports dependency-aware mode.
Parameters
----------
model : torch.nn.Module
Model to be pruned.
config_list : List[Dict]
Supported keys:
- sparsity : This is to specify the sparsity for each layer in this config to be compressed.
- sparsity_per_layer : Equals to sparsity.
- op_types : Conv2d and Linear are supported in ActivationPruner.
- op_names : Operation names to be pruned.
- op_partial_names: Operation partial names to be pruned, will be autocompleted by NNI.
- exclude : Set True then the layers setting by op_types and op_names will be excluded from pruning.
For detailed example please refer to :githublink:`examples/model_compress/pruning/activation_pruning_torch.py <examples/model_compress/pruning/activation_pruning_torch.py>`
"""
def_activation_trans(self,output:Tensor)->Tensor:
def_activation_trans(self,output:Tensor)->Tensor:
# return the activation of `output` directly.
# return the activation of `output` directly.
returnself._activation(output.detach())
returnself._activation(output.detach())
...
@@ -645,11 +868,21 @@ class ActivationMeanRankPruner(ActivationPruner):
...
@@ -645,11 +868,21 @@ class ActivationMeanRankPruner(ActivationPruner):
classTaylorFOWeightPruner(BasicPruner):
classTaylorFOWeightPruner(BasicPruner):
"""
r"""
Taylor FO weight pruner is a pruner which prunes on the first weight dimension,
based on estimated importance calculated from the first order taylor expansion on weights to achieve a preset level of network sparsity.
The estimated importance is defined as the paper `Importance Estimation for Neural Network Pruning <http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf>`__.
For detailed example please refer to :githublink:`examples/model_compress/pruning/taylorfo_pruning_torch.py <examples/model_compress/pruning/taylorfo_pruning_torch.py>`
@@ -772,13 +1020,17 @@ class TaylorFOWeightPruner(BasicPruner):
...
@@ -772,13 +1020,17 @@ class TaylorFOWeightPruner(BasicPruner):
classADMMPruner(BasicPruner):
classADMMPruner(BasicPruner):
"""
r"""
ADMM (Alternating Direction Method of Multipliers) Pruner is a kind of mathematical optimization technique.
Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique,
The metric used in this pruner is the absolute value of the weight.
by decomposing the original nonconvex problem into two subproblems that can be solved iteratively.
In each iteration, the weight with small magnitudes will be set to zero.
In weight pruning problem, these two subproblems are solved via 1) gradient descent algorithm and 2) Euclidean projection respectively.
Only in the final iteration, the mask will be generated and apply to model wrapper.
During the process of solving these two subproblems, the weights of the original model will be changed.
Then a fine-grained pruning will be applied to prune the model according to the config list given.
The original paper refer to: https://arxiv.org/abs/1804.03294.
This solution framework applies both to non-structured and different variations of structured pruning schemes.
For more details, please refer to `A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers <https://arxiv.org/abs/1804.03294>`__.
Parameters
Parameters
----------
----------
...
@@ -814,17 +1066,38 @@ class ADMMPruner(BasicPruner):
...
@@ -814,17 +1066,38 @@ class ADMMPruner(BasicPruner):
For detailed example please refer to :githublink:`examples/model_compress/pruning/admm_pruning_torch.py <examples/model_compress/pruning/admm_pruning_torch.py>`
@@ -70,7 +70,11 @@ class IterativePruner(PruningScheduler):
...
@@ -70,7 +70,11 @@ class IterativePruner(PruningScheduler):
classLinearPruner(IterativePruner):
classLinearPruner(IterativePruner):
"""
r"""
Linear pruner is an iterative pruner, it will increase sparsity evenly from scratch during each iteration.
For example, the final sparsity is set as 0.5, and the iteration number is 5, then the sparsity used in each iteration are ``[0, 0.1, 0.2, 0.3, 0.4, 0.5]``.
Parameters
Parameters
----------
----------
model : Module
model : Module
...
@@ -89,20 +93,31 @@ class LinearPruner(IterativePruner):
...
@@ -89,20 +93,31 @@ class LinearPruner(IterativePruner):
finetuner : Optional[Callable[[Module], None]]
finetuner : Optional[Callable[[Module], None]]
The finetuner handled all finetune logic, use a pytorch module as input.
The finetuner handled all finetune logic, use a pytorch module as input.
It will be called at the end of each iteration, usually for neutralizing the accuracy loss brought by the pruning in this iteration.
It will be called at the end of each iteration, usually for neutralizing the accuracy loss brought by the pruning in this iteration.
speed_up : bool
speedup : bool
If set True, speedup the model at the end of each iteration to make the pruned model compact.
If set True, speedup the model at the end of each iteration to make the pruned model compact.
dummy_input : Optional[torch.Tensor]
dummy_input : Optional[torch.Tensor]
If `speed_up` is True, `dummy_input` is required for tracing the model in speedup.
If `speedup` is True, `dummy_input` is required for tracing the model in speedup.
evaluator : Optional[Callable[[Module], float]]
evaluator : Optional[Callable[[Module], float]]
Evaluate the pruned model and give a score.
Evaluate the pruned model and give a score.
If evaluator is None, the best result refers to the latest result.
If evaluator is None, the best result refers to the latest result.
pruning_params : Dict
pruning_params : Dict
If the chosen pruning_algorithm has extra parameters, put them as a dict to pass in.
If the chosen pruning_algorithm has extra parameters, put them as a dict to pass in.
Examples
--------
>>> from nni.compression.pytorch.pruning import LinearPruner
For detailed example please refer to :githublink:`examples/model_compress/pruning/iterative_pruning_torch.py <examples/model_compress/pruning/iterative_pruning_torch.py>`
This is an iterative pruner, which the sparsity is increased from an initial sparsity value :math:`s_{i}` (usually 0) to a final sparsity value :math:`s_{f}` over a span of :math:`n` pruning iterations,
starting at training step :math:`t_{0}` and with pruning frequency :math:`\Delta t`:
:math:`s_{t}=s_{f}+\left(s_{i}-s_{f}\right)\left(1-\frac{t-t_{0}}{n \Delta t}\right)^{3} \text { for } t \in\left\{t_{0}, t_{0}+\Delta t, \ldots, t_{0} + n \Delta t\right\}`
For more details please refer to `To prune, or not to prune: exploring the efficacy of pruning for model compression <https://arxiv.org/abs/1710.01878>`__\.
Parameters
Parameters
----------
----------
model : Module
model : Module
...
@@ -136,20 +158,31 @@ class AGPPruner(IterativePruner):
...
@@ -136,20 +158,31 @@ class AGPPruner(IterativePruner):
finetuner : Optional[Callable[[Module], None]]
finetuner : Optional[Callable[[Module], None]]
The finetuner handled all finetune logic, use a pytorch module as input.
The finetuner handled all finetune logic, use a pytorch module as input.
It will be called at the end of each iteration, usually for neutralizing the accuracy loss brought by the pruning in this iteration.
It will be called at the end of each iteration, usually for neutralizing the accuracy loss brought by the pruning in this iteration.
speed_up : bool
speedup : bool
If set True, speedup the model at the end of each iteration to make the pruned model compact.
If set True, speedup the model at the end of each iteration to make the pruned model compact.
dummy_input : Optional[torch.Tensor]
dummy_input : Optional[torch.Tensor]
If `speed_up` is True, `dummy_input` is required for tracing the model in speedup.
If `speedup` is True, `dummy_input` is required for tracing the model in speedup.
evaluator : Optional[Callable[[Module], float]]
evaluator : Optional[Callable[[Module], float]]
Evaluate the pruned model and give a score.
Evaluate the pruned model and give a score.
If evaluator is None, the best result refers to the latest result.
If evaluator is None, the best result refers to the latest result.
pruning_params : Dict
pruning_params : Dict
If the chosen pruning_algorithm has extra parameters, put them as a dict to pass in.
If the chosen pruning_algorithm has extra parameters, put them as a dict to pass in.
Examples
--------
>>> from nni.compression.pytorch.pruning import AGPPruner
For detailed example please refer to :githublink:`examples/model_compress/pruning/iterative_pruning_torch.py <examples/model_compress/pruning/iterative_pruning_torch.py>`
For detailed example please refer to :githublink:`examples/model_compress/pruning/iterative_pruning_torch.py <examples/model_compress/pruning/iterative_pruning_torch.py>`
We implement a guided heuristic search method, Simulated Annealing (SA) algorithm. As mentioned in the paper, this method is enhanced on guided search based on prior experience.
The enhanced SA technique is based on the observation that a DNN layer with more number of weights often has a higher degree of model compression with less impact on overall accuracy.
* Randomly initialize a pruning rate distribution (sparsities).
* While current_temperature < stop_temperature:
#. generate a perturbation to current distribution
#. Perform fast evaluation on the perturbated distribution
#. accept the perturbation according to the performance and probability, if not accepted, return to step 1
For more details, please refer to `AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates <https://arxiv.org/abs/1907.03141>`__.
Parameters
Parameters
----------
----------
model : Module
model : Module
...
@@ -242,16 +318,29 @@ class SimulatedAnnealingPruner(IterativePruner):
...
@@ -242,16 +318,29 @@ class SimulatedAnnealingPruner(IterativePruner):
If keeping the intermediate result, including intermediate model and masks during each iteration.
If keeping the intermediate result, including intermediate model and masks during each iteration.
finetuner : Optional[Callable[[Module], None]]
finetuner : Optional[Callable[[Module], None]]
The finetuner handled all finetune logic, use a pytorch module as input, will be called in each iteration.
The finetuner handled all finetune logic, use a pytorch module as input, will be called in each iteration.
speed_up : bool
speedup : bool
If set True, speedup the model at the end of each iteration to make the pruned model compact.
If set True, speedup the model at the end of each iteration to make the pruned model compact.
dummy_input : Optional[torch.Tensor]
dummy_input : Optional[torch.Tensor]
If `speed_up` is True, `dummy_input` is required for tracing the model in speed up.
If `speedup` is True, `dummy_input` is required for tracing the model in speedup.
Examples
--------
>>> from nni.compression.pytorch.pruning import SimulatedAnnealingPruner
For detailed example please refer to :githublink:`examples/model_compress/pruning/simulated_anealing_pruning_torch.py <examples/model_compress/pruning/simulated_anealing_pruning_torch.py>`
Movement pruner is an implementation of movement pruning.
This is a "fine-pruning" algorithm, which means the masks may change during each fine-tuning step.
Each weight element will be scored by the opposite of the sum of the product of weight and its gradient during each step.
This means the weight elements moving towards zero will accumulate negative scores, the weight elements moving away from zero will accumulate positive scores.
The weight elements with low scores will be masked during inference.
The following figure from the paper shows the weight pruning by movement pruning.
.. image:: ../../../img/movement_pruning.png
:target: ../../../img/movement_pruning.png
:alt:
For more details, please refer to `Movement Pruning: Adaptive Sparsity by Fine-Tuning <https://arxiv.org/abs/2005.07683>`__.
Parameters
Parameters
----------
----------
model : torch.nn.Module
model : torch.nn.Module
...
@@ -158,7 +129,7 @@ class MovementPruner(BasicPruner):
...
@@ -158,7 +129,7 @@ class MovementPruner(BasicPruner):
For detailed example please refer to :githublink:`examples/model_compress/pruning/movement_pruning_glue.py <examples/model_compress/pruning/movement_pruning_glue.py>`