The 'default' op_type stands for the module types defined in [default_layers.py](https://github.com/microsoft/nni/blob/v1.9/src/sdk/pynni/nni/compression/torch/default_layers.py) for pytorch.
The 'default' op_type stands for the module types defined in [default_layers.py](https://github.com/microsoft/nni/blob/v1.9/src/sdk/pynni/nni/compression/pytorch/default_layers.py) for pytorch.
Therefore ```{ 'sparsity': 0.8, 'op_types': ['default'] }```means that **all layers with specified op_types will be compressed with the same 0.8 sparsity**. When ```pruner.compress()``` called, the model is compressed with masks and after that you can normally fine tune this model and **pruned weights won't be updated** which have been masked.
Therefore ```{ 'sparsity': 0.8, 'op_types': ['default'] }```means that **all layers with specified op_types will be compressed with the same 0.8 sparsity**. When ```pruner.compress()``` called, the model is compressed with masks and after that you can normally fine tune this model and **pruned weights won't be updated** which have been masked.
...
@@ -71,7 +71,7 @@ Then we need to modify our codes for few lines
...
@@ -71,7 +71,7 @@ Then we need to modify our codes for few lines
When the masks of different layers in a model have conflict (for example, assigning different sparsities for the layers that have channel dependency), we can fix the mask conflict by MaskConflict. Specifically, the MaskConflict loads the masks exported by the pruners(L1FilterPruner, etc), and check if there is mask conflict, if so, MaskConflict sets the conflicting masks to the same value.
When the masks of different layers in a model have conflict (for example, assigning different sparsities for the layers that have channel dependency), we can fix the mask conflict by MaskConflict. Specifically, the MaskConflict loads the masks exported by the pruners(L1FilterPruner, etc), and check if there is mask conflict, if so, MaskConflict sets the conflicting masks to the same value.
```
```
from nni.compression.torch.utils.mask_conflict import fix_mask_conflict
from nni.compression.pytorch.utils.mask_conflict import fix_mask_conflict
You can reference nni provided [weight masker](https://github.com/microsoft/nni/blob/v1.9/src/sdk/pynni/nni/compression/torch/pruning/structured_pruning.py) implementations to implement your own weight masker.
You can reference nni provided [weight masker](https://github.com/microsoft/nni/blob/v1.9/src/sdk/pynni/nni/compression/pytorch/pruning/structured_pruning.py) implementations to implement your own weight masker.
A basic `pruner` looks likes this:
A basic `pruner` looks likes this:
...
@@ -54,17 +54,17 @@ class MyPruner(Pruner):
...
@@ -54,17 +54,17 @@ class MyPruner(Pruner):
```
```
Reference nni provided [pruner](https://github.com/microsoft/nni/blob/v1.9/src/sdk/pynni/nni/compression/torch/pruning/one_shot.py) implementations to implement your own pruner class.
Reference nni provided [pruner](https://github.com/microsoft/nni/blob/v1.9/src/sdk/pynni/nni/compression/pytorch/pruning/one_shot.py) implementations to implement your own pruner class.
***
***
## Customize a new quantization algorithm
## Customize a new quantization algorithm
To write a new quantization algorithm, you can write a class that inherits `nni.compression.torch.Quantizer`. Then, override the member functions with the logic of your algorithm. The member function to override is `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
To write a new quantization algorithm, you can write a class that inherits `nni.compression.pytorch.Quantizer`. Then, override the member functions with the logic of your algorithm. The member function to override is `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
```python
```python
fromnni.compression.torchimportQuantizer
fromnni.compression.pytorchimportQuantizer
classYourQuantizer(Quantizer):
classYourQuantizer(Quantizer):
def__init__(self,model,config_list):
def__init__(self,model,config_list):
...
@@ -140,7 +140,7 @@ class YourQuantizer(Quantizer):
...
@@ -140,7 +140,7 @@ class YourQuantizer(Quantizer):
Sometimes it's necessary for a quantization operation to have a customized backward function, such as [Straight-Through Estimator](https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste), user can customize a backward function as follow:
Sometimes it's necessary for a quantization operation to have a customized backward function, such as [Straight-Through Estimator](https://stackoverflow.com/questions/38361314/the-concept-of-straight-through-estimator-ste), user can customize a backward function as follow:
# dummy_input is necessary for the dependency_aware mode
# dummy_input is necessary for the dependency_aware mode
dummy_input=torch.ones(1,3,224,224).cuda()
dummy_input=torch.ones(1,3,224,224).cuda()
...
@@ -54,4 +54,4 @@ We trained a Mobilenet_v2 model on the cifar10 dataset and prune the model based
...
@@ -54,4 +54,4 @@ We trained a Mobilenet_v2 model on the cifar10 dataset and prune the model based


In the figure, the `Dependency-aware` represents the L1FilterPruner with dependency-aware mode enabled. `L1 Filter` is the normal `L1FilterPruner` without the dependency-aware mode, and the `No-Dependency` means pruner only prunes the layers that has no channel dependency with other layers. As we can see in the figure, when the dependency-aware mode enabled, the pruner can bring higher accuracy under the same Flops.
In the figure, the `Dependency-aware` represents the L1FilterPruner with dependency-aware mode enabled. `L1 Filter` is the normal `L1FilterPruner` without the dependency-aware mode, and the `No-Dependency` means pruner only prunes the layers that has no channel dependency with other layers. As we can see in the figure, when the dependency-aware mode enabled, the pruner can bring higher accuracy under the same Flops.
@@ -15,7 +15,7 @@ There are 3 major components/classes in NNI model compression framework: `Compre
...
@@ -15,7 +15,7 @@ There are 3 major components/classes in NNI model compression framework: `Compre
Compressor is the base class for pruner and quntizer, it provides a unified interface for pruner and quantizer for end users, so that pruner and quantizer can be used in the same way. For example, to use a pruner:
Compressor is the base class for pruner and quntizer, it provides a unified interface for pruner and quantizer for end users, so that pruner and quantizer can be used in the same way. For example, to use a pruner:
@@ -146,4 +146,4 @@ We implemented one of the experiments in [Binarized Neural Networks: Training De
...
@@ -146,4 +146,4 @@ We implemented one of the experiments in [Binarized Neural Networks: Training De
| VGGNet | 86.93% |
| VGGNet | 86.93% |
The experiments code can be found at [examples/model_compress/BNN_quantizer_cifar10.py](https://github.com/microsoft/nni/tree/v1.9/examples/model_compress/BNN_quantizer_cifar10.py)
The experiments code can be found at [examples/model_compress/BNN_quantizer_cifar10.py](https://github.com/microsoft/nni/tree/v1.9/examples/model_compress/BNN_quantizer_cifar10.py)
@@ -49,7 +49,7 @@ The complete code of model compression examples can be found [here](https://gith
...
@@ -49,7 +49,7 @@ The complete code of model compression examples can be found [here](https://gith
Masks do not provide real speedup of your model. The model should be speeded up based on the exported masks, thus, we provide an API to speed up your model as shown below. After invoking `apply_compression_results` on your model, your model becomes a smaller one with shorter inference latency.
Masks do not provide real speedup of your model. The model should be speeded up based on the exported masks, thus, we provide an API to speed up your model as shown below. After invoking `apply_compression_results` on your model, your model becomes a smaller one with shorter inference latency.
You can use other compression algorithms in the package of `nni.compression`. The algorithms are implemented in both PyTorch and TensorFlow (partial support on TensorFlow), under `nni.compression.torch` and `nni.compression.tensorflow` respectively. You can refer to [Pruner](./Pruner.md) and [Quantizer](./Quantizer.md) for detail description of supported algorithms. Also if you want to use knowledge distillation, you can refer to [KDExample](../TrialExample/KDExample.md)
You can use other compression algorithms in the package of `nni.compression`. The algorithms are implemented in both PyTorch and TensorFlow (partial support on TensorFlow), under `nni.compression.pytorch` and `nni.compression.tensorflow` respectively. You can refer to [Pruner](./Pruner.md) and [Quantizer](./Quantizer.md) for detail description of supported algorithms. Also if you want to use knowledge distillation, you can refer to [KDExample](../TrialExample/KDExample.md)
A compression algorithm is first instantiated with a `config_list` passed in. The specification of this `config_list` will be described later.
A compression algorithm is first instantiated with a `config_list` passed in. The specification of this `config_list` will be described later.
CDARTS builds a cyclic feedback mechanism between the search and evaluation networks. First, the search network generates an initial topology for evaluation, so that the weights of the evaluation network can be optimized. Second, the architecture topology in the search network is further optimized by the label supervision in classification, as well as the regularization from the evaluation network through feature distillation. Repeating the above cycle results in a joint optimization of the search and evaluation networks, and thus enables the evolution of the topology to fit the final evaluation network.
CDARTS builds a cyclic feedback mechanism between the search and evaluation networks. First, the search network generates an initial topology for evaluation, so that the weights of the evaluation network can be optimized. Second, the architecture topology in the search network is further optimized by the label supervision in classification, as well as the regularization from the evaluation network through feature distillation. Repeating the above cycle results in a joint optimization of the search and evaluation networks, and thus enables the evolution of the topology to fit the final evaluation network.
In implementation of `CdartsTrainer`, it first instantiates two models and two mutators (one for each). The first model is the so-called "search network", which is mutated with a `RegularizedDartsMutator` -- a mutator with subtle differences with `DartsMutator`. The second model is the "evaluation network", which is mutated with a discrete mutator that leverages the previous search network mutator, to sample a single path each time. Trainers train models and mutators alternatively. Users can refer to [references](#reference) if they are interested in more details on these trainers and mutators.
In implementation of `CdartsTrainer`, it first instantiates two models and two mutators (one for each). The first model is the so-called "search network", which is mutated with a `RegularizedDartsMutator` -- a mutator with subtle differences with `DartsMutator`. The second model is the "evaluation network", which is mutated with a discrete mutator that leverages the previous search network mutator, to sample a single path each time. Trainers train models and mutators alternatively. Users can refer to [references](#reference) if they are interested in more details on these trainers and mutators.
## Reproduction Results
## Reproduction Results
This is CDARTS based on the NNI platform, which currently supports CIFAR10 search and retrain. ImageNet search and retrain should also be supported, and we provide corresponding interfaces. Our reproduced results on NNI are slightly lower than the paper, but much higher than the original DARTS. Here we show the results of three independent experiments on CIFAR10.
This is CDARTS based on the NNI platform, which currently supports CIFAR10 search and retrain. ImageNet search and retrain should also be supported, and we provide corresponding interfaces. Our reproduced results on NNI are slightly lower than the paper, but much higher than the original DARTS. Here we show the results of three independent experiments on CIFAR10.