Pruner.md 26.5 KB
Newer Older
1
# Supported Pruning Algorithms on NNI
2

3
We provide several pruning algorithms that support fine-grained weight pruning and structural filter pruning. **Fine-grained Pruning** generally results in  unstructured models, which need specialized haredware or software to speed up the sparse network. **Filter Pruning** achieves acceleratation by removing the entire filter.  We also provide an algorithm to control the **pruning schedule**.
4
5


6
**Fine-grained Pruning**
7
* [Level Pruner](#level-pruner)
8
9
  
**Filter Pruning**
10
* [Slim Pruner](#slim-pruner)
11
12
13
* [FPGM Pruner](#fpgm-pruner)
* [L1Filter Pruner](#l1filter-pruner)
* [L2Filter Pruner](#l2filter-pruner)
14
15
16
* [Activation APoZ Rank Filter Pruner](#activationAPoZRankFilter-pruner)
* [Activation Mean Rank Filter Pruner](#activationmeanrankfilter-pruner)
* [Taylor FO On Weight Pruner](#taylorfoweightfilter-pruner)
17

18
19
**Pruning Schedule**
* [AGP Pruner](#agp-pruner)
Guoxin's avatar
Guoxin committed
20
21
22
* [NetAdapt Pruner](#netadapt-pruner)
* [SimulatedAnnealing Pruner](#simulatedannealing-pruner)
* [AutoCompress Pruner](#autocompress-pruner)
chicm-ms's avatar
chicm-ms committed
23
* [AMC Pruner](#amc-pruner)
Ningxin Zheng's avatar
Ningxin Zheng committed
24
* [Sensitivity Pruner](#sensitivity-pruner)
25

26
**Others**
Guoxin's avatar
Guoxin committed
27
* [ADMM Pruner](#admm-pruner)
28
29
* [Lottery Ticket Hypothesis](#lottery-ticket-hypothesis)

30
31
## Level Pruner

32
This is one basic one-shot pruner: you can set a target sparsity level (expressed as a fraction, 0.6 means we will prune 60% of the weight parameters). 
33
34
35
36
37
38

We first sort the weights in the specified layer by their absolute values. And then mask to zero the smallest magnitude weights until the desired sparsity level is reached.

### Usage

Tensorflow code
39
```python
40
from nni.algorithms.compression.tensorflow.pruning import LevelPruner
chicm-ms's avatar
chicm-ms committed
41
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
liuzhe-lz's avatar
liuzhe-lz committed
42
pruner = LevelPruner(model, config_list)
QuanluZhang's avatar
QuanluZhang committed
43
pruner.compress()
44
45
46
```

PyTorch code
47
```python
48
from nni.algorithms.compression.pytorch.pruning import LevelPruner
chicm-ms's avatar
chicm-ms committed
49
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
QuanluZhang's avatar
QuanluZhang committed
50
51
pruner = LevelPruner(model, config_list)
pruner.compress()
52
53
54
55
```

#### User configuration for Level Pruner

56
57
58
##### PyTorch

```eval_rst
59
..  autoclass:: nni.algorithms.compression.pytorch.pruning.LevelPruner
60
61
62
63
64
```

##### Tensorflow

```eval_rst
65
..  autoclass:: nni.algorithms.compression.tensorflow.pruning.LevelPruner
66
67
```

68

69
70
71
72
73
74
75
76
77
78
79
80
81
## Slim Pruner

This is an one-shot pruner, In ['Learning Efficient Convolutional Networks through Network Slimming'](https://arxiv.org/pdf/1708.06519.pdf), authors Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan and Changshui Zhang.

![](../../img/slim_pruner.png)

> Slim Pruner **prunes channels in the convolution layers by masking corresponding scaling factors in the later BN layers**, L1 regularization on the scaling factors should be applied in batch normalization (BN) layers while training, scaling factors of BN layers are **globally ranked** while pruning, so the sparse model can be automatically found given sparsity.

### Usage

PyTorch code

```python
82
from nni.algorithms.compression.pytorch.pruning import SlimPruner
83
84
85
86
87
88
89
config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
pruner = SlimPruner(model, config_list)
pruner.compress()
```

#### User configuration for Slim Pruner

90
91
92
##### PyTorch

```eval_rst
93
..  autoclass:: nni.algorithms.compression.pytorch.pruning.SlimPruner
94
```
95

96
97
98
99
100
101
102
103
104
### Reproduced Experiment

We implemented one of the experiments in ['Learning Efficient Convolutional Networks through Network Slimming'](https://arxiv.org/pdf/1708.06519.pdf), we pruned $70\%$ channels in the **VGGNet** for CIFAR-10 in the paper, in which $88.5\%$ parameters are pruned. Our experiments results are as follows:

| Model         | Error(paper/ours) | Parameters | Pruned    |
| ------------- | ----------------- | ---------- | --------- |
| VGGNet        | 6.34/6.40     | 20.04M   |           |
| Pruned-VGGNet | 6.20/6.26     | 2.03M    | 88.5% |

105
The experiments code can be found at [examples/model_compress]( https://github.com/microsoft/nni/tree/v1.9/examples/model_compress/)
106
107

***
108

109
## FPGM Pruner
110

Tang Lang's avatar
Tang Lang committed
111
This is an one-shot pruner, FPGM Pruner is an implementation of paper [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf)
112

113
FPGMPruner prune filters with the smallest geometric median.
114
115
116

 ![](../../img/fpgm_fig1.png)

117
118
119
>Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance. 

We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.
120

121
### Usage
122
123
124

PyTorch code
```python
125
from nni.algorithms.compression.pytorch.pruning import FPGMPruner
126
127
128
129
130
131
132
133
134
135
config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
}]
pruner = FPGMPruner(model, config_list)
pruner.compress()
```

#### User configuration for FPGM Pruner

136
137
##### PyTorch
```eval_rst
138
..  autoclass:: nni.algorithms.compression.pytorch.pruning.FPGMPruner
139
140
```

141
## L1Filter Pruner
Tang Lang's avatar
Tang Lang committed
142

143
This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.
Tang Lang's avatar
Tang Lang committed
144
145
146
147
148
149
150
151
152
153

![](../../img/l1filter_pruner.png)

> L1Filter Pruner prunes filters in the **convolution layers**
>
> The procedure of pruning m filters from the ith convolutional layer is as follows:
>
> 1. For each filter ![](http://latex.codecogs.com/gif.latex?F_{i,j}), calculate the sum of its absolute kernel weights![](http://latex.codecogs.com/gif.latex?s_j=\sum_{l=1}^{n_i}\sum|K_l|)
> 2. Sort the filters by ![](http://latex.codecogs.com/gif.latex?s_j).
> 3. Prune ![](http://latex.codecogs.com/gif.latex?m) filters with the smallest sum values and their corresponding feature maps. The
154
155
>      kernels in the next convolutional layer corresponding to the pruned feature maps are also
>        removed.
Tang Lang's avatar
Tang Lang committed
156
> 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
157
>      weights are copied to the new model.
Tang Lang's avatar
Tang Lang committed
158

159
160
In addition, we also provide a dependency-aware mode for the L1FilterPruner. For more details about the dependency-aware mode, please reference [dependency-aware mode](./DependencyAware.md).

161
### Usage
162
163
164
165

PyTorch code

```python
166
from nni.algorithms.compression.pytorch.pruning import L1FilterPruner
Tang Lang's avatar
Tang Lang committed
167
168
169
170
171
172
173
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L1FilterPruner(model, config_list)
pruner.compress()
```

#### User configuration for L1Filter Pruner

174
175
##### PyTorch
```eval_rst
176
..  autoclass:: nni.algorithms.compression.pytorch.pruning.L1FilterPruner
177
```
178

179
### Reproduced Experiment
180
181
182
183
184
185
186
187

We implemented one of the experiments in ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710) with **L1FilterPruner**, we pruned **VGG-16** for CIFAR-10 to **VGG-16-pruned-A** in the paper, in which $64\%$ parameters are pruned. Our experiments results are as follows:

| Model           | Error(paper/ours) | Parameters      | Pruned   |
| --------------- | ----------------- | --------------- | -------- |
| VGG-16          | 6.75/6.49     | 1.5x10^7 |          |
| VGG-16-pruned-A | 6.60/6.47     | 5.4x10^6 | 64.0% |

188
The experiments code can be found at [examples/model_compress]( https://github.com/microsoft/nni/tree/v1.9/examples/model_compress/)
189

190
191
***

192
## L2Filter Pruner
193

194
This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights. It is implemented as a one-shot pruner.
195

196
197
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.

198
### Usage
199
200
201
202

PyTorch code

```python
203
from nni.algorithms.compression.pytorch.pruning import L2FilterPruner
204
205
206
207
208
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L2FilterPruner(model, config_list)
pruner.compress()
```

209

210
### User configuration for L2Filter Pruner
211

212
213
##### PyTorch
```eval_rst
214
..  autoclass:: nni.algorithms.compression.pytorch.pruning.L2FilterPruner
215
```
216
217
***

218

219
## ActivationAPoZRankFilter Pruner
220

221
ActivationAPoZRankFilter Pruner is a pruner which prunes the filters with the smallest importance criterion `APoZ` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion `APoZ` is explained in the paper [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250).
222

223
224
225
The APoZ is defined as:

![](../../img/apoz.png)
226

227
228
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.

229
### Usage
230
231
232
233

PyTorch code

```python
234
from nni.algorithms.compression.pytorch.pruning import ActivationAPoZRankFilterPruner
235
236
237
238
239
240
241
242
243
244
config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
}]
pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1)
pruner.compress()
```

Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.

245
You can view [example](https://github.com/microsoft/nni/blob/v1.9/examples/model_compress/model_prune_torch.py) for more information.
246

247
248


249
### User configuration for ActivationAPoZRankFilter Pruner
250

251
252
##### PyTorch
```eval_rst
253
..  autoclass:: nni.algorithms.compression.pytorch.pruning.ActivationAPoZRankFilterPruner
254
```
255
256
***

257
258

## ActivationMeanRankFilter Pruner
259

260
ActivationMeanRankFilterPruner is a pruner which prunes the filters with the smallest importance criterion `mean activation` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion `mean activation` is explained in section 2.2 of the paper[Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440). Other pruning criteria mentioned in this paper will be supported in future release.
261

262
263
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.

264
### Usage
265
266
267
268

PyTorch code

```python
269
from nni.algorithms.compression.pytorch.pruning import ActivationMeanRankFilterPruner
270
271
272
273
config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
}]
274
pruner = ActivationMeanRankFilterPruner(model, config_list, statistics_batch_num=1)
275
276
277
278
279
pruner.compress()
```

Note: ActivationMeanRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.

280
You can view [example](https://github.com/microsoft/nni/blob/v1.9/examples/model_compress/model_prune_torch.py) for more information.
281

282

283
### User configuration for ActivationMeanRankFilterPruner
284

285
286
##### PyTorch
```eval_rst
287
..  autoclass:: nni.algorithms.compression.pytorch.pruning.ActivationMeanRankFilterPruner
288
```
289
***
290

291

292
## TaylorFOWeightFilter Pruner
293

294
TaylorFOWeightFilter Pruner is a pruner which prunes convolutional layers based on estimated importance calculated from the first order taylor expansion on weights to achieve a preset level of network sparsity. The estimated importance of filters is defined as the paper [Importance Estimation for Neural Network Pruning](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf). Other pruning criteria mentioned in this paper will be supported in future release.
295
296
297
298
299

> 

![](../../img/importance_estimation_sum.png)

300
301
We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference [dependency-aware](./DependencyAware.md) for more details.

302
### Usage
303
304
305
306

PyTorch code

```python
307
from nni.algorithms.compression.pytorch.pruning import TaylorFOWeightFilterPruner
308
309
310
311
config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
}]
312
pruner = TaylorFOWeightFilterPruner(model, config_list, statistics_batch_num=1)
313
314
315
316
pruner.compress()
```


317
#### User configuration for TaylorFOWeightFilter Pruner
318

319
320
##### PyTorch
```eval_rst
321
..  autoclass:: nni.algorithms.compression.pytorch.pruning.TaylorFOWeightFilterPruner
322
```
323
324
***

325

326
## AGP Pruner
327

328
329
330
331
This is an iterative pruner, In [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878), authors Michael Zhu and Suyog Gupta provide an algorithm to prune the weight gradually.

>We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step t0 and with pruning frequency ∆t:
![](../../img/agp_pruner.png)
332
333

>The binary weight masks are updated every ∆t steps as the network is trained to gradually increase the sparsity of the network while allowing the network training steps to recover from any pruning-induced loss in accuracy. In our experience, varying the pruning frequency ∆t between 100 and 1000 training steps had a negligible impact on the final model quality. Once the model achieves the target sparsity sf , the weight masks are no longer updated. The intuition behind this sparsity function in equation (1).
334
335

### Usage
336

337
338
339
340
You can prune all weight from 0% to 80% sparsity in 10 epoch with the code below.

PyTorch code
```python
341
from nni.algorithms.compression.pytorch.pruning import AGPPruner
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
config_list = [{
    'initial_sparsity': 0,
    'final_sparsity': 0.8,
    'start_epoch': 0,
    'end_epoch': 10,
    'frequency': 1,
    'op_types': ['default']
}]

# load a pretrained model or train a model before using a pruner
# model = MyModel()
# model.load_state_dict(torch.load('mycheckpoint.pth'))

# AGP pruner prunes model while fine tuning the model by adding a hook on
# optimizer.step(), so an optimizer is required to prune the model.
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)

359
pruner = AGPPruner(model, config_list, optimizer, pruning_algorithm='level')
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
pruner.compress()
```

AGP pruner uses `LevelPruner` algorithms to prune the weight by default, however you can set `pruning_algorithm` parameter to other values to use other pruning algorithms:
* `level`: LevelPruner
* `slim`: SlimPruner
* `l1`: L1FilterPruner
* `l2`: L2FilterPruner
* `fpgm`: FPGMPruner
* `taylorfo`: TaylorFOWeightFilterPruner
* `apoz`: ActivationAPoZRankFilterPruner
* `mean_activation`: ActivationMeanRankFilterPruner

You should add code below to update epoch number when you finish one epoch in your training code.

PyTorch code
```python
pruner.update_epoch(epoch)
```
379
You can view [example](https://github.com/microsoft/nni/blob/v1.9/examples/model_compress/model_prune_torch.py) for more information.
380
381

#### User configuration for AGP Pruner
382
383
384
385

##### PyTorch

```eval_rst
386
..  autoclass:: nni.algorithms.compression.pytorch.pruning.AGPPruner
387
388
```

389
390
***

Guoxin's avatar
Guoxin committed
391
392
393
394
395
396
397
398
399
400
401
402
403
## NetAdapt Pruner
NetAdapt allows a user to automatically simplify a pretrained network to meet the resource budget. 
Given the overall sparsity, NetAdapt will automatically generate the sparsities distribution among different layers by iterative pruning.

For more details, please refer to [NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications](https://arxiv.org/abs/1804.03230).

![](../../img/algo_NetAdapt.png)

#### Usage

PyTorch code

```python
404
from nni.algorithms.compression.pytorch.pruning import NetAdaptPruner
Guoxin's avatar
Guoxin committed
405
406
407
408
409
410
411
412
config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
}]
pruner = NetAdaptPruner(model, config_list, short_term_fine_tuner=short_term_fine_tuner, evaluator=evaluator,base_algo='l1', experiment_data_dir='./')
pruner.compress()
```

413
You can view [example](https://github.com/microsoft/nni/blob/v1.9/examples/model_compress/auto_pruners_torch.py) for more information.
Guoxin's avatar
Guoxin committed
414
415
416

#### User configuration for NetAdapt Pruner

417
418
419
##### PyTorch

```eval_rst
420
..  autoclass:: nni.algorithms.compression.pytorch.pruning.NetAdaptPruner
421
```
Guoxin's avatar
Guoxin committed
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442


## SimulatedAnnealing Pruner

We implement a guided heuristic search method, Simulated Annealing (SA) algorithm, with enhancement on guided search based on prior experience. 
The enhanced SA technique is based on the observation that a DNN layer with more number of weights often has a higher degree of model compression with less impact on overall accuracy.

- Randomly initialize a pruning rate distribution (sparsities).
- While current_temperature < stop_temperature:
    1. generate a perturbation to current distribution
    2. Perform fast evaluation on the perturbated distribution
    3. accept the perturbation according to the performance and probability, if not accepted, return to step 1
    4. cool down, current_temperature <- current_temperature * cool_down_rate

For more details, please refer to [AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates](https://arxiv.org/abs/1907.03141).

#### Usage

PyTorch code

```python
443
from nni.algorithms.compression.pytorch.pruning import SimulatedAnnealingPruner
Guoxin's avatar
Guoxin committed
444
445
446
447
448
449
450
451
config_list = [{
    'sparsity': 0.5,
    'op_types': ['Conv2d']
}]
pruner = SimulatedAnnealingPruner(model, config_list, evaluator=evaluator, base_algo='l1', cool_down_rate=0.9, experiment_data_dir='./')
pruner.compress()
```

452
You can view [example](https://github.com/microsoft/nni/blob/v1.9/examples/model_compress/auto_pruners_torch.py) for more information.
Guoxin's avatar
Guoxin committed
453
454
455

#### User configuration for SimulatedAnnealing Pruner

456
457
458
##### PyTorch

```eval_rst
459
..  autoclass:: nni.algorithms.compression.pytorch.pruning.SimulatedAnnealingPruner
460
```
Guoxin's avatar
Guoxin committed
461

462
            
Guoxin's avatar
Guoxin committed
463
464
## AutoCompress Pruner
For each round, AutoCompressPruner prune the model for the same sparsity to achive the overall sparsity:
465
        1. Generate sparsities distribution using SimulatedAnnealingPruner
Guoxin's avatar
Guoxin committed
466
467
468
469
470
471
472
473
474
475
        2. Perform ADMM-based structured pruning to generate pruning result for the next round.
           Here we use `speedup` to perform real pruning.

For more details, please refer to [AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates](https://arxiv.org/abs/1907.03141).

#### Usage

PyTorch code

```python
476
from nni.algorithms.compression.pytorch.pruning import ADMMPruner
Guoxin's avatar
Guoxin committed
477
478
479
480
481
482
483
484
485
486
487
config_list = [{
        'sparsity': 0.5,
        'op_types': ['Conv2d']
    }]
pruner = AutoCompressPruner(
            model, config_list, trainer=trainer, evaluator=evaluator,
            dummy_input=dummy_input, num_iterations=3, optimize_mode='maximize', base_algo='l1',
            cool_down_rate=0.9, admm_num_iterations=30, admm_training_epochs=5, experiment_data_dir='./')
pruner.compress()
```

488
You can view [example](https://github.com/microsoft/nni/blob/v1.9/examples/model_compress/auto_pruners_torch.py) for more information.
Guoxin's avatar
Guoxin committed
489
490
491

#### User configuration for AutoCompress Pruner

492
493
494
##### PyTorch

```eval_rst
495
..  autoclass:: nni.algorithms.compression.pytorch.pruning.AutoCompressPruner
496
```
Guoxin's avatar
Guoxin committed
497

chicm-ms's avatar
chicm-ms committed
498
## AMC Pruner
499

chicm-ms's avatar
chicm-ms committed
500
AMC pruner leverages reinforcement learning to provide the model compression policy.
501
502
503
504
505
506
507
508
509
510
511
512
513
This learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio,
better preserving the accuracy and freeing human labor.

![](../../img/amc_pruner.jpg)

For more details, please refer to [AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/pdf/1802.03494.pdf).


#### Usage

PyTorch code

```python
514
from nni.algorithms.compression.pytorch.pruning import AMCPruner
515
516
517
518
519
520
521
config_list = [{
        'op_types': ['Conv2d', 'Linear']
    }]
pruner = AMCPruner(model, config_list, evaluator, val_loader, flops_ratio=0.5)
pruner.compress()
```

522
You can view [example](https://github.com/microsoft/nni/blob/v1.9/examples/model_compress/amc/) for more information.
523
524
525
526
527
528

#### User configuration for AutoCompress Pruner

##### PyTorch

```eval_rst
529
..  autoclass:: nni.algorithms.compression.pytorch.pruning.AMCPruner
530
```
Guoxin's avatar
Guoxin committed
531

chicm-ms's avatar
chicm-ms committed
532
533
534
535
536
537
538
539
### Reproduced Experiment

We implemented one of the experiments in [AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/pdf/1802.03494.pdf), we pruned **MobileNet** to 50% FLOPS for ImageNet in the paper. Our experiments results are as follows:

| Model         | Top 1 acc.(paper/ours) | Top 5 acc. (paper/ours) | FLOPS |
| ------------- | --------------| -------------- | ----- |
| MobileNet     | 70.5% / 69.9% | 89.3% / 89.1%  | 50%   |

540
The experiments code can be found at [examples/model_compress]( https://github.com/microsoft/nni/tree/v1.9/examples/model_compress/amc/)
chicm-ms's avatar
chicm-ms committed
541

Guoxin's avatar
Guoxin committed
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
## ADMM Pruner
Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique,
by decomposing the original nonconvex problem into two subproblems that can be solved iteratively. In weight pruning problem, these two subproblems are solved via 1) gradient descent algorithm and 2) Euclidean projection respectively. 

During the process of solving these two subproblems, the weights of the original model will be changed. An one-shot pruner will then be applied to prune the model according to the config list given.

This solution framework applies both to non-structured and different variations of structured pruning schemes.

For more details, please refer to [A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers](https://arxiv.org/abs/1804.03294).

#### Usage

PyTorch code

```python
557
from nni.algorithms.compression.pytorch.pruning import ADMMPruner
Guoxin's avatar
Guoxin committed
558
559
560
561
562
563
564
565
566
567
568
569
570
config_list = [{
            'sparsity': 0.8,
            'op_types': ['Conv2d'],
            'op_names': ['conv1']
        }, {
            'sparsity': 0.92,
            'op_types': ['Conv2d'],
            'op_names': ['conv2']
        }]
pruner = ADMMPruner(model, config_list, trainer=trainer, num_iterations=30, epochs=5)
pruner.compress()
```

571
You can view [example](https://github.com/microsoft/nni/blob/v1.9/examples/model_compress/auto_pruners_torch.py) for more information.
Guoxin's avatar
Guoxin committed
572
573
574

#### User configuration for ADMM Pruner

575
576
577
##### PyTorch

```eval_rst
578
..  autoclass:: nni.algorithms.compression.pytorch.pruning.ADMMPruner
579
```
Guoxin's avatar
Guoxin committed
580
581


582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
## Lottery Ticket Hypothesis
[The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635), authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis, and articulate the *lottery ticket hypothesis*: dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*) that -- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations.

In this paper, the authors use the following process to prune a model, called *iterative prunning*:
>1. Randomly initialize a neural network f(x;theta_0) (where theta_0 follows D_{theta}).
>2. Train the network for j iterations, arriving at parameters theta_j.
>3. Prune p% of the parameters in theta_j, creating a mask m.
>4. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).
>5. Repeat step 2, 3, and 4.

If the configured final sparsity is P (e.g., 0.8) and there are n times iterative pruning, each iterative pruning prunes 1-(1-P)^(1/n) of the weights that survive the previous round.

### Usage

PyTorch code
```python
598
from nni.algorithms.compression.pytorch.pruning import LotteryTicketPruner
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
config_list = [{
    'prune_iterations': 5,
    'sparsity': 0.8,
    'op_types': ['default']
}]
pruner = LotteryTicketPruner(model, config_list, optimizer)
pruner.compress()
for _ in pruner.get_prune_iterations():
    pruner.prune_iteration_start()
    for epoch in range(epoch_num):
        ...
```

The above configuration means that there are 5 times of iterative pruning. As the 5 times iterative pruning are executed in the same run, LotteryTicketPruner needs `model` and `optimizer` (**Note that should add `lr_scheduler` if used**) to reset their states every time a new prune iteration starts. Please use `get_prune_iterations` to get the pruning iterations, and invoke `prune_iteration_start` at the beginning of each iteration. `epoch_num` is better to be large enough for model convergence, because the hypothesis is that the performance (accuracy) got in latter rounds with high sparsity could be comparable with that got in the first round.


*Tensorflow version will be supported later.*

617
#### User configuration for LotteryTicket Pruner
618

619
620
621
##### PyTorch

```eval_rst
622
..  autoclass:: nni.algorithms.compression.pytorch.pruning.LotteryTicketPruner
623
```
624
625
626

### Reproduced Experiment

627
We try to reproduce the experiment result of the fully connected network on MNIST using the same configuration as in the paper. The code can be referred [here](https://github.com/microsoft/nni/tree/v1.9/examples/model_compress/lottery_torch_mnist_fc.py). In this experiment, we prune 10 times, for each pruning we train the pruned model for 50 epochs.
628
629
630
631

![](../../img/lottery_ticket_mnist_fc.png)

The above figure shows the result of the fully connected network. `round0-sparsity-0.0` is the performance without pruning. Consistent with the paper, pruning around 80% also obtain similar performance compared to non-pruning, and converges a little faster. If pruning too much, e.g., larger than 94%, the accuracy becomes lower and convergence becomes a little slower. A little different from the paper, the trend of the data in the paper is relatively more clear.
Ningxin Zheng's avatar
Ningxin Zheng committed
632
633
634
635
636
637
638
639
640
641
642
643
644
645


## Sensitivity Pruner
For each round, SensitivityPruner prunes the model based on the sensitivity to the accuracy of each layer until meeting the final configured sparsity of the whole model:
        1. Analyze the sensitivity of each layer in the current state of the model.
        2. Prune each layer according to the sensitivity.

For more details, please refer to [Learning both Weights and Connections for Efficient Neural Networks ](https://arxiv.org/abs/1506.02626).

#### Usage

PyTorch code

```python
646
from nni.algorithms.compression.pytorch.pruning import SensitivityPruner
Ningxin Zheng's avatar
Ningxin Zheng committed
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
config_list = [{
        'sparsity': 0.5,
        'op_types': ['Conv2d']
    }]
pruner = SensitivityPruner(model, config_list, finetuner=fine_tuner, evaluator=evaluator)
# eval_args and finetune_args are the parameters passed to the evaluator and finetuner respectively
pruner.compress(eval_args=[model], finetune_args=[model])
```


#### User configuration for Sensitivity Pruner

##### PyTorch

```eval_rst
662
..  autoclass:: nni.algorithms.compression.pytorch.pruning.SensitivityPruner
liuzhe-lz's avatar
liuzhe-lz committed
663
```