"...git@developer.sourcefind.cn:renzhc/diffusers_dcu.git" did not exist on "29ece0db7966a8d25fdd8f314a13096194bc5568"
Unverified Commit f5caa193 authored by Guoxin's avatar Guoxin Committed by GitHub
Browse files

Auto pruners (#2490)

parent a3b0bd7d
...@@ -144,6 +144,10 @@ Within the following table, we summarized the current NNI capabilities, we are g ...@@ -144,6 +144,10 @@ Within the following table, we summarized the current NNI capabilities, we are g
<li><a href="docs/en_US/Compressor/Pruner.md#agp-pruner">AGP Pruner</a></li> <li><a href="docs/en_US/Compressor/Pruner.md#agp-pruner">AGP Pruner</a></li>
<li><a href="docs/en_US/Compressor/Pruner.md#slim-pruner">Slim Pruner</a></li> <li><a href="docs/en_US/Compressor/Pruner.md#slim-pruner">Slim Pruner</a></li>
<li><a href="docs/en_US/Compressor/Pruner.md#fpgm-pruner">FPGM Pruner</a></li> <li><a href="docs/en_US/Compressor/Pruner.md#fpgm-pruner">FPGM Pruner</a></li>
<li><a href="docs/en_US/Compressor/Pruner.md#netadapt-pruner">NetAdapt Pruner</a></li>
<li><a href="docs/en_US/Compressor/Pruner.md#simulatedannealing-pruner">SimulatedAnnealing Pruner</a></li>
<li><a href="docs/en_US/Compressor/Pruner.md#admm-pruner">ADMM Pruner</a></li>
<li><a href="docs/en_US/Compressor/Pruner.md#autocompress-pruner">AutoCompress Pruner</a></li>
</ul> </ul>
<b>Quantization</b> <b>Quantization</b>
<ul> <ul>
......
...@@ -37,6 +37,10 @@ Pruning algorithms compress the original network by removing redundant weights o ...@@ -37,6 +37,10 @@ Pruning algorithms compress the original network by removing redundant weights o
| [ActivationMeanRankFilterPruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#activationmeanrankfilterpruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations | | [ActivationMeanRankFilterPruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#activationmeanrankfilterpruner) | Pruning filters based on the metric that calculates the smallest mean value of output activations |
| [Slim Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) | | [Slim Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) [Reference Paper](https://arxiv.org/abs/1708.06519) |
| [TaylorFO Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#taylorfoweightfilterpruner) | Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) [Reference Paper](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf) | | [TaylorFO Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#taylorfoweightfilterpruner) | Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) [Reference Paper](http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf) |
| [ADMM Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#admm-pruner) | Pruning based on ADMM optimization technique [Reference Paper](https://arxiv.org/abs/1804.03294) |
| [NetAdapt Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#netadapt-pruner) | Automatically simplify a pretrained network to meet the resource budget by iterative pruning [Reference Paper](https://arxiv.org/abs/1804.03230) |
| [SimulatedAnnealing Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#simulatedannealing-pruner) | Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm [Reference Paper](https://arxiv.org/abs/1907.03141) |
| [AutoCompress Pruner](https://nni.readthedocs.io/en/latest/Compressor/Pruner.html#autocompress-pruner) | Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner [Reference Paper](https://arxiv.org/abs/1907.03141) |
### Quantization Algorithms ### Quantization Algorithms
......
...@@ -17,8 +17,12 @@ We provide several pruning algorithms that support fine-grained weight pruning a ...@@ -17,8 +17,12 @@ We provide several pruning algorithms that support fine-grained weight pruning a
**Pruning Schedule** **Pruning Schedule**
* [AGP Pruner](#agp-pruner) * [AGP Pruner](#agp-pruner)
* [NetAdapt Pruner](#netadapt-pruner)
* [SimulatedAnnealing Pruner](#simulatedannealing-pruner)
* [AutoCompress Pruner](#autocompress-pruner)
**Others** **Others**
* [ADMM Pruner](#admm-pruner)
* [Lottery Ticket Hypothesis](#lottery-ticket-hypothesis) * [Lottery Ticket Hypothesis](#lottery-ticket-hypothesis)
## Level Pruner ## Level Pruner
...@@ -349,6 +353,290 @@ You can view example for more information ...@@ -349,6 +353,290 @@ You can view example for more information
*** ***
## NetAdapt Pruner
NetAdapt allows a user to automatically simplify a pretrained network to meet the resource budget.
Given the overall sparsity, NetAdapt will automatically generate the sparsities distribution among different layers by iterative pruning.
For more details, please refer to [NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications](https://arxiv.org/abs/1804.03230).
![](../../img/algo_NetAdapt.png)
#### Usage
PyTorch code
```python
from nni.compression.torch import NetAdaptPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = NetAdaptPruner(model, config_list, short_term_fine_tuner=short_term_fine_tuner, evaluator=evaluator,base_algo='l1', experiment_data_dir='./')
pruner.compress()
```
You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/auto_pruners_torch.py) for more information.
#### User configuration for NetAdapt Pruner
- **sparsity:** The target overall sparsity.
- **op_types:** The operation type to prune. If `base_algo` is `l1` or `l2`, then only `Conv2d` is supported as `op_types`.
- **short_term_fine_tuner:** Function to short-term fine tune the masked model.
This function should include `model` as the only parameter, and fine tune the model for a short term after each pruning iteration.
Example:
```python
>>> def short_term_fine_tuner(model, epoch=3):
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> train_loader = ...
>>> criterion = torch.nn.CrossEntropyLoss()
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
>>> model.train()
>>> for _ in range(epoch):
>>> for batch_idx, (data, target) in enumerate(train_loader):
>>> data, target = data.to(device), target.to(device)
>>> optimizer.zero_grad()
>>> output = model(data)
>>> loss = criterion(output, target)
>>> loss.backward()
>>> optimizer.step()
```
- **evaluator:** Function to evaluate the masked model. This function should include `model` as the only parameter, and returns a scalar value.
Example::
```python
>>> def evaluator(model):
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> val_loader = ...
>>> model.eval()
>>> correct = 0
>>> with torch.no_grad():
>>> for data, target in val_loader:
>>> data, target = data.to(device), target.to(device)
>>> output = model(data)
>>> # get the index of the max log-probability
>>> pred = output.argmax(dim=1, keepdim=True)
>>> correct += pred.eq(target.view_as(pred)).sum().item()
>>> accuracy = correct / len(val_loader.dataset)
>>> return accuracy
```
- **optimize_mode:** Optimize mode, `maximize` or `minimize`, by default `maximize`.
- **base_algo:** Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
Given the sparsity distribution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
- **sparsity_per_iteration:** The sparsity to prune in each iteration. NetAdapt Pruner prune the model by the same level in each iteration to meet the resource budget progressively.
- **experiment_data_dir:** PATH to save experiment data, including the config_list generated for the base pruning algorithm and the performance of the pruned model.
## SimulatedAnnealing Pruner
We implement a guided heuristic search method, Simulated Annealing (SA) algorithm, with enhancement on guided search based on prior experience.
The enhanced SA technique is based on the observation that a DNN layer with more number of weights often has a higher degree of model compression with less impact on overall accuracy.
- Randomly initialize a pruning rate distribution (sparsities).
- While current_temperature < stop_temperature:
1. generate a perturbation to current distribution
2. Perform fast evaluation on the perturbated distribution
3. accept the perturbation according to the performance and probability, if not accepted, return to step 1
4. cool down, current_temperature <- current_temperature * cool_down_rate
For more details, please refer to [AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates](https://arxiv.org/abs/1907.03141).
#### Usage
PyTorch code
```python
from nni.compression.torch import SimulatedAnnealingPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = SimulatedAnnealingPruner(model, config_list, evaluator=evaluator, base_algo='l1', cool_down_rate=0.9, experiment_data_dir='./')
pruner.compress()
```
You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/auto_pruners_torch.py) for more information.
#### User configuration for SimulatedAnnealing Pruner
- **sparsity:** The target overall sparsity.
- **op_types:** The operation type to prune. If `base_algo` is `l1` or `l2`, then only `Conv2d` is supported as `op_types`.
- **evaluator:** Function to evaluate the masked model. This function should include `model` as the only parameter, and returns a scalar value.
Example::
```python
>>> def evaluator(model):
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> val_loader = ...
>>> model.eval()
>>> correct = 0
>>> with torch.no_grad():
>>> for data, target in val_loader:
>>> data, target = data.to(device), target.to(device)
>>> output = model(data)
>>> # get the index of the max log-probability
>>> pred = output.argmax(dim=1, keepdim=True)
>>> correct += pred.eq(target.view_as(pred)).sum().item()
>>> accuracy = correct / len(val_loader.dataset)
>>> return accuracy
```
- **optimize_mode:** Optimize mode, `maximize` or `minimize`, by default `maximize`.
- **base_algo:** Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
Given the sparsity distribution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
- **start_temperature:** Simualated Annealing related parameter.
- **stop_temperature:** Simualated Annealing related parameter.
- **cool_down_rate:** Simualated Annealing related parameter.
- **perturbation_magnitude:** Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.
- **experiment_data_dir:** PATH to save experiment data, including the config_list generated for the base pruning algorithm, the performance of the pruned model and the pruning history.
## AutoCompress Pruner
For each round, AutoCompressPruner prune the model for the same sparsity to achive the overall sparsity:
1. Generate sparsities distribution using SimualtedAnnealingPruner
2. Perform ADMM-based structured pruning to generate pruning result for the next round.
Here we use `speedup` to perform real pruning.
For more details, please refer to [AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates](https://arxiv.org/abs/1907.03141).
#### Usage
PyTorch code
```python
from nni.compression.torch import ADMMPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = AutoCompressPruner(
model, config_list, trainer=trainer, evaluator=evaluator,
dummy_input=dummy_input, num_iterations=3, optimize_mode='maximize', base_algo='l1',
cool_down_rate=0.9, admm_num_iterations=30, admm_training_epochs=5, experiment_data_dir='./')
pruner.compress()
```
You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/auto_pruners_torch.py) for more information.
#### User configuration for AutoCompress Pruner
- **sparsity:** The target overall sparsity.
- **op_types:** The operation type to prune. If `base_algo` is `l1` or `l2`, then only `Conv2d` is supported as `op_types`.
- **trainer:** Function used for the first subproblem.
Users should write this function as a normal function to train the Pytorch model and include `model, optimizer, criterion, epoch, callback` as function arguments.
Here `callback` acts as an L2 regulizer as presented in the formula (7) of the original paper.
The logic of `callback` is implemented inside the Pruner, users are just required to insert `callback()` between `loss.backward()` and `optimizer.step()`.
Example:
```python
>>> def trainer(model, criterion, optimizer, epoch, callback):
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> train_loader = ...
>>> model.train()
>>> for batch_idx, (data, target) in enumerate(train_loader):
>>> data, target = data.to(device), target.to(device)
>>> optimizer.zero_grad()
>>> output = model(data)
>>> loss = criterion(output, target)
>>> loss.backward()
>>> # callback should be inserted between loss.backward() and optimizer.step()
>>> if callback:
>>> callback()
>>> optimizer.step()
```
- **evaluator:** Function to evaluate the masked model. This function should include `model` as the only parameter, and returns a scalar value.
Example::
```python
>>> def evaluator(model):
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> val_loader = ...
>>> model.eval()
>>> correct = 0
>>> with torch.no_grad():
>>> for data, target in val_loader:
>>> data, target = data.to(device), target.to(device)
>>> output = model(data)
>>> # get the index of the max log-probability
>>> pred = output.argmax(dim=1, keepdim=True)
>>> correct += pred.eq(target.view_as(pred)).sum().item()
>>> accuracy = correct / len(val_loader.dataset)
>>> return accuracy
```
- **dummy_input:** The dummy input for model speed up, users should put it on right device before pass in.
- **iterations:** The number of overall iterations.
- **optimize_mode:** Optimize mode, `maximize` or `minimize`, by default `maximize`.
- **base_algo:** Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
Given the sparsity distribution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
- **start_temperature:** Simualated Annealing related parameter.
- **stop_temperature:** Simualated Annealing related parameter.
- **cool_down_rate:** Simualated Annealing related parameter.
- **perturbation_magnitude:** Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature.
- **admm_num_iterations:** Number of iterations of ADMM Pruner.
- **admm_training_epochs:** Training epochs of the first optimization subproblem of ADMMPruner.
- **experiment_data_dir:** PATH to store temporary experiment data.
## ADMM Pruner
Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique,
by decomposing the original nonconvex problem into two subproblems that can be solved iteratively. In weight pruning problem, these two subproblems are solved via 1) gradient descent algorithm and 2) Euclidean projection respectively.
During the process of solving these two subproblems, the weights of the original model will be changed. An one-shot pruner will then be applied to prune the model according to the config list given.
This solution framework applies both to non-structured and different variations of structured pruning schemes.
For more details, please refer to [A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers](https://arxiv.org/abs/1804.03294).
#### Usage
PyTorch code
```python
from nni.compression.torch import ADMMPruner
config_list = [{
'sparsity': 0.8,
'op_types': ['Conv2d'],
'op_names': ['conv1']
}, {
'sparsity': 0.92,
'op_types': ['Conv2d'],
'op_names': ['conv2']
}]
pruner = ADMMPruner(model, config_list, trainer=trainer, num_iterations=30, epochs=5)
pruner.compress()
```
You can view [example](https://github.com/microsoft/nni/blob/master/examples/model_compress/auto_pruners_torch.py) for more information.
#### User configuration for ADMM Pruner
- **sparsity:** This is to specify the sparsity operations to be compressed to.
- **op_types:** The operation type to prune. If `base_algo` is `l1` or `l2`, then only `Conv2d` is supported as `op_types`.
- **trainer:** Function used for the first subproblem in ADMM optimization, attention, this is not used for fine-tuning.
Users should write this function as a normal function to train the Pytorch model and include `model, optimizer, criterion, epoch, callback` as function arguments.
Here `callback` acts as an L2 regulizer as presented in the formula (7) of the original paper.
The logic of `callback` is implemented inside the Pruner, users are just required to insert `callback()` between `loss.backward()` and `optimizer.step()`.
Example:
```python
>>> def trainer(model, criterion, optimizer, epoch, callback):
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> train_loader = ...
>>> model.train()
>>> for batch_idx, (data, target) in enumerate(train_loader):
>>> data, target = data.to(device), target.to(device)
>>> optimizer.zero_grad()
>>> output = model(data)
>>> loss = criterion(output, target)
>>> loss.backward()
>>> # callback should be inserted between loss.backward() and optimizer.step()
>>> if callback:
>>> callback()
>>> optimizer.step()
```
- **num_iterations:** Total number of iterations.
- **training_epochs:** Training epochs of the first subproblem.
- **row:** Penalty parameters for ADMM training.
- **base_algo:** Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`.
Given the sparsity distribution among the ops, the assigned `base_algo` is used to decide which filters/channels/weights to prune.
## Lottery Ticket Hypothesis ## Lottery Ticket Hypothesis
[The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635), authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis, and articulate the *lottery ticket hypothesis*: dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*) that -- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations. [The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks](https://arxiv.org/abs/1803.03635), authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis, and articulate the *lottery ticket hypothesis*: dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*) that -- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations.
...@@ -396,7 +684,3 @@ We try to reproduce the experiment result of the fully connected network on MNIS ...@@ -396,7 +684,3 @@ We try to reproduce the experiment result of the fully connected network on MNIS
![](../../img/lottery_ticket_mnist_fc.png) ![](../../img/lottery_ticket_mnist_fc.png)
The above figure shows the result of the fully connected network. `round0-sparsity-0.0` is the performance without pruning. Consistent with the paper, pruning around 80% also obtain similar performance compared to non-pruning, and converges a little faster. If pruning too much, e.g., larger than 94%, the accuracy becomes lower and convergence becomes a little slower. A little different from the paper, the trend of the data in the paper is relatively more clear. The above figure shows the result of the fully connected network. `round0-sparsity-0.0` is the performance without pruning. Consistent with the paper, pruning around 80% also obtain similar performance compared to non-pruning, and converges a little faster. If pruning too much, e.g., larger than 94%, the accuracy becomes lower and convergence becomes a little slower. A little different from the paper, the trend of the data in the paper is relatively more clear.
 
\ No newline at end of file
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
'''
Examples for automatic pruners
'''
import argparse
import os
import json
import torch
from torch.optim.lr_scheduler import StepLR, MultiStepLR
from torchvision import datasets, transforms, models
from models.mnist.lenet import LeNet
from models.cifar10.vgg import VGG
from nni.compression.torch import L1FilterPruner, SimulatedAnnealingPruner, ADMMPruner, NetAdaptPruner, AutoCompressPruner
from nni.compression.torch import ModelSpeedup
def get_data(args):
'''
get data
'''
kwargs = {'num_workers': 1, 'pin_memory': True} if torch.cuda.is_available() else {
}
if args.dataset == 'mnist':
train_loader = torch.utils.data.DataLoader(
datasets.MNIST(args.data_dir, train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
val_loader = torch.utils.data.DataLoader(
datasets.MNIST(args.data_dir, train=False,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.test_batch_size, shuffle=True, **kwargs)
criterion = torch.nn.NLLLoss()
elif args.dataset == 'cifar10':
normalize = transforms.Normalize(
(0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10(args.data_dir, train=True, transform=transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32, 4),
transforms.ToTensor(),
normalize,
]), download=True),
batch_size=args.batch_size, shuffle=True, **kwargs)
val_loader = torch.utils.data.DataLoader(
datasets.CIFAR10(args.data_dir, train=False, transform=transforms.Compose([
transforms.ToTensor(),
normalize,
])),
batch_size=args.batch_size, shuffle=False, **kwargs)
criterion = torch.nn.CrossEntropyLoss()
elif args.dataset == 'imagenet':
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
train_loader = torch.utils.data.DataLoader(
datasets.ImageFolder(os.path.join(args.data_dir, 'train'),
transform=transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize,
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
val_loader = torch.utils.data.DataLoader(
datasets.ImageFolder(os.path.join(args.data_dir, 'val'),
transform=transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize,
])),
batch_size=args.test_batch_size, shuffle=True, **kwargs)
criterion = torch.nn.CrossEntropyLoss()
return train_loader, val_loader, criterion
def train(args, model, device, train_loader, criterion, optimizer, epoch, callback=None):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
# callback should be inserted between loss.backward() and optimizer.step()
if callback:
callback()
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def test(model, device, criterion, val_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in val_loader:
data, target = data.to(device), target.to(device)
output = model(data)
# sum up batch loss
test_loss += criterion(output, target).item()
# get the index of the max log-probability
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(val_loader.dataset)
accuracy = correct / len(val_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
test_loss, correct, len(val_loader.dataset), 100. * accuracy))
return accuracy
def get_trained_model(args, device, train_loader, val_loader, criterion):
if args.model == 'LeNet':
model = LeNet().to(device)
optimizer = torch.optim.Adadelta(model.parameters(), lr=1)
scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
for epoch in range(args.pretrain_epochs):
train(args, model, device, train_loader,
criterion, optimizer, epoch)
scheduler.step()
elif args.model == 'vgg16':
model = VGG(depth=16).to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01,
momentum=0.9,
weight_decay=5e-4)
scheduler = MultiStepLR(
optimizer, milestones=[int(args.pretrain_epochs*0.5), int(args.pretrain_epochs*0.75)], gamma=0.1)
for epoch in range(args.pretrain_epochs):
train(args, model, device, train_loader,
criterion, optimizer, epoch)
scheduler.step()
elif args.model == 'resnet18':
model = models.resnet18(pretrained=False, num_classes=10).to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01,
momentum=0.9,
weight_decay=5e-4)
scheduler = MultiStepLR(
optimizer, milestones=[int(args.pretrain_epochs*0.5), int(args.pretrain_epochs*0.75)], gamma=0.1)
for epoch in range(args.pretrain_epochs):
train(args, model, device, train_loader,
criterion, optimizer, epoch)
scheduler.step()
elif args.model == 'mobilenet_v2':
model = models.mobilenet_v2(pretrained=True).to(device)
if args.save_model:
torch.save(model.state_dict(), os.path.join(
args.experiment_data_dir, 'model_trained.pth'))
print('Model trained saved to %s', args.experiment_data_dir)
return model, optimizer
def get_dummy_input(args, device):
if args.dataset == 'mnist':
dummy_input = torch.randn(
[args.test_batch_size, 1, 28, 28]).to(device)
elif args.dataset in ['cifar10', 'imagenet']:
dummy_input = torch.randn(
[args.test_batch_size, 3, 32, 32]).to(device)
return dummy_input
def main(args):
# prepare dataset
torch.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader, val_loader, criterion = get_data(args)
model, optimizer = get_trained_model(args, device, train_loader, val_loader, criterion)
def short_term_fine_tuner(model, epochs=1):
for epoch in range(epochs):
train(args, model, device, train_loader, criterion, optimizer, epoch)
def trainer(model, optimizer, criterion, epoch, callback):
return train(args, model, device, train_loader, criterion, optimizer, epoch=epoch, callback=callback)
def evaluator(model):
return test(model, device, criterion, val_loader)
# used to save the performance of the original & pruned & finetuned models
result = {}
evaluation_result = evaluator(model)
print('Evaluation result (original model): %s' % evaluation_result)
result['original'] = evaluation_result
# module types to prune, only "Conv2d" supported for channel pruning
if args.base_algo in ['l1', 'l2']:
op_types = ['Conv2d']
elif args.base_algo == 'level':
op_types = ['default']
config_list = [{
'sparsity': args.sparsity,
'op_types': op_types
}]
dummy_input = get_dummy_input(args, device)
if args.pruner == 'L1FilterPruner':
pruner = L1FilterPruner(model, config_list)
elif args.pruner == 'NetAdaptPruner':
pruner = NetAdaptPruner(model, config_list, short_term_fine_tuner=short_term_fine_tuner, evaluator=evaluator,
base_algo=args.base_algo, experiment_data_dir=args.experiment_data_dir)
elif args.pruner == 'ADMMPruner':
# users are free to change the config here
if args.model == 'LeNet':
if args.base_algo in ['l1', 'l2']:
config_list = [{
'sparsity': 0.8,
'op_types': ['Conv2d'],
'op_names': ['conv1']
}, {
'sparsity': 0.92,
'op_types': ['Conv2d'],
'op_names': ['conv2']
}]
elif args.base_algo == 'level':
config_list = [{
'sparsity': 0.8,
'op_names': ['conv1']
}, {
'sparsity': 0.92,
'op_names': ['conv2']
}, {
'sparsity': 0.991,
'op_names': ['fc1']
}, {
'sparsity': 0.93,
'op_names': ['fc2']
}]
else:
raise ValueError('Example only implemented for LeNet.')
pruner = ADMMPruner(model, config_list, trainer=trainer, num_iterations=2, training_epochs=2)
elif args.pruner == 'SimulatedAnnealingPruner':
pruner = SimulatedAnnealingPruner(
model, config_list, evaluator=evaluator, base_algo=args.base_algo,
cool_down_rate=args.cool_down_rate, experiment_data_dir=args.experiment_data_dir)
elif args.pruner == 'AutoCompressPruner':
pruner = AutoCompressPruner(
model, config_list, trainer=trainer, evaluator=evaluator, dummy_input=dummy_input,
num_iterations=3, optimize_mode='maximize', base_algo=args.base_algo,
cool_down_rate=args.cool_down_rate, admm_num_iterations=30, admm_training_epochs=5,
experiment_data_dir=args.experiment_data_dir)
else:
raise ValueError(
"Please use L1FilterPruner, NetAdaptPruner, SimulatedAnnealingPruner, ADMMPruner or AutoCompressPruner in this example.")
# Pruner.compress() returns the masked model
# but for AutoCompressPruner, Pruner.compress() returns directly the pruned model
model_masked = pruner.compress()
evaluation_result = evaluator(model_masked)
print('Evaluation result (masked model): %s' % evaluation_result)
result['pruned'] = evaluation_result
if args.save_model:
pruner.export_model(
os.path.join(args.experiment_data_dir, 'model_masked.pth'), os.path.join(args.experiment_data_dir, 'mask.pth'))
print('Masked model saved to %s', args.experiment_data_dir)
if args.fine_tune:
if args.dataset == 'mnist':
optimizer = torch.optim.Adadelta(model_masked.parameters(), lr=1)
scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
for epoch in range(args.fine_tune_epochs):
train(args, model_masked, device, train_loader, criterion, optimizer, epoch)
scheduler.step()
test(model_masked, device, criterion, val_loader)
elif args.dataset == 'cifar10':
optimizer = torch.optim.SGD(model_masked.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4)
scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
for epoch in range(args.fine_tune_epochs):
train(args, model_masked, device, train_loader, criterion, optimizer, epoch)
scheduler.step()
test(model_masked, device, criterion, val_loader)
elif args.dataset == 'imagenet':
for epoch in range(args.fine_tune_epochs):
optimizer = torch.optim.SGD(model_masked.parameters(), lr=0.05, momentum=0.9, weight_decay=5e-4)
train(args, model_masked, device, train_loader, criterion, optimizer, epoch)
test(model_masked, device, criterion, val_loader)
evaluation_result = evaluator(model_masked)
print('Evaluation result (fine tuned): %s' % evaluation_result)
result['finetuned'] = evaluation_result
if args.save_model:
pruner.export_model(os.path.join(
args.experiment_data_dir, 'model_fine_tuned.pth'), os.path.join(args.experiment_data_dir, 'mask.pth'))
print('Fined tuned model saved to %s', args.experiment_data_dir)
# model speed up
if args.speed_up and args.pruner != 'AutoCompressPruner':
if args.model == 'LeNet':
model = LeNet().to(device)
elif args.model == 'vgg16':
model = VGG(depth=16).to(device)
elif args.model == 'resnet18':
model = models.resnet18(pretrained=False, num_classes=10).to(device)
elif args.model == 'mobilenet_v2':
model = models.mobilenet_v2(pretrained=False).to(device)
model.load_state_dict(torch.load(os.path.join(args.experiment_data_dir, 'model_fine_tuned.pth')))
masks_file = os.path.join(args.experiment_data_dir, 'mask.pth')
m_speedup = ModelSpeedup(model, dummy_input, masks_file, device)
m_speedup.speedup_model()
evaluation_result = evaluator(model)
print('Evaluation result (speed up model): %s' % evaluation_result)
result['speedup'] = evaluation_result
torch.save(model.state_dict(), os.path.join(args.experiment_data_dir, 'model_speed_up.pth'))
print('Speed up model saved to %s', args.experiment_data_dir)
with open(os.path.join(args.experiment_data_dir, 'performance.json'), 'w+') as f:
json.dump(result, f)
if __name__ == '__main__':
def str2bool(v):
if isinstance(v, bool):
return v
if v.lower() in ('yes', 'true', 't', 'y', '1'):
return True
elif v.lower() in ('no', 'false', 'f', 'n', '0'):
return False
else:
raise argparse.ArgumentTypeError('Boolean value expected.')
parser = argparse.ArgumentParser(description='PyTorch Example for SimulatedAnnealingPruner')
parser.add_argument('--pruner', type=str, default='SimulatedAnnealingPruner',
help='pruner to use, L1FilterPruner, NetAdaptPruner, SimulatedAnnealingPruner, ADMMPruner or AutoCompressPruner')
parser.add_argument('--base-algo', type=str, default='l1',
help='base pruning algorithm. level, l1 or l2')
parser.add_argument('--sparsity', type=float, default=0.3,
help='overall target sparsity')
parser.add_argument('--speed-up', type=str2bool, default=False,
help='Whether to speed-up the pruned model')
# param for SimulatedAnnealingPruner
parser.add_argument('--cool-down-rate', type=float, default=0.9,
help='cool down rate')
# param for NetAdaptPruner
parser.add_argument('--sparsity-per-iteration', type=float, default=0.05,
help='sparsity_per_iteration of NetAdaptPruner')
parser.add_argument('--dataset', type=str, default='mnist',
help='dataset to use, mnist, cifar10 or imagenet (default MNIST)')
parser.add_argument('--model', type=str, default='LeNet',
help='model to use, LeNet, vgg16, resnet18 or mobilenet_v2')
parser.add_argument('--fine-tune', type=str2bool, default=True,
help='whether to fine-tune the pruned model')
parser.add_argument('--fine-tune-epochs', type=int, default=10,
help='epochs to fine tune')
parser.add_argument('--data-dir', type=str, default='/datasets/',
help='dataset directory')
parser.add_argument('--experiment-data-dir', type=str, default='./',
help='For saving experiment data')
parser.add_argument('--batch-size', type=int, default=64,
help='input batch size for training (default: 64)')
parser.add_argument('--test-batch-size', type=int, default=64,
help='input batch size for testing (default: 64)')
parser.add_argument('--pretrain-epochs', type=int, default=1,
help='number of epochs to pretrain the model')
parser.add_argument('--log-interval', type=int, default=200,
help='how many batches to wait before logging training status')
parser.add_argument('--save-model', type=str2bool, default=True,
help='For Saving the current Model')
args = parser.parse_args()
if not os.path.exists(args.experiment_data_dir):
os.makedirs(args.experiment_data_dir)
main(args)
import torch
import torch.nn as nn
import torch.nn.functional as F
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3, 1)
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(9216, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
# Copyright (c) Microsoft Corporation. # Copyright (c) Microsoft Corporation.
# Licensed under the MIT license. # Licensed under the MIT license.
from .speedup import ModelSpeedup
from .pruning import * from .pruning import *
from .quantization import * from .quantization import *
from .compressor import Compressor, Pruner, Quantizer from .compressor import Compressor, Pruner, Quantizer
from .speedup import ModelSpeedup
...@@ -346,7 +346,7 @@ class Pruner(Compressor): ...@@ -346,7 +346,7 @@ class Pruner(Compressor):
config : dict config : dict
the configuration for generating the mask the configuration for generating the mask
""" """
_logger.info("compressing module %s.", layer.name) _logger.info("Module detected to compress : %s.", layer.name)
wrapper = PrunerModuleWrapper(layer.module, layer.name, layer.type, config, self) wrapper = PrunerModuleWrapper(layer.module, layer.name, layer.type, config, self)
assert hasattr(layer.module, 'weight'), "module %s does not have 'weight' attribute" % layer.name assert hasattr(layer.module, 'weight'), "module %s does not have 'weight' attribute" % layer.name
# move newly registered buffers to the same device of weight # move newly registered buffers to the same device of weight
...@@ -381,7 +381,7 @@ class Pruner(Compressor): ...@@ -381,7 +381,7 @@ class Pruner(Compressor):
if weight_mask is not None: if weight_mask is not None:
mask_sum = weight_mask.sum().item() mask_sum = weight_mask.sum().item()
mask_num = weight_mask.numel() mask_num = weight_mask.numel()
_logger.info('Layer: %s Sparsity: %.2f', wrapper.name, 1 - mask_sum / mask_num) _logger.info('Layer: %s Sparsity: %.4f', wrapper.name, 1 - mask_sum / mask_num)
wrapper.module.weight.data = wrapper.module.weight.data.mul(weight_mask) wrapper.module.weight.data = wrapper.module.weight.data.mul(weight_mask)
if bias_mask is not None: if bias_mask is not None:
wrapper.module.bias.data = wrapper.module.bias.data.mul(bias_mask) wrapper.module.bias.data = wrapper.module.bias.data.mul(bias_mask)
......
...@@ -7,3 +7,7 @@ from .apply_compression import apply_compression_results ...@@ -7,3 +7,7 @@ from .apply_compression import apply_compression_results
from .one_shot import * from .one_shot import *
from .agp import * from .agp import *
from .lottery_ticket import LotteryTicketPruner from .lottery_ticket import LotteryTicketPruner
from .simulated_annealing_pruner import SimulatedAnnealingPruner
from .net_adapt_pruner import NetAdaptPruner
from .admm_pruner import ADMMPruner
from .auto_compress_pruner import AutoCompressPruner
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import logging
import torch
from schema import And, Optional
from ..utils.config_validation import CompressorSchema
from .constants import MASKER_DICT
from .one_shot import OneshotPruner
_logger = logging.getLogger(__name__)
class ADMMPruner(OneshotPruner):
"""
This is a Pytorch implementation of ADMM Pruner algorithm.
Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique,
by decomposing the original nonconvex problem into two subproblems that can be solved iteratively.
In weight pruning problem, these two subproblems are solved via 1) gradient descent algorithm and 2) Euclidean projection respectively.
This solution framework applies both to non-structured and different variations of structured pruning schemes.
For more details, please refer to the paper: https://arxiv.org/abs/1804.03294.
"""
def __init__(self, model, config_list, trainer, num_iterations=30, training_epochs=5, row=1e-4, base_algo='l1'):
"""
Parameters
----------
model : torch.nn.module
Model to be pruned
config_list : list
List on pruning configs
trainer : function
Function used for the first subproblem.
Users should write this function as a normal function to train the Pytorch model
and include `model, optimizer, criterion, epoch, callback` as function arguments.
Here `callback` acts as an L2 regulizer as presented in the formula (7) of the original paper.
The logic of `callback` is implemented inside the Pruner,
users are just required to insert `callback()` between `loss.backward()` and `optimizer.step()`.
Example::
```
>>> def trainer(model, criterion, optimizer, epoch, callback):
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> train_loader = ...
>>> model.train()
>>> for batch_idx, (data, target) in enumerate(train_loader):
>>> data, target = data.to(device), target.to(device)
>>> optimizer.zero_grad()
>>> output = model(data)
>>> loss = criterion(output, target)
>>> loss.backward()
>>> # callback should be inserted between loss.backward() and optimizer.step()
>>> if callback:
>>> callback()
>>> optimizer.step()
```
num_iterations : int
Total number of iterations.
training_epochs : int
Training epochs of the first subproblem.
row : float
Penalty parameters for ADMM training.
base_algo : str
Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`. Given the sparsity distribution among the ops,
the assigned `base_algo` is used to decide which filters/channels/weights to prune.
"""
self._base_algo = base_algo
super().__init__(model, config_list)
self._trainer = trainer
self._num_iterations = num_iterations
self._training_epochs = training_epochs
self._row = row
self.set_wrappers_attribute("if_calculated", False)
self.masker = MASKER_DICT[self._base_algo](self.bound_model, self)
def validate_config(self, model, config_list):
"""
Parameters
----------
model : torch.nn.module
Model to be pruned
config_list : list
List on pruning configs
"""
if self._base_algo == 'level':
schema = CompressorSchema([{
'sparsity': And(float, lambda n: 0 < n < 1),
Optional('op_types'): [str],
Optional('op_names'): [str],
}], model, _logger)
elif self._base_algo in ['l1', 'l2']:
schema = CompressorSchema([{
'sparsity': And(float, lambda n: 0 < n < 1),
'op_types': ['Conv2d'],
Optional('op_names'): [str]
}], model, _logger)
schema.validate(config_list)
def _projection(self, weight, sparsity):
'''
Return the Euclidean projection of the weight matrix according to the pruning mode.
Parameters
----------
weight : tensor
original matrix
sparsity : float
the ratio of parameters which need to be set to zero
Returns
-------
tensor
the projected matrix
'''
w_abs = weight.abs()
if self._base_algo == 'level':
k = int(weight.numel() * sparsity)
if k == 0:
mask_weight = torch.ones(weight.shape).type_as(weight)
else:
threshold = torch.topk(w_abs.view(-1), k, largest=False)[0].max()
mask_weight = torch.gt(w_abs, threshold).type_as(weight)
elif self._base_algo in ['l1', 'l2']:
filters = weight.size(0)
num_prune = int(filters * sparsity)
if filters < 2 or num_prune < 1:
mask_weight = torch.ones(weight.size()).type_as(weight).detach()
else:
w_abs_structured = w_abs.view(filters, -1).sum(dim=1)
threshold = torch.topk(w_abs_structured.view(-1), num_prune, largest=False)[0].max()
mask_weight = torch.gt(w_abs_structured, threshold)[:, None, None, None].expand_as(weight).type_as(weight)
return weight.data.mul(mask_weight)
def compress(self):
"""
Compress the model with ADMM.
Returns
-------
torch.nn.Module
model with specified modules compressed.
"""
_logger.info('Starting ADMM Compression...')
# initiaze Z, U
# Z_i^0 = W_i^0
# U_i^0 = 0
Z = []
U = []
for wrapper in self.get_modules_wrapper():
z = wrapper.module.weight.data
Z.append(z)
U.append(torch.zeros_like(z))
optimizer = torch.optim.Adam(
self.bound_model.parameters(), lr=1e-3, weight_decay=5e-5)
# Loss = cross_entropy + l2 regulization + \Sum_{i=1}^N \row_i ||W_i - Z_i^k + U_i^k||^2
criterion = torch.nn.CrossEntropyLoss()
# callback function to do additonal optimization, refer to the deriatives of Formula (7)
def callback():
for i, wrapper in enumerate(self.get_modules_wrapper()):
wrapper.module.weight.data -= self._row * \
(wrapper.module.weight.data - Z[i] + U[i])
# optimization iteration
for k in range(self._num_iterations):
_logger.info('ADMM iteration : %d', k)
# step 1: optimize W with AdamOptimizer
for epoch in range(self._training_epochs):
self._trainer(self.bound_model, optimizer=optimizer,
criterion=criterion, epoch=epoch, callback=callback)
# step 2: update Z, U
# Z_i^{k+1} = projection(W_i^{k+1} + U_i^k)
# U_i^{k+1} = U^k + W_i^{k+1} - Z_i^{k+1}
for i, wrapper in enumerate(self.get_modules_wrapper()):
z = wrapper.module.weight.data + U[i]
Z[i] = self._projection(z, wrapper.config['sparsity'])
U[i] = U[i] + wrapper.module.weight.data - Z[i]
# apply prune
self.update_mask()
_logger.info('Compression finished.')
return self.bound_model
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import logging
import os
import copy
import torch
from schema import And, Optional
from nni.utils import OptimizeMode
from nni.compression.torch import ModelSpeedup
from ..compressor import Pruner
from ..utils.config_validation import CompressorSchema
from .simulated_annealing_pruner import SimulatedAnnealingPruner
from .admm_pruner import ADMMPruner
_logger = logging.getLogger(__name__)
class AutoCompressPruner(Pruner):
"""
This is a Pytorch implementation of AutoCompress pruning algorithm.
For each round, AutoCompressPruner prune the model for the same sparsity to achive the ovrall sparsity:
1. Generate sparsities distribution using SimualtedAnnealingPruner
2. Perform ADMM-based structured pruning to generate pruning result for the next round.
Here we use 'speedup' to perform real pruning.
For more details, please refer to the paper: https://arxiv.org/abs/1907.03141.
"""
def __init__(self, model, config_list, trainer, evaluator, dummy_input,
num_iterations=3, optimize_mode='maximize', base_algo='l1',
# SimulatedAnnealing related
start_temperature=100, stop_temperature=20, cool_down_rate=0.9, perturbation_magnitude=0.35,
# ADMM related
admm_num_iterations=30, admm_training_epochs=5, row=1e-4,
experiment_data_dir='./'):
"""
Parameters
----------
model : pytorch model
The model to be pruned
config_list : list
Supported keys:
- sparsity : The target overall sparsity.
- op_types : The operation type to prune.
trainer : function
Function used for the first subproblem of ADMM Pruner.
Users should write this function as a normal function to train the Pytorch model
and include `model, optimizer, criterion, epoch, callback` as function arguments.
Here `callback` acts as an L2 regulizer as presented in the formula (7) of the original paper.
The logic of `callback` is implemented inside the Pruner,
users are just required to insert `callback()` between `loss.backward()` and `optimizer.step()`.
Example::
```
>>> def trainer(model, criterion, optimizer, epoch, callback):
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> train_loader = ...
>>> model.train()
>>> for batch_idx, (data, target) in enumerate(train_loader):
>>> data, target = data.to(device), target.to(device)
>>> optimizer.zero_grad()
>>> output = model(data)
>>> loss = criterion(output, target)
>>> loss.backward()
>>> # callback should be inserted between loss.backward() and optimizer.step()
>>> if callback:
>>> callback()
>>> optimizer.step()
```
evaluator : function
function to evaluate the pruned model.
This function should include `model` as the only parameter, and returns a scalar value.
Example::
>>> def evaluator(model):
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> val_loader = ...
>>> model.eval()
>>> correct = 0
>>> with torch.no_grad():
>>> for data, target in val_loader:
>>> data, target = data.to(device), target.to(device)
>>> output = model(data)
>>> # get the index of the max log-probability
>>> pred = output.argmax(dim=1, keepdim=True)
>>> correct += pred.eq(target.view_as(pred)).sum().item()
>>> accuracy = correct / len(val_loader.dataset)
>>> return accuracy
dummy_input : pytorch tensor
The dummy input for ```jit.trace```, users should put it on right device before pass in
num_iterations : int
Number of overall iterations
optimize_mode : str
optimize mode, `maximize` or `minimize`, by default `maximize`
base_algo : str
Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`. Given the sparsity distribution among the ops,
the assigned `base_algo` is used to decide which filters/channels/weights to prune.
start_temperature : float
Simualated Annealing related parameter
stop_temperature : float
Simualated Annealing related parameter
cool_down_rate : float
Simualated Annealing related parameter
perturbation_magnitude : float
Initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature
admm_num_iterations : int
Number of iterations of ADMM Pruner
admm_training_epochs : int
Training epochs of the first optimization subproblem of ADMMPruner
row : float
Penalty parameters for ADMM training
experiment_data_dir : string
PATH to store temporary experiment data
"""
# original model
self._model_to_prune = model
self._base_algo = base_algo
self._trainer = trainer
self._evaluator = evaluator
self._dummy_input = dummy_input
self._num_iterations = num_iterations
self._optimize_mode = OptimizeMode(optimize_mode)
# hyper parameters for SA algorithm
self._start_temperature = start_temperature
self._stop_temperature = stop_temperature
self._cool_down_rate = cool_down_rate
self._perturbation_magnitude = perturbation_magnitude
# hyper parameters for ADMM algorithm
self._admm_num_iterations = admm_num_iterations
self._admm_training_epochs = admm_training_epochs
self._row = row
# overall pruning rate
self._sparsity = config_list[0]['sparsity']
self._experiment_data_dir = experiment_data_dir
if not os.path.exists(self._experiment_data_dir):
os.makedirs(self._experiment_data_dir)
def validate_config(self, model, config_list):
"""
Parameters
----------
model : torch.nn.module
Model to be pruned
config_list : list
List on pruning configs
"""
if self._base_algo == 'level':
schema = CompressorSchema([{
'sparsity': And(float, lambda n: 0 < n < 1),
Optional('op_types'): [str],
Optional('op_names'): [str],
}], model, _logger)
elif self._base_algo in ['l1', 'l2']:
schema = CompressorSchema([{
'sparsity': And(float, lambda n: 0 < n < 1),
'op_types': ['Conv2d'],
Optional('op_names'): [str]
}], model, _logger)
schema.validate(config_list)
def calc_mask(self, wrapper, **kwargs):
return None
def compress(self):
"""
Compress the model with AutoCompress.
Returns
-------
torch.nn.Module
model with specified modules compressed.
"""
_logger.info('Starting AutoCompress pruning...')
sparsity_each_round = 1 - pow(1-self._sparsity, 1/self._num_iterations)
for i in range(self._num_iterations):
_logger.info('Pruning iteration: %d', i)
_logger.info('Target sparsity this round: %s',
1-pow(1-sparsity_each_round, i+1))
# SimulatedAnnealingPruner
_logger.info(
'Generating sparsities with SimulatedAnnealingPruner...')
SApruner = SimulatedAnnealingPruner(
model=copy.deepcopy(self._model_to_prune),
config_list=[
{"sparsity": sparsity_each_round, "op_types": ['Conv2d']}],
evaluator=self._evaluator,
optimize_mode=self._optimize_mode,
base_algo=self._base_algo,
start_temperature=self._start_temperature,
stop_temperature=self._stop_temperature,
cool_down_rate=self._cool_down_rate,
perturbation_magnitude=self._perturbation_magnitude,
experiment_data_dir=self._experiment_data_dir)
config_list = SApruner.compress(return_config_list=True)
_logger.info("Generated config_list : %s", config_list)
# ADMMPruner
_logger.info('Performing structured pruning with ADMMPruner...')
ADMMpruner = ADMMPruner(
model=copy.deepcopy(self._model_to_prune),
config_list=config_list,
trainer=self._trainer,
num_iterations=self._admm_num_iterations,
training_epochs=self._admm_training_epochs,
row=self._row,
base_algo=self._base_algo)
ADMMpruner.compress()
ADMMpruner.export_model(os.path.join(self._experiment_data_dir, 'model_admm_masked.pth'), os.path.join(
self._experiment_data_dir, 'mask.pth'))
# use speed up to prune the model before next iteration, because SimulatedAnnealingPruner & ADMMPruner don't take masked models
self._model_to_prune.load_state_dict(torch.load(os.path.join(
self._experiment_data_dir, 'model_admm_masked.pth')))
masks_file = os.path.join(self._experiment_data_dir, 'mask.pth')
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
_logger.info('Speeding up models...')
m_speedup = ModelSpeedup(self._model_to_prune, self._dummy_input, masks_file, device)
m_speedup.speedup_model()
evaluation_result = self._evaluator(self._model_to_prune)
_logger.info('Evaluation result of the pruned model in iteration %d: %s', i, evaluation_result)
_logger.info('----------Compression finished--------------')
os.remove(os.path.join(self._experiment_data_dir, 'model_admm_masked.pth'))
os.remove(os.path.join(self._experiment_data_dir, 'mask.pth'))
return self._model_to_prune
def export_model(self, model_path, mask_path=None, onnx_path=None, input_shape=None, device=None):
_logger.info("AutoCompressPruner export directly the pruned model without mask")
torch.save(self._model_to_prune.state_dict(), model_path)
_logger.info('Model state_dict saved to %s', model_path)
if onnx_path is not None:
assert input_shape is not None, 'input_shape must be specified to export onnx model'
# input info needed
if device is None:
device = torch.device('cpu')
input_data = torch.Tensor(*input_shape)
torch.onnx.export(self._model_to_prune, input_data.to(device), onnx_path)
_logger.info('Model in onnx with input shape %s saved to %s', input_data.shape, onnx_path)
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from .one_shot import LevelPruner, L1FilterPruner, L2FilterPruner
PRUNER_DICT = {
'level': LevelPruner,
'l1': L1FilterPruner,
'l2': L2FilterPruner
}
...@@ -29,4 +29,3 @@ class LevelPrunerMasker(WeightMasker): ...@@ -29,4 +29,3 @@ class LevelPrunerMasker(WeightMasker):
mask_weight = torch.gt(w_abs, threshold).type_as(weight) mask_weight = torch.gt(w_abs, threshold).type_as(weight)
mask = {'weight_mask': mask_weight} mask = {'weight_mask': mask_weight}
return mask return mask
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import logging
import os
import copy
import json
import torch
from schema import And, Optional
from nni.utils import OptimizeMode
from ..compressor import Pruner
from ..utils.config_validation import CompressorSchema
from ..utils.num_param_counter import get_total_num_weights
from .constants_pruner import PRUNER_DICT
_logger = logging.getLogger(__name__)
class NetAdaptPruner(Pruner):
"""
This is a Pytorch implementation of NetAdapt compression algorithm.
The pruning procedure can be described as follows:
While Res_i > Bud:
1. Con = Res_i - delta_Res
2. for every layer:
Choose Num Filters to prune
Choose which filter to prune
Short-term fine tune the pruned model
3. Pick the best layer to prune
Long-term fine tune
For the details of this algorithm, please refer to the paper: https://arxiv.org/abs/1804.03230
"""
def __init__(self, model, config_list, short_term_fine_tuner, evaluator,
optimize_mode='maximize', base_algo='l1', sparsity_per_iteration=0.05, experiment_data_dir='./'):
"""
Parameters
----------
model : pytorch model
The model to be pruned
config_list : list
Supported keys:
- sparsity : The target overall sparsity.
- op_types : The operation type to prune.
short_term_fine_tuner : function
function to short-term fine tune the masked model.
This function should include `model` as the only parameter,
and fine tune the model for a short term after each pruning iteration.
Example:
>>> def short_term_fine_tuner(model, epoch=3):
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> train_loader = ...
>>> criterion = torch.nn.CrossEntropyLoss()
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
>>> model.train()
>>> for _ in range(epoch):
>>> for _, (data, target) in enumerate(train_loader):
>>> data, target = data.to(device), target.to(device)
>>> optimizer.zero_grad()
>>> output = model(data)
>>> loss = criterion(output, target)
>>> loss.backward()
>>> optimizer.step()
evaluator : function
function to evaluate the masked model.
This function should include `model` as the only parameter, and returns a scalar value.
Example::
>>> def evaluator(model):
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> val_loader = ...
>>> model.eval()
>>> correct = 0
>>> with torch.no_grad():
>>> for data, target in val_loader:
>>> data, target = data.to(device), target.to(device)
>>> output = model(data)
>>> # get the index of the max log-probability
>>> pred = output.argmax(dim=1, keepdim=True)
>>> correct += pred.eq(target.view_as(pred)).sum().item()
>>> accuracy = correct / len(val_loader.dataset)
>>> return accuracy
optimize_mode : str
optimize mode, `maximize` or `minimize`, by default `maximize`.
base_algo : str
Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`. Given the sparsity distribution among the ops,
the assigned `base_algo` is used to decide which filters/channels/weights to prune.
sparsity_per_iteration : float
sparsity to prune in each iteration
experiment_data_dir : str
PATH to save experiment data,
including the config_list generated for the base pruning algorithm and the performance of the pruned model.
"""
# models used for iterative pruning and evaluation
self._model_to_prune = copy.deepcopy(model)
self._base_algo = base_algo
super().__init__(model, config_list)
self._short_term_fine_tuner = short_term_fine_tuner
self._evaluator = evaluator
self._optimize_mode = OptimizeMode(optimize_mode)
# hyper parameters for NetAdapt algorithm
self._sparsity_per_iteration = sparsity_per_iteration
# overall pruning rate
self._sparsity = config_list[0]['sparsity']
# config_list
self._config_list_generated = []
self._experiment_data_dir = experiment_data_dir
if not os.path.exists(self._experiment_data_dir):
os.makedirs(self._experiment_data_dir)
self._tmp_model_path = os.path.join(self._experiment_data_dir, 'tmp_model.pth')
def validate_config(self, model, config_list):
"""
Parameters
----------
model : torch.nn.module
Model to be pruned
config_list : list
List on pruning configs
"""
if self._base_algo == 'level':
schema = CompressorSchema([{
'sparsity': And(float, lambda n: 0 < n < 1),
Optional('op_types'): [str],
Optional('op_names'): [str],
}], model, _logger)
elif self._base_algo in ['l1', 'l2']:
schema = CompressorSchema([{
'sparsity': And(float, lambda n: 0 < n < 1),
'op_types': ['Conv2d'],
Optional('op_names'): [str]
}], model, _logger)
schema.validate(config_list)
def calc_mask(self, wrapper, **kwargs):
return None
def _update_config_list(self, config_list, op_name, sparsity):
'''
update sparsity of op_name in config_list
'''
config_list_updated = copy.deepcopy(config_list)
for idx, item in enumerate(config_list):
if op_name in item['op_names']:
config_list_updated[idx]['sparsity'] = sparsity
return config_list_updated
# if op_name is not in self._config_list_generated, create a new json item
if self._base_algo in ['l1', 'l2']:
config_list_updated.append(
{'sparsity': sparsity, 'op_types': ['Conv2d'], 'op_names': [op_name]})
elif self._base_algo == 'level':
config_list_updated.append(
{'sparsity': sparsity, 'op_names': [op_name]})
return config_list_updated
def _get_op_num_weights_remained(self, op_name, module):
'''
Get the number of weights remained after channel pruning with current sparsity
Returns
-------
int
remained number of weights of the op
'''
# if op is wrapped by the pruner
for wrapper in self.get_modules_wrapper():
if wrapper.name == op_name:
return wrapper.weight_mask.sum().item()
# if op is not wrapped by the pruner
return module.weight.data.numel()
def _get_op_sparsity(self, op_name):
for config in self._config_list_generated:
if 'op_names' in config and op_name in config['op_names']:
return config['sparsity']
return 0
def _calc_num_related_weights(self, op_name):
'''
Calculate total number weights of the op and the next op, applicable only for models without dependencies among ops
Parameters
----------
op_name : str
Returns
-------
int
total number of all the realted (current and the next) op weights
'''
num_weights = 0
flag_found = False
previous_name = None
previous_module = None
for name, module in self._model_to_prune.named_modules():
if not flag_found and name != op_name and type(module).__name__ in ['Conv2d', 'Linear']:
previous_name = name
previous_module = module
if not flag_found and name == op_name:
_logger.debug("original module found: %s", name)
num_weights = module.weight.data.numel()
# consider related pruning in this op caused by previous op's pruning
if previous_module:
sparsity_previous_op = self._get_op_sparsity(previous_name)
if sparsity_previous_op:
_logger.debug(
"decrease op's weights by %s due to previous op %s's pruning...", sparsity_previous_op, previous_name)
num_weights *= (1-sparsity_previous_op)
flag_found = True
continue
if flag_found and type(module).__name__ in ['Conv2d', 'Linear']:
_logger.debug("related module found: %s", name)
# channel/filter pruning crossing is considered here, so only the num_weights after channel pruning is valuable
num_weights += self._get_op_num_weights_remained(name, module)
break
_logger.debug("num related weights of op %s : %d", op_name, num_weights)
return num_weights
def compress(self):
"""
Compress the model.
Returns
-------
torch.nn.Module
model with specified modules compressed.
"""
_logger.info('Starting NetAdapt Compression...')
pruning_iteration = 0
current_sparsity = 0
delta_num_weights_per_iteration = \
int(get_total_num_weights(self._model_to_prune, ['Conv2d', 'Linear']) * self._sparsity_per_iteration)
# stop condition
while current_sparsity < self._sparsity:
_logger.info('Pruning iteration: %d', pruning_iteration)
# calculate target sparsity of this iteration
target_sparsity = current_sparsity + self._sparsity_per_iteration
# variable to store the info of the best layer found in this iteration
best_op = {}
for wrapper in self.get_modules_wrapper():
_logger.debug("op name : %s", wrapper.name)
_logger.debug("op weights : %d", wrapper.weight_mask.numel())
_logger.debug("op left weights : %d", wrapper.weight_mask.sum().item())
current_op_sparsity = 1 - wrapper.weight_mask.sum().item() / wrapper.weight_mask.numel()
_logger.debug("current op sparsity : %s", current_op_sparsity)
# sparsity that this layer needs to prune to satisfy the requirement
target_op_sparsity = current_op_sparsity + delta_num_weights_per_iteration / self._calc_num_related_weights(wrapper.name)
if target_op_sparsity >= 1:
_logger.info('Layer %s has no enough weights (remained) to prune', wrapper.name)
continue
config_list = self._update_config_list(self._config_list_generated, wrapper.name, target_op_sparsity)
_logger.debug("config_list used : %s", config_list)
pruner = PRUNER_DICT[self._base_algo](copy.deepcopy(self._model_to_prune), config_list)
model_masked = pruner.compress()
# Short-term fine tune the pruned model
self._short_term_fine_tuner(model_masked)
performance = self._evaluator(model_masked)
_logger.info("Layer : %s, evaluation result after short-term fine tuning : %s", wrapper.name, performance)
if not best_op \
or (self._optimize_mode is OptimizeMode.Maximize and performance > best_op['performance']) \
or (self._optimize_mode is OptimizeMode.Minimize and performance < best_op['performance']):
_logger.debug("updating best layer to %s...", wrapper.name)
# find weight mask of this layer
for w in pruner.get_modules_wrapper():
if w.name == wrapper.name:
masks = {'weight_mask': w.weight_mask,
'bias_mask': w.bias_mask}
break
best_op = {
'op_name': wrapper.name,
'sparsity': target_op_sparsity,
'performance': performance,
'masks': masks
}
# save model weights
pruner.export_model(self._tmp_model_path)
if not best_op:
# decrease pruning step
self._sparsity_per_iteration *= 0.5
_logger.info("No more layers to prune, decrease pruning step to %s", self._sparsity_per_iteration)
continue
# Pick the best layer to prune, update iterative information
# update config_list
self._config_list_generated = self._update_config_list(
self._config_list_generated, best_op['op_name'], best_op['sparsity'])
# update weights parameters
self._model_to_prune.load_state_dict(torch.load(self._tmp_model_path))
# update mask of the chosen op
for wrapper in self.get_modules_wrapper():
if wrapper.name == best_op['op_name']:
for k in best_op['masks']:
setattr(wrapper, k, best_op['masks'][k])
break
current_sparsity = target_sparsity
_logger.info('Pruning iteration %d finished, current sparsity: %s', pruning_iteration, current_sparsity)
_logger.info('Layer %s seleted with sparsity %s, performance after pruning & short term fine-tuning : %s',
best_op['op_name'], best_op['sparsity'], best_op['performance'])
pruning_iteration += 1
self._final_performance = best_op['performance']
# load weights parameters
self.load_model_state_dict(torch.load(self._tmp_model_path))
os.remove(self._tmp_model_path)
_logger.info('----------Compression finished--------------')
_logger.info('config_list generated: %s', self._config_list_generated)
_logger.info("Performance after pruning: %s", self._final_performance)
_logger.info("Masked sparsity: %.6f", current_sparsity)
# save best config found and best performance
with open(os.path.join(self._experiment_data_dir, 'search_result.json'), 'w') as jsonfile:
json.dump({
'performance': self._final_performance,
'config_list': json.dumps(self._config_list_generated)
}, jsonfile)
_logger.info('search history and result saved to foler : %s', self._experiment_data_dir)
return self.bound_model
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import logging
import os
import math
import copy
import csv
import json
import numpy as np
from schema import And, Optional
from nni.utils import OptimizeMode
from ..compressor import Pruner
from ..utils.config_validation import CompressorSchema
from .constants_pruner import PRUNER_DICT
_logger = logging.getLogger(__name__)
class SimulatedAnnealingPruner(Pruner):
"""
This is a Pytorch implementation of Simulated Annealing compression algorithm.
- Randomly initialize a pruning rate distribution (sparsities).
- While current_temperature < stop_temperature:
1. generate a perturbation to current distribution
2. Perform fast evaluation on the perturbated distribution
3. accept the perturbation according to the performance and probability, if not accepted, return to step 1
4. cool down, current_temperature <- current_temperature * cool_down_rate
"""
def __init__(self, model, config_list, evaluator, optimize_mode='maximize', base_algo='l1',
start_temperature=100, stop_temperature=20, cool_down_rate=0.9, perturbation_magnitude=0.35, experiment_data_dir='./'):
"""
Parameters
----------
model : pytorch model
The model to be pruned
config_list : list
Supported keys:
- sparsity : The target overall sparsity.
- op_types : The operation type to prune.
evaluator : function
function to evaluate the pruned model.
This function should include `model` as the only parameter, and returns a scalar value.
Example::
>>> def evaluator(model):
>>> device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
>>> val_loader = ...
>>> model.eval()
>>> correct = 0
>>> with torch.no_grad():
>>> for data, target in val_loader:
>>> data, target = data.to(device), target.to(device)
>>> output = model(data)
>>> # get the index of the max log-probability
>>> pred = output.argmax(dim=1, keepdim=True)
>>> correct += pred.eq(target.view_as(pred)).sum().item()
>>> accuracy = correct / len(val_loader.dataset)
>>> return accuracy
optimize_mode : str
optimize mode, `maximize` or `minimize`, by default `maximize`.
base_algo : str
Base pruning algorithm. `level`, `l1` or `l2`, by default `l1`. Given the sparsity distribution among the ops,
the assigned `base_algo` is used to decide which filters/channels/weights to prune.
start_temperature : float
Simualated Annealing related parameter
stop_temperature : float
Simualated Annealing related parameter
cool_down_rate : float
Simualated Annealing related parameter
perturbation_magnitude : float
initial perturbation magnitude to the sparsities. The magnitude decreases with current temperature
experiment_data_dir : string
PATH to save experiment data,
including the config_list generated for the base pruning algorithm, the performance of the pruned model and the pruning history.
"""
# original model
self._model_to_prune = copy.deepcopy(model)
self._base_algo = base_algo
super().__init__(model, config_list)
self._evaluator = evaluator
self._optimize_mode = OptimizeMode(optimize_mode)
# hyper parameters for SA algorithm
self._start_temperature = start_temperature
self._current_temperature = start_temperature
self._stop_temperature = stop_temperature
self._cool_down_rate = cool_down_rate
self._perturbation_magnitude = perturbation_magnitude
# overall pruning rate
self._sparsity = config_list[0]['sparsity']
# pruning rates of the layers
self._sparsities = None
# init current performance & best performance
self._current_performance = -np.inf
self._best_performance = -np.inf
self._best_config_list = []
self._search_history = []
self._experiment_data_dir = experiment_data_dir
if not os.path.exists(self._experiment_data_dir):
os.makedirs(self._experiment_data_dir)
def validate_config(self, model, config_list):
"""
Parameters
----------
model : torch.nn.module
Model to be pruned
config_list : list
List on pruning configs
"""
if self._base_algo == 'level':
schema = CompressorSchema([{
'sparsity': And(float, lambda n: 0 < n < 1),
Optional('op_types'): [str],
Optional('op_names'): [str],
}], model, _logger)
elif self._base_algo in ['l1', 'l2']:
schema = CompressorSchema([{
'sparsity': And(float, lambda n: 0 < n < 1),
'op_types': ['Conv2d'],
Optional('op_names'): [str]
}], model, _logger)
schema.validate(config_list)
def _sparsities_2_config_list(self, sparsities):
'''
convert sparsities vector into config_list for LevelPruner or L1FilterPruner
Parameters
----------
sparsities : list
list of sparsities
Returns
-------
list of dict
config_list for LevelPruner or L1FilterPruner
'''
config_list = []
sparsities = sorted(sparsities)
self.modules_wrapper = sorted(
self.modules_wrapper, key=lambda wrapper: wrapper.module.weight.data.numel())
# a layer with more weights will have no less pruning rate
for idx, wrapper in enumerate(self.get_modules_wrapper()):
# L1Filter Pruner requires to specify op_types
if self._base_algo in ['l1', 'l2']:
config_list.append(
{'sparsity': sparsities[idx], 'op_types': ['Conv2d'], 'op_names': [wrapper.name]})
elif self._base_algo == 'level':
config_list.append(
{'sparsity': sparsities[idx], 'op_names': [wrapper.name]})
config_list = [val for val in config_list if not math.isclose(val['sparsity'], 0, abs_tol=1e-6)]
return config_list
def _rescale_sparsities(self, sparsities, target_sparsity):
'''
Rescale the sparsities list to satisfy the target overall sparsity
Parameters
----------
sparsities : list
target_sparsity : float
the target overall sparsity
Returns
-------
list
the rescaled sparsities
'''
num_weights = []
for wrapper in self.get_modules_wrapper():
num_weights.append(wrapper.module.weight.data.numel())
num_weights = sorted(num_weights)
sparsities = sorted(sparsities)
total_weights = 0
total_weights_pruned = 0
# calculate the scale
for idx, num_weight in enumerate(num_weights):
total_weights += num_weight
total_weights_pruned += int(num_weight*sparsities[idx])
if total_weights_pruned == 0:
return None
scale = target_sparsity / (total_weights_pruned/total_weights)
# rescale the sparsities
sparsities = np.asarray(sparsities)*scale
return sparsities
def _init_sparsities(self):
'''
Generate a sorted sparsities vector
'''
# repeatedly generate a distribution until satisfies the overall sparsity requirement
_logger.info('Gererating sparsities...')
while True:
sparsities = sorted(np.random.uniform(
0, 1, len(self.get_modules_wrapper())))
sparsities = self._rescale_sparsities(
sparsities, target_sparsity=self._sparsity)
if sparsities is not None and sparsities[0] >= 0 and sparsities[-1] < 1:
_logger.info('Initial sparsities generated : %s', sparsities)
self._sparsities = sparsities
break
def _generate_perturbations(self):
'''
Generate perturbation to the current sparsities distribution.
Returns:
--------
list
perturbated sparsities
'''
_logger.info("Gererating perturbations to the current sparsities...")
# decrease magnitude with current temperature
magnitude = self._current_temperature / \
self._start_temperature * self._perturbation_magnitude
_logger.info('current perturation magnitude:%s', magnitude)
while True:
perturbation = np.random.uniform(-magnitude,
magnitude, len(self.get_modules_wrapper()))
sparsities = np.clip(0, self._sparsities + perturbation, None)
_logger.debug("sparsities before rescalling:%s", sparsities)
sparsities = self._rescale_sparsities(
sparsities, target_sparsity=self._sparsity)
_logger.debug("sparsities after rescalling:%s", sparsities)
if sparsities is not None and sparsities[0] >= 0 and sparsities[-1] < 1:
_logger.info("Sparsities perturbated:%s", sparsities)
return sparsities
def calc_mask(self, wrapper, **kwargs):
return None
def compress(self, return_config_list=False):
"""
Compress the model with Simulated Annealing.
Returns
-------
torch.nn.Module
model with specified modules compressed.
"""
_logger.info('Starting Simulated Annealing Compression...')
# initiaze a randomized action
pruning_iteration = 0
self._init_sparsities()
# stop condition
self._current_temperature = self._start_temperature
while self._current_temperature > self._stop_temperature:
_logger.info('Pruning iteration: %d', pruning_iteration)
_logger.info('Current temperature: %d, Stop temperature: %d',
self._current_temperature, self._stop_temperature)
while True:
# generate perturbation
sparsities_perturbated = self._generate_perturbations()
config_list = self._sparsities_2_config_list(
sparsities_perturbated)
_logger.info(
"config_list for Pruner generated: %s", config_list)
# fast evaluation
pruner = PRUNER_DICT[self._base_algo](copy.deepcopy(self._model_to_prune), config_list)
model_masked = pruner.compress()
evaluation_result = self._evaluator(model_masked)
self._search_history.append(
{'sparsity': self._sparsity, 'performance': evaluation_result, 'config_list': config_list})
if self._optimize_mode is OptimizeMode.Minimize:
evaluation_result *= -1
# if better evaluation result, then accept the perturbation
if evaluation_result > self._current_performance:
self._current_performance = evaluation_result
self._sparsities = sparsities_perturbated
# save best performance and best params
if evaluation_result > self._best_performance:
_logger.info('updating best model...')
self._best_performance = evaluation_result
self._best_config_list = config_list
# save the overall best masked model
self.bound_model = model_masked
break
# if not, accept with probability e^(-deltaE/current_temperature)
else:
delta_E = np.abs(evaluation_result -
self._current_performance)
probability = math.exp(-1 * delta_E /
self._current_temperature)
if np.random.uniform(0, 1) < probability:
self._current_performance = evaluation_result
self._sparsities = sparsities_perturbated
break
# cool down
self._current_temperature *= self._cool_down_rate
pruning_iteration += 1
_logger.info('----------Compression finished--------------')
_logger.info('Best performance: %s', self._best_performance)
_logger.info('config_list found : %s',
self._best_config_list)
# save search history
with open(os.path.join(self._experiment_data_dir, 'search_history.csv'), 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=['sparsity', 'performance', 'config_list'])
writer.writeheader()
for item in self._search_history:
writer.writerow({'sparsity': item['sparsity'], 'performance': item['performance'], 'config_list': json.dumps(
item['config_list'])})
# save best config found and best performance
if self._optimize_mode is OptimizeMode.Minimize:
self._best_performance *= -1
with open(os.path.join(self._experiment_data_dir, 'search_result.json'), 'w+') as jsonfile:
json.dump({
'performance': self._best_performance,
'config_list': json.dumps(self._best_config_list)
}, jsonfile)
_logger.info('search history and result saved to foler : %s',
self._experiment_data_dir)
if return_config_list:
return self._best_config_list
return self.bound_model
def get_total_num_weights(model, op_types=['default']):
'''
calculate the total number of weights
Returns
-------
int
total weights of all the op considered
'''
num_weights = 0
for _, module in model.named_modules():
if module == model:
continue
if 'default' in op_types or type(module).__name__ in op_types:
num_weights += module.weight.data.numel()
return num_weights
\ No newline at end of file
...@@ -9,7 +9,7 @@ import math ...@@ -9,7 +9,7 @@ import math
from unittest import TestCase, main from unittest import TestCase, main
from nni.compression.torch import LevelPruner, SlimPruner, FPGMPruner, L1FilterPruner, \ from nni.compression.torch import LevelPruner, SlimPruner, FPGMPruner, L1FilterPruner, \
L2FilterPruner, AGP_Pruner, ActivationMeanRankFilterPruner, ActivationAPoZRankFilterPruner, \ L2FilterPruner, AGP_Pruner, ActivationMeanRankFilterPruner, ActivationAPoZRankFilterPruner, \
TaylorFOWeightFilterPruner TaylorFOWeightFilterPruner, NetAdaptPruner, SimulatedAnnealingPruner, ADMMPruner, AutoCompressPruner
def validate_sparsity(wrapper, sparsity, bias=False): def validate_sparsity(wrapper, sparsity, bias=False):
masks = [wrapper.weight_mask] masks = [wrapper.weight_mask]
...@@ -113,6 +113,47 @@ prune_config = { ...@@ -113,6 +113,47 @@ prune_config = {
'validators': [ 'validators': [
lambda model: validate_sparsity(model.conv1, 0.5, model.bias) lambda model: validate_sparsity(model.conv1, 0.5, model.bias)
] ]
},
'netadapt': {
'pruner_class': NetAdaptPruner,
'config_list': [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}],
'short_term_fine_tuner': lambda model:model,
'evaluator':lambda model: 0.9,
'validators': []
},
'simulatedannealing': {
'pruner_class': SimulatedAnnealingPruner,
'config_list': [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}],
'evaluator':lambda model: 0.9,
'validators': []
},
'admm': {
'pruner_class': ADMMPruner,
'config_list': [{
'sparsity': 0.5,
'op_types': ['Conv2d'],
}],
'trainer': lambda model, optimizer, criterion, epoch, callback : model,
'validators': [
lambda model: validate_sparsity(model.conv1, 0.5, model.bias)
]
},
'autocompress': {
'pruner_class': AutoCompressPruner,
'config_list': [{
'sparsity': 0.5,
'op_types': ['Conv2d'],
}],
'trainer': lambda model, optimizer, criterion, epoch, callback : model,
'evaluator': lambda model: 0.9,
'dummy_input': torch.randn([64, 1, 28, 28]),
'validators': []
} }
} }
...@@ -127,25 +168,36 @@ class Model(nn.Module): ...@@ -127,25 +168,36 @@ class Model(nn.Module):
def forward(self, x): def forward(self, x):
return self.fc(self.pool(self.bn1(self.conv1(x))).view(x.size(0), -1)) return self.fc(self.pool(self.bn1(self.conv1(x))).view(x.size(0), -1))
def pruners_test(pruner_names=['agp', 'level', 'slim', 'fpgm', 'l1', 'l2', 'taylorfo', 'mean_activation', 'apoz'], bias=True): def pruners_test(pruner_names=['level', 'agp', 'slim', 'fpgm', 'l1', 'l2', 'taylorfo', 'mean_activation', 'apoz', 'netadapt', 'simulatedannealing', 'admm', 'autocompress'], bias=True):
for pruner_name in pruner_names: for pruner_name in pruner_names:
model = Model(bias=bias) print('testing {}...'.format(pruner_name))
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = Model(bias=bias).to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
config_list = prune_config[pruner_name]['config_list'] config_list = prune_config[pruner_name]['config_list']
x = torch.randn(2, 1, 28, 28) x = torch.randn(2, 1, 28, 28).to(device)
y = torch.tensor([0, 1]).long() y = torch.tensor([0, 1]).long().to(device)
out = model(x) out = model(x)
loss = F.cross_entropy(out, y) loss = F.cross_entropy(out, y)
optimizer.zero_grad() optimizer.zero_grad()
loss.backward() loss.backward()
optimizer.step() optimizer.step()
pruner = prune_config[pruner_name]['pruner_class'](model, config_list, optimizer) if pruner_name == 'netadapt':
pruner = prune_config[pruner_name]['pruner_class'](model, config_list, short_term_fine_tuner=prune_config[pruner_name]['short_term_fine_tuner'], evaluator=prune_config[pruner_name]['evaluator'])
elif pruner_name == 'simulatedannealing':
pruner = prune_config[pruner_name]['pruner_class'](model, config_list, evaluator=prune_config[pruner_name]['evaluator'])
elif pruner_name == 'admm':
pruner = prune_config[pruner_name]['pruner_class'](model, config_list, trainer=prune_config[pruner_name]['trainer'])
elif pruner_name == 'autocompress':
pruner = prune_config[pruner_name]['pruner_class'](model, config_list, trainer=prune_config[pruner_name]['trainer'], evaluator=prune_config[pruner_name]['evaluator'], dummy_input=x)
else:
pruner = prune_config[pruner_name]['pruner_class'](model, config_list, optimizer)
pruner.compress() pruner.compress()
x = torch.randn(2, 1, 28, 28) x = torch.randn(2, 1, 28, 28).to(device)
y = torch.tensor([0, 1]).long() y = torch.tensor([0, 1]).long().to(device)
out = model(x) out = model(x)
loss = F.cross_entropy(out, y) loss = F.cross_entropy(out, y)
optimizer.zero_grad() optimizer.zero_grad()
...@@ -157,14 +209,16 @@ def pruners_test(pruner_names=['agp', 'level', 'slim', 'fpgm', 'l1', 'l2', 'tayl ...@@ -157,14 +209,16 @@ def pruners_test(pruner_names=['agp', 'level', 'slim', 'fpgm', 'l1', 'l2', 'tayl
# when iteration >= statistics_batch_num (default 1) # when iteration >= statistics_batch_num (default 1)
optimizer.step() optimizer.step()
pruner.export_model('./model_tmp.pth', './mask_tmp.pth', './onnx_tmp.pth', input_shape=(2,1,28,28)) pruner.export_model('./model_tmp.pth', './mask_tmp.pth', './onnx_tmp.pth', input_shape=(2,1,28,28), device=device)
for v in prune_config[pruner_name]['validators']: for v in prune_config[pruner_name]['validators']:
v(model) v(model)
os.remove('./model_tmp.pth')
os.remove('./mask_tmp.pth') filePaths = ['./model_tmp.pth', './mask_tmp.pth', './onnx_tmp.pth', './search_history.csv', './search_result.json']
os.remove('./onnx_tmp.pth') for f in filePaths:
if os.path.exists(f):
os.remove(f)
def test_agp(pruning_algorithm): def test_agp(pruning_algorithm):
model = Model() model = Model()
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment