Unverified Commit b9a7a95d authored by SparkSnail's avatar SparkSnail Committed by GitHub
Browse files

Merge pull request #223 from microsoft/master

merge master
parents f9ee589c 0c7f22fb
ActivationRankFilterPruner on NNI Compressor
===
## 1. Introduction
ActivationRankFilterPruner is a series of pruners which prune filters according to some importance criterion calculated from the filters' output activations.
| Pruner | Importance criterion | Reference paper |
| :----------------------------: | :-------------------------------: | :----------------------------------------------------------: |
| ActivationAPoZRankFilterPruner | APoZ(average percentage of zeros) | [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250) |
| ActivationMeanRankFilterPruner | mean value of output activations | [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440) |
## 2. Pruners
### ActivationAPoZRankFilterPruner
Hengyuan Hu, Rui Peng, Yu-Wing Tai and Chi-Keung Tang,
"[Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250)", ICLR 2016.
ActivationAPoZRankFilterPruner prunes the filters with the smallest APoZ(average percentage of zeros) of output activations.
The APoZ is defined as:
![](../../img/apoz.png)
### ActivationMeanRankFilterPruner
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila and Jan Kautz,
"[Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440)", ICLR 2017.
ActivationMeanRankFilterPruner prunes the filters with the smallest mean value of output activations
## 3. Usage
PyTorch code
```python
from nni.compression.torch import ActivationAPoZRankFilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'], 'op_names': ['conv1', 'conv2'] }]
pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1)
pruner.compress()
```
#### User configuration for ActivationAPoZRankFilterPruner
- **sparsity:** This is to specify the sparsity operations to be compressed to
- **op_types:** Only Conv2d is supported in ActivationAPoZRankFilterPruner
## 4. Experiment
TODO.
......@@ -14,10 +14,14 @@ We have provided several compression algorithms, including several pruning and q
|---|---|
| [Level Pruner](./Pruner.md#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
| [AGP Pruner](./Pruner.md#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
| [L1Filter Pruner](./Pruner.md#l1filter-pruner) | Pruning least important filters in convolution layers(PRUNING FILTERS FOR EFFICIENT CONVNETS)[Reference Paper](https://arxiv.org/abs/1608.08710) |
| [Slim Pruner](./Pruner.md#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming)[Reference Paper](https://arxiv.org/abs/1708.06519) |
| [Lottery Ticket Pruner](./Pruner.md#agp-pruner) | The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. [Reference Paper](https://arxiv.org/abs/1803.03635)|
| [FPGM Pruner](./Pruner.md#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
| [L1Filter Pruner](./Pruner.md#l1filter-pruner) | Pruning filters with the smallest L1 norm of weights in convolution layers(PRUNING FILTERS FOR EFFICIENT CONVNETS)[Reference Paper](https://arxiv.org/abs/1608.08710) |
| [L2Filter Pruner](./Pruner.md#l2filter-pruner) | Pruning filters with the smallest L2 norm of weights in convolution layers |
| [ActivationAPoZRankFilterPruner](./Pruner.md#ActivationAPoZRankFilterPruner) | Pruning filters prunes the filters with the smallest APoZ(average percentage of zeros) of output activations(Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures)[Reference Paper](https://arxiv.org/abs/1607.03250) |
| [ActivationMeanRankFilterPruner](./Pruner.md#ActivationMeanRankFilterPruner) | Pruning filters prunes the filters with the smallest mean value of output activations(Pruning Convolutional Neural Networks for Resource Efficient Inference)[Reference Paper](https://arxiv.org/abs/1611.06440) |
| [Slim Pruner](./Pruner.md#slim-pruner) | Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming)[Reference Paper](https://arxiv.org/abs/1708.06519) |
**Quantization**
......
......@@ -10,7 +10,7 @@ We first sort the weights in the specified layer by their absolute values. And t
### Usage
Tensorflow code
```
```python
from nni.compression.tensorflow import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
pruner = LevelPruner(model_graph, config_list)
......@@ -18,7 +18,7 @@ pruner.compress()
```
PyTorch code
```
```python
from nni.compression.torch import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
pruner = LevelPruner(model, config_list)
......@@ -40,8 +40,6 @@ This is an iterative pruner, In [To prune, or not to prune: exploring the effica
### Usage
You can prune all weight from 0% to 80% sparsity in 10 epoch with the code below.
First, you should import pruner and add mask to model.
Tensorflow code
```python
from nni.compression.tensorflow import AGP_Pruner
......@@ -71,7 +69,7 @@ pruner = AGP_Pruner(model, config_list)
pruner.compress()
```
Second, you should add code below to update epoch number when you finish one epoch in your training code.
you should add code below to update epoch number when you finish one epoch in your training code.
Tensorflow code
```python
......@@ -133,13 +131,16 @@ The above configuration means that there are 5 times of iterative pruning. As th
* **sparsity:** The final sparsity when the compression is done.
***
## FPGM Pruner
## WeightRankFilterPruner
WeightRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the weights in convolution layers to achieve a preset level of network sparsity
### 1, FPGM Pruner
This is an one-shot pruner, FPGM Pruner is an implementation of paper [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf)
>Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
### Usage
First, you should import pruner and add mask to model.
#### Usage
Tensorflow code
```python
......@@ -163,7 +164,7 @@ pruner.compress()
```
Note: FPGM Pruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.
Second, you should add code below to update epoch number at beginning of each epoch.
you should add code below to update epoch number at beginning of each epoch.
Tensorflow code
```python
......@@ -180,7 +181,7 @@ You can view example for more information
***
## L1Filter Pruner
### 2, L1Filter Pruner
This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.
......@@ -193,12 +194,16 @@ This is an one-shot pruner, In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https:
> 1. For each filter ![](http://latex.codecogs.com/gif.latex?F_{i,j}), calculate the sum of its absolute kernel weights![](http://latex.codecogs.com/gif.latex?s_j=\sum_{l=1}^{n_i}\sum|K_l|)
> 2. Sort the filters by ![](http://latex.codecogs.com/gif.latex?s_j).
> 3. Prune ![](http://latex.codecogs.com/gif.latex?m) filters with the smallest sum values and their corresponding feature maps. The
> kernels in the next convolutional layer corresponding to the pruned feature maps are also
> removed.
> kernels in the next convolutional layer corresponding to the pruned feature maps are also
> removed.
> 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
> weights are copied to the new model.
> weights are copied to the new model.
```
#### Usage
PyTorch code
```python
from nni.compression.torch import L1FilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L1FilterPruner(model, config_list)
......@@ -208,7 +213,90 @@ pruner.compress()
#### User configuration for L1Filter Pruner
- **sparsity:** This is to specify the sparsity operations to be compressed to
- **op_types:** Only Conv2d is supported in L1Filter Pruner
- **op_types:** Only Conv1d and Conv2d is supported in L1Filter Pruner
***
### 3, L2Filter Pruner
This is a structured pruning algorithm that prunes the filters with the smallest L2 norm of the weights.
#### Usage
PyTorch code
```python
from nni.compression.torch import L2FilterPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L2FilterPruner(model, config_list)
pruner.compress()
```
#### User configuration for L2Filter Pruner
- **sparsity:** This is to specify the sparsity operations to be compressed to
- **op_types:** Only Conv1d and Conv2d is supported in L2Filter Pruner
## ActivationRankFilterPruner
ActivationRankFilterPruner is a series of pruners which prune the filters with the smallest importance criterion calculated from the output activations of convolution layers to achieve a preset level of network sparsity
### 1, ActivationAPoZRankFilterPruner
This is an one-shot pruner, ActivationAPoZRankFilterPruner is an implementation of paper [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250)
#### Usage
PyTorch code
```python
from nni.compression.torch import ActivationAPoZRankFilterPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = ActivationAPoZRankFilterPruner(model, config_list, statistics_batch_num=1)
pruner.compress()
```
Note: ActivationAPoZRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.
You can view example for more information
#### User configuration for ActivationAPoZRankFilterPruner
- **sparsity:** How much percentage of convolutional filters are to be pruned.
- **op_types:** Only Conv2d is supported in ActivationAPoZRankFilterPruner
***
### 2, ActivationMeanRankFilterPruner
This is an one-shot pruner, ActivationMeanRankFilterPruner is an implementation of paper [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://arxiv.org/abs/1611.06440)
#### Usage
PyTorch code
```python
from nni.compression.torch import ActivationMeanRankFilterPruner
config_list = [{
'sparsity': 0.5,
'op_types': ['Conv2d']
}]
pruner = ActivationMeanRankFilterPruner(model, config_list)
pruner.compress()
```
Note: ActivationMeanRankFilterPruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.
You can view example for more information
#### User configuration for ActivationMeanRankFilterPruner
- **sparsity:** How much percentage of convolutional filters are to be pruned.
- **op_types:** Only Conv2d is supported in ActivationMeanRankFilterPruner
***
## Slim Pruner
......@@ -222,7 +310,7 @@ This is an one-shot pruner, In ['Learning Efficient Convolutional Networks throu
PyTorch code
```
```python
from nni.compression.torch import SlimPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
pruner = SlimPruner(model, config_list)
......
Quantizer on NNI Compressor
===
## Naive Quantizer
We provide Naive Quantizer to quantizer weight to default 8 bits, you can use it to test quantize algorithm without any configure.
......@@ -53,11 +52,24 @@ You can view example for more information
#### User configuration for QAT Quantizer
* **quant_types:** : list of string
type of quantization you want to apply, currently support 'weight', 'input', 'output'
type of quantization you want to apply, currently support 'weight', 'input', 'output'.
* **op_types:** list of string
specify the type of modules that will be quantized. eg. 'Conv2D'
* **op_names:** list of string
specify the name of modules that will be quantized. eg. 'conv1'
* **quant_bits:** int or dict of {str : int}
bits length of quantization, key is the quantization type, value is the length, eg. {'weight', 8},
when the type is int, all quantization types share same bits length
bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8},
when the type is int, all quantization types share same bits length.
* **quant_start_step:** int
disable quantization until model are run by certain number of steps, this allows the network to enter a more stable
state where activation quantization ranges do not exclude a significant fraction of values, default value is 0
......@@ -71,17 +83,14 @@ In [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bit
### Usage
To implement DoReFa Quantizer, you can add code below before your training code
Tensorflow code
```python
from nni.compressors.tensorflow import DoReFaQuantizer
config_list = [{ 'q_bits': 8, 'op_types': 'default' }]
quantizer = DoReFaQuantizer(tf.get_default_graph(), config_list)
quantizer.compress()
```
PyTorch code
```python
from nni.compressors.torch import DoReFaQuantizer
config_list = [{ 'q_bits': 8, 'op_types': 'default' }]
config_list = [{
'quant_types': ['weight'],
'quant_bits': 8,
'op_types': 'default'
}]
quantizer = DoReFaQuantizer(model, config_list)
quantizer.compress()
```
......@@ -89,4 +98,79 @@ quantizer.compress()
You can view example for more information
#### User configuration for DoReFa Quantizer
* **q_bits:** This is to specify the q_bits operations to be quantized to
* **quant_types:** : list of string
type of quantization you want to apply, currently support 'weight', 'input', 'output'.
* **op_types:** list of string
specify the type of modules that will be quantized. eg. 'Conv2D'
* **op_names:** list of string
specify the name of modules that will be quantized. eg. 'conv1'
* **quant_bits:** int or dict of {str : int}
bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8},
when the type is int, all quantization types share same bits length.
## BNN Quantizer
In [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/abs/1602.02830),
>We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. At training-time the binary weights and activations are used for computing the parameters gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency.
### Usage
PyTorch code
```python
from nni.compression.torch import BNNQuantizer
model = VGG_Cifar10(num_classes=10)
configure_list = [{
'quant_types': ['weight'],
'quant_bits': 1,
'op_types': ['Conv2d', 'Linear'],
'op_names': ['features.0', 'features.3', 'features.7', 'features.10', 'features.14', 'features.17', 'classifier.0', 'classifier.3']
}, {
'quant_types': ['output'],
'quant_bits': 1,
'op_types': ['Hardtanh'],
'op_names': ['features.6', 'features.9', 'features.13', 'features.16', 'features.20', 'classifier.2', 'classifier.5']
}]
quantizer = BNNQuantizer(model, configure_list)
model = quantizer.compress()
```
You can view example [examples/model_compress/BNN_quantizer_cifar10.py]( https://github.com/microsoft/nni/tree/master/examples/model_compress/BNN_quantizer_cifar10.py) for more information.
#### User configuration for BNN Quantizer
* **quant_types:** : list of string
type of quantization you want to apply, currently support 'weight', 'input', 'output'.
* **op_types:** list of string
specify the type of modules that will be quantized. eg. 'Conv2D'
* **op_names:** list of string
specify the name of modules that will be quantized. eg. 'conv1'
* **quant_bits:** int or dict of {str : int}
bits length of quantization, key is the quantization type, value is the length, eg. {'weight': 8},
when the type is int, all quantization types share same bits length.
### Experiment
We implemented one of the experiments in [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/abs/1602.02830), we quantized the **VGGNet** for CIFAR-10 in the paper. Our experiments results are as follows:
| Model | Accuracy |
| ------------- | --------- |
| VGGNet | 86.93% |
The experiments code can be found at [examples/model_compress/BNN_quantizer_cifar10.py]( https://github.com/microsoft/nni/tree/master/examples/model_compress/BNN_quantizer_cifar10.py)
\ No newline at end of file
L1FilterPruner on NNI Compressor
WeightRankFilterPruner on NNI Compressor
===
## 1. Introduction
WeightRankFilterPruner is a series of pruners which prune filters according to some importance criterion calculated from the filters' weight.
| Pruner | Importance criterion | Reference paper |
| :------------: | :-------------------------: | :----------------------------------------------------------: |
| L1FilterPruner | L1 norm of weights | [PRUNING FILTERS FOR EFFICIENT CONVNETS](https://arxiv.org/abs/1608.08710) |
| L2FilterPruner | L2 norm of weights | |
| FPGMPruner | Geometric Median of weights | [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf) |
## 2. Pruners
### L1FilterPruner
L1FilterPruner is a general structured pruning algorithm for pruning filters in the convolutional layers.
In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.
......@@ -16,12 +28,26 @@ In ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710),
> 1. For each filter ![](http://latex.codecogs.com/gif.latex?F_{i,j}), calculate the sum of its absolute kernel weights![](http://latex.codecogs.com/gif.latex?s_j=\sum_{l=1}^{n_i}\sum|K_l|)
> 2. Sort the filters by ![](http://latex.codecogs.com/gif.latex?s_j).
> 3. Prune ![](http://latex.codecogs.com/gif.latex?m) filters with the smallest sum values and their corresponding feature maps. The
> kernels in the next convolutional layer corresponding to the pruned feature maps are also
> removed.
> kernels in the next convolutional layer corresponding to the pruned feature maps are also
> removed.
> 4. A new kernel matrix is created for both the ![](http://latex.codecogs.com/gif.latex?i)th and ![](http://latex.codecogs.com/gif.latex?i+1)th layers, and the remaining kernel
> weights are copied to the new model.
> weights are copied to the new model.
### L2FilterPruner
L2FilterPruner is similar to L1FilterPruner, but only replace the importance criterion from L1 norm to L2 norm
### FPGMPruner
Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, Yi Yang
"[Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/abs/1811.00250)", CVPR 2019.
FPGMPruner prune filters with the smallest geometric median
![](../../img/fpgm_fig1.png)
## 2. Usage
## 3. Usage
PyTorch code
......@@ -37,9 +63,9 @@ pruner.compress()
- **sparsity:** This is to specify the sparsity operations to be compressed to
- **op_types:** Only Conv2d is supported in L1Filter Pruner
## 3. Experiment
## 4. Experiment
We implemented one of the experiments in ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710), we pruned **VGG-16** for CIFAR-10 to **VGG-16-pruned-A** in the paper, in which $64\%$ parameters are pruned. Our experiments results are as follows:
We implemented one of the experiments in ['PRUNING FILTERS FOR EFFICIENT CONVNETS'](https://arxiv.org/abs/1608.08710) with **L1FilterPruner**, we pruned **VGG-16** for CIFAR-10 to **VGG-16-pruned-A** in the paper, in which $64\%$ parameters are pruned. Our experiments results are as follows:
| Model | Error(paper/ours) | Parameters | Pruned |
| --------------- | ----------------- | --------------- | -------- |
......
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from nni.compression.torch import ActivationAPoZRankFilterPruner
from models.cifar10.vgg import VGG
def train(model, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.cross_entropy(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
acc = 100 * correct / len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, acc))
return acc
def main():
torch.manual_seed(0)
device = torch.device('cuda')
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data.cifar10', train=True, download=True,
transform=transforms.Compose([
transforms.Pad(4),
transforms.RandomCrop(32),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=200, shuffle=False)
model = VGG(depth=16)
model.to(device)
# Train the base VGG-16 model
print('=' * 10 + 'Train the unpruned base model' + '=' * 10)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 160, 0)
for epoch in range(160):
train(model, device, train_loader, optimizer)
test(model, device, test_loader)
lr_scheduler.step(epoch)
torch.save(model.state_dict(), 'vgg16_cifar10.pth')
# Test base model accuracy
print('=' * 10 + 'Test on the original model' + '=' * 10)
model.load_state_dict(torch.load('vgg16_cifar10.pth'))
test(model, device, test_loader)
# top1 = 93.51%
# Pruning Configuration, in paper 'PRUNING FILTERS FOR EFFICIENT CONVNETS',
# Conv_1, Conv_8, Conv_9, Conv_10, Conv_11, Conv_12 are pruned with 50% sparsity, as 'VGG-16-pruned-A'
configure_list = [{
'sparsity': 0.5,
'op_types': ['default'],
'op_names': ['feature.0', 'feature.24', 'feature.27', 'feature.30', 'feature.34', 'feature.37']
}]
# Prune model and test accuracy without fine tuning.
print('=' * 10 + 'Test on the pruned model before fine tune' + '=' * 10)
pruner = ActivationAPoZRankFilterPruner(model, configure_list)
model = pruner.compress()
test(model, device, test_loader)
# top1 = 88.19%
# Fine tune the pruned model for 40 epochs and test accuracy
print('=' * 10 + 'Fine tuning' + '=' * 10)
optimizer_finetune = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
best_top1 = 0
for epoch in range(40):
pruner.update_epoch(epoch)
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer_finetune)
top1 = test(model, device, test_loader)
if top1 > best_top1:
best_top1 = top1
# Export the best model, 'model_path' stores state_dict of the pruned model,
# mask_path stores mask_dict of the pruned model
pruner.export_model(model_path='pruned_vgg16_cifar10.pth', mask_path='mask_vgg16_cifar10.pth')
# Test the exported model
print('=' * 10 + 'Test on the pruned model after fine tune' + '=' * 10)
new_model = VGG(depth=16)
new_model.to(device)
new_model.load_state_dict(torch.load('pruned_vgg16_cifar10.pth'))
test(new_model, device, test_loader)
# top1 = 93.53%
if __name__ == '__main__':
main()
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from nni.compression.torch import BNNQuantizer
class VGG_Cifar10(nn.Module):
def __init__(self, num_classes=1000):
super(VGG_Cifar10, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 128, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(128, eps=1e-4, momentum=0.1),
nn.Hardtanh(inplace=True),
nn.Conv2d(128, 128, kernel_size=3, padding=1, bias=False),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.BatchNorm2d(128, eps=1e-4, momentum=0.1),
nn.Hardtanh(inplace=True),
nn.Conv2d(128, 256, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(256, eps=1e-4, momentum=0.1),
nn.Hardtanh(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1, bias=False),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.BatchNorm2d(256, eps=1e-4, momentum=0.1),
nn.Hardtanh(inplace=True),
nn.Conv2d(256, 512, kernel_size=3, padding=1, bias=False),
nn.BatchNorm2d(512, eps=1e-4, momentum=0.1),
nn.Hardtanh(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, padding=1, bias=False),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.BatchNorm2d(512, eps=1e-4, momentum=0.1),
nn.Hardtanh(inplace=True)
)
self.classifier = nn.Sequential(
nn.Linear(512 * 4 * 4, 1024, bias=False),
nn.BatchNorm1d(1024),
nn.Hardtanh(inplace=True),
nn.Linear(1024, 1024, bias=False),
nn.BatchNorm1d(1024),
nn.Hardtanh(inplace=True),
nn.Linear(1024, num_classes), # do not quantize output
nn.BatchNorm1d(num_classes, affine=False)
)
def forward(self, x):
x = self.features(x)
x = x.view(-1, 512 * 4 * 4)
x = self.classifier(x)
return x
def train(model, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.cross_entropy(output, target)
loss.backward()
optimizer.step()
for name, param in model.named_parameters():
if name.endswith('old_weight'):
param = param.clamp(-1, 1)
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
acc = 100 * correct / len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, acc))
return acc
def adjust_learning_rate(optimizer, epoch):
update_list = [55, 100, 150, 200, 400, 600]
if epoch in update_list:
for param_group in optimizer.param_groups:
param_group['lr'] = param_group['lr'] * 0.1
return
def main():
torch.manual_seed(0)
device = torch.device('cuda')
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data.cifar10', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=200, shuffle=False)
model = VGG_Cifar10(num_classes=10)
model.to(device)
configure_list = [{
'quant_types': ['weight'],
'quant_bits': 1,
'op_types': ['Conv2d', 'Linear'],
'op_names': ['features.3', 'features.7', 'features.10', 'features.14', 'classifier.0', 'classifier.3']
}, {
'quant_types': ['output'],
'quant_bits': 1,
'op_types': ['Hardtanh'],
'op_names': ['features.6', 'features.9', 'features.13', 'features.16', 'features.20', 'classifier.2', 'classifier.5']
}]
quantizer = BNNQuantizer(model, configure_list)
model = quantizer.compress()
print('=' * 10 + 'train' + '=' * 10)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)
best_top1 = 0
for epoch in range(400):
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer)
adjust_learning_rate(optimizer, epoch)
top1 = test(model, device, test_loader)
if top1 > best_top1:
best_top1 = top1
print(best_top1)
if __name__ == '__main__':
main()
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from nni.compression.torch import L1FilterPruner
from models.cifar10.vgg import VGG
def train(model, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.cross_entropy(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item()
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
acc = 100 * correct / len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, acc))
return acc
def main():
torch.manual_seed(0)
device = torch.device('cuda')
train_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data.cifar10', train=True, download=True,
transform=transforms.Compose([
transforms.Pad(4),
transforms.RandomCrop(32),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.CIFAR10('./data.cifar10', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
])),
batch_size=200, shuffle=False)
model = VGG(depth=16)
model.to(device)
# Train the base VGG-16 model
print('=' * 10 + 'Train the unpruned base model' + '=' * 10)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=1e-4)
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, 160, 0)
for epoch in range(160):
train(model, device, train_loader, optimizer)
test(model, device, test_loader)
lr_scheduler.step(epoch)
torch.save(model.state_dict(), 'vgg16_cifar10.pth')
# Test base model accuracy
print('=' * 10 + 'Test on the original model' + '=' * 10)
model.load_state_dict(torch.load('vgg16_cifar10.pth'))
test(model, device, test_loader)
# top1 = 93.51%
# Pruning Configuration, in paper 'PRUNING FILTERS FOR EFFICIENT CONVNETS',
# Conv_1, Conv_8, Conv_9, Conv_10, Conv_11, Conv_12 are pruned with 50% sparsity, as 'VGG-16-pruned-A'
configure_list = [{
'sparsity': 0.5,
'op_types': ['default'],
'op_names': ['feature.0', 'feature.24', 'feature.27', 'feature.30', 'feature.34', 'feature.37']
}]
# Prune model and test accuracy without fine tuning.
print('=' * 10 + 'Test on the pruned model before fine tune' + '=' * 10)
pruner = L1FilterPruner(model, configure_list)
model = pruner.compress()
test(model, device, test_loader)
# top1 = 88.19%
# Fine tune the pruned model for 40 epochs and test accuracy
print('=' * 10 + 'Fine tuning' + '=' * 10)
optimizer_finetune = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9, weight_decay=1e-4)
best_top1 = 0
for epoch in range(40):
pruner.update_epoch(epoch)
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer_finetune)
top1 = test(model, device, test_loader)
if top1 > best_top1:
best_top1 = top1
# Export the best model, 'model_path' stores state_dict of the pruned model,
# mask_path stores mask_dict of the pruned model
pruner.export_model(model_path='pruned_vgg16_cifar10.pth', mask_path='mask_vgg16_cifar10.pth')
# Test the exported model
print('=' * 10 + 'Test on the pruned model after fine tune' + '=' * 10)
new_model = VGG(depth=16)
new_model.to(device)
new_model.load_state_dict(torch.load('pruned_vgg16_cifar10.pth'))
test(new_model, device, test_loader)
# top1 = 93.53%
if __name__ == '__main__':
main()
data
checkpoints
runs
nni_auto_gen_search_space.json
# Single Path One-Shot Neural Architecture Search with Uniform Sampling
Single Path One-Shot by Megvii Research. [Paper link](https://arxiv.org/abs/1904.00420). [Official repo](https://github.com/megvii-model/SinglePathOneShot).
Block search only. Channel search is not supported yet.
Only GPU version is provided here.
## Preparation
### Requirements
* PyTorch >= 1.2
* NVIDIA DALI >= 0.16 as we use DALI to accelerate the data loading of ImageNet. [Installation guide](https://docs.nvidia.com/deeplearning/sdk/dali-developer-guide/docs/installation.html)
### Data
Need to download the flops lookup table from [here](https://1drv.ms/u/s!Am_mmG2-KsrnajesvSdfsq_cN48?e=aHVppN).
Put `op_flops_dict.pkl` and `checkpoint-150000.pth.tar` (if you don't want to retrain the supernet) under `data` directory.
Prepare ImageNet in the standard format (follow the script [here](https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4)). Link it to `data/imagenet` will be more convenient.
After preparation, it's expected to have the following code structure:
```
spos
├── architecture_final.json
├── blocks.py
├── config_search.yml
├── data
│   ├── imagenet
│   │   ├── train
│   │   └── val
│   └── op_flops_dict.pkl
├── dataloader.py
├── network.py
├── readme.md
├── scratch.py
├── supernet.py
├── tester.py
├── tuner.py
└── utils.py
```
## Step 1. Train Supernet
```
python supernet.py
```
Will export the checkpoint to checkpoints directory, for the next step.
NOTE: The data loading used in the official repo is [slightly different from usual](https://github.com/megvii-model/SinglePathOneShot/issues/5), as they use BGR tensor and keep the values between 0 and 255 intentionally to align with their own DL framework. The option `--spos-preprocessing` will simulate the behavior used originally and enable you to use the checkpoints pretrained.
## Step 2. Evolution Search
Single Path One-Shot leverages evolution algorithm to search for the best architecture. The tester, which is responsible for testing the sampled architecture, recalculates all the batch norm for a subset of training images, and evaluates the architecture on the full validation set.
To have a search space ready for NNI framework, first run
```
nnictl ss_gen -t "python tester.py"
```
This will generate a file called `nni_auto_gen_search_space.json`, which is a serialized representation of your search space.
Then search with evolution tuner.
```
nnictl create --config config_search.yml
```
The final architecture exported from every epoch of evolution can be found in `checkpoints` under the working directory of your tuner, which, by default, is `$HOME/nni/experiments/your_experiment_id/log`.
## Step 3. Train from Scratch
```
python scratch.py
```
By default, it will use `architecture_final.json`. This architecture is provided by the official repo (converted into NNI format). You can use any architecture (e.g., the architecture found in step 2) with `--fixed-arc` option.
## Current Reproduction Results
Reproduction is still undergoing. Due to the gap between official release and original paper, we compare our current results with official repo (our run) and paper.
* Evolution phase is almost aligned with official repo. Our evolution algorithm shows a converging trend and reaches ~65% accuracy at the end of search. Nevertheless, this result is not on par with paper. For details, please refer to [this issue](https://github.com/megvii-model/SinglePathOneShot/issues/6).
* Retrain phase is not aligned. Our retraining code, which uses the architecture released by the authors, reaches 72.14% accuracy, still having a gap towards 73.61% by official release and 74.3% reported in original paper.
{
"LayerChoice1": [false, false, true, false],
"LayerChoice2": [false, true, false, false],
"LayerChoice3": [true, false, false, false],
"LayerChoice4": [false, true, false, false],
"LayerChoice5": [false, false, true, false],
"LayerChoice6": [true, false, false, false],
"LayerChoice7": [false, false, true, false],
"LayerChoice8": [true, false, false, false],
"LayerChoice9": [false, false, true, false],
"LayerChoice10": [true, false, false, false],
"LayerChoice11": [false, false, true, false],
"LayerChoice12": [false, false, false, true],
"LayerChoice13": [true, false, false, false],
"LayerChoice14": [true, false, false, false],
"LayerChoice15": [true, false, false, false],
"LayerChoice16": [true, false, false, false],
"LayerChoice17": [false, false, false, true],
"LayerChoice18": [false, false, true, false],
"LayerChoice19": [false, false, false, true],
"LayerChoice20": [false, false, false, true]
}
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import torch
import torch.nn as nn
class ShuffleNetBlock(nn.Module):
"""
When stride = 1, the block receives input with 2 * inp channels. Otherwise inp channels.
"""
def __init__(self, inp, oup, mid_channels, ksize, stride, sequence="pdp"):
super().__init__()
assert stride in [1, 2]
assert ksize in [3, 5, 7]
self.channels = inp // 2 if stride == 1 else inp
self.inp = inp
self.oup = oup
self.mid_channels = mid_channels
self.ksize = ksize
self.stride = stride
self.pad = ksize // 2
self.oup_main = oup - self.channels
assert self.oup_main > 0
self.branch_main = nn.Sequential(*self._decode_point_depth_conv(sequence))
if stride == 2:
self.branch_proj = nn.Sequential(
# dw
nn.Conv2d(self.channels, self.channels, ksize, stride, self.pad,
groups=self.channels, bias=False),
nn.BatchNorm2d(self.channels, affine=False),
# pw-linear
nn.Conv2d(self.channels, self.channels, 1, 1, 0, bias=False),
nn.BatchNorm2d(self.channels, affine=False),
nn.ReLU(inplace=True)
)
def forward(self, x):
if self.stride == 2:
x_proj, x = self.branch_proj(x), x
else:
x_proj, x = self._channel_shuffle(x)
return torch.cat((x_proj, self.branch_main(x)), 1)
def _decode_point_depth_conv(self, sequence):
result = []
first_depth = first_point = True
pc = c = self.channels
for i, token in enumerate(sequence):
# compute output channels of this conv
if i + 1 == len(sequence):
assert token == "p", "Last conv must be point-wise conv."
c = self.oup_main
elif token == "p" and first_point:
c = self.mid_channels
if token == "d":
# depth-wise conv
assert pc == c, "Depth-wise conv must not change channels."
result.append(nn.Conv2d(pc, c, self.ksize, self.stride if first_depth else 1, self.pad,
groups=c, bias=False))
result.append(nn.BatchNorm2d(c, affine=False))
first_depth = False
elif token == "p":
# point-wise conv
result.append(nn.Conv2d(pc, c, 1, 1, 0, bias=False))
result.append(nn.BatchNorm2d(c, affine=False))
result.append(nn.ReLU(inplace=True))
first_point = False
else:
raise ValueError("Conv sequence must be d and p.")
pc = c
return result
def _channel_shuffle(self, x):
bs, num_channels, height, width = x.data.size()
assert (num_channels % 4 == 0)
x = x.reshape(bs * num_channels // 2, 2, height * width)
x = x.permute(1, 0, 2)
x = x.reshape(2, -1, num_channels // 2, height, width)
return x[0], x[1]
class ShuffleXceptionBlock(ShuffleNetBlock):
def __init__(self, inp, oup, mid_channels, stride):
super().__init__(inp, oup, mid_channels, 3, stride, "dpdpdp")
authorName: unknown
experimentName: SPOS Search
trialConcurrency: 4
maxExecDuration: 7d
maxTrialNum: 99999
trainingServicePlatform: local
searchSpacePath: nni_auto_gen_search_space.json
useAnnotation: false
tuner:
codeDir: .
classFileName: tuner.py
className: EvolutionWithFlops
trial:
command: python tester.py --imagenet-dir /path/to/your/imagenet --spos-prep
codeDir: .
gpuNum: 1
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import os
import nvidia.dali.ops as ops
import nvidia.dali.types as types
import torch.utils.data
from nvidia.dali.pipeline import Pipeline
from nvidia.dali.plugin.pytorch import DALIClassificationIterator
class HybridTrainPipe(Pipeline):
def __init__(self, batch_size, num_threads, device_id, data_dir, crop, seed=12, local_rank=0, world_size=1,
spos_pre=False):
super(HybridTrainPipe, self).__init__(batch_size, num_threads, device_id, seed=seed + device_id)
color_space_type = types.BGR if spos_pre else types.RGB
self.input = ops.FileReader(file_root=data_dir, shard_id=local_rank, num_shards=world_size, random_shuffle=True)
self.decode = ops.ImageDecoder(device="mixed", output_type=color_space_type)
self.res = ops.RandomResizedCrop(device="gpu", size=crop,
interp_type=types.INTERP_LINEAR if spos_pre else types.INTERP_TRIANGULAR)
self.twist = ops.ColorTwist(device="gpu")
self.jitter_rng = ops.Uniform(range=[0.6, 1.4])
self.cmnp = ops.CropMirrorNormalize(device="gpu",
output_dtype=types.FLOAT,
output_layout=types.NCHW,
image_type=color_space_type,
mean=0. if spos_pre else [0.485 * 255, 0.456 * 255, 0.406 * 255],
std=1. if spos_pre else [0.229 * 255, 0.224 * 255, 0.225 * 255])
self.coin = ops.CoinFlip(probability=0.5)
def define_graph(self):
rng = self.coin()
self.jpegs, self.labels = self.input(name="Reader")
images = self.decode(self.jpegs)
images = self.res(images)
images = self.twist(images, saturation=self.jitter_rng(),
contrast=self.jitter_rng(), brightness=self.jitter_rng())
output = self.cmnp(images, mirror=rng)
return [output, self.labels]
class HybridValPipe(Pipeline):
def __init__(self, batch_size, num_threads, device_id, data_dir, crop, size, seed=12, local_rank=0, world_size=1,
spos_pre=False, shuffle=False):
super(HybridValPipe, self).__init__(batch_size, num_threads, device_id, seed=seed + device_id)
color_space_type = types.BGR if spos_pre else types.RGB
self.input = ops.FileReader(file_root=data_dir, shard_id=local_rank, num_shards=world_size,
random_shuffle=shuffle)
self.decode = ops.ImageDecoder(device="mixed", output_type=color_space_type)
self.res = ops.Resize(device="gpu", resize_shorter=size,
interp_type=types.INTERP_LINEAR if spos_pre else types.INTERP_TRIANGULAR)
self.cmnp = ops.CropMirrorNormalize(device="gpu",
output_dtype=types.FLOAT,
output_layout=types.NCHW,
crop=(crop, crop),
image_type=color_space_type,
mean=0. if spos_pre else [0.485 * 255, 0.456 * 255, 0.406 * 255],
std=1. if spos_pre else [0.229 * 255, 0.224 * 255, 0.225 * 255])
def define_graph(self):
self.jpegs, self.labels = self.input(name="Reader")
images = self.decode(self.jpegs)
images = self.res(images)
output = self.cmnp(images)
return [output, self.labels]
class ClassificationWrapper:
def __init__(self, loader, size):
self.loader = loader
self.size = size
def __iter__(self):
return self
def __next__(self):
data = next(self.loader)
return data[0]["data"], data[0]["label"].view(-1).long().cuda(non_blocking=True)
def __len__(self):
return self.size
def get_imagenet_iter_dali(split, image_dir, batch_size, num_threads, crop=224, val_size=256,
spos_preprocessing=False, seed=12, shuffle=False, device_id=None):
world_size, local_rank = 1, 0
if device_id is None:
device_id = torch.cuda.device_count() - 1 # use last gpu
if split == "train":
pipeline = HybridTrainPipe(batch_size=batch_size, num_threads=num_threads, device_id=device_id,
data_dir=os.path.join(image_dir, "train"), seed=seed,
crop=crop, world_size=world_size, local_rank=local_rank,
spos_pre=spos_preprocessing)
elif split == "val":
pipeline = HybridValPipe(batch_size=batch_size, num_threads=num_threads, device_id=device_id,
data_dir=os.path.join(image_dir, "val"), seed=seed,
crop=crop, size=val_size, world_size=world_size, local_rank=local_rank,
spos_pre=spos_preprocessing, shuffle=shuffle)
else:
raise AssertionError
pipeline.build()
num_samples = pipeline.epoch_size("Reader")
return ClassificationWrapper(
DALIClassificationIterator(pipeline, size=num_samples, fill_last_batch=split == "train",
auto_reset=True), (num_samples + batch_size - 1) // batch_size)
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import os
import pickle
import re
import torch
import torch.nn as nn
from nni.nas.pytorch import mutables
from blocks import ShuffleNetBlock, ShuffleXceptionBlock
class ShuffleNetV2OneShot(nn.Module):
block_keys = [
'shufflenet_3x3',
'shufflenet_5x5',
'shufflenet_7x7',
'xception_3x3',
]
def __init__(self, input_size=224, first_conv_channels=16, last_conv_channels=1024, n_classes=1000,
op_flops_path="./data/op_flops_dict.pkl"):
super().__init__()
assert input_size % 32 == 0
with open(os.path.join(os.path.dirname(__file__), op_flops_path), "rb") as fp:
self._op_flops_dict = pickle.load(fp)
self.stage_blocks = [4, 4, 8, 4]
self.stage_channels = [64, 160, 320, 640]
self._parsed_flops = dict()
self._input_size = input_size
self._feature_map_size = input_size
self._first_conv_channels = first_conv_channels
self._last_conv_channels = last_conv_channels
self._n_classes = n_classes
# building first layer
self.first_conv = nn.Sequential(
nn.Conv2d(3, first_conv_channels, 3, 2, 1, bias=False),
nn.BatchNorm2d(first_conv_channels, affine=False),
nn.ReLU(inplace=True),
)
self._feature_map_size //= 2
p_channels = first_conv_channels
features = []
for num_blocks, channels in zip(self.stage_blocks, self.stage_channels):
features.extend(self._make_blocks(num_blocks, p_channels, channels))
p_channels = channels
self.features = nn.Sequential(*features)
self.conv_last = nn.Sequential(
nn.Conv2d(p_channels, last_conv_channels, 1, 1, 0, bias=False),
nn.BatchNorm2d(last_conv_channels, affine=False),
nn.ReLU(inplace=True),
)
self.globalpool = nn.AvgPool2d(self._feature_map_size)
self.dropout = nn.Dropout(0.1)
self.classifier = nn.Sequential(
nn.Linear(last_conv_channels, n_classes, bias=False),
)
self._initialize_weights()
def _make_blocks(self, blocks, in_channels, channels):
result = []
for i in range(blocks):
stride = 2 if i == 0 else 1
inp = in_channels if i == 0 else channels
oup = channels
base_mid_channels = channels // 2
mid_channels = int(base_mid_channels) # prepare for scale
choice_block = mutables.LayerChoice([
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=3, stride=stride),
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=5, stride=stride),
ShuffleNetBlock(inp, oup, mid_channels=mid_channels, ksize=7, stride=stride),
ShuffleXceptionBlock(inp, oup, mid_channels=mid_channels, stride=stride)
])
result.append(choice_block)
# find the corresponding flops
flop_key = (inp, oup, mid_channels, self._feature_map_size, self._feature_map_size, stride)
self._parsed_flops[choice_block.key] = [
self._op_flops_dict["{}_stride_{}".format(k, stride)][flop_key] for k in self.block_keys
]
if stride == 2:
self._feature_map_size //= 2
return result
def forward(self, x):
bs = x.size(0)
x = self.first_conv(x)
x = self.features(x)
x = self.conv_last(x)
x = self.globalpool(x)
x = self.dropout(x)
x = x.contiguous().view(bs, -1)
x = self.classifier(x)
return x
def get_candidate_flops(self, candidate):
conv1_flops = self._op_flops_dict["conv1"][(3, self._first_conv_channels,
self._input_size, self._input_size, 2)]
# Should use `last_conv_channels` here, but megvii insists that it's `n_classes`. Keeping it.
# https://github.com/megvii-model/SinglePathOneShot/blob/36eed6cf083497ffa9cfe7b8da25bb0b6ba5a452/src/Supernet/flops.py#L313
rest_flops = self._op_flops_dict["rest_operation"][(self.stage_channels[-1], self._n_classes,
self._feature_map_size, self._feature_map_size, 1)]
total_flops = conv1_flops + rest_flops
for k, m in candidate.items():
parsed_flops_dict = self._parsed_flops[k]
if isinstance(m, dict): # to be compatible with classical nas format
total_flops += parsed_flops_dict[m["_idx"]]
else:
total_flops += parsed_flops_dict[torch.max(m, 0)[1]]
return total_flops
def _initialize_weights(self):
for name, m in self.named_modules():
if isinstance(m, nn.Conv2d):
if 'first' in name:
nn.init.normal_(m.weight, 0, 0.01)
else:
nn.init.normal_(m.weight, 0, 1.0 / m.weight.shape[1])
if m.bias is not None:
nn.init.constant_(m.bias, 0)
elif isinstance(m, nn.BatchNorm2d):
if m.weight is not None:
nn.init.constant_(m.weight, 1)
if m.bias is not None:
nn.init.constant_(m.bias, 0.0001)
nn.init.constant_(m.running_mean, 0)
elif isinstance(m, nn.BatchNorm1d):
nn.init.constant_(m.weight, 1)
if m.bias is not None:
nn.init.constant_(m.bias, 0.0001)
nn.init.constant_(m.running_mean, 0)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, 0, 0.01)
if m.bias is not None:
nn.init.constant_(m.bias, 0)
def load_and_parse_state_dict(filepath="./data/checkpoint-150000.pth.tar"):
checkpoint = torch.load(filepath, map_location=torch.device("cpu"))
result = dict()
for k, v in checkpoint["state_dict"].items():
if k.startswith("module."):
k = k[len("module."):]
k = re.sub(r"^(features.\d+).(\d+)", "\\1.choices.\\2", k)
result[k] = v
return result
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import argparse
import logging
import random
import numpy as np
import torch
import torch.nn as nn
from dataloader import get_imagenet_iter_dali
from nni.nas.pytorch.fixed import apply_fixed_architecture
from nni.nas.pytorch.utils import AverageMeterGroup
from torch.utils.tensorboard import SummaryWriter
from network import ShuffleNetV2OneShot
from utils import CrossEntropyLabelSmooth, accuracy
logger = logging.getLogger("nni.spos.scratch")
def train(epoch, model, criterion, optimizer, loader, writer, args):
model.train()
meters = AverageMeterGroup()
cur_lr = optimizer.param_groups[0]["lr"]
for step, (x, y) in enumerate(loader):
cur_step = len(loader) * epoch + step
optimizer.zero_grad()
logits = model(x)
loss = criterion(logits, y)
loss.backward()
optimizer.step()
metrics = accuracy(logits, y)
metrics["loss"] = loss.item()
meters.update(metrics)
writer.add_scalar("lr", cur_lr, global_step=cur_step)
writer.add_scalar("loss/train", loss.item(), global_step=cur_step)
writer.add_scalar("acc1/train", metrics["acc1"], global_step=cur_step)
writer.add_scalar("acc5/train", metrics["acc5"], global_step=cur_step)
if step % args.log_frequency == 0 or step + 1 == len(loader):
logger.info("Epoch [%d/%d] Step [%d/%d] %s", epoch + 1,
args.epochs, step + 1, len(loader), meters)
logger.info("Epoch %d training summary: %s", epoch + 1, meters)
def validate(epoch, model, criterion, loader, writer, args):
model.eval()
meters = AverageMeterGroup()
with torch.no_grad():
for step, (x, y) in enumerate(loader):
logits = model(x)
loss = criterion(logits, y)
metrics = accuracy(logits, y)
metrics["loss"] = loss.item()
meters.update(metrics)
if step % args.log_frequency == 0 or step + 1 == len(loader):
logger.info("Epoch [%d/%d] Validation Step [%d/%d] %s", epoch + 1,
args.epochs, step + 1, len(loader), meters)
writer.add_scalar("loss/test", meters.loss.avg, global_step=epoch)
writer.add_scalar("acc1/test", meters.acc1.avg, global_step=epoch)
writer.add_scalar("acc5/test", meters.acc5.avg, global_step=epoch)
logger.info("Epoch %d validation: top1 = %f, top5 = %f", epoch + 1, meters.acc1.avg, meters.acc5.avg)
if __name__ == "__main__":
parser = argparse.ArgumentParser("SPOS Training From Scratch")
parser.add_argument("--imagenet-dir", type=str, default="./data/imagenet")
parser.add_argument("--tb-dir", type=str, default="runs")
parser.add_argument("--architecture", type=str, default="architecture_final.json")
parser.add_argument("--workers", type=int, default=12)
parser.add_argument("--batch-size", type=int, default=1024)
parser.add_argument("--epochs", type=int, default=240)
parser.add_argument("--learning-rate", type=float, default=0.5)
parser.add_argument("--momentum", type=float, default=0.9)
parser.add_argument("--weight-decay", type=float, default=4E-5)
parser.add_argument("--label-smooth", type=float, default=0.1)
parser.add_argument("--log-frequency", type=int, default=10)
parser.add_argument("--lr-decay", type=str, default="linear")
parser.add_argument("--seed", type=int, default=42)
parser.add_argument("--spos-preprocessing", default=False, action="store_true")
parser.add_argument("--label-smoothing", type=float, default=0.1)
args = parser.parse_args()
torch.manual_seed(args.seed)
torch.cuda.manual_seed_all(args.seed)
np.random.seed(args.seed)
random.seed(args.seed)
torch.backends.cudnn.deterministic = True
model = ShuffleNetV2OneShot()
model.cuda()
apply_fixed_architecture(model, args.architecture)
if torch.cuda.device_count() > 1: # exclude last gpu, saving for data preprocessing on gpu
model = nn.DataParallel(model, device_ids=list(range(0, torch.cuda.device_count() - 1)))
criterion = CrossEntropyLabelSmooth(1000, args.label_smoothing)
optimizer = torch.optim.SGD(model.parameters(), lr=args.learning_rate,
momentum=args.momentum, weight_decay=args.weight_decay)
if args.lr_decay == "linear":
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer,
lambda step: (1.0 - step / args.epochs)
if step <= args.epochs else 0,
last_epoch=-1)
elif args.lr_decay == "cosine":
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, args.epochs, 1E-3)
else:
raise ValueError("'%s' not supported." % args.lr_decay)
writer = SummaryWriter(log_dir=args.tb_dir)
train_loader = get_imagenet_iter_dali("train", args.imagenet_dir, args.batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing)
val_loader = get_imagenet_iter_dali("val", args.imagenet_dir, args.batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing)
for epoch in range(args.epochs):
train(epoch, model, criterion, optimizer, train_loader, writer, args)
validate(epoch, model, criterion, val_loader, writer, args)
scheduler.step()
writer.close()
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import argparse
import logging
import random
import numpy as np
import torch
import torch.nn as nn
from nni.nas.pytorch.callbacks import LRSchedulerCallback
from nni.nas.pytorch.callbacks import ModelCheckpoint
from nni.nas.pytorch.spos import SPOSSupernetTrainingMutator, SPOSSupernetTrainer
from dataloader import get_imagenet_iter_dali
from network import ShuffleNetV2OneShot, load_and_parse_state_dict
from utils import CrossEntropyLabelSmooth, accuracy
logger = logging.getLogger("nni.spos.supernet")
if __name__ == "__main__":
parser = argparse.ArgumentParser("SPOS Supernet Training")
parser.add_argument("--imagenet-dir", type=str, default="./data/imagenet")
parser.add_argument("--load-checkpoint", action="store_true", default=False)
parser.add_argument("--spos-preprocessing", action="store_true", default=False,
help="When true, image values will range from 0 to 255 and use BGR "
"(as in original repo).")
parser.add_argument("--workers", type=int, default=4)
parser.add_argument("--batch-size", type=int, default=768)
parser.add_argument("--epochs", type=int, default=120)
parser.add_argument("--learning-rate", type=float, default=0.5)
parser.add_argument("--momentum", type=float, default=0.9)
parser.add_argument("--weight-decay", type=float, default=4E-5)
parser.add_argument("--label-smooth", type=float, default=0.1)
parser.add_argument("--log-frequency", type=int, default=10)
parser.add_argument("--seed", type=int, default=42)
parser.add_argument("--label-smoothing", type=float, default=0.1)
args = parser.parse_args()
torch.manual_seed(args.seed)
torch.cuda.manual_seed_all(args.seed)
np.random.seed(args.seed)
random.seed(args.seed)
torch.backends.cudnn.deterministic = True
model = ShuffleNetV2OneShot()
if args.load_checkpoint:
if not args.spos_preprocessing:
logger.warning("You might want to use SPOS preprocessing if you are loading their checkpoints.")
model.load_state_dict(load_and_parse_state_dict())
model.cuda()
if torch.cuda.device_count() > 1: # exclude last gpu, saving for data preprocessing on gpu
model = nn.DataParallel(model, device_ids=list(range(0, torch.cuda.device_count() - 1)))
mutator = SPOSSupernetTrainingMutator(model, flops_func=model.module.get_candidate_flops,
flops_lb=290E6, flops_ub=360E6)
criterion = CrossEntropyLabelSmooth(1000, args.label_smoothing)
optimizer = torch.optim.SGD(model.parameters(), lr=args.learning_rate,
momentum=args.momentum, weight_decay=args.weight_decay)
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer,
lambda step: (1.0 - step / args.epochs)
if step <= args.epochs else 0,
last_epoch=-1)
train_loader = get_imagenet_iter_dali("train", args.imagenet_dir, args.batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing)
valid_loader = get_imagenet_iter_dali("val", args.imagenet_dir, args.batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing)
trainer = SPOSSupernetTrainer(model, criterion, accuracy, optimizer,
args.epochs, train_loader, valid_loader,
mutator=mutator, batch_size=args.batch_size,
log_frequency=args.log_frequency, workers=args.workers,
callbacks=[LRSchedulerCallback(scheduler),
ModelCheckpoint("./checkpoints")])
trainer.train()
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
import argparse
import logging
import random
import time
from itertools import cycle
import nni
import numpy as np
import torch
import torch.nn as nn
from nni.nas.pytorch.classic_nas import get_and_apply_next_architecture
from nni.nas.pytorch.utils import AverageMeterGroup
from dataloader import get_imagenet_iter_dali
from network import ShuffleNetV2OneShot, load_and_parse_state_dict
from utils import CrossEntropyLabelSmooth, accuracy
logger = logging.getLogger("nni.spos.tester")
def retrain_bn(model, criterion, max_iters, log_freq, loader):
with torch.no_grad():
logger.info("Clear BN statistics...")
for m in model.modules():
if isinstance(m, nn.BatchNorm2d):
m.running_mean = torch.zeros_like(m.running_mean)
m.running_var = torch.ones_like(m.running_var)
logger.info("Train BN with training set (BN sanitize)...")
model.train()
meters = AverageMeterGroup()
for step in range(max_iters):
inputs, targets = next(loader)
logits = model(inputs)
loss = criterion(logits, targets)
metrics = accuracy(logits, targets)
metrics["loss"] = loss.item()
meters.update(metrics)
if step % log_freq == 0 or step + 1 == max_iters:
logger.info("Train Step [%d/%d] %s", step + 1, max_iters, meters)
def test_acc(model, criterion, log_freq, loader):
logger.info("Start testing...")
model.eval()
meters = AverageMeterGroup()
start_time = time.time()
with torch.no_grad():
for step, (inputs, targets) in enumerate(loader):
logits = model(inputs)
loss = criterion(logits, targets)
metrics = accuracy(logits, targets)
metrics["loss"] = loss.item()
meters.update(metrics)
if step % log_freq == 0 or step + 1 == len(loader):
logger.info("Valid Step [%d/%d] time %.3fs acc1 %.4f acc5 %.4f loss %.4f",
step + 1, len(loader), time.time() - start_time,
meters.acc1.avg, meters.acc5.avg, meters.loss.avg)
return meters.acc1.avg
def evaluate_acc(model, criterion, args, loader_train, loader_test):
acc_before = test_acc(model, criterion, args.log_frequency, loader_test)
nni.report_intermediate_result(acc_before)
retrain_bn(model, criterion, args.train_iters, args.log_frequency, loader_train)
acc = test_acc(model, criterion, args.log_frequency, loader_test)
assert isinstance(acc, float)
nni.report_intermediate_result(acc)
nni.report_final_result(acc)
if __name__ == "__main__":
parser = argparse.ArgumentParser("SPOS Candidate Tester")
parser.add_argument("--imagenet-dir", type=str, default="./data/imagenet")
parser.add_argument("--checkpoint", type=str, default="./data/checkpoint-150000.pth.tar")
parser.add_argument("--spos-preprocessing", action="store_true", default=False,
help="When true, image values will range from 0 to 255 and use BGR "
"(as in original repo).")
parser.add_argument("--seed", type=int, default=42)
parser.add_argument("--workers", type=int, default=6)
parser.add_argument("--train-batch-size", type=int, default=128)
parser.add_argument("--train-iters", type=int, default=200)
parser.add_argument("--test-batch-size", type=int, default=512)
parser.add_argument("--log-frequency", type=int, default=10)
args = parser.parse_args()
# use a fixed set of image will improve the performance
torch.manual_seed(args.seed)
torch.cuda.manual_seed_all(args.seed)
np.random.seed(args.seed)
random.seed(args.seed)
torch.backends.cudnn.deterministic = True
assert torch.cuda.is_available()
model = ShuffleNetV2OneShot()
criterion = CrossEntropyLabelSmooth(1000, 0.1)
get_and_apply_next_architecture(model)
model.load_state_dict(load_and_parse_state_dict(filepath=args.checkpoint))
model.cuda()
train_loader = get_imagenet_iter_dali("train", args.imagenet_dir, args.train_batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing,
seed=args.seed, device_id=0)
val_loader = get_imagenet_iter_dali("val", args.imagenet_dir, args.test_batch_size, args.workers,
spos_preprocessing=args.spos_preprocessing, shuffle=True,
seed=args.seed, device_id=0)
train_loader = cycle(train_loader)
evaluate_acc(model, criterion, args, train_loader, val_loader)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment