"git@developer.sourcefind.cn:xdb4_94051/vllm.git" did not exist on "3ec8c25cd07c4a3d747b846ece8e305a7fb44349"
Unverified Commit e93d2c25 authored by liuzhe-lz's avatar liuzhe-lz Committed by GitHub
Browse files

Merge model compression dev branch to master (#1571)

* [Proposal] demo compressor (#1402)

model compression

* update doc for model compression (#1509)

* Update Overview.md

* Change Doc (#1510)

* refactor compression sdk (#1562)

* refactor compression sdk

* bugfix

* bugfix

* update ut

* Sync model compression doc and implementation (#1575)

* update doc

* formatting

* bugfix

* add import to examples
parent 3274ca30
...@@ -10,6 +10,11 @@ jobs: ...@@ -10,6 +10,11 @@ jobs:
steps: steps:
- script: python3 -m pip install --upgrade pip setuptools --user - script: python3 -m pip install --upgrade pip setuptools --user
displayName: 'Install python tools' displayName: 'Install python tools'
- script: |
python3 -m pip install torch==0.4.1 --user
python3 -m pip install torchvision==0.2.1 --user
python3 -m pip install tensorflow==1.12.0 --user
displayName: 'Install dependencies for integration'
- script: | - script: |
source install.sh source install.sh
displayName: 'Install nni toolkit via source code' displayName: 'Install nni toolkit via source code'
...@@ -50,6 +55,11 @@ jobs: ...@@ -50,6 +55,11 @@ jobs:
steps: steps:
- script: python3 -m pip install --upgrade pip setuptools - script: python3 -m pip install --upgrade pip setuptools
displayName: 'Install python tools' displayName: 'Install python tools'
- script: |
python3 -m pip install torch==0.4.1 --user
python3 -m pip install torchvision==0.2.1 --user
python3 -m pip install tensorflow --user
displayName: 'Install dependencies for integration'
- script: | - script: |
source install.sh source install.sh
displayName: 'Install nni toolkit via source code' displayName: 'Install nni toolkit via source code'
......
# Automatic Model Compression on NNI
TBD.
\ No newline at end of file
# Compressor
NNI provides an easy-to-use toolkit to help user design and use compression algorithms. It supports Tensorflow and PyTorch with unified interface. For users to compress their models, they only need to add several lines in their code. There are some popular model compression algorithms built-in in NNI. Users could further use NNI's auto tuning power to find the best compressed model, which is detailed in [Auto Model Compression](./AutoCompression.md). On the other hand, users could easily customize their new compression algorithms using NNI's interface, refer to the tutorial [here](#customize-new-compression-algorithms).
## Supported algorithms
We have provided two naive compression algorithms and four popular ones for users, including three pruning algorithms and three quantization algorithms:
|Name|Brief Introduction of Algorithm|
|---|---|
| [Level Pruner](./Pruner.md#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
| [AGP Pruner](./Pruner.md#agp-pruner) | To prune, or not to prune: exploring the efficacy of pruning for model compression. [Reference Paper](https://arxiv.org/abs/1710.01878)|
| [Sensitivity Pruner](./Pruner.md#sensitivity-pruner) | Learning both Weights and Connections for Efficient Neural Networks. [Reference Paper](https://arxiv.org/abs/1506.02626)|
| [Naive Quantizer](./Quantizer.md#naive-quantizer) | Quantize weights to default 8 bits |
| [QAT Quantizer](./Quantizer.md#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
| [DoReFa Quantizer](./Quantizer.md#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
## Usage of built-in compression algorithms
We use a simple example to show how to modify your trial code in order to apply the compression algorithms. Let's say you want to prune all weight to 80% sparsity with Level Pruner, you can add the following three lines into your code before training your model ([here](https://github.com/microsoft/nni/tree/master/examples/model_compress) is complete code).
Tensorflow code
```python
from nni.compression.tensorflow import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': 'default' }]
pruner = LevelPruner(config_list)
pruner(tf.get_default_graph())
```
PyTorch code
```python
from nni.compression.torch import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': 'default' }]
pruner = LevelPruner(config_list)
pruner(model)
```
You can use other compression algorithms in the package of `nni.compression`. The algorithms are implemented in both PyTorch and Tensorflow, under `nni.compression.torch` and `nni.compression.tensorflow` respectively. You can refer to [Pruner](./Pruner.md) and [Quantizer](./Quantizer.md) for detail description of supported algorithms.
The function call `pruner(model)` receives user defined model (in Tensorflow the model can be obtained with `tf.get_default_graph()`, while in PyTorch the model is the defined model class), and the model is modified with masks inserted. Then when you run the model, the masks take effect. The masks can be adjusted at runtime by the algorithms.
When instantiate a compression algorithm, there is `config_list` passed in. We describe how to write this config below.
### User configuration for a compression algorithm
When compressing a model, users may want to specify the ratio for sparsity, to specify different ratios for different types of operations, to exclude certain types of operations, or to compress only a certain types of operations. For users to express these kinds of requirements, we define a configuration specification. It can be seen as a python `list` object, where each element is a `dict` object. In each `dict`, there are some keys commonly supported by NNI compression:
* __op_types__: This is to specify what types of operations to be compressed. 'default' means following the algorithm's default setting.
* __op_names__: This is to specify by name what operations to be compressed. If this field is omitted, operations will not be filtered by it.
* __exclude__: Default is False. If this field is True, it means the operations with specified types and names will be excluded from the compression.
There are also other keys in the `dict`, but they are specific for every compression algorithm. For example, some , some.
The `dict`s in the `list` are applied one by one, that is, the configurations in latter `dict` will overwrite the configurations in former ones on the operations that are within the scope of both of them.
A simple example of configuration is shown below:
```python
[
{
'sparsity': 0.8,
'op_types': 'default'
},
{
'sparsity': 0.6,
'op_names': ['op_name1', 'op_name2']
},
{
'exclude': True,
'op_names': ['op_name3']
}
]
```
It means following the algorithm's default setting for compressed operations with sparsity 0.8, but for `op_name1` and `op_name2` use sparsity 0.6, and please do not compress `op_name3`.
### Other APIs
Some compression algorithms use epochs to control the progress of compression, and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke. One is `update_epoch`, you can use it as follows:
Tensorflow code
```python
pruner.update_epoch(epoch, sess)
```
PyTorch code
```python
pruner.update_epoch(epoch)
```
The other is `step`, it can be called with `pruner.step()` after each minibatch. Note that not all algorithms need these two APIs, for those that do not need them, calling them is allowed but has no effect.
__[TODO]__ The last API is for users to export the compressed model. You will get a compressed model when you finish the training using this API. It also exports another file storing the values of masks.
## Customize new compression algorithms
To simplify writing a new compression algorithm, we design programming interfaces which are simple but flexible enough. There are interfaces for pruner and quantizer respectively.
### Pruning algorithm
If you want to write a new pruning algorithm, you can write a class that inherits `nni.compression.tensorflow.Pruner` or `nni.compression.torch.Pruner` depending on which framework you use. Then, override the member functions with the logic of your algorithm.
```python
# This is writing a pruner in tensorflow.
# For writing a pruner in PyTorch, you can simply replace
# nni.compression.tensorflow.Pruner with
# nni.compression.torch.Pruner
class YourPruner(nni.compression.tensorflow.Pruner):
def __init__(self, config_list):
# suggest you to use the NNI defined spec for config
super().__init__(config_list)
def bind_model(self, model):
# this func can be used to remember the model or its weights
# in member variables, for getting their values during training
pass
def calc_mask(self, weight, config, **kwargs):
# weight is the target weight tensor
# config is the selected dict object in config_list for this layer
# kwargs contains op, op_type, and op_name
# design your mask and return your mask
return your_mask
# note for pytorch version, there is no sess in input arguments
def update_epoch(self, epoch_num, sess):
pass
# note for pytorch version, there is no sess in input arguments
def step(self, sess):
# can do some processing based on the model or weights binded
# in the func bind_model
pass
```
For the simpliest algorithm, you only need to override `calc_mask`. It receives each layer's weight and selected configuration, as well as op information. You generate the mask for this weight in this function and return. Then NNI applies the mask for you.
Some algorithms generate mask based on training progress, i.e., epoch number. We provide `update_epoch` for the pruner to be aware of the training progress.
Some algorithms may want global information for generating masks, for example, all weights of the model (for statistic information), model optimizer's information. NNI supports this requirement using `bind_model`. `bind_model` receives the complete model, thus, it could record any information (e.g., reference to weights) it cares about. Then `step` can process or update the information according to the algorithm. You can refer to [source code of built-in algorithms](https://github.com/microsoft/nni/tree/master/src/sdk/pynni/nni/compressors) for example implementations.
### Quantization algorithm
The interface for customizing quantization algorithm is similar to that of pruning algorithms. The only difference is that `calc_mask` is replaced with `quantize_weight`. `quantize_weight` directly returns the quantized weights rather than mask, because for quantization the quantized weights cannot be obtained by applying mask.
```
# This is writing a Quantizer in tensorflow.
# For writing a Quantizer in PyTorch, you can simply replace
# nni.compression.tensorflow.Quantizer with
# nni.compression.torch.Quantizer
class YourPruner(nni.compression.tensorflow.Quantizer):
def __init__(self, config_list):
# suggest you to use the NNI defined spec for config
super().__init__(config_list)
def bind_model(self, model):
# this func can be used to remember the model or its weights
# in member variables, for getting their values during training
pass
def quantize_weight(self, weight, config, **kwargs):
# weight is the target weight tensor
# config is the selected dict object in config_list for this layer
# kwargs contains op, op_type, and op_name
# design your quantizer and return new weight
return new_weight
# note for pytorch version, there is no sess in input arguments
def update_epoch(self, epoch_num, sess):
pass
# note for pytorch version, there is no sess in input arguments
def step(self, sess):
# can do some processing based on the model or weights binded
# in the func bind_model
pass
# you can also design your method
def your_method(self, your_input):
#your code
def bind_model(self, model):
#preprocess model
```
__[TODO]__ Will add another member function `quantize_layer_output`, as some quantization algorithms also quantize layers' output.
### Usage of user customized compression algorithm
__[TODO]__ ...
Pruner on NNI Compressor
===
## Level Pruner
This is one basic pruner: you can set a target sparsity level (expressed as a fraction, 0.6 means we will prune 60%).
We first sort the weights in the specified layer by their absolute values. And then mask to zero the smallest magnitude weights until the desired sparsity level is reached.
### Usage
Tensorflow code
```
from nni.compression.tensorflow import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': 'default' }]
pruner = LevelPruner(config_list)
pruner(model_graph)
```
PyTorch code
```
from nni.compression.torch import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': 'default' }]
pruner = LevelPruner(config_list)
pruner(model)
```
#### User configuration for Level Pruner
* **sparsity:** This is to specify the sparsity operations to be compressed to
***
## AGP Pruner
In [To prune, or not to prune: exploring the efficacy of pruning for model compression](https://arxiv.org/abs/1710.01878), authors Michael Zhu and Suyog Gupta provide an algorithm to prune the weight gradually.
>We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value si (usually 0) to a final sparsity value sf over a span of n pruning steps, starting at training step t0 and with pruning frequency ∆t:
![](../../img/agp_pruner.png)
>The binary weight masks are updated every ∆t steps as the network is trained to gradually increase the sparsity of the network while allowing the network training steps to recover from any pruning-induced loss in accuracy. In our experience, varying the pruning frequency ∆t between 100 and 1000 training steps had a negligible impact on the final model quality. Once the model achieves the target sparsity sf , the weight masks are no longer updated. The intuition behind this sparsity function in equation
### Usage
You can prune all weight from %0 to 80% sparsity in 10 epoch with the code below.
First, you should import pruner and add mask to model.
Tensorflow code
```python
from nni.compression.tensorflow import AGP_Pruner
config_list = [{
'initial_sparsity': 0,
'final_sparsity': 0.8,
'start_epoch': 1,
'end_epoch': 10,
'frequency': 1,
'op_types': 'default'
}]
pruner = AGP_Pruner(config_list)
pruner(tf.get_default_graph())
```
PyTorch code
```python
from nni.compression.torch import AGP_Pruner
config_list = [{
'initial_sparsity': 0,
'final_sparsity': 0.8,
'start_epoch': 1,
'end_epoch': 10,
'frequency': 1,
'op_types': 'default'
}]
pruner = AGP_Pruner(config_list)
pruner(model)
```
Second, you should add code below to update epoch number when you finish one epoch in your training code.
Tensorflow code
```python
pruner.update_epoch(epoch, sess)
```
PyTorch code
```python
pruner.update_epoch(epoch)
```
You can view example for more information
#### User configuration for AGP Pruner
* **initial_sparsity:** This is to specify the sparsity when compressor starts to compress
* **final_sparsity:** This is to specify the sparsity when compressor finishes to compress
* **start_epoch:** This is to specify the epoch number when compressor starts to compress
* **end_epoch:** This is to specify the epoch number when compressor finishes to compress
* **frequency:** This is to specify every *frequency* number epochs compressor compress once
***
## Sensitivity Pruner
In [Learning both Weights and Connections for Efficient Neural Networks](https://arxiv.org/abs/1506.02626), author Song Han and provide an algorithm to find the sensitivity of each layer and set the pruning threshold to each layer.
>We used the sensitivity results to find each layer’s threshold: for example, the smallest threshold was applied to the most sensitive layer, which is the first convolutional layer... The pruning threshold is chosen as a quality parameter multiplied by the standard deviation of a layer’s weights
### Usage
You can prune weight step by step and reach one target sparsity by Sensitivity Pruner with the code below.
Tensorflow code
```python
from nni.compression.tensorflow import SensitivityPruner
config_list = [{ 'sparsity':0.8, 'op_types': 'default' }]
pruner = SensitivityPruner(config_list)
pruner(tf.get_default_graph())
```
PyTorch code
```python
from nni.compression.torch import SensitivityPruner
config_list = [{ 'sparsity':0.8, 'op_types': 'default' }]
pruner = SensitivityPruner(config_list)
pruner(model)
```
Like AGP Pruner, you should update mask information every epoch by adding code below
Tensorflow code
```python
pruner.update_epoch(epoch, sess)
```
PyTorch code
```python
pruner.update_epoch(epoch)
```
You can view example for more information
#### User configuration for Sensitivity Pruner
* **sparsity:** This is to specify the sparsity operations to be compressed to
***
Quantizer on NNI Compressor
===
## Naive Quantizer
We provide Naive Quantizer to quantizer weight to default 8 bits, you can use it to test quantize algorithm without any configure.
### Usage
tensorflow
```python
nni.compressors.tensorflow.NaiveQuantizer()(model_graph)
```
pytorch
```python
nni.compressors.torch.NaiveQuantizer()(model)
```
***
## QAT Quantizer
In [Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf), authors Benoit Jacob and Skirmantas Kligys provide an algorithm to quantize the model with training.
>We propose an approach that simulates quantization effects in the forward pass of training. Backpropagation still happens as usual, and all weights and biases are stored in floating point so that they can be easily nudged by small amounts. The forward propagation pass however simulates quantized inference as it will happen in the inference engine, by implementing in floating-point arithmetic the rounding behavior of the quantization scheme
>* Weights are quantized before they are convolved with the input. If batch normalization (see [17]) is used for the layer, the batch normalization parameters are “folded into” the weights before quantization.
>* Activations are quantized at points where they would be during inference, e.g. after the activation function is applied to a convolutional or fully connected layer’s output, or after a bypass connection adds or concatenates the outputs of several layers together such as in ResNets.
### Usage
You can quantize your model to 8 bits with the code below before your training code.
Tensorflow code
```python
from nni.compressors.tensorflow import QAT_Quantizer
config_list = [{ 'q_bits': 8, 'op_types': 'default' }]
quantizer = QAT_Quantizer(config_list)
quantizer(tf.get_default_graph())
```
PyTorch code
```python
from nni.compressors.torch import QAT_Quantizer
config_list = [{ 'q_bits': 8, 'op_types': 'default' }]
quantizer = QAT_Quantizer(config_list)
quantizer(model)
```
You can view example for more information
#### User configuration for QAT Quantizer
* **q_bits:** This is to specify the q_bits operations to be quantized to
***
## DoReFa Quantizer
In [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/abs/1606.06160), authors Shuchang Zhou and Yuxin Wu provide an algorithm named DoReFa to quantize the weight, activation and gradients with training.
### Usage
To implement DoReFa Quantizer, you can add code below before your training code
Tensorflow code
```python
from nni.compressors.tensorflow import DoReFaQuantizer
config_list = [{ 'q_bits': 8, 'op_types': 'default' }]
quantizer = DoReFaQuantizer(config_list)
quantizer(tf.get_default_graph())
```
PyTorch code
```python
from nni.compressors.torch import DoReFaQuantizer
config_list = [{ 'q_bits': 8, 'op_types': 'default' }]
quantizer = DoReFaQuantizer(config_list)
quantizer(model)
```
You can view example for more information
#### User configuration for QAT Quantizer
* **q_bits:** This is to specify the q_bits operations to be quantized to
AGPruner:
config:
-
start_epoch: 1
end_epoch: 10
frequency: 1
initial_sparsity: 0.05
final_sparsity: 0.8
op_type: 'default'
from nni.compression.tensorflow import AGP_Pruner
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
def weight_variable(shape):
return tf.Variable(tf.truncated_normal(shape, stddev = 0.1))
def bias_variable(shape):
return tf.Variable(tf.constant(0.1, shape = shape))
def conv2d(x_input, w_matrix):
return tf.nn.conv2d(x_input, w_matrix, strides = [ 1, 1, 1, 1 ], padding = 'SAME')
def max_pool(x_input, pool_size):
size = [ 1, pool_size, pool_size, 1 ]
return tf.nn.max_pool(x_input, ksize = size, strides = size, padding = 'SAME')
class Mnist:
def __init__(self):
images = tf.placeholder(tf.float32, [ None, 784 ], name = 'input_x')
labels = tf.placeholder(tf.float32, [ None, 10 ], name = 'input_y')
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
self.images = images
self.labels = labels
self.keep_prob = keep_prob
self.train_step = None
self.accuracy = None
self.w1 = None
self.b1 = None
self.fcw1 = None
self.cross = None
with tf.name_scope('reshape'):
x_image = tf.reshape(images, [ -1, 28, 28, 1 ])
with tf.name_scope('conv1'):
w_conv1 = weight_variable([ 5, 5, 1, 32 ])
self.w1 = w_conv1
b_conv1 = bias_variable([ 32 ])
self.b1 = b_conv1
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)
with tf.name_scope('pool1'):
h_pool1 = max_pool(h_conv1, 2)
with tf.name_scope('conv2'):
w_conv2 = weight_variable([ 5, 5, 32, 64 ])
b_conv2 = bias_variable([ 64 ])
h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2)
with tf.name_scope('pool2'):
h_pool2 = max_pool(h_conv2, 2)
with tf.name_scope('fc1'):
w_fc1 = weight_variable([ 7 * 7 * 64, 1024 ])
self.fcw1 = w_fc1
b_fc1 = bias_variable([ 1024 ])
h_pool2_flat = tf.reshape(h_pool2, [ -1, 7 * 7 * 64 ])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)
with tf.name_scope('dropout'):
h_fc1_drop = tf.nn.dropout(h_fc1, 0.5)
with tf.name_scope('fc2'):
w_fc2 = weight_variable([ 1024, 10 ])
b_fc2 = bias_variable([ 10 ])
y_conv = tf.matmul(h_fc1_drop, w_fc2) + b_fc2
with tf.name_scope('loss'):
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = labels, logits = y_conv))
self.cross = cross_entropy
with tf.name_scope('adam_optimizer'):
self.train_step = tf.train.AdamOptimizer(0.0001).minimize(cross_entropy)
with tf.name_scope('accuracy'):
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(labels, 1))
self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
def main():
tf.set_random_seed(0)
data = input_data.read_data_sets('data', one_hot = True)
model = Mnist()
'''you can change this to SensitivityPruner to implement it
pruner = SensitivityPruner(configure_list)
'''
configure_list = [{
'initial_sparsity': 0,
'final_sparsity': 0.8,
'start_epoch': 1,
'end_epoch': 10,
'frequency': 1,
'op_type': 'default'
}]
pruner = AGP_Pruner(configure_list)
# if you want to load from yaml file
# configure_file = nni.compressors.tf_compressor._nnimc_tf._tf_default_load_configure_file('configure_example.yaml','AGPruner')
# configure_list = configure_file.get('config',[])
# pruner.load_configure(configure_list)
# you can also handle it yourself and input an configure list in json
pruner(tf.get_default_graph())
# you can also use compress(model) or compress_default_graph() for tensorflow compressor
# pruner.compress(tf.get_default_graph())
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_idx in range(2000):
batch = data.train.next_batch(2000)
model.train_step.run(feed_dict = {
model.images: batch[0],
model.labels: batch[1],
model.keep_prob: 0.5
})
if batch_idx % 10 == 0:
test_acc = model.accuracy.eval(feed_dict = {
model.images: data.test.images,
model.labels: data.test.labels,
model.keep_prob: 1.0
})
pruner.update_epoch(batch_idx / 10,sess)
print('test accuracy', test_acc)
test_acc = model.accuracy.eval(feed_dict = {
model.images: data.test.images,
model.labels: data.test.labels,
model.keep_prob: 1.0
})
print('final result is', test_acc)
main()
from nni.compression.tensorflow import QAT_Quantizer
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
def weight_variable(shape):
return tf.Variable(tf.truncated_normal(shape, stddev = 0.1))
def bias_variable(shape):
return tf.Variable(tf.constant(0.1, shape = shape))
def conv2d(x_input, w_matrix):
return tf.nn.conv2d(x_input, w_matrix, strides = [ 1, 1, 1, 1 ], padding = 'SAME')
def max_pool(x_input, pool_size):
size = [ 1, pool_size, pool_size, 1 ]
return tf.nn.max_pool(x_input, ksize = size, strides = size, padding = 'SAME')
class Mnist:
def __init__(self):
images = tf.placeholder(tf.float32, [ None, 784 ], name = 'input_x')
labels = tf.placeholder(tf.float32, [ None, 10 ], name = 'input_y')
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
self.images = images
self.labels = labels
self.keep_prob = keep_prob
self.train_step = None
self.accuracy = None
self.w1 = None
self.b1 = None
self.fcw1 = None
self.cross = None
with tf.name_scope('reshape'):
x_image = tf.reshape(images, [ -1, 28, 28, 1 ])
with tf.name_scope('conv1'):
w_conv1 = weight_variable([ 5, 5, 1, 32 ])
self.w1 = w_conv1
b_conv1 = bias_variable([ 32 ])
self.b1 = b_conv1
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)
with tf.name_scope('pool1'):
h_pool1 = max_pool(h_conv1, 2)
with tf.name_scope('conv2'):
w_conv2 = weight_variable([ 5, 5, 32, 64 ])
b_conv2 = bias_variable([ 64 ])
h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2)
with tf.name_scope('pool2'):
h_pool2 = max_pool(h_conv2, 2)
with tf.name_scope('fc1'):
w_fc1 = weight_variable([ 7 * 7 * 64, 1024 ])
self.fcw1 = w_fc1
b_fc1 = bias_variable([ 1024 ])
h_pool2_flat = tf.reshape(h_pool2, [ -1, 7 * 7 * 64 ])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1) + b_fc1)
with tf.name_scope('dropout'):
h_fc1_drop = tf.nn.dropout(h_fc1, 0.5)
with tf.name_scope('fc2'):
w_fc2 = weight_variable([ 1024, 10 ])
b_fc2 = bias_variable([ 10 ])
y_conv = tf.matmul(h_fc1_drop, w_fc2) + b_fc2
with tf.name_scope('loss'):
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels = labels, logits = y_conv))
self.cross = cross_entropy
with tf.name_scope('adam_optimizer'):
self.train_step = tf.train.AdamOptimizer(0.0001).minimize(cross_entropy)
with tf.name_scope('accuracy'):
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(labels, 1))
self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
def main():
tf.set_random_seed(0)
data = input_data.read_data_sets('data', one_hot = True)
model = Mnist()
'''you can change this to DoReFaQuantizer to implement it
DoReFaQuantizer(configure_list).compress(tf.get_default_graph())
'''
configure_list = [{'q_bits':8, 'op_type':'default'}]
quantizer = QAT_Quantizer(configure_list)
quantizer(tf.get_default_graph())
# you can also use compress(model) or compress_default_graph()
# method like QATquantizer(q_bits = 8).compress_default_graph()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for batch_idx in range(2000):
batch = data.train.next_batch(2000)
model.train_step.run(feed_dict = {
model.images: batch[0],
model.labels: batch[1],
model.keep_prob: 0.5
})
if batch_idx % 10 == 0:
test_acc = model.accuracy.eval(feed_dict = {
model.images: data.test.images,
model.labels: data.test.labels,
model.keep_prob: 1.0
})
print('test accuracy', test_acc)
test_acc = model.accuracy.eval(feed_dict = {
model.images: data.test.images,
model.labels: data.test.labels,
model.keep_prob: 1.0
})
print('final result is', test_acc)
main()
from nni.compression.torch import AGP_Pruner
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
class Mnist(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
self.fc2 = torch.nn.Linear(500, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4 * 4 * 50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim = 1)
def train(model, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction = 'sum').item()
pred = output.argmax(dim = 1, keepdim = True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, 100 * correct / len(test_loader.dataset)))
def main():
torch.manual_seed(0)
device = torch.device('cpu')
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train = True, download = True, transform = trans),
batch_size = 64, shuffle = True)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train = False, transform = trans),
batch_size = 1000, shuffle = True)
model = Mnist()
'''you can change this to SensitivityPruner to implement it
pruner = SensitivityPruner(configure_list)
'''
configure_list = [{
'initial_sparsity': 0,
'final_sparsity': 0.8,
'start_epoch': 1,
'end_epoch': 10,
'frequency': 1,
'op_type': 'default'
}]
pruner = AGP_Pruner(configure_list)
pruner(model)
# you can also use compress(model) method
# like that pruner.compress(model)
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01, momentum = 0.5)
for epoch in range(10):
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer)
test(model, device, test_loader)
pruner.update_epoch(epoch)
main()
from nni.compression.torch import QAT_Quantizer
import torch
import torch.nn.functional as F
from torchvision import datasets, transforms
class Mnist(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
self.fc2 = torch.nn.Linear(500, 10)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.max_pool2d(x, 2, 2)
x = F.relu(self.conv2(x))
x = F.max_pool2d(x, 2, 2)
x = x.view(-1, 4 * 4 * 50)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x, dim = 1)
def train(model, device, train_loader, optimizer):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 100 == 0:
print('{:2.0f}% Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
def test(model, device, test_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += F.nll_loss(output, target, reduction = 'sum').item()
pred = output.argmax(dim = 1, keepdim = True)
correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset)
print('Loss: {} Accuracy: {}%)\n'.format(
test_loss, 100 * correct / len(test_loader.dataset)))
def main():
torch.manual_seed(0)
device = torch.device('cpu')
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train = True, download = True, transform = trans),
batch_size = 64, shuffle = True)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train = False, transform = trans),
batch_size = 1000, shuffle = True)
model = Mnist()
'''you can change this to DoReFaQuantizer to implement it
DoReFaQuantizer(configure_list).compress(model)
'''
configure_list = [{'q_bits':8, 'op_type':'default'}]
quantizer = QAT_Quantizer(configure_list)
quantizer(model)
# you can also use compress(model) method
# like thaht quantizer.compress(model)
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01, momentum = 0.5)
for epoch in range(10):
print('# Epoch {} #'.format(epoch))
train(model, device, train_loader, optimizer)
test(model, device, test_loader)
main()
from .compressor import LayerInfo, Compressor, Pruner, Quantizer
from .builtin_pruners import *
from .builtin_quantizers import *
import logging
import tensorflow as tf
from .compressor import Pruner
__all__ = [ 'LevelPruner', 'AGP_Pruner', 'SensitivityPruner' ]
_logger = logging.getLogger(__name__)
class LevelPruner(Pruner):
def __init__(self, config_list):
"""
Configure Args:
sparsity
"""
super().__init__(config_list)
def calc_mask(self, weight, config, **kwargs):
threshold = tf.contrib.distributions.percentile(tf.abs(weight), config['sparsity'] * 100)
return tf.cast(tf.math.greater(tf.abs(weight), threshold), weight.dtype)
class AGP_Pruner(Pruner):
"""
An automated gradual pruning algorithm that prunes the smallest magnitude
weights to achieve a preset level of network sparsity.
Michael Zhu and Suyog Gupta, "To prune, or not to prune: exploring the
efficacy of pruning for model compression", 2017 NIPS Workshop on Machine
Learning of Phones and other Consumer Devices,
https://arxiv.org/pdf/1710.01878.pdf
"""
def __init__(self, config_list):
"""
Configure Args
initial_sparsity:
final_sparsity: you should make sure initial_sparsity <= final_sparsity
start_epoch: start epoch numer begin update mask
end_epoch: end epoch number stop update mask
frequency: if you want update every 2 epoch, you can set it 2
"""
super().__init__(config_list)
self.now_epoch = tf.Variable(0)
self.assign_handler = []
def calc_mask(self, weight, config, **kwargs):
target_sparsity = self.compute_target_sparsity(config)
threshold = tf.contrib.distributions.percentile(weight, target_sparsity * 100)
# stop gradient in case gradient change the mask
mask = tf.stop_gradient(tf.cast(tf.math.greater(weight, threshold), weight.dtype))
self.assign_handler.append(tf.assign(weight, weight * mask))
return mask
def compute_target_sparsity(self, config):
end_epoch = config.get('end_epoch', 1)
start_epoch = config.get('start_epoch', 0)
freq = config.get('frequency', 1)
final_sparsity = config.get('final_sparsity', 0)
initial_sparsity = config.get('initial_sparsity', 0)
if end_epoch <= start_epoch or initial_sparsity >= final_sparsity:
_logger.warning('your end epoch <= start epoch or initial_sparsity >= final_sparsity')
return final_sparsity
now_epoch = tf.minimum(self.now_epoch, tf.constant(end_epoch))
span = int(((end_epoch - start_epoch-1)//freq)*freq)
assert span > 0
base = tf.cast(now_epoch - start_epoch, tf.float32) / span
target_sparsity = (final_sparsity +
(initial_sparsity - final_sparsity)*
(tf.pow(1.0 - base, 3)))
return target_sparsity
def update_epoch(self, epoch, sess):
sess.run(self.assign_handler)
sess.run(tf.assign(self.now_epoch, int(epoch)))
class SensitivityPruner(Pruner):
"""
Use algorithm from "Learning both Weights and Connections for Efficient Neural Networks"
https://arxiv.org/pdf/1506.02626v3.pdf
I.e.: "The pruning threshold is chosen as a quality parameter multiplied
by the standard deviation of a layers weights."
"""
def __init__(self, config_list):
"""
Configure Args:
sparsity: chosen pruning sparsity
"""
super().__init__(config_list)
self.layer_mask = {}
self.assign_handler = []
def calc_mask(self, weight, config, op_name, **kwargs):
target_sparsity = config['sparsity'] * tf.math.reduce_std(weight)
mask = tf.get_variable(op_name + '_mask', initializer=tf.ones(weight.shape), trainable=False)
self.layer_mask[op_name] = mask
weight_assign_handler = tf.assign(weight, mask*weight)
# use control_dependencies so that weight_assign_handler will be executed before mask_update_handler
with tf.control_dependencies([weight_assign_handler]):
threshold = tf.contrib.distributions.percentile(weight, target_sparsity * 100)
# stop gradient in case gradient change the mask
new_mask = tf.stop_gradient(tf.cast(tf.math.greater(weight, threshold), weight.dtype))
mask_update_handler = tf.assign(mask, new_mask)
self.assign_handler.append(mask_update_handler)
return mask
def update_epoch(self, epoch, sess):
sess.run(self.assign_handler)
import logging
import tensorflow as tf
from .compressor import Quantizer
__all__ = [ 'NaiveQuantizer', 'QAT_Quantizer', 'DoReFaQuantizer' ]
_logger = logging.getLogger(__name__)
class NaiveQuantizer(Quantizer):
"""
quantize weight to 8 bits
"""
def __init__(self, config_list):
super().__init__(config_list)
self.layer_scale = { }
def quantize_weight(self, weight, config, op_name, **kwargs):
new_scale = tf.reduce_max(tf.abs(weight)) / 127
scale = tf.maximum(self.layer_scale.get(op_name, tf.constant(0.0)), new_scale)
self.layer_scale[op_name] = scale
orig_type = weight.dtype
return tf.cast(tf.cast(weight / scale, tf.int8), orig_type) * scale
class QAT_Quantizer(Quantizer):
"""
Quantizer using the DoReFa scheme, as defined in:
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf
"""
def __init__(self, config_list):
"""
Configure Args:
q_bits
"""
super().__init__(config_list)
def quantize_weight(self, weight, config, **kwargs):
a = tf.stop_gradient(tf.reduce_min(weight))
b = tf.stop_gradient(tf.reduce_max(weight))
n = tf.cast(2 ** config['q_bits'], tf.float32)
scale = b-a/(n-1)
# use gradient_override_map to change round to idetity for gradient
with tf.get_default_graph().gradient_override_map({'Round': 'Identity'}):
qw = tf.round((weight-a)/scale)*scale +a
return qw
class DoReFaQuantizer(Quantizer):
"""
Quantizer using the DoReFa scheme, as defined in:
Zhou et al., DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
(https://arxiv.org/abs/1606.06160)
"""
def __init__(self, config_list):
"""
Configure Args:
q_bits
"""
super().__init__(config_list)
def quantize_weight(self, weight, config, **kwargs):
a = tf.math.tanh(weight)
b = a/(2*tf.reduce_max(tf.abs(weight))) + 0.5
scale = pow(2, config['q_bits'] - 1)
# use gradient_override_map to change round to idetity for gradient
with tf.get_default_graph().gradient_override_map({'Round': 'Identity'}):
qw = tf.round(b*scale)/scale
r_qw = 2 * qw - 1
return r_qw
import tensorflow as tf
import logging
from . import default_layers
_logger = logging.getLogger(__name__)
class LayerInfo:
def __init__(self, op):
self.op = op
self.name = op.name
self.type = op.type
class Compressor:
"""
Abstract base TensorFlow compressor
"""
def __init__(self, config_list):
self._bound_model = None
self._config_list = config_list
def __call__(self, model):
self.compress(model)
return model
def compress(self, model):
"""
Compress given graph with algorithm implemented by subclass.
This will edit the graph.
"""
assert self._bound_model is None, "Each NNI compressor instance can only compress one model"
self._bound_model = model
self.bind_model(model)
for op in model.get_operations():
layer = LayerInfo(op)
config = self._select_config(layer)
if config is not None:
self._instrument_layer(layer, config)
def compress_default_graph(self):
"""
Compress the default graph with algorithm implemented by subclass.
This will edit the graph.
"""
self.compress(tf.get_default_graph())
def bind_model(self, model):
"""
This method is called when a model is bound to the compressor.
Users can optionally overload this method to do model-specific initialization.
It is guaranteed that only one model will be bound to each compressor instance.
"""
pass
def update_epoch(self, epoch, sess):
"""
if user want to update mask every epoch, user can override this method
"""
pass
def step(self, sess):
"""
if user want to update mask every step, user can override this method
"""
pass
def _instrument_layer(self, layer, config):
raise NotImplementedError()
def _select_config(self, layer):
ret = None
for config in self._config_list:
op_types = config.get('op_types')
if op_types == 'default':
op_types = default_layers.op_weight_index.keys()
if op_types and layer.type not in op_types:
continue
if config.get('op_names') and layer.name not in config['op_names']:
continue
ret = config
if ret is None or ret.get('exclude'):
return None
return ret
class Pruner(Compressor):
"""
Abstract base TensorFlow pruner
"""
def __init__(self, config_list):
super().__init__(config_list)
def calc_mask(self, weight, config, op, op_type, op_name):
"""
Pruners should overload this method to provide mask for weight tensors.
The mask must have the same shape and type comparing to the weight.
It will be applied with `multiply()` operation.
This method works as a subgraph which will be inserted into the bound model.
"""
raise NotImplementedError("Pruners must overload calc_mask()")
def _instrument_layer(self, layer, config):
"""
it seems the graph editor can only swap edges of nodes or remove all edges from a node
it cannot remove one edge from a node, nor can it assign a new edge to a node
we assume there is a proxy operation between the weight and the Conv2D layer
this is true as long as the weight is `tf.Value`
not sure what will happen if the weight is calculated from other operations
"""
weight_index = _detect_weight_index(layer)
if weight_index is None:
_logger.warning('Failed to detect weight for layer {}'.format(layer.name))
return
weight_op = layer.op.inputs[weight_index].op
weight = weight_op.inputs[0]
mask = self.calc_mask(weight, config, op=layer.op, op_type=layer.type, op_name=layer.name)
new_weight = weight * mask
tf.contrib.graph_editor.swap_outputs(weight_op, new_weight.op)
class Quantizer(Compressor):
"""
Abstract base TensorFlow quantizer
"""
def __init__(self, config_list):
super().__init__(config_list)
def quantize_weight(self, weight, config, op, op_type, op_name):
raise NotImplementedError("Quantizer must overload quantize_weight()")
def _instrument_layer(self, layer, config):
weight_index = _detect_weight_index(layer)
if weight_index is None:
_logger.warning('Failed to detect weight for layer {}'.format(layer.name))
return
weight_op = layer.op.inputs[weight_index].op
weight = weight_op.inputs[0]
new_weight = self.quantize_weight(weight, config, op=layer.op, op_type=layer.type, op_name=layer.name)
tf.contrib.graph_editor.swap_outputs(weight_op, new_weight.op)
def _detect_weight_index(layer):
index = default_layers.op_weight_index.get(layer.type)
if index is not None:
return index
weight_indices = [ i for i, op in enumerate(layer.op.inputs) if op.name.endswith('Variable/read') ]
if len(weight_indices) == 1:
return weight_indices[0]
return None
op_weight_index = {
'Conv2D': None,
'Conv3D': None,
'DepthwiseConv2dNative': None,
'Mul': None,
'MatMul': None,
}
from .compressor import LayerInfo, Compressor, Pruner, Quantizer
from .builtin_pruners import *
from .builtin_quantizers import *
import logging
import torch
from .compressor import Pruner
__all__ = [ 'LevelPruner', 'AGP_Pruner', 'SensitivityPruner' ]
logger = logging.getLogger('torch pruner')
class LevelPruner(Pruner):
"""Prune to an exact pruning level specification
"""
def __init__(self, config_list):
"""
we suggest user to use json configure list, like [{},{}...], to set configure
format :
[
{
'sparsity': 0,
'support_type': 'default'
},
{
'sparsity': 50,
'support_op': conv1
}
]
if you want input multiple configure from file, you'd better use load_configure_file(path) to load
"""
super().__init__(config_list)
def calc_mask(self, weight, config, **kwargs):
w_abs = weight.abs()
k = int(weight.numel() * config['sparsity'])
if k == 0:
return torch.ones(weight.shape)
threshold = torch.topk(w_abs.view(-1), k, largest = False).values.max()
return torch.gt(w_abs, threshold).type(weight.type())
class AGP_Pruner(Pruner):
"""
An automated gradual pruning algorithm that prunes the smallest magnitude
weights to achieve a preset level of network sparsity.
Michael Zhu and Suyog Gupta, "To prune, or not to prune: exploring the
efficacy of pruning for model compression", 2017 NIPS Workshop on Machine
Learning of Phones and other Consumer Devices,
https://arxiv.org/pdf/1710.01878.pdf
"""
def __init__(self, config_list):
"""
Configure Args
initial_sparsity
final_sparsity: you should make sure initial_sparsity <= final_sparsity
start_epoch: start epoch numer begin update mask
end_epoch: end epoch number stop update mask, you should make sure start_epoch <= end_epoch
frequency: if you want update every 2 epoch, you can set it 2
"""
super().__init__(config_list)
self.mask_list = {}
self.now_epoch = 1
def calc_mask(self, weight, config, op_name, **kwargs):
mask = self.mask_list.get(op_name, torch.ones(weight.shape))
target_sparsity = self.compute_target_sparsity(config)
k = int(weight.numel() * target_sparsity)
if k == 0 or target_sparsity >= 1 or target_sparsity <= 0:
return mask
# if we want to generate new mask, we should update weigth first
w_abs = weight.abs()*mask
threshold = torch.topk(w_abs.view(-1), k, largest = False).values.max()
new_mask = torch.gt(w_abs, threshold).type(weight.type())
self.mask_list[op_name] = new_mask
return new_mask
def compute_target_sparsity(self, config):
end_epoch = config.get('end_epoch', 1)
start_epoch = config.get('start_epoch', 1)
freq = config.get('frequency', 1)
final_sparsity = config.get('final_sparsity', 0)
initial_sparsity = config.get('initial_sparsity', 0)
if end_epoch <= start_epoch or initial_sparsity >= final_sparsity:
logger.warning('your end epoch <= start epoch or initial_sparsity >= final_sparsity')
return final_sparsity
if end_epoch <= self.now_epoch:
return final_sparsity
span = ((end_epoch - start_epoch-1)//freq)*freq
assert span > 0
target_sparsity = (final_sparsity +
(initial_sparsity - final_sparsity)*
(1.0 - ((self.now_epoch - start_epoch)/span))**3)
return target_sparsity
def update_epoch(self, epoch):
if epoch > 0:
self.now_epoch = epoch
class SensitivityPruner(Pruner):
"""
Use algorithm from "Learning both Weights and Connections for Efficient Neural Networks"
https://arxiv.org/pdf/1506.02626v3.pdf
I.e.: "The pruning threshold is chosen as a quality parameter multiplied
by the standard deviation of a layers weights."
"""
def __init__(self, config_list):
"""
configure Args:
sparsity: chosen pruning sparsity
"""
super().__init__(config_list)
self.mask_list = {}
def calc_mask(self, weight, config, op_name, **kwargs):
mask = self.mask_list.get(op_name, torch.ones(weight.shape))
# if we want to generate new mask, we should update weigth first
weight = weight*mask
target_sparsity = config['sparsity'] * torch.std(weight).item()
k = int(weight.numel() * target_sparsity)
if k == 0:
return mask
w_abs = weight.abs()
threshold = torch.topk(w_abs.view(-1), k, largest = False).values.max()
new_mask = torch.gt(w_abs, threshold).type(weight.type())
self.mask_list[op_name] = new_mask
return new_mask
import logging
import torch
from .compressor import Quantizer
__all__ = [ 'NaiveQuantizer', 'QAT_Quantizer', 'DoReFaQuantizer' ]
logger = logging.getLogger(__name__)
class NaiveQuantizer(Quantizer):
"""
quantize weight to 8 bits
"""
def __init__(self, config_list):
super().__init__(config_list)
self.layer_scale = {}
def quantize_weight(self, weight, config, op_name, **kwargs):
new_scale = weight.abs().max() / 127
scale = max(self.layer_scale.get(op_name, 0), new_scale)
self.layer_scale[op_name] = scale
orig_type = weight.type() # TODO: user layer
return weight.div(scale).type(torch.int8).type(orig_type).mul(scale)
class QAT_Quantizer(Quantizer):
"""
Quantizer using the DoReFa scheme, as defined in:
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf
"""
def __init__(self, config_list):
"""
Configure Args:
q_bits
"""
super().__init__(config_list)
def quantize_weight(self, weight, config, **kwargs):
if config['q_bits'] <= 1:
return weight
a = torch.min(weight)
b = torch.max(weight)
n = pow(2, config['q_bits'])
scale = (b-a)/(n-1)
zero_point = a
out = torch.round((weight - zero_point)/scale)
out = out*scale + zero_point
orig_type = weight.dtype
return out.type(orig_type)
class DoReFaQuantizer(Quantizer):
"""
Quantizer using the DoReFa scheme, as defined in:
Zhou et al., DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
(https://arxiv.org/abs/1606.06160)
"""
def __init__(self, config_list):
"""
configure Args:
q_bits
"""
super().__init__(config_list)
def quantize_weight(self, weight, config, **kwargs):
out = weight.tanh()
out = out /( 2 * out.abs().max()) + 0.5
out = self.quantize(out, config['q_bits'])
out = 2 * out -1
return out
def quantize(self, input_ri, q_bits):
scale = pow(2, q_bits)-1
output = torch.round(input_ri*scale)/scale
return output
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment