Unverified Commit 6e09c2c1 authored by J-shang's avatar J-shang Committed by GitHub
Browse files

[Doc] update compression tutorials (#4646)

parent a4d8a4ea
...@@ -15,7 +15,7 @@ logger = logging.getLogger(__name__) ...@@ -15,7 +15,7 @@ logger = logging.getLogger(__name__)
class DoReFaQuantizer(Quantizer): class DoReFaQuantizer(Quantizer):
r""" r"""
Quantizer using the DoReFa scheme, as defined in: Quantizer using the DoReFa scheme, as defined in:
`DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients <https://arxiv.org/abs/1606.06160>`__\ , `DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients <https://arxiv.org/abs/1606.06160>`__,
authors Shuchang Zhou and Yuxin Wu provide an algorithm named DoReFa to quantize the weight, activation and gradients with training. authors Shuchang Zhou and Yuxin Wu provide an algorithm named DoReFa to quantize the weight, activation and gradients with training.
Parameters Parameters
......
...@@ -109,8 +109,8 @@ def update_ema(biased_ema, value, decay): ...@@ -109,8 +109,8 @@ def update_ema(biased_ema, value, decay):
class QAT_Quantizer(Quantizer): class QAT_Quantizer(Quantizer):
r""" r"""
Quantizer defined in: Quantizer defined in:
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference `Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf <http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf>`__
Authors Benoit Jacob and Skirmantas Kligys provide an algorithm to quantize the model with training. Authors Benoit Jacob and Skirmantas Kligys provide an algorithm to quantize the model with training.
...@@ -123,10 +123,11 @@ class QAT_Quantizer(Quantizer): ...@@ -123,10 +123,11 @@ class QAT_Quantizer(Quantizer):
by implementing in floating-point arithmetic the rounding behavior of the quantization scheme: by implementing in floating-point arithmetic the rounding behavior of the quantization scheme:
* Weights are quantized before they are convolved with the input. If batch normalization (see [17]) is used for the layer, * Weights are quantized before they are convolved with the input. If batch normalization (see [17]) is used for the layer,
the batch normalization parameters are “folded into” the weights before quantization. the batch normalization parameters are “folded into” the weights before quantization.
* Activations are quantized at points where they would be during inference, * Activations are quantized at points where they would be during inference,
e.g. after the activation function is applied to a convolutional or fully connected layer’s output, e.g. after the activation function is applied to a convolutional or fully connected layer’s output,
or after a bypass connection adds or concatenates the outputs of several layers together such as in ResNets. or after a bypass connection adds or concatenates the outputs of several layers together such as in ResNets.
Parameters Parameters
---------- ----------
...@@ -184,7 +185,7 @@ class QAT_Quantizer(Quantizer): ...@@ -184,7 +185,7 @@ class QAT_Quantizer(Quantizer):
dummy_input = torch.randn(1, 1, 28, 28) dummy_input = torch.randn(1, 1, 28, 28)
# pass the dummy_input to the quantizer # pass the dummy_input to the quantizer
quantizer = QAT_Quantizer(model, config_list, dummy_input=dummy_input) quantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input=dummy_input)
The quantizer will automatically detect Conv-BN patterns and simulate batch normalization folding process in the training The quantizer will automatically detect Conv-BN patterns and simulate batch normalization folding process in the training
......
...@@ -148,18 +148,18 @@ class LevelPruner(BasicPruner): ...@@ -148,18 +148,18 @@ class LevelPruner(BasicPruner):
operation an example, weight tensor will be split into sub block whose shape is aligned to operation an example, weight tensor will be split into sub block whose shape is aligned to
balance_gran. Then finegrained pruning will be applied internal of sub block. This sparsity balance_gran. Then finegrained pruning will be applied internal of sub block. This sparsity
pattern have more chance to achieve better trade-off between model performance and hardware pattern have more chance to achieve better trade-off between model performance and hardware
acceleration. Please refer to releated paper for further information 'Balanced Sparsity for acceleration. Please refer to releated paper for further information `Balanced Sparsity for
Efficient DNN Inference on GPU'(https://arxiv.org/pdf/1811.00206.pdf). Efficient DNN Inference on GPU <https://arxiv.org/pdf/1811.00206.pdf>`__.
balance_gran : list balance_gran : list
Balance_gran is for special sparse pattern balanced sparsity, Default value is None which means pruning Balance_gran is for special sparse pattern balanced sparsity, Default value is None which means pruning
without awaring balance, namely normal finegrained pruning. without awaring balance, namely normal finegrained pruning.
If passing list of int, LevelPruner will prune the model in the granularity of multi-dimension block. If passing list of int, LevelPruner will prune the model in the granularity of multi-dimension block.
Attention that the length of balance_gran should be smaller than tensor dimension. Attention that the length of balance_gran should be smaller than tensor dimension.
For instance, in Linear operation, length of balance_gran should be equal or smaller than two since For instance, in Linear operation, length of balance_gran should be equal or smaller than two since
dimension of pruning weight is two. If setting balbance_gran = [5, 5], sparsity = 0.6, pruner will dimension of pruning weight is two. If setting balbance_gran = [5, 5], sparsity = 0.6, pruner will
divide pruning parameters into multiple block with tile size (5,5) and each bank has 5 * 5 values divide pruning parameters into multiple block with tile size (5,5) and each bank has 5 * 5 values
and 10 values would be kept after pruning. Finegrained pruning is applied in the granularity of block and 10 values would be kept after pruning. Finegrained pruning is applied in the granularity of block
so that each block will kept same number of non-zero values after pruning. Such pruning method "balance" so that each block will kept same number of non-zero values after pruning. Such pruning method "balance"
the non-zero value in tensor which create chance for better hardware acceleration. the non-zero value in tensor which create chance for better hardware acceleration.
Note: If length of given balance_gran smaller than length of pruning tensor shape, it will be made up Note: If length of given balance_gran smaller than length of pruning tensor shape, it will be made up
...@@ -290,7 +290,7 @@ class L1NormPruner(NormPruner): ...@@ -290,7 +290,7 @@ class L1NormPruner(NormPruner):
i.e., compute the l1 norm of the filters in convolution layer as metric values, i.e., compute the l1 norm of the filters in convolution layer as metric values,
compute the l1 norm of the weight by rows in linear layer as metric values. compute the l1 norm of the weight by rows in linear layer as metric values.
For more details, please refer to `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__\. For more details, please refer to `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__.
In addition, L1 norm pruner also supports dependency-aware mode. In addition, L1 norm pruner also supports dependency-aware mode.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment