Unverified Commit b7d5aac7 authored by chenbohua3's avatar chenbohua3 Committed by GitHub
Browse files

update doc for dtype & scheme customization (#4283)

parent fa339ca3
...@@ -102,6 +102,64 @@ The quantizer will automatically detect Conv-BN patterns and simulate batch norm ...@@ -102,6 +102,64 @@ The quantizer will automatically detect Conv-BN patterns and simulate batch norm
graph. Note that when the quantization aware training process is finished, the folded weight/bias would be restored after calling graph. Note that when the quantization aware training process is finished, the folded weight/bias would be restored after calling
`quantizer.export_model`. `quantizer.export_model`.
Quantization dtype and scheme customization
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Different backends on different devices use different quantization strategies (i.e. dtype (int or uint) and
scheme (per-tensor or per-channel and symmetric or affine)). QAT quantizer supports customization of mainstream dtypes and schemes.
There are two ways to set them. One way is setting them globally through a function named `set_quant_scheme_dtype` like:
.. code-block:: python
from nni.compression.pytorch.quantization.settings import set_quant_scheme_dtype
# This will set all the quantization of 'input' in 'per_tensor_affine' and 'uint' manner
set_quant_scheme_dtype('input', 'per_tensor_affine', 'uint)
# This will set all the quantization of 'output' in 'per_tensor_symmetric' and 'int' manner
set_quant_scheme_dtype('output', 'per_tensor_symmetric', 'int')
# This will set all the quantization of 'weight' in 'per_channel_symmetric' and 'int' manner
set_quant_scheme_dtype('weight', 'per_channel_symmetric', 'int')
The other way is more detailed. You can customize the dtype and scheme in each quantization config list like:
.. code-block:: python
config_list = [{
'quant_types': ['weight'],
'quant_bits': 8,
'op_types':['Conv2d', 'Linear'],
'quant_dtype': 'int',
'quant_scheme': 'per_channel_symmetric'
}, {
'quant_types': ['output'],
'quant_bits': 8,
'quant_start_step': 7000,
'op_types':['ReLU6'],
'quant_dtype': 'uint',
'quant_scheme': 'per_tensor_affine'
}]
Multi-GPU training
^^^^^^^^^^^^^^^^^^^
QAT quantizer natively supports multi-gpu training (DataParallel and DistributedDataParallel). Note that the quantizer
instantiation should happen before you wrap your model with DataParallel or DistributedDataParallel. For example:
.. code-block:: python
from torch.nn.parallel import DistributedDataParallel as DDP
from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer
model = define_your_model()
model = QAT_Quantizer(model, **other_params) # <--- QAT_Quantizer instantiation
model = DDP(model)
for i in range(epochs):
train(model)
eval(model)
---- ----
LSQ Quantizer LSQ Quantizer
......
...@@ -85,12 +85,16 @@ Step1. Write configuration ...@@ -85,12 +85,16 @@ Step1. Write configuration
'quant_bits': { 'quant_bits': {
'weight': 8, 'weight': 8,
}, # you can just use `int` here because all `quan_types` share same bits length, see config for `ReLu6` below. }, # you can just use `int` here because all `quan_types` share same bits length, see config for `ReLu6` below.
'op_types':['Conv2d', 'Linear'] 'op_types':['Conv2d', 'Linear'],
'quant_dtype': 'int',
'quant_scheme': 'per_channel_symmetric'
}, { }, {
'quant_types': ['output'], 'quant_types': ['output'],
'quant_bits': 8, 'quant_bits': 8,
'quant_start_step': 7000, 'quant_start_step': 7000,
'op_types':['ReLU6'] 'op_types':['ReLU6'],
'quant_dtype': 'uint',
'quant_scheme': 'per_tensor_affine'
}] }]
The specification of configuration can be found `here <./Tutorial.rst#quantization-specific-keys>`__. The specification of configuration can be found `here <./Tutorial.rst#quantization-specific-keys>`__.
......
...@@ -66,7 +66,7 @@ to the weight parameter of modules. 'input' means applying quantization operatio ...@@ -66,7 +66,7 @@ to the weight parameter of modules. 'input' means applying quantization operatio
bits length of quantization, key is the quantization type, value is the quantization bits length, eg. bits length of quantization, key is the quantization type, value is the quantization bits length, eg.
.. code-block:: bash .. code-block:: python
{ {
quant_bits: { quant_bits: {
...@@ -77,36 +77,102 @@ bits length of quantization, key is the quantization type, value is the quantiza ...@@ -77,36 +77,102 @@ bits length of quantization, key is the quantization type, value is the quantiza
when the value is int type, all quantization types share same bits length. eg. when the value is int type, all quantization types share same bits length. eg.
.. code-block:: bash .. code-block:: python
{ {
quant_bits: 8, # weight or output quantization are all 8 bits quant_bits: 8, # weight or output quantization are all 8 bits
} }
* **quant_dtype** : str or dict of {str : str}
quantization dtype, used to determine the range of quantized value. Two choices can be used:
- int: the range is singed
- uint: the range is unsigned
Two ways to set it. One is that the key is the quantization type, and the value is the quantization dtype, eg.
.. code-block:: python
{
quant_dtype: {
'weight': 'int',
'output': 'uint,
},
}
The other is that the value is str type, and all quantization types share the same dtype. eg.
.. code-block:: python
{
'quant_dtype': 'int', # the dtype of weight and output quantization are all 'int'
}
There are totally two kinds of `quant_dtype` you can set, they are 'int' and 'uint'.
* **quant_scheme** : str or dict of {str : str}
quantization scheme, used to determine the quantization manners. Four choices can used:
- per_tensor_affine: per tensor, asymmetric quantization
- per_tensor_symmetric: per tensor, symmetric quantization
- per_channel_affine: per channel, asymmetric quantization
- per_channel_symmetric: per channel, symmetric quantization
Two ways to set it. One is that the key is the quantization type, value is the quantization scheme, eg.
.. code-block:: python
{
quant_scheme: {
'weight': 'per_channel_symmetric',
'output': 'per_tensor_affine',
},
}
The other is that the value is str type, all quantization types share the same quant_scheme. eg.
.. code-block:: python
{
quant_scheme: 'per_channel_symmetric', # the quant_scheme of weight and output quantization are all 'per_channel_symmetric'
}
There are totally four kinds of `quant_scheme` you can set, they are 'per_tensor_affine', 'per_tensor_symmetric', 'per_channel_affine' and 'per_channel_symmetric'.
The following example shows a more complete ``config_list``\ , it uses ``op_names`` (or ``op_types``\ ) to specify the target layers along with the quantization bits for those layers. The following example shows a more complete ``config_list``\ , it uses ``op_names`` (or ``op_types``\ ) to specify the target layers along with the quantization bits for those layers.
.. code-block:: bash .. code-block:: python
config_list = [{ config_list = [{
'quant_types': ['weight'], 'quant_types': ['weight'],
'quant_bits': 8, 'quant_bits': 8,
'op_names': ['conv1'] 'op_names': ['conv1'],
}, 'quant_dtype': 'int',
'quant_scheme': 'per_channel_symmetric'
},
{ {
'quant_types': ['weight'], 'quant_types': ['weight'],
'quant_bits': 4, 'quant_bits': 4,
'quant_start_step': 0, 'quant_start_step': 0,
'op_names': ['conv2'] 'op_names': ['conv2'],
}, 'quant_dtype': 'int',
'quant_scheme': 'per_tensor_symmetric'
},
{ {
'quant_types': ['weight'], 'quant_types': ['weight'],
'quant_bits': 3, 'quant_bits': 3,
'op_names': ['fc1'] 'op_names': ['fc1'],
}, 'quant_dtype': 'int',
'quant_scheme': 'per_tensor_symmetric'
},
{ {
'quant_types': ['weight'], 'quant_types': ['weight'],
'quant_bits': 2, 'quant_bits': 2,
'op_names': ['fc2'] 'op_names': ['fc2'],
'quant_dtype': 'int',
'quant_scheme': 'per_channel_symmetric'
}] }]
In this example, 'op_names' is the name of layer and four layers will be quantized to different quant_bits. In this example, 'op_names' is the name of layer and four layers will be quantized to different quant_bits.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment