update doc for dtype & scheme customization (#4283)

b7d5aac7 · chenbohua3 · GitHub · fa339ca3 · b7d5aac7 · b7d5aac7
Unverified Commit b7d5aac7 authored Nov 03, 2021 by chenbohua3 Committed by GitHub Nov 03, 2021
3 changed files
--- a/docs/en_US/Compression/Quantizer.rst
+++ b/docs/en_US/Compression/Quantizer.rst
@@ -102,6 +102,64 @@ The quantizer will automatically detect Conv-BN patterns and simulate batch norm
 graph. Note that when the quantization aware training process is finished, the folded weight/bias would be restored after calling
 `quantizer.export_model`.
+Quantization dtype and scheme customization
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Different backends on different devices use different quantization strategies (i.e. dtype (int or uint) and
+scheme (per-tensor or per-channel and symmetric or affine)). QAT quantizer supports customization of mainstream dtypes and schemes.
+There are two ways to set them. One way is setting them globally through a function named `set_quant_scheme_dtype` like:
+.. code-block:: python
+    from nni.compression.pytorch.quantization.settings import set_quant_scheme_dtype
+    # This will set all the quantization of 'input' in 'per_tensor_affine' and 'uint' manner
+    set_quant_scheme_dtype('input', 'per_tensor_affine', 'uint)
+    # This will set all the quantization of 'output' in 'per_tensor_symmetric' and 'int' manner
+    set_quant_scheme_dtype('output', 'per_tensor_symmetric', 'int')
+    # This will set all the quantization of 'weight' in 'per_channel_symmetric' and 'int' manner
+    set_quant_scheme_dtype('weight', 'per_channel_symmetric', 'int')
+The other way is more detailed. You can customize the dtype and scheme in each quantization config list like:
+.. code-block:: python
+    config_list = [{
+       'quant_types': ['weight'],
+       'quant_bits':  8,
+       'op_types':['Conv2d', 'Linear'],
+       'quant_dtype': 'int',
+       'quant_scheme': 'per_channel_symmetric'
+   }, {
+       'quant_types': ['output'],
+       'quant_bits': 8,
+       'quant_start_step': 7000,
+       'op_types':['ReLU6'],
+       'quant_dtype': 'uint',
+       'quant_scheme': 'per_tensor_affine'
+   }]
+Multi-GPU training
+^^^^^^^^^^^^^^^^^^^
+QAT quantizer natively supports multi-gpu training (DataParallel and DistributedDataParallel). Note that the quantizer
+instantiation should happen before you wrap your model with DataParallel or DistributedDataParallel. For example:
+.. code-block:: python
+    from torch.nn.parallel import DistributedDataParallel as DDP
+    from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer
+    model = define_your_model()
+    model = QAT_Quantizer(model, **other_params)  # <--- QAT_Quantizer instantiation
+    model = DDP(model)
+    for i in range(epochs):
+        train(model)
+        eval(model)
 ----
 LSQ Quantizer

--- a/docs/en_US/Compression/QuickStart.rst
+++ b/docs/en_US/Compression/QuickStart.rst
@@ -85,12 +85,16 @@ Step1. Write configuration
       'quant_bits': {
           'weight': 8,
       }, # you can just use `int` here because all `quan_types` share same bits length, see config for `ReLu6` below.
-       'op_types':['Conv2d', 'Linear']
+       'op_types':['Conv2d', 'Linear'],
+       'quant_dtype': 'int',
+       'quant_scheme': 'per_channel_symmetric'
   }, {
       'quant_types': ['output'],
       'quant_bits': 8,
       'quant_start_step': 7000,
-       'op_types':['ReLU6']
+       'op_types':['ReLU6'],
+       'quant_dtype': 'uint',
+       'quant_scheme': 'per_tensor_affine'
   }]
 The specification of configuration can be found `here <./Tutorial.rst#quantization-specific-keys>`__.

--- a/docs/en_US/Compression/Tutorial.rst
+++ b/docs/en_US/Compression/Tutorial.rst
@@ -66,7 +66,7 @@ to the weight parameter of modules. 'input' means applying quantization operatio
 bits length of quantization, key is the quantization type, value is the quantization bits length, eg. 
-.. code-block:: bash
+.. code-block:: python
   {
      quant_bits: {
@@ -77,36 +77,102 @@ bits length of quantization, key is the quantization type, value is the quantiza
 when the value is int type, all quantization types share same bits length. eg. 
-.. code-block:: bash
+.. code-block:: python
   {
      quant_bits: 8, # weight or output quantization are all 8 bits
   }
+* **quant_dtype** : str or dict of {str : str}
+quantization dtype, used to determine the range of quantized value. Two choices can be used:
+- int: the range is singed
+- uint: the range is unsigned
+Two ways to set it. One is that the key is the quantization type, and the value is the quantization dtype, eg.
+.. code-block:: python
+   {
+      quant_dtype: {
+         'weight': 'int',
+         'output': 'uint,
+         },
+   }
+The other is that the value is str type, and all quantization types share the same dtype. eg.
+.. code-block:: python
+   {
+      'quant_dtype': 'int', # the dtype of weight and output quantization are all 'int'
+   }
+There are totally two kinds of `quant_dtype` you can set, they are 'int' and 'uint'.
+* **quant_scheme** : str or dict of {str : str}
+quantization scheme, used to determine the quantization manners. Four choices can used:
+- per_tensor_affine: per tensor, asymmetric quantization
+- per_tensor_symmetric: per tensor, symmetric quantization
+- per_channel_affine: per channel, asymmetric quantization
+- per_channel_symmetric: per channel, symmetric quantization
+Two ways to set it. One is that the key is the quantization type, value is the quantization scheme, eg.
+.. code-block:: python
+   {
+      quant_scheme: {
+         'weight': 'per_channel_symmetric',
+         'output': 'per_tensor_affine',
+         },
+   }
+The other is that the value is str type, all quantization types share the same quant_scheme. eg.
+.. code-block:: python
+   {
+      quant_scheme: 'per_channel_symmetric', # the quant_scheme of weight and output quantization are all 'per_channel_symmetric'
+   }
+There are totally four kinds of `quant_scheme` you can set, they are 'per_tensor_affine', 'per_tensor_symmetric', 'per_channel_affine' and 'per_channel_symmetric'.
 The following example shows a more complete ``config_list``\ , it uses ``op_names`` (or ``op_types``\ ) to specify the target layers along with the quantization bits for those layers.
-.. code-block:: bash
+.. code-block:: python
   config_list = [{
      'quant_types': ['weight'],
      'quant_bits': 8,
-      'op_names': ['conv1']
+      'op_names': ['conv1'],
+      'quant_dtype': 'int',
+      'quant_scheme': 'per_channel_symmetric'
   },
   {
      'quant_types': ['weight'],
      'quant_bits': 4,
      'quant_start_step': 0,
-      'op_names': ['conv2']
+      'op_names': ['conv2'],
+      'quant_dtype': 'int',
+      'quant_scheme': 'per_tensor_symmetric'
   },
   {
      'quant_types': ['weight'],
      'quant_bits': 3,
-      'op_names': ['fc1']
+      'op_names': ['fc1'],
+      'quant_dtype': 'int',
+      'quant_scheme': 'per_tensor_symmetric'
   },
   {
      'quant_types': ['weight'],
      'quant_bits': 2,
-      'op_names': ['fc2']
+      'op_names': ['fc2'],
+      'quant_dtype': 'int',
+      'quant_scheme': 'per_channel_symmetric'
   }]
 In this example, 'op_names' is the name of layer and four layers will be quantized to different quant_bits.