Add quantized model export description (#3192)

cc58a81d · lin bin · GitHub · 1e439e45 · cc58a81d
Unverified Commit cc58a81d authored Dec 25, 2020 by lin bin Committed by GitHub Dec 25, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 42 additions and 3 deletions

docs/en_US/Compression/QuickStart.rst docs/en_US/Compression/QuickStart.rst +42 -3

No files found.
--- a/docs/en_US/Compression/QuickStart.rst
+++ b/docs/en_US/Compression/QuickStart.rst
@@ -194,10 +194,10 @@ Some compression algorithms use epochs to control the progress of compression (e
 ``update_epoch`` should be invoked in every epoch, while ``step`` should be invoked after each minibatch. Note that most algorithms do not require calling the two APIs. Please refer to each algorithm's document for details. For the algorithms that do not need them, calling them is allowed but has no effect.
-Export Compressed Model
+Export Pruned Model
-^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^
-You can easily export the compressed model using the following API if you are pruning your model, ``state_dict`` of the sparse model weights will be stored in ``model.pth``\ , which can be loaded by ``torch.load('model.pth')``. In this exported ``model.pth``\ , the masked weights are zero.
+You can easily export the pruned model using the following API if you are pruning your model, ``state_dict`` of the sparse model weights will be stored in ``model.pth``\ , which can be loaded by ``torch.load('model.pth')``. In this exported ``model.pth``\ , the masked weights are zero.
 .. code-block:: bash
@@ -209,4 +209,43 @@ You can easily export the compressed model using the following API if you are pr
   pruner.export_model(model_path='model.pth', mask_path='mask.pth', onnx_path='model.onnx', input_shape=[1, 1, 28, 28])
+Export Quantized Model
+^^^^^^^^^^^^^^^^^^^^^^
+You can export the quantized model directly by using ``torch.save`` api and the quantized model can be loaded by ``torch.load`` without any extra modification. The following example shows the normal procedure of saving, loading quantized model and get related parameters in QAT.
+.. code-block:: python
+   # Init model and quantize it by using NNI QAT
+   model = Mnist()
+   configure_list = [...]
+   optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
+   quantizer = QAT_Quantizer(model, configure_list, optimizer)
+   quantizer.compress()
+   model.to(device)
+   # Quantize aware training
+   for epoch in range(40):
+        print('# Epoch {} #'.format(epoch))
+        train(model, quantizer, device, train_loader, optimizer)
+   # Save quantized model which is generated by using NNI QAT algorithm
+   torch.save(model.state_dict(), "quantized_model.pkt")
+   # Simulate model loading procedure
+   # Have to init new model and compress it before loading
+   qmodel_load = Mnist()
+   optimizer = torch.optim.SGD(qmodel_load.parameters(), lr=0.01, momentum=0.5)
+   quantizer = QAT_Quantizer(qmodel_load, configure_list, optimizer)
+   quantizer.compress()
+   # Load quantized model
+   qmodel_load.load_state_dict(torch.load("quantized_model.pkt"))
+   # Get scale, zero_point and weight of conv1 in loaded model
+   conv1 = qmodel_load.conv1
+   scale = conv1.module.scale
+   zero_point = conv1.module.zero_point
+   weight = conv1.module.weight
 If you want to really speed up the compressed model, please refer to `NNI model speedup <./ModelSpeedup.rst>`__ for details.