"git@developer.sourcefind.cn:gaoqiong/composable_kernel.git" did not exist on "d5ee4350a4ad44996efa0b29ccc8b9c9d8f8bf8b"
Unverified Commit cc58a81d authored by lin bin's avatar lin bin Committed by GitHub
Browse files

Add quantized model export description (#3192)

parent 1e439e45
...@@ -194,10 +194,10 @@ Some compression algorithms use epochs to control the progress of compression (e ...@@ -194,10 +194,10 @@ Some compression algorithms use epochs to control the progress of compression (e
``update_epoch`` should be invoked in every epoch, while ``step`` should be invoked after each minibatch. Note that most algorithms do not require calling the two APIs. Please refer to each algorithm's document for details. For the algorithms that do not need them, calling them is allowed but has no effect. ``update_epoch`` should be invoked in every epoch, while ``step`` should be invoked after each minibatch. Note that most algorithms do not require calling the two APIs. Please refer to each algorithm's document for details. For the algorithms that do not need them, calling them is allowed but has no effect.
Export Compressed Model Export Pruned Model
^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^
You can easily export the compressed model using the following API if you are pruning your model, ``state_dict`` of the sparse model weights will be stored in ``model.pth``\ , which can be loaded by ``torch.load('model.pth')``. In this exported ``model.pth``\ , the masked weights are zero. You can easily export the pruned model using the following API if you are pruning your model, ``state_dict`` of the sparse model weights will be stored in ``model.pth``\ , which can be loaded by ``torch.load('model.pth')``. In this exported ``model.pth``\ , the masked weights are zero.
.. code-block:: bash .. code-block:: bash
...@@ -209,4 +209,43 @@ You can easily export the compressed model using the following API if you are pr ...@@ -209,4 +209,43 @@ You can easily export the compressed model using the following API if you are pr
pruner.export_model(model_path='model.pth', mask_path='mask.pth', onnx_path='model.onnx', input_shape=[1, 1, 28, 28]) pruner.export_model(model_path='model.pth', mask_path='mask.pth', onnx_path='model.onnx', input_shape=[1, 1, 28, 28])
Export Quantized Model
^^^^^^^^^^^^^^^^^^^^^^
You can export the quantized model directly by using ``torch.save`` api and the quantized model can be loaded by ``torch.load`` without any extra modification. The following example shows the normal procedure of saving, loading quantized model and get related parameters in QAT.
.. code-block:: python
# Init model and quantize it by using NNI QAT
model = Mnist()
configure_list = [...]
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
quantizer = QAT_Quantizer(model, configure_list, optimizer)
quantizer.compress()
model.to(device)
# Quantize aware training
for epoch in range(40):
print('# Epoch {} #'.format(epoch))
train(model, quantizer, device, train_loader, optimizer)
# Save quantized model which is generated by using NNI QAT algorithm
torch.save(model.state_dict(), "quantized_model.pkt")
# Simulate model loading procedure
# Have to init new model and compress it before loading
qmodel_load = Mnist()
optimizer = torch.optim.SGD(qmodel_load.parameters(), lr=0.01, momentum=0.5)
quantizer = QAT_Quantizer(qmodel_load, configure_list, optimizer)
quantizer.compress()
# Load quantized model
qmodel_load.load_state_dict(torch.load("quantized_model.pkt"))
# Get scale, zero_point and weight of conv1 in loaded model
conv1 = qmodel_load.conv1
scale = conv1.module.scale
zero_point = conv1.module.zero_point
weight = conv1.module.weight
If you want to really speed up the compressed model, please refer to `NNI model speedup <./ModelSpeedup.rst>`__ for details. If you want to really speed up the compressed model, please refer to `NNI model speedup <./ModelSpeedup.rst>`__ for details.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment