quantization_quick_start_mnist.ipynb 5.38 KB
Newer Older
J-shang's avatar
J-shang committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
18
        "\n# Quantization Quickstart\n\nHere is a four-minute video to get you started with model quantization.\n\n..  youtube:: MSfV7AyfiA4\n    :align: center\n\nQuantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.\n\nIn NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.\nHere we use `QAT_Quantizer` as an example to show the usage of quantization in NNI.\n"
J-shang's avatar
J-shang committed
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Preparation\n\nIn this tutorial, we use a simple model and pre-train on MNIST dataset.\nIf you are familiar with defining a model and training in pytorch, you can skip directly to `Quantizing Model`_.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
36
        "import torch\nimport torch.nn.functional as F\nfrom torch.optim import SGD\n\nfrom scripts.compression_mnist_model import TorchModel, trainer, evaluator, device, test_trt\n\n# define the model\nmodel = TorchModel().to(device)\n\n# define the optimizer and criterion for pre-training\n\noptimizer = SGD(model.parameters(), 1e-2)\ncriterion = F.nll_loss\n\n# pre-train and evaluate the model on MNIST dataset\nfor epoch in range(3):\n    trainer(model, optimizer, criterion)\n    evaluator(model)"
J-shang's avatar
J-shang committed
37
38
39
40
41
42
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
43
        "## Quantizing Model\n\nInitialize a `config_list`.\nDetailed about how to write ``config_list`` please refer :doc:`compression config specification <../compression/compression_config_list>`.\n\n"
J-shang's avatar
J-shang committed
44
45
46
47
48
49
50
51
52
53
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
54
        "config_list = [{\n    'quant_types': ['input', 'weight'],\n    'quant_bits': {'input': 8, 'weight': 8},\n    'op_types': ['Conv2d']\n}, {\n    'quant_types': ['output'],\n    'quant_bits': {'output': 8},\n    'op_types': ['ReLU']\n}, {\n    'quant_types': ['input', 'weight'],\n    'quant_bits': {'input': 8, 'weight': 8},\n    'op_names': ['fc1', 'fc2']\n}]"
J-shang's avatar
J-shang committed
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "finetuning the model by using QAT\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
        "from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer\ndummy_input = torch.rand(32, 1, 28, 28).to(device)\nquantizer = QAT_Quantizer(model, config_list, optimizer, dummy_input)\nquantizer.compress()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The model has now been wrapped, and quantization targets ('quant_types' setting in `config_list`)\nwill be quantized & dequantized for simulated quantization in the wrapped layers.\nQAT is a training-aware quantizer, it will update scale and zero point during training.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "for epoch in range(3):\n    trainer(model, optimizer, criterion)\n    evaluator(model)"
J-shang's avatar
J-shang committed
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "export model and get calibration_config\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "model_path = \"./log/mnist_model.pth\"\ncalibration_path = \"./log/mnist_calibration.pth\"\ncalibration_config = quantizer.export_model(model_path, calibration_path)\n\nprint(\"calibration_config: \", calibration_config)"
      ]
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "build tensorRT engine to make a real speedup, for more information about speedup, please refer :doc:`quantization_speedup`.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from nni.compression.pytorch.quantization_speedup import ModelSpeedupTensorRT\ninput_shape = (32, 1, 28, 28)\nengine = ModelSpeedupTensorRT(model, input_shape, config=calibration_config, batchsize=32)\nengine.compress()\ntest_trt(engine)"
      ]
J-shang's avatar
J-shang committed
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
146
      "version": "3.8.8"
J-shang's avatar
J-shang committed
147
148
149
150
151
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}