"\n# Pruning Quickstart\n\nModel pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.\nIt usually has following paths:\n\n#. Pre-training a model -> Pruning the model -> Fine-tuning the model\n#. Pruning the model aware training -> Fine-tuning the model\n#. Pruning the model -> Pre-training the compact model\n\nNNI supports the above three modes and mainly focuses on the pruning stage.\nFollow this tutorial for a quick look at how to use NNI to prune a model in a common practice.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparation\n\nIn this tutorial, we use a simple model and pre-train on MNIST dataset.\nIf you are familiar with defining a model and training in pytorch, you can skip directly to `Pruning Model`_.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import torch\nimport torch.nn.functional as F\nfrom torch.optim import SGD\n\nfrom scripts.compression_mnist_model import TorchModel, trainer, evaluator, device\n\n# define the model\nmodel = TorchModel().to(device)\n\n# define the optimizer and criterion for pre-training\n\noptimizer = SGD(model.parameters(), 1e-2)\ncriterion = F.nll_loss\n\n# pre-train and evaluate the model on MNIST dataset\nfor epoch in range(3):\n trainer(model, optimizer, criterion)\n evaluator(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pruning Model\n\nUsing L1NormPruner pruning the model and generating the masks.\nUsually, pruners require original model and ``config_list`` as parameters.\nDetailed about how to write ``config_list`` please refer ...\n\nThis `config_list` means all layers whose type is `Linear` or `Conv2d` will be pruned,\nexcept the layer named `fc3`, because `fc3` is `exclude`.\nThe final sparsity ratio for each layer is 50%. The layer named `fc3` will not be pruned.\n\n"
"Pruners usually require `model` and `config_list` as input arguments.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"from nni.algorithms.compression.v2.pytorch.pruning import L1NormPruner\n\npruner = L1NormPruner(model, config_list)\n# show the wrapped model structure\nprint(model)\n# compress the model and generate the masks\n_, masks = pruner.compress()\n# show the masks sparsity\nfor name, mask in masks.items():\n print(name, ' sparsity: ', '{:.2}'.format(mask['weight'].sum() / mask['weight'].numel()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Speed up the original model with masks, note that `ModelSpeedup` requires an unwrapped model.\nThe model becomes smaller after speed-up,\nand reaches a higher sparsity ratio because `ModelSpeedup` will propagate the masks across layers.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# need to unwrap the model, if the model is wrapped before speed up\npruner._unwrap_model()\n\n# speed up the model\nfrom nni.compression.pytorch.speedup import ModelSpeedup\n\nModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"the model will become real smaller after speed up\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fine-tuning Compacted Model\nNote that if the model has been sped up, you need to re-initialize a new optimizer for fine-tuning.\nBecause speed up will replace the masked big layers with dense small ones.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"optimizer = SGD(model.parameters(), 1e-2)\nfor epoch in range(3):\n trainer(model, optimizer, criterion)"
/home/ningshang/nni/nni/compression/pytorch/speedup/infer_mask.py:262:UserWarning:The.gradattributeofaTensorthatisnotaleafTensorisbeingaccessed.Its.gradattributewon't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
if isinstance(self.output, torch.Tensor) and self.output.grad is not None:
/home/ningshang/nni/nni/compression/pytorch/speedup/compressor.py:282: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won'tbepopulatedduringautograd.backward().Ifyouindeedwantthegradientforanon-leafTensor,use.retain_grad()onthenon-leafTensor.Ifyouaccessthenon-leafTensorbymistake,makesureyouaccesstheleafTensorinstead.Seegithub.com/pytorch/pytorch/pull/30531formoreinformations.
"\n# Speed Up Model with Mask\n\n## Introduction\n\nPruning algorithms usually use weight masks to simulate the real pruning. Masks can be used\nto check model performance of a specific pruning (or sparsity), but there is no real speedup.\nSince model speedup is the ultimate goal of model pruning, we try to provide a tool to users\nto convert a model to a smaller one based on user provided masks (the masks come from the\npruning algorithms).\n\nThere are two types of pruning. One is fine-grained pruning, it does not change the shape of weights,\nand input/output tensors. Sparse kernel is required to speed up a fine-grained pruned layer.\nThe other is coarse-grained pruning (e.g., channels), shape of weights and input/output tensors usually change due to such pruning.\nTo speed up this kind of pruning, there is no need to use sparse kernel, just replace the pruned layer with smaller one.\nSince the support of sparse kernels in community is limited,\nwe only support the speedup of coarse-grained pruning and leave the support of fine-grained pruning in future.\n\n## Design and Implementation\n\nTo speed up a model, the pruned layers should be replaced, either replaced with smaller layer for coarse-grained mask,\nor replaced with sparse kernel for fine-grained mask. Coarse-grained mask usually changes the shape of weights or input/output tensors,\nthus, we should do shape inference to check are there other unpruned layers should be replaced as well due to shape change.\nTherefore, in our design, there are two main steps: first, do shape inference to find out all the modules that should be replaced;\nsecond, replace the modules.\n\nThe first step requires topology (i.e., connections) of the model, we use ``jit.trace`` to obtain the model graph for PyTorch.\nThe new shape of module is auto-inference by NNI, the unchanged parts of outputs during forward and inputs during backward are prepared for reduct.\nFor each type of module, we should prepare a function for module replacement.\nThe module replacement function returns a newly created module which is smaller.\n\n## Usage\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Generate a mask for the model at first.\nWe usually use a NNI pruner to generate the masks then use ``ModelSpeedup`` to compact the model.\nBut in fact ``ModelSpeedup`` is a relatively independent tool, so you can use it independently.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import torch\nfrom scripts.compression_mnist_model import TorchModel, device\n\nmodel = TorchModel().to(device)\n# masks = {layer_name: {'weight': weight_mask, 'bias': bias_mask}}\nconv1_mask = torch.ones_like(model.conv1.weight.data)\n# mask the first three output channels in conv1\nconv1_mask[0: 3] = 0\nmasks = {'conv1': {'weight': conv1_mask}}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Show the original model structure.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"print(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Roughly test the original model inference speed.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import time\nstart = time.time()\nmodel(torch.rand(128, 1, 28, 28).to(device))\nprint('Original Model - Elapsed Time : ', time.time() - start)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Speed up the model and show the model structure after speed up.\n\n"
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) infer module masks...
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for conv1
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::relu.5
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::max_pool2d.6
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for conv2
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::relu.7
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::max_pool2d.8
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::flatten.9
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for fc1
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::relu.10
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for fc2
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::relu.11
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for fc3
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::log_softmax.12
[2022-02-28 13:29:56] ERROR (nni.compression.pytorch.speedup.jit_translate/MainThread) aten::log_softmax is not Supported! Please report an issue at https://github.com/microsoft/nni. Thanks~
[2022-02-28 13:29:56] WARNING (nni.compression.pytorch.speedup.compressor/MainThread) Note: .aten::log_softmax.12 does not have corresponding mask inference object
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the fc3
/home/ningshang/anaconda3/envs/nni-dev/lib/python3.8/site-packages/torch/_tensor.py:1013: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:417.)
return self._grad
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the .aten::relu.11
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the fc2
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the .aten::relu.10
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the fc1
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the .aten::flatten.9
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the .aten::max_pool2d.8
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the .aten::relu.7
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the conv2
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the .aten::max_pool2d.6
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the .aten::relu.5
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the conv1
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) resolve the mask conflict
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace compressed modules...
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: conv1, op_type: Conv2d)
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Warning: cannot replace (name: .aten::relu.5, op_type: aten::relu) which is func type
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Warning: cannot replace (name: .aten::max_pool2d.6, op_type: aten::max_pool2d) which is func type
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: conv2, op_type: Conv2d)
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Warning: cannot replace (name: .aten::relu.7, op_type: aten::relu) which is func type
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Warning: cannot replace (name: .aten::max_pool2d.8, op_type: aten::max_pool2d) which is func type
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Warning: cannot replace (name: .aten::flatten.9, op_type: aten::flatten) which is func type
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: fc1, op_type: Linear)
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compress_modules/MainThread) replace linear with new in_features: 256, out_features: 120
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Warning: cannot replace (name: .aten::relu.10, op_type: aten::relu) which is func type
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: fc2, op_type: Linear)
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compress_modules/MainThread) replace linear with new in_features: 120, out_features: 84
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Warning: cannot replace (name: .aten::relu.11, op_type: aten::relu) which is func type
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: fc3, op_type: Linear)
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compress_modules/MainThread) replace linear with new in_features: 84, out_features: 10
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Warning: cannot replace (name: .aten::log_softmax.12, op_type: aten::log_softmax) which is func type
[2022-02-28 13:29:56] INFO (nni.compression.pytorch.speedup.compressor/MainThread) speedup done
"\n# Quantization Quickstart\n\nQuantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.\n\nIn NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.\nHere we use `QAT_Quantizer` as an example to show the usage of quantization in NNI.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparation\n\nIn this tutorial, we use a simple model and pre-train on MNIST dataset.\nIf you are familiar with defining a model and training in pytorch, you can skip directly to `Quantizing Model`_.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import torch\nimport torch.nn.functional as F\nfrom torch.optim import SGD\n\nfrom scripts.compression_mnist_model import TorchModel, trainer, evaluator, device\n\n# define the model\nmodel = TorchModel().to(device)\n\n# define the optimizer and criterion for pre-training\n\noptimizer = SGD(model.parameters(), 1e-2)\ncriterion = F.nll_loss\n\n# pre-train and evaluate the model on MNIST dataset\nfor epoch in range(3):\n trainer(model, optimizer, criterion)\n evaluator(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Quantizing Model\n\nInitialize a `config_list`.\n\n"