"src/targets/gpu/vscode:/vscode.git/clone" did not exist on "2f2757ac05545b6f90a085051ea0ef918fdf4fc9"
Unverified Commit a911b856 authored by Yuge Zhang's avatar Yuge Zhang Committed by GitHub
Browse files

Resolve conflicts for #4760 (#4762)

parent 14d2966b
.. 98b0285bbfe1a01c90b9ba6a9b0d6caa
快速入门
===========
.. code-block::
.. toctree::
:hidden:
Notebook Example <compression_pipeline_example>
模型压缩通常包括三个阶段:1)预训练模型,2)压缩模型,3)微调模型。 NNI 主要关注于第二阶段,并为模型压缩提供易于使用的 API。遵循本指南,您将快速了解如何使用 NNI 来压缩模型。更深入地了解 NNI 中的模型压缩模块,请查看 `Tutorial <./Tutorial.rst>`__
.. 提供了一个在 Jupyter notebook 中进行完整的模型压缩流程的 `示例 <./compression_pipeline_example.rst>`__,参考 :githublink:`代码 <examples/notebooks/compression_pipeline_example.ipynb>`
模型剪枝
-------------
这里通过 `level pruner <../Compression/Pruner.rst#level-pruner>`__ 举例说明 NNI 中模型剪枝的用法。
Step1. 编写配置
^^^^^^^^^^^^^^^^^^^^^^^^^^
编写配置来指定要剪枝的层。以下配置表示剪枝所有的 ``default`` 层,稀疏度设为 0.5,其它层保持不变。
.. code-block:: python
config_list = [{
'sparsity': 0.5,
'op_types': ['default'],
}]
配置说明在 `这里 <./Tutorial.rst#specify-the-configuration>`__。注意,不同的 Pruner 可能有自定义的配置字段。详情参考每个 Pruner `具体用法 <./Pruner.rst>`__,来调整相应的配置。
Step2. 选择 Pruner 来压缩模型
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
首先,使用模型来初始化 Pruner,并将配置作为参数传入,然后调用 ``compress()`` 来压缩模型。请注意,有些算法可能会检查训练过程中的梯度,因此我们可能会定义一组 trainer, optimizer, criterion 并传递给 Pruner
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import LevelPruner
pruner = LevelPruner(model, config_list)
model = pruner.compress()
然后,使用正常的训练方法来训练模型 (如,SGD),剪枝在训练过程中是透明的。有些 Pruner(如 L1FilterPrunerFPGMPruner)在开始时修剪一次,下面的训练可以看作是微调。有些 Pruner(例如AGPPruner)会迭代的对模型剪枝,在训练过程中逐步修改掩码。
如果使用 Pruner 进行迭代剪枝,或者剪枝过程中需要训练或者推理,则需要将 finetune 逻辑传到 Pruner 中。
例如:
.. code-block:: python
from nni.algorithms.compression.pytorch.pruning import AGPPruner
pruner = AGPPruner(model, config_list, optimizer, trainer, criterion, num_iterations=10, epochs_per_iteration=1, pruning_algorithm='level')
model = pruner.compress()
Step3. 导出压缩结果
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
训练之后,可将模型权重导出到文件,同时将生成的掩码也导出到文件, 也支持导出 ONNX 模型。
.. code-block:: python
pruner.export_model(model_path='pruned_vgg19_cifar10.pth', mask_path='mask_vgg19_cifar10.pth')
参考 :githublink:`mnist 示例 <examples/model_compress/pruning/naive_prune_torch.py>` 获取代码。
更多剪枝算法的示例在 :githublink:`basic_pruners_torch <examples/model_compress/pruning/basic_pruners_torch.py>` :githublink:`auto_pruners_torch <examples/model_compress/pruning/auto_pruners_torch.py>`
模型量化
------------------
这里通过 `QAT Quantizer <../Compression/Quantizer.rst#qat-quantizer>`__ 举例说明在 NNI 中量化的用法。
Step1. 编写配置
^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: python
config_list = [{
'quant_types': ['weight', 'input'],
'quant_bits': {
'weight': 8,
'input': 8,
}, # 这里可以仅使用 `int`,因为所有 `quan_types` 使用了一样的位长,参考下方 `ReLu6` 配置。
'op_types':['Conv2d', 'Linear'],
'quant_dtype': 'int',
'quant_scheme': 'per_channel_symmetric'
}, {
'quant_types': ['output'],
'quant_bits': 8,
'quant_start_step': 7000,
'op_types':['ReLU6'],
'quant_dtype': 'uint',
'quant_scheme': 'per_tensor_affine'
}]
配置说明在 `这里 <./Tutorial.rst#quantization-specific-keys>`__
Step2. 选择 Quantizer 来压缩模型
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: python
from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer
quantizer = QAT_Quantizer(model, config_list)
quantizer.compress()
Step3. 导出压缩结果
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
在训练和校准之后,你可以将模型权重导出到一个文件,并将生成的校准参数也导出到一个文件。 也支持导出 ONNX 模型。
.. code-block:: python
calibration_config = quantizer.export_model(model_path, calibration_path, onnx_path, input_shape, device)
参考 :githublink:`mnist example <examples/model_compress/quantization/QAT_torch_quantizer.py>` 获取示例代码。
恭喜! 您已经通过 NNI 压缩了您的第一个模型。 更深入地了解 NNI 中的模型压缩,请查看 `Tutorial <./Tutorial.rst>`__
\ No newline at end of file
Tutorial
========
.. contents::
In this tutorial, we will explain more detailed usage about the model compression in NNI.
Setup compression goal
----------------------
Specify the configuration
^^^^^^^^^^^^^^^^^^^^^^^^^
Users can specify the configuration (i.e., ``config_list``\ ) for a compression algorithm. For example, when compressing a model, users may want to specify the sparsity ratio, to specify different ratios for different types of operations, to exclude certain types of operations, or to compress only a certain types of operations. For users to express these kinds of requirements, we define a configuration specification. It can be seen as a python ``list`` object, where each element is a ``dict`` object.
The ``dict``\ s in the ``list`` are applied one by one, that is, the configurations in latter ``dict`` will overwrite the configurations in former ones on the operations that are within the scope of both of them.
There are different keys in a ``dict``. Some of them are common keys supported by all the compression algorithms:
* **op_types**\ : This is to specify what types of operations to be compressed. 'default' means following the algorithm's default setting. All suported module types are defined in :githublink:`default_layers.py <nni/compression/pytorch/default_layers.py>` for pytorch.
* **op_names**\ : This is to specify by name what operations to be compressed. If this field is omitted, operations will not be filtered by it.
* **exclude**\ : Default is False. If this field is True, it means the operations with specified types and names will be excluded from the compression.
Some other keys are often specific to a certain algorithm, users can refer to `pruning algorithms <./Pruner.rst>`__ and `quantization algorithms <./Quantizer.rst>`__ for the keys allowed by each algorithm.
To prune all ``Conv2d`` layers with the sparsity of 0.6, the configuration can be written as:
.. code-block:: python
[{
'sparsity': 0.6,
'op_types': ['Conv2d']
}]
To control the sparsity of specific layers, the configuration can be written as:
.. code-block:: python
[{
'sparsity': 0.8,
'op_types': ['default']
},
{
'sparsity': 0.6,
'op_names': ['op_name1', 'op_name2']
},
{
'exclude': True,
'op_names': ['op_name3']
}]
It means following the algorithm's default setting for compressed operations with sparsity 0.8, but for ``op_name1`` and ``op_name2`` use sparsity 0.6, and do not compress ``op_name3``.
Quantization specific keys
^^^^^^^^^^^^^^^^^^^^^^^^^^
Besides the keys explained above, if you use quantization algorithms you need to specify more keys in ``config_list``\ , which are explained below.
* **quant_types** : list of string.
Type of quantization you want to apply, currently support 'weight', 'input', 'output'. 'weight' means applying quantization operation
to the weight parameter of modules. 'input' means applying quantization operation to the input of module forward method. 'output' means applying quantization operation to the output of module forward method, which is often called as 'activation' in some papers.
* **quant_bits** : int or dict of {str : int}
bits length of quantization, key is the quantization type, value is the quantization bits length, eg.
.. code-block:: python
{
quant_bits: {
'weight': 8,
'output': 4,
},
}
when the value is int type, all quantization types share same bits length. eg.
.. code-block:: python
{
quant_bits: 8, # weight or output quantization are all 8 bits
}
* **quant_dtype** : str or dict of {str : str}
quantization dtype, used to determine the range of quantized value. Two choices can be used:
- int: the range is singed
- uint: the range is unsigned
Two ways to set it. One is that the key is the quantization type, and the value is the quantization dtype, eg.
.. code-block:: python
{
quant_dtype: {
'weight': 'int',
'output': 'uint,
},
}
The other is that the value is str type, and all quantization types share the same dtype. eg.
.. code-block:: python
{
'quant_dtype': 'int', # the dtype of weight and output quantization are all 'int'
}
There are totally two kinds of `quant_dtype` you can set, they are 'int' and 'uint'.
* **quant_scheme** : str or dict of {str : str}
quantization scheme, used to determine the quantization manners. Four choices can used:
- per_tensor_affine: per tensor, asymmetric quantization
- per_tensor_symmetric: per tensor, symmetric quantization
- per_channel_affine: per channel, asymmetric quantization
- per_channel_symmetric: per channel, symmetric quantization
Two ways to set it. One is that the key is the quantization type, value is the quantization scheme, eg.
.. code-block:: python
{
quant_scheme: {
'weight': 'per_channel_symmetric',
'output': 'per_tensor_affine',
},
}
The other is that the value is str type, all quantization types share the same quant_scheme. eg.
.. code-block:: python
{
quant_scheme: 'per_channel_symmetric', # the quant_scheme of weight and output quantization are all 'per_channel_symmetric'
}
There are totally four kinds of `quant_scheme` you can set, they are 'per_tensor_affine', 'per_tensor_symmetric', 'per_channel_affine' and 'per_channel_symmetric'.
The following example shows a more complete ``config_list``\ , it uses ``op_names`` (or ``op_types``\ ) to specify the target layers along with the quantization bits for those layers.
.. code-block:: python
config_list = [{
'quant_types': ['weight'],
'quant_bits': 8,
'op_names': ['conv1'],
'quant_dtype': 'int',
'quant_scheme': 'per_channel_symmetric'
},
{
'quant_types': ['weight'],
'quant_bits': 4,
'quant_start_step': 0,
'op_names': ['conv2'],
'quant_dtype': 'int',
'quant_scheme': 'per_tensor_symmetric'
},
{
'quant_types': ['weight'],
'quant_bits': 3,
'op_names': ['fc1'],
'quant_dtype': 'int',
'quant_scheme': 'per_tensor_symmetric'
},
{
'quant_types': ['weight'],
'quant_bits': 2,
'op_names': ['fc2'],
'quant_dtype': 'int',
'quant_scheme': 'per_channel_symmetric'
}]
In this example, 'op_names' is the name of layer and four layers will be quantized to different quant_bits.
Export compression result
-------------------------
Export the pruned model
^^^^^^^^^^^^^^^^^^^^^^^
You can easily export the pruned model using the following API if you are pruning your model, ``state_dict`` of the sparse model weights will be stored in ``model.pth``\ , which can be loaded by ``torch.load('model.pth')``. Note that, the exported ``model.pth``\ has the same parameters as the original model except the masked weights are zero. ``mask_dict`` stores the binary value that produced by the pruning algorithm, which can be further used to speed up the model.
.. code-block:: python
# export model weights and mask
pruner.export_model(model_path='model.pth', mask_path='mask.pth')
# apply mask to model
from nni.compression.pytorch import apply_compression_results
apply_compression_results(model, mask_file, device)
export model in ``onnx`` format(\ ``input_shape`` need to be specified):
.. code-block:: python
pruner.export_model(model_path='model.pth', mask_path='mask.pth', onnx_path='model.onnx', input_shape=[1, 1, 28, 28])
Export the quantized model
^^^^^^^^^^^^^^^^^^^^^^^^^^
You can export the quantized model directly by using ``torch.save`` api and the quantized model can be loaded by ``torch.load`` without any extra modification. The following example shows the normal procedure of saving, loading quantized model and get related parameters in QAT.
.. code-block:: python
# Save quantized model which is generated by using NNI QAT algorithm
torch.save(model.state_dict(), "quantized_model.pth")
# Simulate model loading procedure
# Have to init new model and compress it before loading
qmodel_load = Mnist()
optimizer = torch.optim.SGD(qmodel_load.parameters(), lr=0.01, momentum=0.5)
quantizer = QAT_Quantizer(qmodel_load, config_list, optimizer)
quantizer.compress()
# Load quantized model
qmodel_load.load_state_dict(torch.load("quantized_model.pth"))
# Get scale, zero_point and weight of conv1 in loaded model
conv1 = qmodel_load.conv1
scale = conv1.module.scale
zero_point = conv1.module.zero_point
weight = conv1.module.weight
Speed up the model
------------------
Masks do not provide real speedup of your model. The model should be speeded up based on the exported masks, thus, we provide an API to speed up your model as shown below. After invoking ``apply_compression_results`` on your model, your model becomes a smaller one with shorter inference latency.
.. code-block:: python
from nni.compression.pytorch import apply_compression_results, ModelSpeedup
dummy_input = torch.randn(config['input_shape']).to(device)
m_speedup = ModelSpeedup(model, dummy_input, masks_file, device)
m_speedup.speedup_model()
Please refer to `here <ModelSpeedup.rst>`__ for detailed description. The example code for model speedup can be found :githublink:`here <examples/model_compress/pruning/model_speedup.py>`
Control the Fine-tuning process
-------------------------------
Enhance the fine-tuning process
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Knowledge distillation effectively learns a small student model from a large teacher model. Users can enhance the fine-tuning process that utilize knowledge distillation to improve the performance of the compressed model. Example code can be found :githublink:`here <examples/model_compress/pruning/finetune_kd_torch.py>`
.. acd3f66ad7c2d82b950568efcba1f175
高级用法
==============
.. toctree::
:maxdepth: 2
框架 <./Framework>
自定义压缩算法 <./CustomizeCompressor>
自动模型压缩 (Beta) <./AutoCompression>
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. Prepare model"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"import torch.nn.functional as F\n",
"\n",
"class NaiveModel(torch.nn.Module):\n",
" def __init__(self):\n",
" super().__init__()\n",
" self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)\n",
" self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)\n",
" self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)\n",
" self.fc2 = torch.nn.Linear(500, 10)\n",
" self.relu1 = torch.nn.ReLU6()\n",
" self.relu2 = torch.nn.ReLU6()\n",
" self.relu3 = torch.nn.ReLU6()\n",
" self.max_pool1 = torch.nn.MaxPool2d(2, 2)\n",
" self.max_pool2 = torch.nn.MaxPool2d(2, 2)\n",
"\n",
" def forward(self, x):\n",
" x = self.relu1(self.conv1(x))\n",
" x = self.max_pool1(x)\n",
" x = self.relu2(self.conv2(x))\n",
" x = self.max_pool2(x)\n",
" x = x.view(-1, x.size()[1:].numel())\n",
" x = self.relu3(self.fc1(x))\n",
" x = self.fc2(x)\n",
" return F.log_softmax(x, dim=1)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# define model, optimizer, criterion, data_loader, trainer, evaluator.\n",
"\n",
"import torch.optim as optim\n",
"from torchvision import datasets, transforms\n",
"from torch.optim.lr_scheduler import StepLR\n",
"\n",
"device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"\n",
"model = NaiveModel().to(device)\n",
"\n",
"optimizer = optim.Adadelta(model.parameters(), lr=1)\n",
"\n",
"criterion = torch.nn.NLLLoss()\n",
"\n",
"transform=transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])\n",
"train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)\n",
"test_dataset = datasets.MNIST('./data', train=False, transform=transform)\n",
"train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64)\n",
"test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=1000)\n",
"\n",
"def trainer(model, optimizer, criterion, epoch):\n",
" model.train()\n",
" for batch_idx, (data, target) in enumerate(train_loader):\n",
" data, target = data.to(device), target.to(device)\n",
" optimizer.zero_grad()\n",
" output = model(data)\n",
" loss = criterion(output, target)\n",
" loss.backward()\n",
" optimizer.step()\n",
" if batch_idx % 100 == 0:\n",
" print('Train Epoch: {} [{}/{} ({:.0f}%)]\\tLoss: {:.6f}'.format(\n",
" epoch, batch_idx * len(data), len(train_loader.dataset),\n",
" 100. * batch_idx / len(train_loader), loss.item()))\n",
"\n",
"def evaluator(model):\n",
" model.eval()\n",
" test_loss = 0\n",
" correct = 0\n",
" with torch.no_grad():\n",
" for data, target in test_loader:\n",
" data, target = data.to(device), target.to(device)\n",
" output = model(data)\n",
" test_loss += F.nll_loss(output, target, reduction='sum').item()\n",
" pred = output.argmax(dim=1, keepdim=True)\n",
" correct += pred.eq(target.view_as(pred)).sum().item()\n",
"\n",
" test_loss /= len(test_loader.dataset)\n",
" acc = 100 * correct / len(test_loader.dataset)\n",
"\n",
" print('\\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\\n'.format(\n",
" test_loss, correct, len(test_loader.dataset), acc))\n",
"\n",
" return acc"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train Epoch: 0 [0/60000 (0%)]\tLoss: 2.313423\n",
"Train Epoch: 0 [6400/60000 (11%)]\tLoss: 0.091786\n",
"Train Epoch: 0 [12800/60000 (21%)]\tLoss: 0.087317\n",
"Train Epoch: 0 [19200/60000 (32%)]\tLoss: 0.036397\n",
"Train Epoch: 0 [25600/60000 (43%)]\tLoss: 0.008173\n",
"Train Epoch: 0 [32000/60000 (53%)]\tLoss: 0.047565\n",
"Train Epoch: 0 [38400/60000 (64%)]\tLoss: 0.122448\n",
"Train Epoch: 0 [44800/60000 (75%)]\tLoss: 0.036732\n",
"Train Epoch: 0 [51200/60000 (85%)]\tLoss: 0.150135\n",
"Train Epoch: 0 [57600/60000 (96%)]\tLoss: 0.109684\n",
"\n",
"Test set: Average loss: 0.0457, Accuracy: 9857/10000 (99%)\n",
"\n",
"Train Epoch: 1 [0/60000 (0%)]\tLoss: 0.020650\n",
"Train Epoch: 1 [6400/60000 (11%)]\tLoss: 0.091525\n",
"Train Epoch: 1 [12800/60000 (21%)]\tLoss: 0.019602\n",
"Train Epoch: 1 [19200/60000 (32%)]\tLoss: 0.027827\n",
"Train Epoch: 1 [25600/60000 (43%)]\tLoss: 0.019414\n",
"Train Epoch: 1 [32000/60000 (53%)]\tLoss: 0.007640\n",
"Train Epoch: 1 [38400/60000 (64%)]\tLoss: 0.051296\n",
"Train Epoch: 1 [44800/60000 (75%)]\tLoss: 0.012038\n",
"Train Epoch: 1 [51200/60000 (85%)]\tLoss: 0.121057\n",
"Train Epoch: 1 [57600/60000 (96%)]\tLoss: 0.015796\n",
"\n",
"Test set: Average loss: 0.0302, Accuracy: 9902/10000 (99%)\n",
"\n",
"Train Epoch: 2 [0/60000 (0%)]\tLoss: 0.009903\n",
"Train Epoch: 2 [6400/60000 (11%)]\tLoss: 0.062256\n",
"Train Epoch: 2 [12800/60000 (21%)]\tLoss: 0.013844\n",
"Train Epoch: 2 [19200/60000 (32%)]\tLoss: 0.014133\n",
"Train Epoch: 2 [25600/60000 (43%)]\tLoss: 0.001051\n",
"Train Epoch: 2 [32000/60000 (53%)]\tLoss: 0.006128\n",
"Train Epoch: 2 [38400/60000 (64%)]\tLoss: 0.032162\n",
"Train Epoch: 2 [44800/60000 (75%)]\tLoss: 0.007687\n",
"Train Epoch: 2 [51200/60000 (85%)]\tLoss: 0.092295\n",
"Train Epoch: 2 [57600/60000 (96%)]\tLoss: 0.006266\n",
"\n",
"Test set: Average loss: 0.0259, Accuracy: 9920/10000 (99%)\n",
"\n"
]
}
],
"source": [
"# pre-train model for 3 epoches.\n",
"\n",
"scheduler = StepLR(optimizer, step_size=1, gamma=0.7)\n",
"\n",
"for epoch in range(0, 3):\n",
" trainer(model, optimizer, criterion, epoch)\n",
" evaluator(model)\n",
" scheduler.step()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"op_name: \n",
"op_type: <class '__main__.NaiveModel'>\n",
"\n",
"op_name: conv1\n",
"op_type: <class 'torch.nn.modules.conv.Conv2d'>\n",
"\n",
"op_name: conv2\n",
"op_type: <class 'torch.nn.modules.conv.Conv2d'>\n",
"\n",
"op_name: fc1\n",
"op_type: <class 'torch.nn.modules.linear.Linear'>\n",
"\n",
"op_name: fc2\n",
"op_type: <class 'torch.nn.modules.linear.Linear'>\n",
"\n",
"op_name: relu1\n",
"op_type: <class 'torch.nn.modules.activation.ReLU6'>\n",
"\n",
"op_name: relu2\n",
"op_type: <class 'torch.nn.modules.activation.ReLU6'>\n",
"\n",
"op_name: relu3\n",
"op_type: <class 'torch.nn.modules.activation.ReLU6'>\n",
"\n",
"op_name: max_pool1\n",
"op_type: <class 'torch.nn.modules.pooling.MaxPool2d'>\n",
"\n",
"op_name: max_pool2\n",
"op_type: <class 'torch.nn.modules.pooling.MaxPool2d'>\n",
"\n"
]
},
{
"data": {
"text/plain": [
"[None, None, None, None, None, None, None, None, None, None]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# show all op_name and op_type in the model.\n",
"\n",
"[print('op_name: {}\\nop_type: {}\\n'.format(name, type(module))) for name, module in model.named_modules()]"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([20, 1, 5, 5])\n"
]
}
],
"source": [
"# show the weight size of `conv1`.\n",
"\n",
"print(model.conv1.weight.data.size())"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[[[ 1.5338e-01, -1.1766e-01, -2.6654e-01, -2.9445e-02, -1.4650e-01],\n",
" [-1.8796e-01, -2.9882e-01, 6.9725e-02, 2.1561e-01, 6.5688e-02],\n",
" [ 1.5274e-01, -9.8471e-03, 3.2303e-01, 1.3472e-03, 1.7235e-01],\n",
" [ 1.1804e-01, 2.2535e-01, -8.3370e-02, -3.4553e-02, -1.2529e-01],\n",
" [-6.6012e-02, -2.0272e-02, -1.8797e-01, -4.6882e-02, -8.3206e-02]]],\n",
"\n",
"\n",
" [[[-1.2112e-01, 7.0756e-02, 5.0446e-02, 1.5156e-01, -2.7929e-02],\n",
" [-1.9744e-01, -2.1336e-03, 7.2534e-02, 6.2336e-02, 1.6039e-01],\n",
" [-6.7510e-02, 1.4636e-01, 7.1972e-02, -8.9118e-02, -4.0895e-02],\n",
" [ 2.9499e-02, 2.0788e-01, -1.4989e-01, 1.1668e-01, -2.8503e-01],\n",
" [ 8.1894e-02, -1.4489e-01, -4.2038e-02, -1.2794e-01, -5.0379e-02]]],\n",
"\n",
"\n",
" [[[ 3.8332e-02, -1.4270e-01, -1.9585e-01, 2.2653e-01, 1.0104e-01],\n",
" [-2.7956e-03, -1.4108e-01, -1.4694e-01, -1.3525e-01, 2.6959e-01],\n",
" [ 1.9522e-01, -1.2281e-01, -1.9173e-01, -1.8910e-02, 3.1572e-03],\n",
" [-1.0580e-01, -2.5239e-02, -5.8266e-02, -6.5815e-02, 6.6433e-02],\n",
" [ 8.9601e-02, 7.1189e-02, -2.4255e-01, 1.5746e-01, -1.4708e-01]]],\n",
"\n",
"\n",
" [[[-1.1963e-01, -1.7243e-01, -3.5174e-02, 1.4651e-01, -1.1675e-01],\n",
" [-1.3518e-01, 1.2830e-02, 7.7188e-02, 2.1060e-01, 4.0924e-02],\n",
" [-4.3364e-02, -1.9579e-01, -3.6559e-02, -6.9803e-02, 1.2380e-01],\n",
" [ 7.7321e-02, 3.7590e-02, 8.2935e-02, 2.2878e-01, 2.7859e-03],\n",
" [-1.3601e-01, -2.1167e-01, -2.3195e-01, -1.2524e-01, 1.0073e-01]]],\n",
"\n",
"\n",
" [[[-2.7300e-01, 6.8470e-02, 2.8405e-02, -4.5879e-03, -1.3735e-01],\n",
" [-8.9789e-02, -2.0209e-03, 5.0950e-03, 2.1633e-01, 2.5554e-01],\n",
" [ 5.4389e-02, 1.2262e-01, -1.5514e-01, -1.0416e-01, 1.3606e-01],\n",
" [-1.6794e-01, -2.8876e-02, 2.5900e-02, -2.4261e-02, 1.0923e-01],\n",
" [ 5.2524e-03, -4.4625e-02, -2.1327e-01, -1.7211e-01, -4.4819e-04]]],\n",
"\n",
"\n",
" [[[ 7.2378e-02, 1.5122e-01, -1.2964e-01, 4.9105e-02, -2.1639e-01],\n",
" [ 3.6547e-02, -1.5518e-02, 3.2059e-02, -3.2820e-02, 6.1231e-02],\n",
" [ 1.2514e-01, 8.0623e-02, 1.2686e-02, -1.0074e-01, 2.2836e-02],\n",
" [-2.6842e-02, 2.5578e-02, -2.5877e-01, -1.7808e-01, 7.6966e-02],\n",
" [-4.2424e-02, 4.7006e-02, -1.5486e-02, -4.2686e-02, 4.8482e-02]]],\n",
"\n",
"\n",
" [[[ 1.3081e-01, 9.9530e-02, -1.4729e-01, -1.7665e-01, -1.9757e-01],\n",
" [ 9.6603e-02, 2.2783e-02, 7.8402e-02, -2.8679e-02, 8.5252e-02],\n",
" [-1.5310e-02, 1.1605e-01, -5.8300e-02, 2.4563e-02, 1.7488e-01],\n",
" [ 6.5576e-02, -1.6325e-01, -1.1318e-01, -2.9251e-02, 6.2352e-02],\n",
" [-1.9084e-03, -1.4005e-01, -1.2363e-01, -9.7985e-02, -2.0562e-01]]],\n",
"\n",
"\n",
" [[[ 4.0772e-02, -8.2086e-02, -2.7555e-01, -3.2547e-01, -1.2226e-01],\n",
" [-5.9877e-02, 9.8567e-02, 2.5186e-01, -1.0280e-01, -2.3416e-01],\n",
" [ 8.5760e-02, 1.0896e-01, 1.4898e-01, 2.1579e-01, 8.5297e-02],\n",
" [ 5.4720e-02, -1.7226e-01, -7.2518e-02, 6.7099e-03, -1.6011e-03],\n",
" [-8.9944e-02, 1.7404e-01, -3.6985e-02, 1.8602e-01, 7.2353e-02]]],\n",
"\n",
"\n",
" [[[ 1.6276e-02, -9.6439e-02, -9.6085e-02, -2.4267e-01, -1.8521e-01],\n",
" [ 6.3310e-02, 1.7866e-01, 1.1694e-01, -1.4464e-01, -2.7711e-01],\n",
" [-2.4514e-02, 2.2222e-01, 2.1053e-01, -1.4271e-01, 8.7045e-02],\n",
" [-1.9207e-01, -5.4719e-02, -5.7775e-03, -1.0034e-05, -1.0923e-01],\n",
" [-2.4006e-02, 2.3780e-02, 1.8988e-01, 2.4734e-01, 4.8097e-02]]],\n",
"\n",
"\n",
" [[[ 1.1335e-01, -5.8451e-02, 5.2440e-02, -1.3223e-01, -2.5534e-02],\n",
" [ 9.1323e-02, -6.0707e-02, 2.3524e-01, 2.4992e-01, 8.7842e-02],\n",
" [ 2.9002e-02, 3.5379e-02, -5.9689e-02, -2.8363e-03, 1.8618e-01],\n",
" [-2.9671e-01, 8.1830e-03, 1.1076e-01, -5.4118e-02, -6.1685e-02],\n",
" [-1.7580e-01, -3.4534e-01, -3.9250e-01, -2.7569e-01, -2.6131e-01]]],\n",
"\n",
"\n",
" [[[ 1.1586e-01, -7.5997e-02, -1.4614e-01, 4.8750e-02, 1.8097e-01],\n",
" [-6.7027e-02, -1.4901e-01, -1.5614e-02, -1.0379e-02, 9.5526e-02],\n",
" [-3.2333e-02, -1.5107e-01, -1.9498e-01, 1.0083e-01, 2.2328e-01],\n",
" [-2.0692e-01, -6.3798e-02, -1.2524e-01, 1.9549e-01, 1.9682e-01],\n",
" [-2.1494e-01, 1.0475e-01, -2.4858e-02, -9.7831e-02, 1.1551e-01]]],\n",
"\n",
"\n",
" [[[ 6.3785e-02, -1.8044e-01, -1.0190e-01, -1.3588e-01, 8.5433e-02],\n",
" [ 2.0675e-01, 3.3238e-02, 9.2437e-02, 1.1799e-01, 2.1111e-01],\n",
" [-5.2138e-02, 1.5790e-01, 1.8151e-01, 8.0470e-02, 1.0131e-01],\n",
" [-4.4786e-02, 1.1771e-01, 2.1706e-02, -1.2563e-01, -2.1142e-01],\n",
" [-2.3589e-01, -2.1154e-01, -1.7890e-01, -2.7769e-01, -1.2512e-01]]],\n",
"\n",
"\n",
" [[[ 1.9133e-01, 2.4711e-01, 1.0413e-01, -1.9187e-01, -3.0991e-01],\n",
" [-1.2382e-01, 8.3641e-03, -5.6734e-02, 5.8376e-02, 2.2880e-02],\n",
" [-3.1734e-01, -1.0637e-02, -5.5974e-02, 1.0676e-01, -1.1080e-02],\n",
" [-2.2980e-01, 2.0486e-01, 1.0147e-01, 1.4484e-01, 5.2265e-02],\n",
" [ 7.4410e-02, 2.2806e-02, 8.5137e-02, -2.1809e-01, 3.1704e-02]]],\n",
"\n",
"\n",
" [[[-1.1006e-01, -2.5311e-01, 1.8925e-02, 1.0399e-02, 1.1951e-01],\n",
" [-2.1116e-01, 1.8409e-01, 3.2172e-02, 1.5962e-01, -7.9457e-02],\n",
" [ 1.1059e-01, 9.1966e-02, 1.0777e-01, -9.9132e-02, -4.4586e-02],\n",
" [-8.7919e-02, -3.7283e-02, 9.1275e-02, -3.7412e-02, 3.8875e-02],\n",
" [-4.3558e-02, 1.6196e-01, -4.7944e-03, -1.7560e-02, -1.2593e-01]]],\n",
"\n",
"\n",
" [[[ 7.6976e-02, -3.8627e-02, 1.2610e-01, 1.1994e-01, 2.1706e-03],\n",
" [ 7.4357e-02, 6.7929e-02, 3.1386e-02, 1.4606e-01, 2.1429e-01],\n",
" [-2.6569e-01, -4.2631e-04, -3.6654e-02, -3.0967e-02, -9.4961e-02],\n",
" [-2.0192e-01, -3.5423e-01, -2.5246e-01, -3.5092e-01, -2.4159e-01],\n",
" [ 1.7636e-02, 1.3744e-01, -1.0306e-01, 8.8370e-02, 7.3258e-02]]],\n",
"\n",
"\n",
" [[[ 2.0016e-01, 1.0956e-01, -5.9223e-02, 6.4871e-03, -2.4165e-01],\n",
" [ 5.6283e-02, 1.7276e-01, -2.2316e-01, -1.6699e-01, -7.0742e-02],\n",
" [ 2.6179e-01, -2.5102e-01, -2.0774e-01, -9.6413e-02, 3.4367e-02],\n",
" [-9.1882e-02, -2.9195e-01, -8.7432e-02, 1.0144e-01, -2.0559e-02],\n",
" [-2.5668e-01, -9.8016e-02, 1.1103e-01, -3.0233e-02, 1.1076e-01]]],\n",
"\n",
"\n",
" [[[ 1.0027e-03, -5.7955e-02, -2.1339e-01, -1.6729e-01, -2.0870e-01],\n",
" [ 4.2464e-02, 2.3177e-01, -6.1459e-02, -1.0905e-01, 1.7613e-02],\n",
" [-1.2282e-01, 2.1762e-01, -1.3553e-02, 2.7476e-01, 1.6703e-01],\n",
" [-5.6282e-02, 1.2731e-02, 1.0944e-01, -1.7347e-01, 4.4497e-02],\n",
" [ 5.7346e-02, -5.4657e-02, 4.8718e-02, -2.6221e-02, -2.6933e-02]]],\n",
"\n",
"\n",
" [[[ 6.7697e-02, 1.5692e-01, 2.7050e-01, 1.5936e-02, 1.7659e-01],\n",
" [-2.8899e-02, -1.4866e-01, 3.1838e-02, 1.0903e-01, 1.2292e-01],\n",
" [-1.3608e-01, -4.3198e-03, -9.8925e-02, -4.5599e-02, 1.3452e-01],\n",
" [-5.1435e-02, -2.3815e-01, -2.4151e-01, -4.8556e-02, 1.3825e-01],\n",
" [-1.2823e-01, 8.9324e-03, -1.5313e-01, -2.2933e-01, -3.4081e-02]]],\n",
"\n",
"\n",
" [[[-1.8396e-01, -6.8774e-03, -1.6675e-01, 7.1980e-03, 1.9922e-02],\n",
" [ 1.3416e-01, -1.1450e-01, -1.5277e-01, -6.5713e-02, -9.5435e-02],\n",
" [ 1.5406e-01, -9.1235e-02, -1.0880e-01, -7.1603e-02, -9.5575e-02],\n",
" [ 2.1772e-01, 8.4073e-02, -2.5264e-01, -2.1428e-01, 1.9537e-01],\n",
" [ 1.3124e-01, 7.9532e-02, -2.4044e-01, -1.5717e-01, 1.6562e-01]]],\n",
"\n",
"\n",
" [[[ 1.1849e-01, -5.0517e-03, -1.8900e-01, 1.8093e-02, 6.4660e-02],\n",
" [-1.5309e-01, -2.0106e-01, -8.6551e-02, 5.2692e-03, 1.5448e-01],\n",
" [-3.0727e-01, 4.9703e-02, -4.7637e-02, 2.9111e-01, -1.3173e-01],\n",
" [-8.5167e-02, -1.3540e-01, 2.9235e-01, 3.7895e-03, -9.4651e-02],\n",
" [-6.0694e-02, 9.6936e-02, 1.0533e-01, -6.1769e-02, -1.8086e-01]]]],\n",
" device='cuda:0')\n"
]
}
],
"source": [
"# show the weight of `conv1`.\n",
"\n",
"print(model.conv1.weight.data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Prepare config_list for pruning"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# we will prune 50% weights in `conv1`.\n",
"\n",
"config_list = [{\n",
" 'sparsity': 0.5,\n",
" 'op_types': ['Conv2d'],\n",
" 'op_names': ['conv1']\n",
"}]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3. Choose a pruner and pruning"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# use l1filter pruner to prune the model\n",
"\n",
"from nni.algorithms.compression.pytorch.pruning import L1FilterPruner\n",
"\n",
"# Note that if you use a compressor that need you to pass a optimizer,\n",
"# you need a new optimizer instead of you have used above, because NNI might modify the optimizer.\n",
"# And of course this modified optimizer can not be used in finetuning.\n",
"pruner = L1FilterPruner(model, config_list)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"op_name: \n",
"op_type: <class '__main__.NaiveModel'>\n",
"\n",
"op_name: conv1\n",
"op_type: <class 'nni.compression.pytorch.compressor.PrunerModuleWrapper'>\n",
"\n",
"op_name: conv1.module\n",
"op_type: <class 'torch.nn.modules.conv.Conv2d'>\n",
"\n",
"op_name: conv2\n",
"op_type: <class 'torch.nn.modules.conv.Conv2d'>\n",
"\n",
"op_name: fc1\n",
"op_type: <class 'torch.nn.modules.linear.Linear'>\n",
"\n",
"op_name: fc2\n",
"op_type: <class 'torch.nn.modules.linear.Linear'>\n",
"\n",
"op_name: relu1\n",
"op_type: <class 'torch.nn.modules.activation.ReLU6'>\n",
"\n",
"op_name: relu2\n",
"op_type: <class 'torch.nn.modules.activation.ReLU6'>\n",
"\n",
"op_name: relu3\n",
"op_type: <class 'torch.nn.modules.activation.ReLU6'>\n",
"\n",
"op_name: max_pool1\n",
"op_type: <class 'torch.nn.modules.pooling.MaxPool2d'>\n",
"\n",
"op_name: max_pool2\n",
"op_type: <class 'torch.nn.modules.pooling.MaxPool2d'>\n",
"\n"
]
},
{
"data": {
"text/plain": [
"[None, None, None, None, None, None, None, None, None, None, None]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# we can find the `conv1` has been wrapped, the origin `conv1` changes to `conv1.module`.\n",
"# the weight of conv1 will modify by `weight * mask` in `forward()`. The initial mask is a `ones_like(weight)` tensor.\n",
"\n",
"[print('op_name: {}\\nop_type: {}\\n'.format(name, type(module))) for name, module in model.named_modules()]"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"NaiveModel(\n",
" (conv1): PrunerModuleWrapper(\n",
" (module): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))\n",
" )\n",
" (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1))\n",
" (fc1): Linear(in_features=800, out_features=500, bias=True)\n",
" (fc2): Linear(in_features=500, out_features=10, bias=True)\n",
" (relu1): ReLU6()\n",
" (relu2): ReLU6()\n",
" (relu3): ReLU6()\n",
" (max_pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (max_pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
")"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# compress the model, the mask will be updated.\n",
"\n",
"pruner.compress()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([20, 1, 5, 5])\n"
]
}
],
"source": [
"# show the mask size of `conv1`\n",
"\n",
"print(model.conv1.weight_mask.size())"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]],\n",
"\n",
"\n",
" [[[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]]],\n",
"\n",
"\n",
" [[[0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.],\n",
" [0., 0., 0., 0., 0.]]]], device='cuda:0')\n"
]
}
],
"source": [
"# show the mask of `conv1`\n",
"\n",
"print(model.conv1.weight_mask)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[[[ 1.5338e-01, -1.1766e-01, -2.6654e-01, -2.9445e-02, -1.4650e-01],\n",
" [-1.8796e-01, -2.9882e-01, 6.9725e-02, 2.1561e-01, 6.5688e-02],\n",
" [ 1.5274e-01, -9.8471e-03, 3.2303e-01, 1.3472e-03, 1.7235e-01],\n",
" [ 1.1804e-01, 2.2535e-01, -8.3370e-02, -3.4553e-02, -1.2529e-01],\n",
" [-6.6012e-02, -2.0272e-02, -1.8797e-01, -4.6882e-02, -8.3206e-02]]],\n",
"\n",
"\n",
" [[[-0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [ 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 3.8332e-02, -1.4270e-01, -1.9585e-01, 2.2653e-01, 1.0104e-01],\n",
" [-2.7956e-03, -1.4108e-01, -1.4694e-01, -1.3525e-01, 2.6959e-01],\n",
" [ 1.9522e-01, -1.2281e-01, -1.9173e-01, -1.8910e-02, 3.1572e-03],\n",
" [-1.0580e-01, -2.5239e-02, -5.8266e-02, -6.5815e-02, 6.6433e-02],\n",
" [ 8.9601e-02, 7.1189e-02, -2.4255e-01, 1.5746e-01, -1.4708e-01]]],\n",
"\n",
"\n",
" [[[-0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00]]],\n",
"\n",
"\n",
" [[[-0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [ 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [ 0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [ 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 4.0772e-02, -8.2086e-02, -2.7555e-01, -3.2547e-01, -1.2226e-01],\n",
" [-5.9877e-02, 9.8567e-02, 2.5186e-01, -1.0280e-01, -2.3416e-01],\n",
" [ 8.5760e-02, 1.0896e-01, 1.4898e-01, 2.1579e-01, 8.5297e-02],\n",
" [ 5.4720e-02, -1.7226e-01, -7.2518e-02, 6.7099e-03, -1.6011e-03],\n",
" [-8.9944e-02, 1.7404e-01, -3.6985e-02, 1.8602e-01, 7.2353e-02]]],\n",
"\n",
"\n",
" [[[ 1.6276e-02, -9.6439e-02, -9.6085e-02, -2.4267e-01, -1.8521e-01],\n",
" [ 6.3310e-02, 1.7866e-01, 1.1694e-01, -1.4464e-01, -2.7711e-01],\n",
" [-2.4514e-02, 2.2222e-01, 2.1053e-01, -1.4271e-01, 8.7045e-02],\n",
" [-1.9207e-01, -5.4719e-02, -5.7775e-03, -1.0034e-05, -1.0923e-01],\n",
" [-2.4006e-02, 2.3780e-02, 1.8988e-01, 2.4734e-01, 4.8097e-02]]],\n",
"\n",
"\n",
" [[[ 1.1335e-01, -5.8451e-02, 5.2440e-02, -1.3223e-01, -2.5534e-02],\n",
" [ 9.1323e-02, -6.0707e-02, 2.3524e-01, 2.4992e-01, 8.7842e-02],\n",
" [ 2.9002e-02, 3.5379e-02, -5.9689e-02, -2.8363e-03, 1.8618e-01],\n",
" [-2.9671e-01, 8.1830e-03, 1.1076e-01, -5.4118e-02, -6.1685e-02],\n",
" [-1.7580e-01, -3.4534e-01, -3.9250e-01, -2.7569e-01, -2.6131e-01]]],\n",
"\n",
"\n",
" [[[ 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 6.3785e-02, -1.8044e-01, -1.0190e-01, -1.3588e-01, 8.5433e-02],\n",
" [ 2.0675e-01, 3.3238e-02, 9.2437e-02, 1.1799e-01, 2.1111e-01],\n",
" [-5.2138e-02, 1.5790e-01, 1.8151e-01, 8.0470e-02, 1.0131e-01],\n",
" [-4.4786e-02, 1.1771e-01, 2.1706e-02, -1.2563e-01, -2.1142e-01],\n",
" [-2.3589e-01, -2.1154e-01, -1.7890e-01, -2.7769e-01, -1.2512e-01]]],\n",
"\n",
"\n",
" [[[ 1.9133e-01, 2.4711e-01, 1.0413e-01, -1.9187e-01, -3.0991e-01],\n",
" [-1.2382e-01, 8.3641e-03, -5.6734e-02, 5.8376e-02, 2.2880e-02],\n",
" [-3.1734e-01, -1.0637e-02, -5.5974e-02, 1.0676e-01, -1.1080e-02],\n",
" [-2.2980e-01, 2.0486e-01, 1.0147e-01, 1.4484e-01, 5.2265e-02],\n",
" [ 7.4410e-02, 2.2806e-02, 8.5137e-02, -2.1809e-01, 3.1704e-02]]],\n",
"\n",
"\n",
" [[[-0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 7.6976e-02, -3.8627e-02, 1.2610e-01, 1.1994e-01, 2.1706e-03],\n",
" [ 7.4357e-02, 6.7929e-02, 3.1386e-02, 1.4606e-01, 2.1429e-01],\n",
" [-2.6569e-01, -4.2631e-04, -3.6654e-02, -3.0967e-02, -9.4961e-02],\n",
" [-2.0192e-01, -3.5423e-01, -2.5246e-01, -3.5092e-01, -2.4159e-01],\n",
" [ 1.7636e-02, 1.3744e-01, -1.0306e-01, 8.8370e-02, 7.3258e-02]]],\n",
"\n",
"\n",
" [[[ 2.0016e-01, 1.0956e-01, -5.9223e-02, 6.4871e-03, -2.4165e-01],\n",
" [ 5.6283e-02, 1.7276e-01, -2.2316e-01, -1.6699e-01, -7.0742e-02],\n",
" [ 2.6179e-01, -2.5102e-01, -2.0774e-01, -9.6413e-02, 3.4367e-02],\n",
" [-9.1882e-02, -2.9195e-01, -8.7432e-02, 1.0144e-01, -2.0559e-02],\n",
" [-2.5668e-01, -9.8016e-02, 1.1103e-01, -3.0233e-02, 1.1076e-01]]],\n",
"\n",
"\n",
" [[[ 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00],\n",
" [ 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [ 0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00]]],\n",
"\n",
"\n",
" [[[ 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00, -0.0000e+00]]],\n",
"\n",
"\n",
" [[[-1.8396e-01, -6.8774e-03, -1.6675e-01, 7.1980e-03, 1.9922e-02],\n",
" [ 1.3416e-01, -1.1450e-01, -1.5277e-01, -6.5713e-02, -9.5435e-02],\n",
" [ 1.5406e-01, -9.1235e-02, -1.0880e-01, -7.1603e-02, -9.5575e-02],\n",
" [ 2.1772e-01, 8.4073e-02, -2.5264e-01, -2.1428e-01, 1.9537e-01],\n",
" [ 1.3124e-01, 7.9532e-02, -2.4044e-01, -1.5717e-01, 1.6562e-01]]],\n",
"\n",
"\n",
" [[[ 0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, -0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [-0.0000e+00, -0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00],\n",
" [-0.0000e+00, 0.0000e+00, 0.0000e+00, -0.0000e+00, -0.0000e+00]]]],\n",
" device='cuda:0')\n"
]
}
],
"source": [
"# use a dummy input to apply the sparsify.\n",
"\n",
"model(torch.rand(1, 1, 28, 28).to(device))\n",
"\n",
"# the weights of `conv1` have been sparsified.\n",
"\n",
"print(model.conv1.module.weight.data)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2021-07-26 22:26:05] INFO (nni.compression.pytorch.compressor/MainThread) Model state_dict saved to pruned_naive_mnist_l1filter.pth\n",
"[2021-07-26 22:26:05] INFO (nni.compression.pytorch.compressor/MainThread) Mask dict saved to mask_naive_mnist_l1filter.pth\n"
]
}
],
"source": [
"# export the sparsified model state to './pruned_naive_mnist_l1filter.pth'.\n",
"# export the mask to './mask_naive_mnist_l1filter.pth'.\n",
"\n",
"pruner.export_model(model_path='pruned_naive_mnist_l1filter.pth', mask_path='mask_naive_mnist_l1filter.pth')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 4. Speed Up"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NaiveModel(\n",
" (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))\n",
" (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1))\n",
" (fc1): Linear(in_features=800, out_features=500, bias=True)\n",
" (fc2): Linear(in_features=500, out_features=10, bias=True)\n",
" (relu1): ReLU6()\n",
" (relu2): ReLU6()\n",
" (relu3): ReLU6()\n",
" (max_pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (max_pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
")\n"
]
}
],
"source": [
"# If you use a wrapped model, don't forget to unwrap it.\n",
"\n",
"pruner._unwrap_model()\n",
"\n",
"# the model has been unwrapped.\n",
"\n",
"print(model)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"<ipython-input-1-0f2a9eb92f42>:22: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!\n",
" x = x.view(-1, x.size()[1:].numel())\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) start to speed up the model\n",
"[2021-07-26 22:26:18] INFO (FixMaskConflict/MainThread) {'conv1': 1, 'conv2': 1}\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2021-07-26 22:26:18] INFO (FixMaskConflict/MainThread) dim0 sparsity: 0.500000\n",
"[2021-07-26 22:26:18] INFO (FixMaskConflict/MainThread) dim1 sparsity: 0.000000\n",
"[2021-07-26 22:26:18] INFO (FixMaskConflict/MainThread) Dectected conv prune dim\" 0\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) infer module masks...\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for conv1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for relu1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for max_pool1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for conv2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for relu2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for max_pool2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::view.9\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.jit_translate/MainThread) View Module output size: [-1, 800]\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for fc1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for relu3\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for fc2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update mask for .aten::log_softmax.10\n",
"[2021-07-26 22:26:18] ERROR (nni.compression.pytorch.speedup.jit_translate/MainThread) aten::log_softmax is not Supported! Please report an issue at https://github.com/microsoft/nni. Thanks~\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for .aten::log_softmax.10\n",
"[2021-07-26 22:26:18] WARNING (nni.compression.pytorch.speedup.compressor/MainThread) Note: .aten::log_softmax.10 does not have corresponding mask inference object\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for fc2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the fc2\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for relu3\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the relu3\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for fc1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the fc1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for .aten::view.9\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the .aten::view.9\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for max_pool2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the max_pool2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for relu2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the relu2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for conv2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the conv2\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for max_pool1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the max_pool1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for relu1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the relu1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update indirect sparsity for conv1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Update the indirect sparsity for the conv1\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) resolve the mask conflict\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace compressed modules...\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: conv1, op_type: Conv2d)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: relu1, op_type: ReLU6)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: max_pool1, op_type: MaxPool2d)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: conv2, op_type: Conv2d)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: relu2, op_type: ReLU6)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: max_pool2, op_type: MaxPool2d)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Warning: cannot replace (name: .aten::view.9, op_type: aten::view) which is func type\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: fc1, op_type: Linear)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compress_modules/MainThread) replace linear with new in_features: 800, out_features: 500\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: relu3, op_type: ReLU6)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) replace module (name: fc2, op_type: Linear)\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compress_modules/MainThread) replace linear with new in_features: 500, out_features: 10\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) Warning: cannot replace (name: .aten::log_softmax.10, op_type: aten::log_softmax) which is func type\n",
"[2021-07-26 22:26:18] INFO (nni.compression.pytorch.speedup.compressor/MainThread) speedup done\n"
]
}
],
"source": [
"from nni.compression.pytorch import ModelSpeedup\n",
"\n",
"m_speedup = ModelSpeedup(model, dummy_input=torch.rand(10, 1, 28, 28).to(device), masks_file='mask_naive_mnist_l1filter.pth')\n",
"m_speedup.speedup_model()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NaiveModel(\n",
" (conv1): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))\n",
" (conv2): Conv2d(10, 50, kernel_size=(5, 5), stride=(1, 1))\n",
" (fc1): Linear(in_features=800, out_features=500, bias=True)\n",
" (fc2): Linear(in_features=500, out_features=10, bias=True)\n",
" (relu1): ReLU6()\n",
" (relu2): ReLU6()\n",
" (relu3): ReLU6()\n",
" (max_pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (max_pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
")\n"
]
}
],
"source": [
"# the `conv1` has been replace from `Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1))` to `Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))`\n",
"# and the following layer `conv2` has also changed because the input channel of `conv2` should aware the output channel of `conv1`.\n",
"\n",
"print(model)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train Epoch: 0 [0/60000 (0%)]\tLoss: 0.306930\n",
"Train Epoch: 0 [6400/60000 (11%)]\tLoss: 0.045807\n",
"Train Epoch: 0 [12800/60000 (21%)]\tLoss: 0.049293\n",
"Train Epoch: 0 [19200/60000 (32%)]\tLoss: 0.031464\n",
"Train Epoch: 0 [25600/60000 (43%)]\tLoss: 0.005392\n",
"Train Epoch: 0 [32000/60000 (53%)]\tLoss: 0.005652\n",
"Train Epoch: 0 [38400/60000 (64%)]\tLoss: 0.040619\n",
"Train Epoch: 0 [44800/60000 (75%)]\tLoss: 0.016515\n",
"Train Epoch: 0 [51200/60000 (85%)]\tLoss: 0.092886\n",
"Train Epoch: 0 [57600/60000 (96%)]\tLoss: 0.041380\n",
"\n",
"Test set: Average loss: 0.0257, Accuracy: 9917/10000 (99%)\n",
"\n"
]
}
],
"source": [
"# finetune the model to recover the accuracy.\n",
"\n",
"optimizer = torch.optim.SGD(model.parameters(), lr=0.01)\n",
"\n",
"for epoch in range(0, 1):\n",
" trainer(model, optimizer, criterion, epoch)\n",
" evaluator(model)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 5. Prepare config_list for quantization"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"config_list = [{\n",
" 'quant_types': ['weight', 'input'],\n",
" 'quant_bits': {'weight': 8, 'input': 8},\n",
" 'op_names': ['conv1', 'conv2']\n",
"}]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 6. Choose a quantizer and quantizing"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"NaiveModel(\n",
" (conv1): QuantizerModuleWrapper(\n",
" (module): Conv2d(1, 10, kernel_size=(5, 5), stride=(1, 1))\n",
" )\n",
" (conv2): QuantizerModuleWrapper(\n",
" (module): Conv2d(10, 50, kernel_size=(5, 5), stride=(1, 1))\n",
" )\n",
" (fc1): Linear(in_features=800, out_features=500, bias=True)\n",
" (fc2): Linear(in_features=500, out_features=10, bias=True)\n",
" (relu1): ReLU6()\n",
" (relu2): ReLU6()\n",
" (relu3): ReLU6()\n",
" (max_pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
" (max_pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
")"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from nni.algorithms.compression.pytorch.quantization import QAT_Quantizer\n",
"\n",
"quantizer = QAT_Quantizer(model, config_list, optimizer)\n",
"quantizer.compress()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train Epoch: 0 [0/60000 (0%)]\tLoss: 0.004960\n",
"Train Epoch: 0 [6400/60000 (11%)]\tLoss: 0.036269\n",
"Train Epoch: 0 [12800/60000 (21%)]\tLoss: 0.018744\n",
"Train Epoch: 0 [19200/60000 (32%)]\tLoss: 0.021916\n",
"Train Epoch: 0 [25600/60000 (43%)]\tLoss: 0.003095\n",
"Train Epoch: 0 [32000/60000 (53%)]\tLoss: 0.003947\n",
"Train Epoch: 0 [38400/60000 (64%)]\tLoss: 0.032094\n",
"Train Epoch: 0 [44800/60000 (75%)]\tLoss: 0.017358\n",
"Train Epoch: 0 [51200/60000 (85%)]\tLoss: 0.083886\n",
"Train Epoch: 0 [57600/60000 (96%)]\tLoss: 0.040433\n",
"\n",
"Test set: Average loss: 0.0247, Accuracy: 9917/10000 (99%)\n",
"\n"
]
}
],
"source": [
"# finetune the model for calibration.\n",
"\n",
"for epoch in range(0, 1):\n",
" trainer(model, optimizer, criterion, epoch)\n",
" evaluator(model)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2021-07-26 22:34:41] INFO (nni.compression.pytorch.compressor/MainThread) Model state_dict saved to quantized_naive_mnist_l1filter.pth\n",
"[2021-07-26 22:34:41] INFO (nni.compression.pytorch.compressor/MainThread) Mask dict saved to calibration_naive_mnist_l1filter.pth\n"
]
},
{
"data": {
"text/plain": [
"{'conv1': {'weight_bit': 8,\n",
" 'tracked_min_input': -0.42417848110198975,\n",
" 'tracked_max_input': 2.8212687969207764},\n",
" 'conv2': {'weight_bit': 8,\n",
" 'tracked_min_input': 0.0,\n",
" 'tracked_max_input': 4.246923446655273}}"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# export the sparsified model state to './quantized_naive_mnist_l1filter.pth'.\n",
"# export the calibration config to './calibration_naive_mnist_l1filter.pth'.\n",
"\n",
"quantizer.export_model(model_path='quantized_naive_mnist_l1filter.pth', calibration_path='calibration_naive_mnist_l1filter.pth')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 7. Speed Up"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# speed up with tensorRT\n",
"\n",
"engine = ModelSpeedupTensorRT(model, (32, 1, 28, 28), config=calibration_config, batchsize=32)\n",
"engine.compress()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
#################
Pruning
#################
Pruning is a common technique to compress neural network models.
The pruning methods explore the redundancy in the model weights(parameters) and try to remove/prune the redundant and uncritical weights.
The redundant elements are pruned from the model, their values are zeroed and we make sure they don't take part in the back-propagation process.
From pruning granularity perspective, fine-grained pruning or unstructured pruning refers to pruning each individual weights separately.
Coarse-grained pruning or structured pruning is pruning entire group of weights, such as a convolutional filter.
NNI provides multiple unstructured pruning and structured pruning algorithms.
It supports Tensorflow and PyTorch with unified interface.
For users to prune their models, they only need to add several lines in their code.
For the structured filter pruning, NNI also provides a dependency-aware mode. In the dependency-aware mode, the
filter pruner will get better speed gain after the speedup.
For details, please refer to the following tutorials:
.. toctree::
:maxdepth: 2
Pruners <Pruner>
Dependency Aware Mode <DependencyAware>
Model Speedup <ModelSpeedup>
.. 0f2050a973cfb2207984b4e58c4baf28
#################
剪枝
#################
剪枝是一种常用的神经网络模型压缩技术。
剪枝算法探索模型权重(参数)中的冗余,并尝试去除冗余和非关键权重,
将它们的值归零,确保其不参与反向传播过程。
从剪枝粒度的角度来看,细粒度剪枝或非结构化剪枝是指分别对每个权重进行剪枝。
粗粒度剪枝或结构化剪枝是修剪整组权重,例如卷积滤波器。
NNI 提供了多种非结构化和结构化剪枝算法。
其使用了统一的接口来支持 TensorFlow 和 PyTorch。
只需要添加几行代码即可压缩模型。
对于结构化滤波器剪枝,NNI 还提供了依赖感知模式。 在依赖感知模式下,
滤波器剪枝在加速后会获得更好的速度增益。
详细信息,参考以下教程:
.. toctree::
:maxdepth: 2
Pruners <Pruner>
依赖感知模式 <DependencyAware>
模型加速 <ModelSpeedup>
.. fe32a6de0be31a992afadba5cf6ffe23
#################
量化
#################
量化是指通过减少权重表示或激活所需的比特数来压缩模型,
从而减少计算量和推理时间。 在深度神经网络的背景下,模型权重主要的数据
格式是32位浮点数。 许多研究工作表明,在不显着降低精度的情况下,权重和激活
可以使用8位整数表示, 更低的比特位数,例如4/2/1比特,
是否能够表示权重也是目前非常活跃的研究方向。
一个 Quantizer 是指一种 NNI 实现的量化算法,NNI 提供了多个 Quantizer,如下所示。你也可以
使用 NNI 模型压缩的接口来创造你的 Quantizer。
.. toctree::
:maxdepth: 2
Quantizers <Quantizer>
量化加速 <QuantizationSpeedup>
Pruning V2
==========
Pruning V2 is a refactoring of the old version and provides more powerful functions.
Compared with the old version, the iterative pruning process is detached from the pruner and the pruner is only responsible for pruning and generating the masks once.
What's more, pruning V2 unifies the pruning process and provides a more free combination of pruning components.
Task generator only cares about the pruning effect that should be achieved in each round, and uses a config list to express how to pruning in the next step.
Pruner will reset with the model and config list given by task generator then generate the masks in current step.
For a clearer structure vision, please refer to the figure below.
.. image:: ../../img/pruning_process.png
:target: ../../img/pruning_process.png
:alt:
In V2, a pruning process is usually driven by a pruning scheduler, it contains a specific pruner and a task generator.
But users can also use pruner directly like in the pruning V1.
For details, please refer to the following tutorials:
.. toctree::
:maxdepth: 2
Pruning Algorithms <v2_pruning_algo>
Pruning Scheduler <v2_scheduler>
Pruning Config List <v2_pruning_config_list>
Supported Pruning Algorithms in NNI
===================================
NNI provides several pruning algorithms that reproducing from the papers. In pruning v2, NNI split the pruning algorithm into more detailed components.
This means users can freely combine components from different algorithms,
or easily use a component of their own implementation to replace a step in the original algorithm to implement their own pruning algorithm.
Right now, pruning algorithms with how to generate masks in one step are implemented as pruners,
and how to schedule sparsity in each iteration are implemented as iterative pruners.
**Pruner**
* `Level Pruner <#level-pruner>`__
* `L1 Norm Pruner <#l1-norm-pruner>`__
* `L2 Norm Pruner <#l2-norm-pruner>`__
* `FPGM Pruner <#fpgm-pruner>`__
* `Slim Pruner <#slim-pruner>`__
* `Activation APoZ Rank Pruner <#activation-apoz-rank-pruner>`__
* `Activation Mean Rank Pruner <#activation-mean-rank-pruner>`__
* `Taylor FO Weight Pruner <#taylor-fo-weight-pruner>`__
* `ADMM Pruner <#admm-pruner>`__
* `Movement Pruner <#movement-pruner>`__
**Iterative Pruner**
* `Linear Pruner <#linear-pruner>`__
* `AGP Pruner <#agp-pruner>`__
* `Lottery Ticket Pruner <#lottery-ticket-pruner>`__
* `Simulated Annealing Pruner <#simulated-annealing-pruner>`__
* `Auto Compress Pruner <#auto-compress-pruner>`__
* `AMC Pruner <#amc-pruner>`__
Level Pruner
------------
This is a basic pruner, and in some papers called it magnitude pruning or fine-grained pruning.
It will mask the weight in each specified layer with smaller absolute value by a ratio configured in the config list.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import LevelPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['default'] }]
pruner = LevelPruner(model, config_list)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/level_pruning_torch.py <examples/model_compress/pruning/v2/level_pruning_torch.py>`
User configuration for Level Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.LevelPruner
L1 Norm Pruner
--------------
L1 norm pruner computes the l1 norm of the layer weight on the first dimension,
then prune the weight blocks on this dimension with smaller l1 norm values.
i.e., compute the l1 norm of the filters in convolution layer as metric values,
compute the l1 norm of the weight by rows in linear layer as metric values.
For more details, please refer to `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__\.
In addition, L1 norm pruner also supports dependency-aware mode.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import L1NormPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L1NormPruner(model, config_list)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/norm_pruning_torch.py <examples/model_compress/pruning/v2/norm_pruning_torch.py>`
User configuration for L1 Norm Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.L1NormPruner
L2 Norm Pruner
--------------
L2 norm pruner is a variant of L1 norm pruner. It uses l2 norm as metric to determine which weight elements should be pruned.
L2 norm pruner also supports dependency-aware mode.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import L2NormPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = L2NormPruner(model, config_list)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/norm_pruning_torch.py <examples/model_compress/pruning/v2/norm_pruning_torch.py>`
User configuration for L2 Norm Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.L2NormPruner
FPGM Pruner
-----------
FPGM pruner prunes the blocks of the weight on the first dimension with the smallest geometric median.
FPGM chooses the weight blocks with the most replaceable contribution.
For more details, please refer to `Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration <https://arxiv.org/abs/1811.00250>`__.
FPGM pruner also supports dependency-aware mode.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import FPGMPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = FPGMPruner(model, config_list)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/fpgm_pruning_torch.py <examples/model_compress/pruning/v2/fpgm_pruning_torch.py>`
User configuration for FPGM Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.FPGMPruner
Slim Pruner
-----------
Slim pruner adds sparsity regularization on the scaling factors of batch normalization (BN) layers during training to identify unimportant channels.
The channels with small scaling factor values will be pruned.
For more details, please refer to `Learning Efficient Convolutional Networks through Network Slimming <https://arxiv.org/abs/1708.06519>`__\.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import SlimPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.Adam)(model.parameters())
config_list = [{ 'sparsity': 0.8, 'op_types': ['BatchNorm2d'] }]
pruner = SlimPruner(model, config_list, trainer, traced_optimizer, criterion, training_epochs=1)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/slim_pruning_torch.py <examples/model_compress/pruning/v2/slim_pruning_torch.py>`
User configuration for Slim Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.SlimPruner
Activation APoZ Rank Pruner
---------------------------
Activation APoZ rank pruner is a pruner which prunes on the first weight dimension,
with the smallest importance criterion ``APoZ`` calculated from the output activations of convolution layers to achieve a preset level of network sparsity.
The pruning criterion ``APoZ`` is explained in the paper `Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures <https://arxiv.org/abs/1607.03250>`__.
The APoZ is defined as:
:math:`APoZ_{c}^{(i)} = APoZ\left(O_{c}^{(i)}\right)=\frac{\sum_{k}^{N} \sum_{j}^{M} f\left(O_{c, j}^{(i)}(k)=0\right)}{N \times M}`
Activation APoZ rank pruner also supports dependency-aware mode.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import ActivationAPoZRankPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.Adam)(model.parameters())
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = ActivationAPoZRankPruner(model, config_list, trainer, traced_optimizer, criterion, training_batches=20)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/activation_pruning_torch.py <examples/model_compress/pruning/v2/activation_pruning_torch.py>`
User configuration for Activation APoZ Rank Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.ActivationAPoZRankPruner
Activation Mean Rank Pruner
---------------------------
Activation mean rank pruner is a pruner which prunes on the first weight dimension,
with the smallest importance criterion ``mean activation`` calculated from the output activations of convolution layers to achieve a preset level of network sparsity.
The pruning criterion ``mean activation`` is explained in section 2.2 of the paper `Pruning Convolutional Neural Networks for Resource Efficient Inference <https://arxiv.org/abs/1611.06440>`__.
Activation mean rank pruner also supports dependency-aware mode.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import ActivationMeanRankPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.traces(torch.optim.Adam)(model.parameters())
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = ActivationMeanRankPruner(model, config_list, trainer, traced_optimizer, criterion, training_batches=20)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/activation_pruning_torch.py <examples/model_compress/pruning/v2/activation_pruning_torch.py>`
User configuration for Activation Mean Rank Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.ActivationMeanRankPruner
Taylor FO Weight Pruner
-----------------------
Taylor FO weight pruner is a pruner which prunes on the first weight dimension,
based on estimated importance calculated from the first order taylor expansion on weights to achieve a preset level of network sparsity.
The estimated importance is defined as the paper `Importance Estimation for Neural Network Pruning <http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf>`__.
:math:`\widehat{\mathcal{I}}_{\mathcal{S}}^{(1)}(\mathbf{W}) \triangleq \sum_{s \in \mathcal{S}} \mathcal{I}_{s}^{(1)}(\mathbf{W})=\sum_{s \in \mathcal{S}}\left(g_{s} w_{s}\right)^{2}`
Taylor FO weight pruner also supports dependency-aware mode.
What's more, we provide a global-sort mode for this pruner which is aligned with paper implementation.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import TaylorFOWeightPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.Adam)(model.parameters())
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = TaylorFOWeightPruner(model, config_list, trainer, traced_optimizer, criterion, training_batches=20)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/taylorfo_pruning_torch.py <examples/model_compress/pruning/v2/taylorfo_pruning_torch.py>`
User configuration for Activation Mean Rank Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.TaylorFOWeightPruner
ADMM Pruner
-----------
Alternating Direction Method of Multipliers (ADMM) is a mathematical optimization technique,
by decomposing the original nonconvex problem into two subproblems that can be solved iteratively.
In weight pruning problem, these two subproblems are solved via 1) gradient descent algorithm and 2) Euclidean projection respectively.
During the process of solving these two subproblems, the weights of the original model will be changed.
Then a fine-grained pruning will be applied to prune the model according to the config list given.
This solution framework applies both to non-structured and different variations of structured pruning schemes.
For more details, please refer to `A Systematic DNN Weight Pruning Framework using Alternating Direction Method of Multipliers <https://arxiv.org/abs/1804.03294>`__.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import ADMMPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.Adam)(model.parameters())
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = ADMMPruner(model, config_list, trainer, traced_optimizer, criterion, iterations=10, training_epochs=1)
masked_model, masks = pruner.compress()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/admm_pruning_torch.py <examples/model_compress/pruning/v2/admm_pruning_torch.py>`
User configuration for ADMM Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.ADMMPruner
Movement Pruner
---------------
Movement pruner is an implementation of movement pruning.
This is a "fine-pruning" algorithm, which means the masks may change during each fine-tuning step.
Each weight element will be scored by the opposite of the sum of the product of weight and its gradient during each step.
This means the weight elements moving towards zero will accumulate negative scores, the weight elements moving away from zero will accumulate positive scores.
The weight elements with low scores will be masked during inference.
The following figure from the paper shows the weight pruning by movement pruning.
.. image:: ../../img/movement_pruning.png
:target: ../../img/movement_pruning.png
:alt:
For more details, please refer to `Movement Pruning: Adaptive Sparsity by Fine-Tuning <https://arxiv.org/abs/2005.07683>`__.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import MovementPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.Adam)(model.parameters())
config_list = [{'op_types': ['Linear'], 'op_partial_names': ['bert.encoder'], 'sparsity': 0.9}]
pruner = MovementPruner(model, config_list, trainer, traced_optimizer, criterion, 10, 3000, 27000)
masked_model, masks = pruner.compress()
User configuration for Movement Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.MovementPruner
Reproduced Experiment
^^^^^^^^^^^^^^^^^^^^^
.. list-table::
:header-rows: 1
:widths: auto
* - Model
- Dataset
- Remaining Weights
- MaP acc.(paper/ours)
- MvP acc.(paper/ours)
* - Bert base
- MNLI - Dev
- 10%
- 77.8% / 73.6%
- 79.3% / 78.8%
Linear Pruner
-------------
Linear pruner is an iterative pruner, it will increase sparsity evenly from scratch during each iteration.
For example, the final sparsity is set as 0.5, and the iteration number is 5, then the sparsity used in each iteration are ``[0, 0.1, 0.2, 0.3, 0.4, 0.5]``.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import LinearPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = LinearPruner(model, config_list, pruning_algorithm='l1', total_iteration=10, finetuner=finetuner)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/iterative_pruning_torch.py <examples/model_compress/pruning/v2/iterative_pruning_torch.py>`
User configuration for Linear Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.LinearPruner
AGP Pruner
----------
This is an iterative pruner, which the sparsity is increased from an initial sparsity value :math:`s_{i}` (usually 0) to a final sparsity value :math:`s_{f}` over a span of :math:`n` pruning iterations,
starting at training step :math:`t_{0}` and with pruning frequency :math:`\Delta t`:
:math:`s_{t}=s_{f}+\left(s_{i}-s_{f}\right)\left(1-\frac{t-t_{0}}{n \Delta t}\right)^{3} \text { for } t \in\left\{t_{0}, t_{0}+\Delta t, \ldots, t_{0} + n \Delta t\right\}`
For more details please refer to `To prune, or not to prune: exploring the efficacy of pruning for model compression <https://arxiv.org/abs/1710.01878>`__\.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import AGPPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = AGPPruner(model, config_list, pruning_algorithm='l1', total_iteration=10, finetuner=finetuner)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/iterative_pruning_torch.py <examples/model_compress/pruning/v2/iterative_pruning_torch.py>`
User configuration for AGP Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.AGPPruner
Lottery Ticket Pruner
---------------------
`The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks <https://arxiv.org/abs/1803.03635>`__\ ,
authors Jonathan Frankle and Michael Carbin,provides comprehensive measurement and analysis,
and articulate the *lottery ticket hypothesis*\ : dense, randomly-initialized, feed-forward networks contain subnetworks (*winning tickets*\ ) that
-- when trained in isolation -- reach test accuracy comparable to the original network in a similar number of iterations.
In this paper, the authors use the following process to prune a model, called *iterative prunning*\ :
..
#. Randomly initialize a neural network f(x;theta_0) (where theta\ *0 follows D*\ {theta}).
#. Train the network for j iterations, arriving at parameters theta_j.
#. Prune p% of the parameters in theta_j, creating a mask m.
#. Reset the remaining parameters to their values in theta_0, creating the winning ticket f(x;m*theta_0).
#. Repeat step 2, 3, and 4.
If the configured final sparsity is P (e.g., 0.8) and there are n times iterative pruning,
each iterative pruning prunes 1-(1-P)^(1/n) of the weights that survive the previous round.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import LotteryTicketPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = LotteryTicketPruner(model, config_list, pruning_algorithm='l1', total_iteration=10, finetuner=finetuner, reset_weight=True)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/iterative_pruning_torch.py <examples/model_compress/pruning/v2/iterative_pruning_torch.py>`
User configuration for Lottery Ticket Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.LotteryTicketPruner
Simulated Annealing Pruner
--------------------------
We implement a guided heuristic search method, Simulated Annealing (SA) algorithm. As mentioned in the paper, this method is enhanced on guided search based on prior experience.
The enhanced SA technique is based on the observation that a DNN layer with more number of weights often has a higher degree of model compression with less impact on overall accuracy.
* Randomly initialize a pruning rate distribution (sparsities).
* While current_temperature < stop_temperature:
#. generate a perturbation to current distribution
#. Perform fast evaluation on the perturbated distribution
#. accept the perturbation according to the performance and probability, if not accepted, return to step 1
#. cool down, current_temperature <- current_temperature * cool_down_rate
For more details, please refer to `AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates <https://arxiv.org/abs/1907.03141>`__.
Usage
^^^^^^
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import SimulatedAnnealingPruner
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
pruner = SimulatedAnnealingPruner(model, config_list, pruning_algorithm='l1', evaluator=evaluator, cool_down_rate=0.9, finetuner=finetuner)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()
For detailed example please refer to :githublink:`examples/model_compress/pruning/v2/simulated_anealing_pruning_torch.py <examples/model_compress/pruning/v2/simulated_anealing_pruning_torch.py>`
User configuration for Simulated Annealing Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.SimulatedAnnealingPruner
Auto Compress Pruner
--------------------
For total iteration number :math:`N`, AutoCompressPruner prune the model that survive the previous iteration for a fixed sparsity ratio (e.g., :math:`1-{(1-0.8)}^{(1/N)}`) to achieve the overall sparsity (e.g., :math:`0.8`):
.. code-block:: bash
1. Generate sparsities distribution using SimulatedAnnealingPruner
2. Perform ADMM-based pruning to generate pruning result for the next iteration.
For more details, please refer to `AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates <https://arxiv.org/abs/1907.03141>`__.
Usage
^^^^^^
.. code-block:: python
import nni
from nni.algorithms.compression.v2.pytorch.pruning import AutoCompressPruner
# make sure you have used nni.trace to wrap the optimizer class before initialize
traced_optimizer = nni.trace(torch.optim.Adam)(model.parameters())
config_list = [{ 'sparsity': 0.8, 'op_types': ['Conv2d'] }]
admm_params = {
'trainer': trainer,
'traced_optimizer': traced_optimizer,
'criterion': criterion,
'iterations': 10,
'training_epochs': 1
}
sa_params = {
'evaluator': evaluator
}
pruner = AutoCompressPruner(model, config_list, 10, admm_params, sa_params, finetuner=finetuner)
pruner.compress()
_, model, masks, _, _ = pruner.get_best_result()
The full script can be found :githublink:`here <examples/model_compress/pruning/v2/auto_compress_pruner.py>`.
User configuration for Auto Compress Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.AutoCompressPruner
AMC Pruner
----------
AMC pruner leverages reinforcement learning to provide the model compression policy.
According to the author, this learning-based compression policy outperforms conventional rule-based compression policy by having a higher compression ratio,
better preserving the accuracy and freeing human labor.
For more details, please refer to `AMC: AutoML for Model Compression and Acceleration on Mobile Devices <https://arxiv.org/pdf/1802.03494.pdf>`__.
Usage
^^^^^
PyTorch code
.. code-block:: python
from nni.algorithms.compression.v2.pytorch.pruning import AMCPruner
config_list = [{'op_types': ['Conv2d'], 'total_sparsity': 0.5, 'max_sparsity_per_layer': 0.8}]
pruner = AMCPruner(400, model, config_list, dummy_input, evaluator, finetuner=finetuner)
pruner.compress()
The full script can be found :githublink:`here <examples/model_compress/pruning/v2/amc_pruning_torch.py>`.
User configuration for AMC Pruner
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**PyTorch**
.. autoclass:: nni.algorithms.compression.v2.pytorch.pruning.AMCPruner
.. 1ec93e31648291b0c881655304116b50
剪枝(V2版本)
===============
剪枝(V2版本)是对旧版本的重构,提供了更强大的功能。
与旧版本相比,迭代剪枝过程与剪枝器(pruner)分离,剪枝器只负责剪枝且生成掩码一次。
更重要的是,V2版本统一了剪枝过程,并提供了更自由的剪枝组件组合。
任务生成器(task generator)只关心在每一轮中应该达到的修剪效果,并使用配置列表(config list)来表示下一步如何修剪。
剪枝器将使用任务生成器提供的模型和配置列表重置,然后在当前步骤中生成掩码。
有关更清晰的架构,请参考下图。
.. image:: ../../img/pruning_process.png
:target: ../../img/pruning_process.png
:alt:
在V2版本中,修剪过程通常由剪枝调度器(pruning scheduler)驱动,它包含一个特定的剪枝器和一个任务生成器。
但是用户也可以像V1版本中那样直接使用剪枝器。
有关详细信息,请参阅以下教程:
.. toctree::
:maxdepth: 1
剪枝算法 <v2_pruning_algo>
剪枝调度器接口 <v2_scheduler>
剪枝配置 <v2_pruning_config_list>
Retiarii API Reference
======================
.. contents::
Inline Mutation APIs
--------------------
.. autoclass:: nni.retiarii.nn.pytorch.LayerChoice
:members:
.. autoclass:: nni.retiarii.nn.pytorch.InputChoice
:members:
.. autoclass:: nni.retiarii.nn.pytorch.ValueChoice
:members:
.. autoclass:: nni.retiarii.nn.pytorch.ChosenInputs
:members:
.. autoclass:: nni.retiarii.nn.pytorch.Repeat
:members:
.. autoclass:: nni.retiarii.nn.pytorch.Cell
:members:
Graph Mutation APIs
-------------------
.. autoclass:: nni.retiarii.Mutator
:members:
.. autoclass:: nni.retiarii.Model
:members:
.. autoclass:: nni.retiarii.Graph
:members:
.. autoclass:: nni.retiarii.Node
:members:
.. autoclass:: nni.retiarii.Edge
:members:
.. autoclass:: nni.retiarii.Operation
:members:
Evaluators
----------
.. autoclass:: nni.retiarii.evaluator.FunctionalEvaluator
:members:
.. autoclass:: nni.retiarii.evaluator.pytorch.lightning.LightningModule
:members:
.. autoclass:: nni.retiarii.evaluator.pytorch.lightning.Classification
:members:
.. autoclass:: nni.retiarii.evaluator.pytorch.lightning.Regression
:members:
Exploration Strategies
----------------------
.. automodule:: nni.retiarii.strategy
:members:
:imported-members:
Retiarii Experiments
--------------------
.. autoclass:: nni.retiarii.experiment.pytorch.RetiariiExperiment
:members:
.. autoclass:: nni.retiarii.experiment.pytorch.RetiariiExeConfig
:members:
CGO Execution
-------------
.. autofunction:: nni.retiarii.evaluator.pytorch.cgo.evaluator.MultiModelSupervisedLearningModule
.. autofunction:: nni.retiarii.evaluator.pytorch.cgo.evaluator.Classification
.. autofunction:: nni.retiarii.evaluator.pytorch.cgo.evaluator.Regression
One-shot Implementation
-----------------------
.. automodule:: nni.retiarii.oneshot
:members:
:imported-members:
.. automodule:: nni.retiarii.oneshot.pytorch
:members:
:imported-members:
Utilities
---------
.. autofunction:: nni.retiarii.basic_unit
.. autofunction:: nni.retiarii.model_wrapper
.. autofunction:: nni.retiarii.fixed_arch
Citations
---------
.. bibliography::
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Example Usages of NAS Benchmarks"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import pprint\n",
"import time\n",
"\n",
"from nni.nas.benchmarks.nasbench101 import query_nb101_trial_stats\n",
"from nni.nas.benchmarks.nasbench201 import query_nb201_trial_stats\n",
"from nni.nas.benchmarks.nds import query_nds_trial_stats\n",
"\n",
"ti = time.time()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## NAS-Bench-101"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the following architecture as an example:\n",
"\n",
"![nas-101](../../img/nas-bench-101-example.png)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"arch = {\n",
" 'op1': 'conv3x3-bn-relu',\n",
" 'op2': 'maxpool3x3',\n",
" 'op3': 'conv3x3-bn-relu',\n",
" 'op4': 'conv3x3-bn-relu',\n",
" 'op5': 'conv1x1-bn-relu',\n",
" 'input1': [0],\n",
" 'input2': [1],\n",
" 'input3': [2],\n",
" 'input4': [0],\n",
" 'input5': [0, 3, 4],\n",
" 'input6': [2, 5]\n",
"}\n",
"for t in query_nb101_trial_stats(arch, 108, include_intermediates=True):\n",
" pprint.pprint(t)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An architecture of NAS-Bench-101 could be trained more than once. Each element of the returned generator is a dict which contains one of the training results of this trial config (architecture + hyper-parameters) including train/valid/test accuracy, training time, number of epochs, etc. The results of NAS-Bench-201 and NDS follow similar formats."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## NAS-Bench-201"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the following architecture as an example:\n",
"\n",
"![nas-201](../../img/nas-bench-201-example.png)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"arch = {\n",
" '0_1': 'avg_pool_3x3',\n",
" '0_2': 'conv_1x1',\n",
" '1_2': 'skip_connect',\n",
" '0_3': 'conv_1x1',\n",
" '1_3': 'skip_connect',\n",
" '2_3': 'skip_connect'\n",
"}\n",
"for t in query_nb201_trial_stats(arch, 200, 'cifar100'):\n",
" pprint.pprint(t)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Intermediate results are also available."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"for t in query_nb201_trial_stats(arch, None, 'imagenet16-120', include_intermediates=True):\n",
" print(t['config'])\n",
" print('Intermediates:', len(t['intermediates']))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## NDS"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the following architecture as an example:<br>\n",
"![nds](../../img/nas-bench-nds-example.png)\n",
"\n",
"Here, `bot_muls`, `ds`, `num_gs`, `ss` and `ws` stand for \"bottleneck multipliers\", \"depths\", \"number of groups\", \"strides\" and \"widths\" respectively."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"model_spec = {\n",
" 'bot_muls': [0.0, 0.25, 0.25, 0.25],\n",
" 'ds': [1, 16, 1, 4],\n",
" 'num_gs': [1, 2, 1, 2],\n",
" 'ss': [1, 1, 2, 2],\n",
" 'ws': [16, 64, 128, 16]\n",
"}\n",
"# Use none as a wildcard\n",
"for t in query_nds_trial_stats('residual_bottleneck', None, None, model_spec, None, 'cifar10'):\n",
" pprint.pprint(t)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"model_spec = {\n",
" 'bot_muls': [0.0, 0.25, 0.25, 0.25],\n",
" 'ds': [1, 16, 1, 4],\n",
" 'num_gs': [1, 2, 1, 2],\n",
" 'ss': [1, 1, 2, 2],\n",
" 'ws': [16, 64, 128, 16]\n",
"}\n",
"for t in query_nds_trial_stats('residual_bottleneck', None, None, model_spec, None, 'cifar10', include_intermediates=True):\n",
" pprint.pprint(t['intermediates'][:10])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"model_spec = {'ds': [1, 12, 12, 12], 'ss': [1, 1, 2, 2], 'ws': [16, 24, 24, 40]}\n",
"for t in query_nds_trial_stats('residual_basic', 'resnet', 'random', model_spec, {}, 'cifar10'):\n",
" pprint.pprint(t)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# get the first one\n",
"pprint.pprint(next(query_nds_trial_stats('vanilla', None, None, None, None, None)))"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# count number\n",
"model_spec = {'num_nodes_normal': 5, 'num_nodes_reduce': 5, 'depth': 12, 'width': 32, 'aux': False, 'drop_prob': 0.0}\n",
"cell_spec = {\n",
" 'normal_0_op_x': 'avg_pool_3x3',\n",
" 'normal_0_input_x': 0,\n",
" 'normal_0_op_y': 'conv_7x1_1x7',\n",
" 'normal_0_input_y': 1,\n",
" 'normal_1_op_x': 'sep_conv_3x3',\n",
" 'normal_1_input_x': 2,\n",
" 'normal_1_op_y': 'sep_conv_5x5',\n",
" 'normal_1_input_y': 0,\n",
" 'normal_2_op_x': 'dil_sep_conv_3x3',\n",
" 'normal_2_input_x': 2,\n",
" 'normal_2_op_y': 'dil_sep_conv_3x3',\n",
" 'normal_2_input_y': 2,\n",
" 'normal_3_op_x': 'skip_connect',\n",
" 'normal_3_input_x': 4,\n",
" 'normal_3_op_y': 'dil_sep_conv_3x3',\n",
" 'normal_3_input_y': 4,\n",
" 'normal_4_op_x': 'conv_7x1_1x7',\n",
" 'normal_4_input_x': 2,\n",
" 'normal_4_op_y': 'sep_conv_3x3',\n",
" 'normal_4_input_y': 4,\n",
" 'normal_concat': [3, 5, 6],\n",
" 'reduce_0_op_x': 'avg_pool_3x3',\n",
" 'reduce_0_input_x': 0,\n",
" 'reduce_0_op_y': 'dil_sep_conv_3x3',\n",
" 'reduce_0_input_y': 1,\n",
" 'reduce_1_op_x': 'sep_conv_3x3',\n",
" 'reduce_1_input_x': 0,\n",
" 'reduce_1_op_y': 'sep_conv_3x3',\n",
" 'reduce_1_input_y': 0,\n",
" 'reduce_2_op_x': 'skip_connect',\n",
" 'reduce_2_input_x': 2,\n",
" 'reduce_2_op_y': 'sep_conv_7x7',\n",
" 'reduce_2_input_y': 0,\n",
" 'reduce_3_op_x': 'conv_7x1_1x7',\n",
" 'reduce_3_input_x': 4,\n",
" 'reduce_3_op_y': 'skip_connect',\n",
" 'reduce_3_input_y': 4,\n",
" 'reduce_4_op_x': 'conv_7x1_1x7',\n",
" 'reduce_4_input_x': 0,\n",
" 'reduce_4_op_y': 'conv_7x1_1x7',\n",
" 'reduce_4_input_y': 5,\n",
" 'reduce_concat': [3, 6]\n",
"}\n",
"\n",
"for t in query_nds_trial_stats('nas_cell', None, None, model_spec, cell_spec, 'cifar10'):\n",
" assert t['config']['model_spec'] == model_spec\n",
" assert t['config']['cell_spec'] == cell_spec\n",
" pprint.pprint(t)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# count number\n",
"print('NDS (amoeba) count:', len(list(query_nds_trial_stats(None, 'amoeba', None, None, None, None, None))))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## NLP"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"metadata": false
}
},
"source": [
"Use the following two architectures as examples. \n",
"The arch in the paper is called \"receipe\" with nested variable, and now it is nunested in the benchmarks for NNI.\n",
"An arch has multiple Node, Node_input_n and Node_op, you can refer to doc for more details.\n",
"\n",
"arch1 : <img src=\"../../img/nas-bench-nlp-example1.jpeg\" width=400 height=300 /> \n",
"\n",
"\n",
"arch2 : <img src=\"../../img/nas-bench-nlp-example2.jpeg\" width=400 height=300 /> \n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"{'config': {'arch': {'h_new_0_input_0': 'node_3',\n 'h_new_0_input_1': 'node_2',\n 'h_new_0_input_2': 'node_1',\n 'h_new_0_op': 'blend',\n 'node_0_input_0': 'x',\n 'node_0_input_1': 'h_prev_0',\n 'node_0_op': 'linear',\n 'node_1_input_0': 'node_0',\n 'node_1_op': 'activation_tanh',\n 'node_2_input_0': 'h_prev_0',\n 'node_2_input_1': 'node_1',\n 'node_2_input_2': 'x',\n 'node_2_op': 'linear',\n 'node_3_input_0': 'node_2',\n 'node_3_op': 'activation_leaky_relu'},\n 'dataset': 'ptb',\n 'id': 20003},\n 'id': 16291,\n 'test_loss': 4.680262297102549,\n 'train_loss': 4.132040537087838,\n 'training_time': 177.05208373069763,\n 'val_loss': 4.707944253177966}\n"
]
}
],
"source": [
"import pprint\n",
"from nni.nas.benchmarks.nlp import query_nlp_trial_stats\n",
"\n",
"arch1 = {'h_new_0_input_0': 'node_3', 'h_new_0_input_1': 'node_2', 'h_new_0_input_2': 'node_1', 'h_new_0_op': 'blend', 'node_0_input_0': 'x', 'node_0_input_1': 'h_prev_0', 'node_0_op': 'linear','node_1_input_0': 'node_0', 'node_1_op': 'activation_tanh', 'node_2_input_0': 'h_prev_0', 'node_2_input_1': 'node_1', 'node_2_input_2': 'x', 'node_2_op': 'linear', 'node_3_input_0': 'node_2', 'node_3_op': 'activation_leaky_relu'}\n",
"for i in query_nlp_trial_stats(arch=arch1, dataset=\"ptb\"):\n",
" pprint.pprint(i)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[{'current_epoch': 46,\n 'id': 1796,\n 'test_loss': 6.233430054978619,\n 'train_loss': 6.4866799231542664,\n 'training_time': 146.5680329799652,\n 'val_loss': 6.326836978687959},\n {'current_epoch': 47,\n 'id': 1797,\n 'test_loss': 6.2402057403023825,\n 'train_loss': 6.485401405247535,\n 'training_time': 146.05511450767517,\n 'val_loss': 6.3239741605870865},\n {'current_epoch': 48,\n 'id': 1798,\n 'test_loss': 6.351145308363877,\n 'train_loss': 6.611281181173992,\n 'training_time': 145.8849437236786,\n 'val_loss': 6.436160816865809},\n {'current_epoch': 49,\n 'id': 1799,\n 'test_loss': 6.227155079159031,\n 'train_loss': 6.473414458249545,\n 'training_time': 145.51414465904236,\n 'val_loss': 6.313294354607077}]\n"
]
}
],
"source": [
"arch2 = {\"h_new_0_input_0\":\"node_0\",\"h_new_0_input_1\":\"node_1\",\"h_new_0_op\":\"elementwise_sum\",\"node_0_input_0\":\"x\",\"node_0_input_1\":\"h_prev_0\",\"node_0_op\":\"linear\",\"node_1_input_0\":\"node_0\",\"node_1_op\":\"activation_tanh\"}\n",
"for i in query_nlp_trial_stats(arch=arch2, dataset='wikitext-2', include_intermediates=True):\n",
" pprint.pprint(i['intermediates'][45:49])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"pycharm": {},
"tags": []
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Elapsed time: 5.60982608795166 seconds\n"
]
}
],
"source": [
"print('Elapsed time: ', time.time() - ti, 'seconds')"
]
}
],
"metadata": {
"file_extension": ".py",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"name": "python",
"version": "3.8.5-final"
},
"mimetype": "text/x-python",
"name": "python",
"npconvert_exporter": "python",
"orig_nbformat": 2,
"pygments_lexer": "ipython3",
"version": 3
},
"nbformat": 4,
"nbformat_minor": 2
}
\ No newline at end of file
DARTS
=====
Introduction
------------
The paper `DARTS: Differentiable Architecture Search <https://arxiv.org/abs/1806.09055>`__ addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Their method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent.
Authors' code optimizes the network weights and architecture weights alternatively in mini-batches. They further explore the possibility that uses second order optimization (unroll) instead of first order, to improve the performance.
Implementation on NNI is based on the `official implementation <https://github.com/quark0/darts>`__ and a `popular 3rd-party repo <https://github.com/khanrc/pt.darts>`__. DARTS on NNI is designed to be general for arbitrary search space. A CNN search space tailored for CIFAR10, same as the original paper, is implemented as a use case of DARTS.
Reproduction Results
--------------------
The above-mentioned example is meant to reproduce the results in the paper, we do experiments with first and second order optimization. Due to the time limit, we retrain *only the best architecture* derived from the search phase and we repeat the experiment *only once*. Our results is currently on par with the results reported in paper. We will add more results later when ready.
.. list-table::
:header-rows: 1
:widths: auto
* -
- In paper
- Reproduction
* - First order (CIFAR10)
- 3.00 +/- 0.14
- 2.78
* - Second order (CIFAR10)
- 2.76 +/- 0.09
- 2.80
Examples
--------
CNN Search Space
^^^^^^^^^^^^^^^^
:githublink:`Example code <examples/nas/oneshot/darts>`
.. code-block:: bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/oneshot/darts
python3 search.py
# train the best architecture
python3 retrain.py --arc-checkpoint ./checkpoints/epoch_49.json
Reference
---------
PyTorch
^^^^^^^
.. autoclass:: nni.retiarii.oneshot.pytorch.DartsTrainer
:noindex:
Limitations
-----------
* DARTS doesn't support DataParallel and needs to be customized in order to support DistributedDataParallel.
ENAS
====
Introduction
------------
The paper `Efficient Neural Architecture Search via Parameter Sharing <https://arxiv.org/abs/1802.03268>`__ uses parameter sharing between child models to accelerate the NAS process. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss.
Implementation on NNI is based on the `official implementation in Tensorflow <https://github.com/melodyguan/enas>`__\ , including a general-purpose Reinforcement-learning controller and a trainer that trains target network and this controller alternatively. Following paper, we have also implemented macro and micro search space on CIFAR10 to demonstrate how to use these trainers. Since code to train from scratch on NNI is not ready yet, reproduction results are currently unavailable.
Examples
--------
CIFAR10 Macro/Micro Search Space
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:githublink:`Example code <examples/nas/oneshot/enas>`
.. code-block:: bash
# In case NNI code is not cloned. If the code is cloned already, ignore this line and enter code folder.
git clone https://github.com/Microsoft/nni.git
# search the best architecture
cd examples/nas/oneshot/enas
# search in macro search space
python3 search.py --search-for macro
# search in micro search space
python3 search.py --search-for micro
# view more options for search
python3 search.py -h
Reference
---------
PyTorch
^^^^^^^
.. autoclass:: nni.retiarii.oneshot.pytorch.EnasTrainer
:noindex:
Exploration Strategies for Multi-trial NAS
==========================================
Usage of Exploration Strategy
-----------------------------
To use an exploration strategy, users simply instantiate an exploration strategy and pass the instantiated object to ``RetiariiExperiment``. Below is a simple example.
.. code-block:: python
import nni.retiarii.strategy as strategy
exploration_strategy = strategy.Random(dedup=True) # dedup=False if deduplication is not wanted
Supported Exploration Strategies
--------------------------------
NNI provides the following exploration strategies for multi-trial NAS.
.. list-table::
:header-rows: 1
:widths: auto
* - Name
- Brief Introduction of Algorithm
* - `Random Strategy <./ApiReference.rst#nni.retiarii.strategy.Random>`__
- Randomly sampling new model(s) from user defined model space. (``nni.retiarii.strategy.Random``)
* - `Grid Search <./ApiReference.rst#nni.retiarii.strategy.GridSearch>`__
- Sampling new model(s) from user defined model space using grid search algorithm. (``nni.retiarii.strategy.GridSearch``)
* - `Regularized Evolution <./ApiReference.rst#nni.retiarii.strategy.RegularizedEvolution>`__
- Generating new model(s) from generated models using `regularized evolution algorithm <https://arxiv.org/abs/1802.01548>`__ . (``nni.retiarii.strategy.RegularizedEvolution``)
* - `TPE Strategy <./ApiReference.rst#nni.retiarii.strategy.TPEStrategy>`__
- Sampling new model(s) from user defined model space using `TPE algorithm <https://papers.nips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf>`__ . (``nni.retiarii.strategy.TPEStrategy``)
* - `RL Strategy <./ApiReference.rst#nni.retiarii.strategy.PolicyBasedRL>`__
- It uses `PPO algorithm <https://arxiv.org/abs/1707.06347>`__ to sample new model(s) from user defined model space. (``nni.retiarii.strategy.PolicyBasedRL``)
Customize Exploration Strategy
------------------------------
If users want to innovate a new exploration strategy, they can easily customize a new one following the interface provided by NNI. Specifically, users should inherit the base strategy class ``BaseStrategy``, then implement the member function ``run``. This member function takes ``base_model`` and ``applied_mutators`` as its input arguments. It can simply apply the user specified mutators in ``applied_mutators`` onto ``base_model`` to generate a new model. When a mutator is applied, it should be bound with a sampler (e.g., ``RandomSampler``). Every sampler implements the ``choice`` function which chooses value(s) from candidate values. The ``choice`` functions invoked in mutators are executed with the sampler.
Below is a very simple random strategy, which makes the choices completely random.
.. code-block:: python
from nni.retiarii import Sampler
class RandomSampler(Sampler):
def choice(self, candidates, mutator, model, index):
return random.choice(candidates)
class RandomStrategy(BaseStrategy):
def __init__(self):
self.random_sampler = RandomSampler()
def run(self, base_model, applied_mutators):
_logger.info('stargety start...')
while True:
avail_resource = query_available_resources()
if avail_resource > 0:
model = base_model
_logger.info('apply mutators...')
_logger.info('mutators: %s', str(applied_mutators))
for mutator in applied_mutators:
mutator.bind_sampler(self.random_sampler)
model = mutator.apply(model)
# run models
submit_models(model)
else:
time.sleep(2)
You can find that this strategy does not know the search space beforehand, it passively makes decisions every time ``choice`` is invoked from mutators. If a strategy wants to know the whole search space before making any decision (e.g., TPE, SMAC), it can use ``dry_run`` function provided by ``Mutator`` to obtain the space. An example strategy can be found :githublink:`here <nni/retiarii/strategy/tpe_strategy.py>`.
After generating a new model, the strategy can use our provided APIs (e.g., ``submit_models``, ``is_stopped_exec``) to submit the model and get its reported results. More APIs can be found in `API References <./ApiReference.rst>`__.
FBNet
======
.. note:: This one-shot NAS is still implemented under NNI NAS 1.0, and will `be migrated to Retiarii framework in v2.4 <https://github.com/microsoft/nni/issues/3814>`__.
For the mobile application of facial landmark, based on the basic architecture of PFLD model, we have applied the FBNet (Block-wise DNAS) to design an concise model with the trade-off between latency and accuracy. References are listed as below:
* `FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search <https://arxiv.org/abs/1812.03443>`__
* `PFLD: A Practical Facial Landmark Detector <https://arxiv.org/abs/1902.10859>`__
FBNet is a block-wise differentiable NAS method (Block-wise DNAS), where the best candidate building blocks can be chosen by using Gumbel Softmax random sampling and differentiable training. At each layer (or stage) to be searched, the diverse candidate blocks are side by side planned (just like the effectiveness of structural re-parameterization), leading to sufficient pre-training of the supernet. The pre-trained supernet is further sampled for finetuning of the subnet, to achieve better performance.
.. image:: ../../img/fbnet.png
:target: ../../img/fbnet.png
:alt:
PFLD is a lightweight facial landmark model for realtime application. The architecture of PLFD is firstly simplified for acceleration, by using the stem block of PeleeNet, average pooling with depthwise convolution and eSE module.
To achieve better trade-off between latency and accuracy, the FBNet is further applied on the simplified PFLD for searching the best block at each specific layer. The search space is based on the FBNet space, and optimized for mobile deployment by using the average pooling with depthwise convolution and eSE module etc.
Experiments
------------
To verify the effectiveness of FBNet applied on PFLD, we choose the open source dataset with 106 landmark points as the benchmark:
* `Grand Challenge of 106-Point Facial Landmark Localization <https://arxiv.org/abs/1905.03469>`__
The baseline model is denoted as MobileNet-V3 PFLD (`Reference baseline <https://github.com/Hsintao/pfld_106_face_landmarks>`__), and the searched model is denoted as Subnet. The experimental results are listed as below, where the latency is tested on Qualcomm 625 CPU (ARMv8):
.. list-table::
:header-rows: 1
:widths: auto
* - Model
- Size
- Latency
- Validation NME
* - MobileNet-V3 PFLD
- 1.01MB
- 10ms
- 6.22%
* - Subnet
- 693KB
- 1.60ms
- 5.58%
Example
--------
`Example code <https://github.com/microsoft/nni/tree/master/examples/nas/oneshot/pfld>`__
Please run the following scripts at the example directory.
The Python dependencies used here are listed as below:
.. code-block:: bash
numpy==1.18.5
opencv-python==4.5.1.48
torch==1.6.0
torchvision==0.7.0
onnx==1.8.1
onnx-simplifier==0.3.5
onnxruntime==1.7.0
Data Preparation
-----------------
Firstly, you should download the dataset `106points dataset <https://drive.google.com/file/d/1I7QdnLxAlyG2Tq3L66QYzGhiBEoVfzKo/view?usp=sharing>`__ to the path ``./data/106points`` . The dataset includes the train-set and test-set:
.. code-block:: bash
./data/106points/train_data/imgs
./data/106points/train_data/list.txt
./data/106points/test_data/imgs
./data/106points/test_data/list.txt
Quik Start
-----------
1. Search
^^^^^^^^^^
Based on the architecture of simplified PFLD, the setting of multi-stage search space and hyper-parameters for searching should be firstly configured to construct the supernet, as an example:
.. code-block:: bash
from lib.builder import search_space
from lib.ops import PRIMITIVES
from lib.supernet import PFLDInference, AuxiliaryNet
from nni.algorithms.nas.pytorch.fbnet import LookUpTable, NASConfig,
# configuration of hyper-parameters
# search_space defines the multi-stage search space
nas_config = NASConfig(
model_dir="./ckpt_save",
nas_lr=0.01,
mode="mul",
alpha=0.25,
beta=0.6,
search_space=search_space,
)
# lookup table to manage the information
lookup_table = LookUpTable(config=nas_config, primitives=PRIMITIVES)
# created supernet
pfld_backbone = PFLDInference(lookup_table)
After creation of the supernet with the specification of search space and hyper-parameters, we can run below command to start searching and training of the supernet:
.. code-block:: bash
python train.py --dev_id "0,1" --snapshot "./ckpt_save" --data_root "./data/106points"
The validation accuracy will be shown during training, and the model with best accuracy will be saved as ``./ckpt_save/supernet/checkpoint_best.pth``.
2. Finetune
^^^^^^^^^^^^
After pre-training of the supernet, we can run below command to sample the subnet and conduct the finetuning:
.. code-block:: bash
python retrain.py --dev_id "0,1" --snapshot "./ckpt_save" --data_root "./data/106points" \
--supernet "./ckpt_save/supernet/checkpoint_best.pth"
The validation accuracy will be shown during training, and the model with best accuracy will be saved as ``./ckpt_save/subnet/checkpoint_best.pth``.
3. Export
^^^^^^^^^^
After the finetuning of subnet, we can run below command to export the ONNX model:
.. code-block:: bash
python export.py --supernet "./ckpt_save/supernet/checkpoint_best.pth" \
--resume "./ckpt_save/subnet/checkpoint_best.pth"
ONNX model is saved as ``./output/subnet.onnx``, which can be further converted to the mobile inference engine by using `MNN <https://github.com/alibaba/MNN>`__ .
The checkpoints of pre-trained supernet and subnet are offered as below:
* `Supernet <https://drive.google.com/file/d/1TCuWKq8u4_BQ84BWbHSCZ45N3JGB9kFJ/view?usp=sharing>`__
* `Subnet <https://drive.google.com/file/d/160rkuwB7y7qlBZNM3W_T53cb6MQIYHIE/view?usp=sharing>`__
* `ONNX model <https://drive.google.com/file/d/1s-v-aOiMv0cqBspPVF3vSGujTbn_T_Uo/view?usp=sharing>`__
\ No newline at end of file
Hypermodules
============
Hypermodule is a (PyTorch) module which contains many architecture/hyperparameter candidates for this module. By using hypermodule in user defined model, NNI will help users automatically find the best architecture/hyperparameter of the hypermodules for this model. This follows the design philosophy of Retiarii that users write DNN model as a space.
There has been proposed some hypermodules in NAS community, such as AutoActivation, AutoDropout. Some of them are implemented in the Retiarii framework.
.. autoclass:: nni.retiarii.nn.pytorch.AutoActivation
:members:
\ No newline at end of file
Mutation Primitives
===================
.. TODO: this file will be merged with API reference in future.
To make users easily express a model space within their PyTorch/TensorFlow model, NNI provides some inline mutation APIs as shown below.
We show the most common use case here. For advanced usages, please see `reference <./ApiReference.rst>`__.
.. note:: We can actively adding more mutation primitives. If you have any suggestions, feel free to `ask here <https://github.com/microsoft/nni/issues>`__.
``nn.LayerChoice``
""""""""""""""""""
API reference: :class:`nni.retiarii.nn.pytorch.LayerChoice`
It allows users to put several candidate operations (e.g., PyTorch modules), one of them is chosen in each explored model.
.. code-block:: python
# import nni.retiarii.nn.pytorch as nn
# declared in `__init__` method
self.layer = nn.LayerChoice([
ops.PoolBN('max', channels, 3, stride, 1),
ops.SepConv(channels, channels, 3, stride, 1),
nn.Identity()
])
# invoked in `forward` method
out = self.layer(x)
``nn.InputChoice``
""""""""""""""""""
API reference: :class:`nni.retiarii.nn.pytorch.InputChoice`
It is mainly for choosing (or trying) different connections. It takes several tensors and chooses ``n_chosen`` tensors from them.
.. code-block:: python
# import nni.retiarii.nn.pytorch as nn
# declared in `__init__` method
self.input_switch = nn.InputChoice(n_chosen=1)
# invoked in `forward` method, choose one from the three
out = self.input_switch([tensor1, tensor2, tensor3])
``nn.ValueChoice``
""""""""""""""""""
API reference: :class:`nni.retiarii.nn.pytorch.ValueChoice`
It is for choosing one value from some candidate values. The most common use cases are:
* Used as input arguments of :class:`nni.retiarii.basic_unit` (i.e., modules in ``nni.retiarii.nn.pytorch`` and user-defined modules decorated with ``@basic_unit``).
* Used as input arguments of evaluator (*new in v2.7*).
Examples are as follows:
.. code-block:: python
# import nni.retiarii.nn.pytorch as nn
# used in `__init__` method
self.conv = nn.Conv2d(XX, XX, kernel_size=nn.ValueChoice([1, 3, 5]))
self.op = MyOp(nn.ValueChoice([0, 1]), nn.ValueChoice([-1, 1]))
# used in evaluator
def train_and_evaluate(model_cls, learning_rate):
...
self.evaluator = FunctionalEvaluator(train_and_evaluate, learning_rate=nn.ValueChoice([1e-3, 1e-2, 1e-1]))
Value choices supports arithmetic operators, which is particularly useful when searching for a network width multiplier:
.. code-block:: python
# init
scale = nn.ValueChoice([1.0, 1.5, 2.0])
self.conv1 = nn.Conv2d(3, round(scale * 16))
self.conv2 = nn.Conv2d(round(scale * 16), round(scale * 64))
self.conv3 = nn.Conv2d(round(scale * 64), round(scale * 256))
# forward
return self.conv3(self.conv2(self.conv1(x)))
Or when kernel size and padding are coupled so as to keep the output size constant:
.. code-block:: python
# init
ks = nn.ValueChoice([3, 5, 7])
self.conv = nn.Conv2d(3, 16, kernel_size=ks, padding=(ks - 1) // 2)
# forward
return self.conv(x)
Or when several layers are concatenated for a final layer.
.. code-block:: python
# init
self.linear1 = nn.Linear(3, nn.ValueChoice([1, 2, 3], label='a'))
self.linear2 = nn.Linear(3, nn.ValueChoice([4, 5, 6], label='b'))
self.final = nn.Linear(nn.ValueChoice([1, 2, 3], label='a') + nn.ValueChoice([4, 5, 6], label='b'), 2)
# forward
return self.final(torch.cat([self.linear1(x), self.linear2(x)], 1))
Some advanced operators are also provided, such as ``nn.ValueChoice.max`` and ``nn.ValueChoice.cond``. See reference of :class:`nni.retiarii.nn.pytorch.ValueChoice` for more details.
.. tip::
All the APIs have an optional argument called ``label``, mutations with the same label will share the same choice. A typical example is,
.. code-block:: python
self.net = nn.Sequential(
nn.Linear(10, nn.ValueChoice([32, 64, 128], label='hidden_dim')),
nn.Linear(nn.ValueChoice([32, 64, 128], label='hidden_dim'), 3)
)
.. warning::
It looks as if a specific candidate has been chosen (e.g., the way you can put ``ValueChoice`` as a parameter of ``nn.ValueChoice``), but in fact it's a syntax sugar as because the basic units and evaluators do all the underlying works. That means, you cannot assume that ``ValueChoice`` can be used in the same way as its candidates. For example, the following usage will NOT work:
.. code-block:: python
self.blocks = []
for i in range(nn.ValueChoice([1, 2, 3])):
self.blocks.append(Block())
# NOTE: instead you should probably write
# self.blocks = nn.Repeat(Block(), (1, 3))
``nn.Repeat``
"""""""""""""
API reference: :class:`nni.retiarii.nn.pytorch.Repeat`
Repeat a block by a variable number of times.
.. code-block:: python
# import nni.retiarii.nn.pytorch as nn
# used in `__init__` method
# Block() will be deep copied and repeated 3 times
self.blocks = nn.Repeat(Block(), 3)
# Block() will be repeated 1, 2, or 3 times
self.blocks = nn.Repeat(Block(), (1, 3))
# Can be used together with layer choice
# With deep copy, the 3 layers will have the same label, thus share the choice
self.blocks = nn.Repeat(nn.LayerChoice([...]), (1, 3))
# To make the three layer choices independently
# Need a factory function that accepts index (0, 1, 2, ...) and returns the module of the `index`-th layer.
self.blocks = nn.Repeat(lambda index: nn.LayerChoice([...], label=f'layer{index}'), (1, 3))
``nn.Cell``
"""""""""""
API reference: :class:`nni.retiarii.nn.pytorch.Cell`
This cell structure is popularly used in `NAS literature <https://arxiv.org/abs/1611.01578>`__. High-level speaking, literatures often use the following glossaries.
.. list-table::
:widths: 25 75
* - Cell
- A cell consists of several nodes.
* - Node
- A node is the **sum** of several operators.
* - Operator
- Each operator is independently chosen from a list of user-specified candidate operators.
* - Operator's input
- Each operator has one input, chosen from previous nodes as well as predecessors.
* - Predecessors
- Input of cell. A cell can have multiple predecessors. Predecessors are sent to *preprocessor* for preprocessing.
* - Cell's output
- Output of cell. Usually concatenation of several nodes (possibly all nodes) in the cell. Cell's output, along with predecessors, are sent to *postprocessor* for postprocessing.
* - Preprocessor
- Extra preprocessing to predecessors. Usually used in shape alignment (e.g., predecessors have different shapes). By default, do nothing.
* - Postprocessor
- Extra postprocessing for cell's output. Usually used to chain cells with multiple Predecessors
(e.g., the next cell wants to have the outputs of both this cell and previous cell as its input). By default, directly use this cell's output.
Example usages:
.. code-block:: python
# import nni.retiarii.nn.pytorch as nn
# used in `__init__` method
# Choose between conv2d and maxpool2d.
# The cell have 4 nodes, 1 op per node, and 2 predecessors.
cell = nn.Cell([nn.Conv2d(32, 32, 3), nn.MaxPool2d(3)], 4, 1, 2)
# forward
cell([input1, input2])
# Use `merge_op` to specify how to construct the output.
# The output will then have dynamic shape, depending on which input has been used in the cell.
cell = nn.Cell([nn.Conv2d(32, 32, 3), nn.MaxPool2d(3)], 4, 1, 2, merge_op='loose_end')
# The op candidates can be callable that accepts node index in cell, op index in node, and input index.
cell = nn.Cell([
lambda node_index, op_index, input_index: nn.Conv2d(32, 32, 3, stride=2 if input_index < 1 else 1),
...
], 4, 1, 2)
# predecessor example
class Preprocessor:
def __init__(self):
self.conv1 = nn.Conv2d(16, 32, 1)
self.conv2 = nn.Conv2d(64, 32, 1)
def forward(self, x):
return [self.conv1(x[0]), self.conv2(x[1])]
cell = nn.Cell([nn.Conv2d(32, 32, 3), nn.MaxPool2d(3)], 4, 1, 2, preprocessor=Preprocessor())
cell([torch.randn(1, 16, 48, 48), torch.randn(1, 64, 48, 48)]) # the two inputs will be sent to conv1 and conv2 respectively
One-shot NAS
============
Before reading this tutorial, we highly recommend you to first go through the tutorial of how to `define a model space <./QuickStart.rst#define-your-model-space>`__.
Model Search with One-shot Trainer
----------------------------------
With a defined model space, users can explore the space in two ways. One is using strategy and single-arch evaluator as demonstrated `here <./QuickStart.rst#explore-the-defined-model-space>`__. The other is using one-shot trainer, which consumes much less computational resource compared to the first one. In this tutorial we focus on this one-shot approach. The principle of one-shot approach is combining all the models in a model space into one big model (usually called super-model or super-graph). It takes charge of both search, training and testing, by training and evaluating this big model.
We list the supported one-shot trainers here:
* DARTS trainer
* ENAS trainer
* ProxylessNAS trainer
* Single-path (random) trainer
See `API reference <./ApiReference.rst>`__ for detailed usages. Here, we show an example to use DARTS trainer manually.
.. code-block:: python
from nni.retiarii.oneshot.pytorch import DartsTrainer
trainer = DartsTrainer(
model=model,
loss=criterion,
metrics=lambda output, target: accuracy(output, target, topk=(1,)),
optimizer=optim,
num_epochs=args.epochs,
dataset=dataset_train,
batch_size=args.batch_size,
log_frequency=args.log_frequency,
unrolled=args.unrolled
)
trainer.fit()
final_architecture = trainer.export()
After the searching is done, we can use the exported architecture to instantiate the full network for retraining. Here is an example:
.. code-block:: python
from nni.retiarii import fixed_arch
with fixed_arch('/path/to/checkpoint.json'):
model = Model()
Retiarii for Neural Architecture Search
=======================================
.. attention:: NNI's latest NAS supports are all based on Retiarii Framework, users who are still on `early version using NNI NAS v1.0 <https://nni.readthedocs.io/en/v2.2/nas.html>`__ shall migrate your work to Retiarii as soon as possible.
.. contents::
Motivation
----------
Automatic neural architecture search is playing an increasingly important role in finding better models. Recent research has proven the feasibility of automatic NAS and has led to models that beat many manually designed and tuned models. Representative works include `NASNet <https://arxiv.org/abs/1707.07012>`__\ , `ENAS <https://arxiv.org/abs/1802.03268>`__\ , `DARTS <https://arxiv.org/abs/1806.09055>`__\ , `Network Morphism <https://arxiv.org/abs/1806.10282>`__\ , and `Evolution <https://arxiv.org/abs/1703.01041>`__. In addition, new innovations continue to emerge.
However, it is pretty hard to use existing NAS work to help develop common DNN models. Therefore, we designed `Retiarii <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__, a novel NAS/HPO framework, and implemented it in NNI. It helps users easily construct a model space (or search space, tuning space), and utilize existing NAS algorithms. The framework also facilitates NAS innovation and is used to design new NAS algorithms.
Overview
--------
There are three key characteristics of the Retiarii framework:
* Simple APIs are provided for defining model search space within PyTorch/TensorFlow model.
* SOTA NAS algorithms are built-in to be used for exploring model search space.
* System-level optimizations are implemented for speeding up the exploration.
There are two types of model space exploration approach: **Multi-trial NAS** and **One-shot NAS**. Mutli-trial NAS trains each sampled model in the model space independently, while One-shot NAS samples the model from a super model. After constructing the model space, users can use either exploration appraoch to explore the model space.
Multi-trial NAS
---------------
Multi-trial NAS means each sampled model from model space is trained independently. A typical multi-trial NAS is `NASNet <https://arxiv.org/abs/1707.07012>`__. The algorithm to sample models from model space is called exploration strategy. NNI has supported the following exploration strategies for multi-trial NAS.
.. list-table::
:header-rows: 1
:widths: auto
* - Exploration Strategy Name
- Brief Introduction of Algorithm
* - Random Strategy
- Randomly sampling new model(s) from user defined model space. (``nni.retiarii.strategy.Random``)
* - Grid Search
- Sampling new model(s) from user defined model space using grid search algorithm. (``nni.retiarii.strategy.GridSearch``)
* - Regularized Evolution
- Generating new model(s) from generated models using `regularized evolution algorithm <https://arxiv.org/abs/1802.01548>`__ . (``nni.retiarii.strategy.RegularizedEvolution``)
* - TPE Strategy
- Sampling new model(s) from user defined model space using `TPE algorithm <https://papers.nips.cc/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf>`__ . (``nni.retiarii.strategy.TPEStrategy``)
* - RL Strategy
- It uses `PPO algorithm <https://arxiv.org/abs/1707.06347>`__ to sample new model(s) from user defined model space. (``nni.retiarii.strategy.PolicyBasedRL``)
Please refer to `here <./multi_trial_nas.rst>`__ for detailed usage of multi-trial NAS.
One-shot NAS
------------
One-shot NAS means building model space into a super-model, training the super-model with weight sharing, and then sampling models from the super-model to find the best one. `DARTS <https://arxiv.org/abs/1806.09055>`__ is a typical one-shot NAS.
Below is the supported one-shot NAS algorithms. More one-shot NAS will be supported soon.
.. list-table::
:header-rows: 1
:widths: auto
* - One-shot Algorithm Name
- Brief Introduction of Algorithm
* - `ENAS <ENAS.rst>`__
- `Efficient Neural Architecture Search via Parameter Sharing <https://arxiv.org/abs/1802.03268>`__. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. It uses parameter sharing between child models to achieve fast speed and excellent performance.
* - `DARTS <DARTS.rst>`__
- `DARTS: Differentiable Architecture Search <https://arxiv.org/abs/1806.09055>`__ introduces a novel algorithm for differentiable network architecture search on bilevel optimization.
* - `SPOS <SPOS.rst>`__
- `Single Path One-Shot Neural Architecture Search with Uniform Sampling <https://arxiv.org/abs/1904.00420>`__ constructs a simplified supernet trained with a uniform path sampling method and applies an evolutionary algorithm to efficiently search for the best-performing architectures.
* - `ProxylessNAS <Proxylessnas.rst>`__
- `ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware <https://arxiv.org/abs/1812.00332>`__. It removes proxy, directly learns the architectures for large-scale target tasks and target hardware platforms.
Please refer to `here <one_shot_nas.rst>`__ for detailed usage of one-shot NAS algorithms.
Reference and Feedback
----------------------
* `Quick Start <./QuickStart.rst>`__ ;
* `Construct Your Model Space <./construct_space.rst>`__ ;
* `Retiarii: A Deep Learning Exploratory-Training Framework <https://www.usenix.org/system/files/osdi20-zhang_quanlu.pdf>`__ ;
* To `report a bug <https://github.com/microsoft/nni/issues/new?template=bug-report.rst>`__ for this feature in GitHub ;
* To `file a feature or improvement request <https://github.com/microsoft/nni/issues/new?template=enhancement.rst>`__ for this feature in GitHub .
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment