"\n# Pruning Quickstart\n\nModel pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.\nThere are three common practices for pruning a DNN model:\n\n#. Pre-training a model -> Pruning the model -> Fine-tuning the pruned model\n#. Pruning a model during training (i.e., pruning aware training) -> Fine-tuning the pruned model\n#. Pruning a model -> Training the pruned model from scratch\n\nNNI supports all of the above pruning practices by working on the key pruning stage.\nFollowing this tutorial for a quick look at how to use NNI to prune a model in a common practice.\n"
"\n# Pruning Quickstart\n\nHere is a three-minute video to get you started with model pruning.\n\n.. youtube:: wKh51Jnr0a8\n :align: center\n\nModel pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.\nThere are three common practices for pruning a DNN model:\n\n#. Pre-training a model -> Pruning the model -> Fine-tuning the pruned model\n#. Pruning a model during training (i.e., pruning aware training) -> Fine-tuning the pruned model\n#. Pruning a model -> Training the pruned model from scratch\n\nNNI supports all of the above pruning practices by working on the key pruning stage.\nFollowing this tutorial for a quick look at how to use NNI to prune a model in a common practice.\n"
/home/nishang/anaconda3/envs/MCM/lib/python3.9/site-packages/torch/_tensor.py:1013:UserWarning:The.gradattributeofaTensorthatisnotaleafTensorisbeingaccessed.Its.gradattributewon't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1640811803361/work/build/aten/src/ATen/core/TensorBody.h:417.)
/home/ningshang/anaconda3/envs/nni-dev/lib/python3.8/site-packages/torch/_tensor.py:1013:UserWarning:The.gradattributeofaTensorthatisnotaleafTensorisbeingaccessed.Its.gradattributewon't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:417.)
return self._grad
.. GENERATED FROM PYTHON SOURCE LINES 98-99
.. GENERATED FROM PYTHON SOURCE LINES 103-104
the model will become real smaller after speedup
.. GENERATED FROM PYTHON SOURCE LINES 99-101
.. GENERATED FROM PYTHON SOURCE LINES 104-106
.. code-block:: default
...
...
@@ -301,14 +306,14 @@ the model will become real smaller after speedup
.. GENERATED FROM PYTHON SOURCE LINES 102-106
.. GENERATED FROM PYTHON SOURCE LINES 107-111
Fine-tuning Compacted Model
---------------------------
Note that if the model has been sped up, you need to re-initialize a new optimizer for fine-tuning.
Because speedup will replace the masked big layers with dense small ones.
.. GENERATED FROM PYTHON SOURCE LINES 106-110
.. GENERATED FROM PYTHON SOURCE LINES 111-115
.. code-block:: default
...
...
@@ -326,7 +331,7 @@ Because speedup will replace the masked big layers with dense small ones.
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 58.337 seconds)
**Total running time of the script:** ( 1 minutes 30.730 seconds)
"\n# Quantization Quickstart\n\nQuantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.\n\nIn NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.\nHere we use `QAT_Quantizer` as an example to show the usage of quantization in NNI.\n"
"\n# Quantization Quickstart\n\nHere is a four-minute video to get you started with model quantization.\n\n.. youtube:: MSfV7AyfiA4\n :align: center\n\nQuantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.\n\nIn NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.\nHere we use `QAT_Quantizer` as an example to show the usage of quantization in NNI.\n"