"\n# Pruning Quickstart\n\nModel pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.\nThere are three common practices for pruning a DNN model:\n\n#. Pre-training a model -> Pruning the model -> Fine-tuning the pruned model\n#. Pruning a model during training (i.e., pruning aware training) -> Fine-tuning the pruned model\n#. Pruning a model -> Training the pruned model from scratch\n\nNNI supports all of the above pruning practices by working on the key pruning stage.\nFollowing this tutorial for a quick look at how to use NNI to prune a model in a common practice.\n"
"\n# Pruning Quickstart\n\nHere is a three-minute video to get you started with model pruning.\n\n.. youtube:: wKh51Jnr0a8\n :align: center\n\nModel pruning is a technique to reduce the model size and computation by reducing model weight size or intermediate state size.\nThere are three common practices for pruning a DNN model:\n\n#. Pre-training a model -> Pruning the model -> Fine-tuning the pruned model\n#. Pruning a model during training (i.e., pruning aware training) -> Fine-tuning the pruned model\n#. Pruning a model -> Training the pruned model from scratch\n\nNNI supports all of the above pruning practices by working on the key pruning stage.\nFollowing this tutorial for a quick look at how to use NNI to prune a model in a common practice.\n"
/home/nishang/anaconda3/envs/MCM/lib/python3.9/site-packages/torch/_tensor.py:1013:UserWarning:The.gradattributeofaTensorthatisnotaleafTensorisbeingaccessed.Its.gradattributewon't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /opt/conda/conda-bld/pytorch_1640811803361/work/build/aten/src/ATen/core/TensorBody.h:417.)
/home/ningshang/anaconda3/envs/nni-dev/lib/python3.8/site-packages/torch/_tensor.py:1013:UserWarning:The.gradattributeofaTensorthatisnotaleafTensorisbeingaccessed.Its.gradattributewon't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at aten/src/ATen/core/TensorBody.h:417.)
return self._grad
return self._grad
.. GENERATED FROM PYTHON SOURCE LINES 98-99
.. GENERATED FROM PYTHON SOURCE LINES 103-104
the model will become real smaller after speedup
the model will become real smaller after speedup
.. GENERATED FROM PYTHON SOURCE LINES 99-101
.. GENERATED FROM PYTHON SOURCE LINES 104-106
.. code-block:: default
.. code-block:: default
...
@@ -301,14 +306,14 @@ the model will become real smaller after speedup
...
@@ -301,14 +306,14 @@ the model will become real smaller after speedup
.. GENERATED FROM PYTHON SOURCE LINES 102-106
.. GENERATED FROM PYTHON SOURCE LINES 107-111
Fine-tuning Compacted Model
Fine-tuning Compacted Model
---------------------------
---------------------------
Note that if the model has been sped up, you need to re-initialize a new optimizer for fine-tuning.
Note that if the model has been sped up, you need to re-initialize a new optimizer for fine-tuning.
Because speedup will replace the masked big layers with dense small ones.
Because speedup will replace the masked big layers with dense small ones.
.. GENERATED FROM PYTHON SOURCE LINES 106-110
.. GENERATED FROM PYTHON SOURCE LINES 111-115
.. code-block:: default
.. code-block:: default
...
@@ -326,7 +331,7 @@ Because speedup will replace the masked big layers with dense small ones.
...
@@ -326,7 +331,7 @@ Because speedup will replace the masked big layers with dense small ones.
.. rst-class:: sphx-glr-timing
.. rst-class:: sphx-glr-timing
**Total running time of the script:** ( 0 minutes 58.337 seconds)
**Total running time of the script:** ( 1 minutes 30.730 seconds)
"\n# Quantization Quickstart\n\nQuantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.\n\nIn NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.\nHere we use `QAT_Quantizer` as an example to show the usage of quantization in NNI.\n"
"\n# Quantization Quickstart\n\nHere is a four-minute video to get you started with model quantization.\n\n.. youtube:: MSfV7AyfiA4\n :align: center\n\nQuantization reduces model size and speeds up inference time by reducing the number of bits required to represent weights or activations.\n\nIn NNI, both post-training quantization algorithms and quantization-aware training algorithms are supported.\nHere we use `QAT_Quantizer` as an example to show the usage of quantization in NNI.\n"