[Doc] split index to overview & toctree compression part (#4749)

fe02b808 · J-shang · GitHub · b8d029b1 · fe02b808 · fe02b808
Unverified Commit fe02b808 authored Apr 12, 2022 by J-shang Committed by GitHub Apr 12, 2022
20 changed files
--- a/docs/source/compression/index.rst
+++ b/docs/source/compression/index.rst
-Model Compression with NNI
-==========================
-
-.. toctree::
-    :hidden:
-    :maxdepth: 2
-
-    Pruning <pruning>
-    Quantization <quantization>
-    Config Specification <compression_config_list>
-    Advanced Usage <advanced_usage>
-
-.. Using rubric to prevent the section heading to be include into toc
-
-.. rubric:: Overview
+Overview of NNI Model Compression
+=================================

 Deep neural networks (DNNs) have achieved great success in many tasks like computer vision, nature launguage processing, speech processing.
 However, typical neural networks are both computationally expensive and energy-intensive,
@@ -43,7 +30,8 @@ There are several core features supported by NNI model compression:
 * Concise interface for users to customize their own compression algorithms.


-.. rubric:: Compression Pipeline
+Compression Pipeline
+--------------------

 .. image:: ../../img/compression_pipeline.png
   :target: ../../img/compression_pipeline.png
@@ -60,7 +48,8 @@ If users want to apply both, a sequential mode is recommended as common practise
  The interface and APIs are unified for both PyTorch and TensorFlow. Currently only PyTorch version has been supported, and TensorFlow version will be supported in future.


-.. rubric:: Model Speedup
+Model Speedup
+-------------

 The final goal of model compression is to reduce inference latency and model size.
 However, existing model compression algorithms mainly use simulation to check the performance (e.g., accuracy) of compressed model.

--- a/docs/source/compression/index_zh.rst
+++ b/docs/source/compression/index_zh.rst
-.. 3f7d3620b31e7bab985f1429044b7adc
+.. b6bdf52910e2e2c72085d03482d45340

 模型压缩
 ========

-..  toctree::
-    :hidden:
-    :maxdepth: 2
-
-    模型剪枝 <pruning>
-    模型量化 <quantization>
-    用户配置 <compression_config_list>
-    高级用法 <advanced_usage>
-
 深度神经网络（DNNs）在计算机视觉、自然语言处理、语音处理等领域取得了巨大的成功。   
 然而，典型的神经网络是计算和能源密集型的，很难将其部署在计算资源匮乏
 或具有严格延迟要求的设备上。 因此，一个自然的想法就是对模型进行压缩，
@@ -42,7 +33,8 @@ NNI 具备以下几个核心特性:
 * 提供友好和易于使用的压缩工具，让用户深入到压缩过程和结果。
 * 简洁的界面，供用户自定义自己的压缩算法。

-.. rubric:: 压缩流程
+压缩流程
+---------

 .. image:: ../../img/compression_pipeline.png
   :target: ../../img/compression_pipeline.png
@@ -62,7 +54,8 @@ NNI中模型压缩的整体流程如上图所示。
  PyTorch和TensorFlow的接口都是统一的。目前只支持PyTorch版本，未来将支持TensorFlow版本。


-.. rubric:: 模型加速
+模型加速
+---------

 模型压缩的最终目标是减少推理延迟和模型大小。
 然而，现有的模型压缩算法主要是通过仿真来检测压缩模型的性能。

--- a/docs/source/compression/pruner.rst
+++ b/docs/source/compression/pruner.rst
 Pruner in NNI
 =============

-Pruning algorithms compress the original network by removing redundant weights or channels of layers, which can reduce model complexity and mitigate the over-fitting issue.
+NNI implements the main part of the pruning algorithm as pruner. All pruners are implemented as close as possible to what is described in the paper (if it has).
+The following table provides a brief introduction to the pruners implemented in nni, click the link in table to view a more detailed introduction and use cases.
+
+There are two kinds of pruners in NNI, please refer to `basic pruner <basic-pruner>`_ and `scheduled pruner <scheduled-pruner>`_ for details.

 .. list-table::
   :header-rows: 1

--- a/docs/source/compression/pruning.rst
+++ b/docs/source/compression/pruning.rst
-Model Pruning with NNI
-======================
+Overview of NNI Model Pruning
+=============================

 Pruning is a common technique to compress neural network models.
 The pruning methods explore the redundancy in the model weights(parameters) and try to remove/prune the redundant and uncritical weights.
@@ -7,21 +7,26 @@ The redundant elements are pruned from the model, their values are zeroed and we

 The following concepts can help you understand pruning in NNI.

-.. Using rubric to prevent the section heading to be include into toc
-
-.. rubric:: Pruning Target
+Pruning Target
+--------------

 Pruning target means where we apply the sparsity.
 Most pruning methods prune the weights to reduce the model size and accelerate the inference latency.
 Other pruning methods also apply sparsity on activations (e.g., inputs, outputs, or feature maps) to accelerate the inference latency.
 NNI supports pruning module weights right now, and will support other pruning targets in the future.

-.. rubric:: Basic Pruner
+.. _basic-pruner:
+
+Basic Pruner
+------------

 Basic pruner generates the masks for each pruning target (weights) for a determined sparsity ratio.
 It usually takes model and config as input arguments, then generates masks for each pruning target.

-.. rubric:: Scheduled Pruner
+.. _scheduled-pruner:
+
+Scheduled Pruner
+----------------

 Scheduled pruner decides how to allocate sparsity ratio to each pruning target,
 it also handles the model speedup (after each pruning iteration) and finetuning logic.
@@ -40,7 +45,8 @@ For a clearer structure vision, please refer to the figure below.

 More information about scheduled pruning process please refer to :doc:`Pruning Scheduler <pruning_scheduler>`.

-.. rubric:: Granularity
+Granularity
+-----------

 Fine-grained pruning or unstructured pruning refers to pruning each individual weights separately.
 Coarse-grained pruning or structured pruning is pruning a regular group of weights, such as a convolutional filter.
@@ -49,7 +55,8 @@ Only :ref:`level-pruner` and :ref:`admm-pruner` support fine-grained pruning, al

 .. _dependency-awareode-for-output-channel-pruning:

-.. rubric:: Dependency-aware Mode for Output Channel Pruning
+Dependency-aware Mode for Output Channel Pruning
+------------------------------------------------

 Currently, we support dependency-aware mode in several ``pruner``: :ref:`l1-norm-pruner`, :ref:`l2-norm-pruner`, :ref:`fpgm-pruner`,
 :ref:`activation-apoz-rank-pruner`, :ref:`activation-mean-rank-pruner`, :ref:`taylor-fo-weight-pruner`.
@@ -99,11 +106,3 @@ In addition, for the convolutional layers that have more than one filter group,
 Overall, this pruner will prune the model according to the L1 norm of each filter and try to meet the topological constrains (channel dependency, etc) to improve the final speed gain after the speedup process. 

 In the dependency-aware mode, the pruner will provide a better speed gain from the model pruning.
-
-.. toctree::
-    :hidden:
-    :maxdepth: 2
-
-    Quickstart <../tutorials/pruning_quick_start_mnist>
-    Pruner <pruner>
-    Speedup <../tutorials/pruning_speedup>
--- a/docs/source/compression/quantization.rst
+++ b/docs/source/compression/quantization.rst
-Model Quantization with NNI
-===========================
+Overview of NNI Model Quantization
+==================================

 Quantization refers to compressing models by reducing the number of bits required to represent weights or activations,
 which can reduce the computations and the inference time. In the context of deep neural networks, the major numerical
@@ -9,11 +9,3 @@ is an active field of research.

 A quantizer is a quantization algorithm implementation in NNI.
 You can also :doc:`create your own quantizer <../tutorials/quantization_customize>` using NNI model compression interface.
-
-.. toctree::
-    :hidden:
-    :maxdepth: 2
-
-    Quickstart <../tutorials/quantization_quick_start_mnist>
-    Quantizer <quantizer>
-    SpeedUp <../tutorials/quantization_speedup>
--- a/docs/source/compression/quantizer.rst
+++ b/docs/source/compression/quantizer.rst
 Quantizer in NNI
 ================

-Quantization algorithms compress the original network by reducing the number of bits required to represent weights or activations, which can reduce the computations and the inference time.
+NNI implements the main part of the quantizaiton algorithm as quantizer. All quantizers are implemented as close as possible to what is described in the paper (if it has).
+The following table provides a brief introduction to the quantizers implemented in nni, click the link in table to view a more detailed introduction and use cases.

 .. list-table::
   :header-rows: 1

--- a/docs/source/compression/toctree.rst
+++ b/docs/source/compression/toctree.rst
+Compression
+===========
+
+.. toctree::
+    :hidden:
+    :maxdepth: 2
+
+    Overview <overview>
+    Pruning <toctree_pruning>
+    Quantization <toctree_quantization>
+    Config Specification <compression_config_list>
+    Advanced Usage <advanced_usage>
--- a/docs/source/compression/toctree_pruning.rst
+++ b/docs/source/compression/toctree_pruning.rst
+Pruning
+=======
+
+.. toctree::
+    :hidden:
+    :maxdepth: 2
+
+    Overview <pruning>
+    Quickstart </tutorials/pruning_quick_start_mnist>
+    Pruner <pruner>
+    Speedup </tutorials/pruning_speedup>
--- a/docs/source/compression/toctree_quantization.rst
+++ b/docs/source/compression/toctree_quantization.rst
+Quantization
+============
+
+.. toctree::
+    :hidden:
+    :maxdepth: 2
+
+    Overview <quantization>
+    Quickstart </tutorials/quantization_quick_start_mnist>
+    Quantizer <quantizer>
+    SpeedUp </tutorials/quantization_speedup>
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -145,11 +145,6 @@ sphinx_tabs_disable_css_loading = True
 tutorials_copy_list = [
    # Seems that we don't need it for now.
    # Add tuples back if we need it in future.
-
-    # ('tutorials/pruning_quick_start_mnist.rst', 'tutorials/cp_pruning_quick_start_mnist.rst'),
-    # ('tutorials/pruning_speedup.rst', 'tutorials/cp_pruning_speedup.rst'),
-    # ('tutorials/quantization_quick_start_mnist.rst', 'tutorials/cp_quantization_quick_start_mnist.rst'),
-    # ('tutorials/quantization_speedup.rst', 'tutorials/cp_quantization_speedup.rst'),
 ]

 # Toctree ensures that toctree docs do not contain any other contents.

--- a/docs/source/feature_engineering/overview.rst
+++ b/docs/source/feature_engineering/overview.rst
@@ -309,13 +309,3 @@ Benchmark
 The dataset of benchmark could be download in `here <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/>`__

 The code could be refenrence ``/examples/feature_engineering/gradient_feature_selector/benchmark_test.py``.
-
-Reference and Feedback
----------------------
-
-
-* To `report a bug <https://github.com/microsoft/nni/issues/new?template=bug-report.rst>`__ for this feature in GitHub;
-* To `file a feature or improvement request <https://github.com/microsoft/nni/issues/new?template=enhancement.rst>`__ for this feature in GitHub;
-* To know more about :githublink:`Neural Architecture Search with NNI <docs/en_US/NAS/Overview.rst>`\ ;
-* To know more about :githublink:`Model Compression with NNI <docs/en_US/Compression/Overview.rst>`\ ;
-* To know more about :githublink:`Hyperparameter Tuning with NNI <docs/en_US/Tuner/BuiltinTuner.rst>`\ ;
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -16,7 +16,7 @@ NNI Documentation

   Hyperparameter Optimization <hpo/index>
   nas/toctree
-   Model Compression <compression/index>
+   Model Compression <compression/toctree>
   feature_engineering/toctree
   experiment/toctree

@@ -45,7 +45,7 @@ NNI Documentation

 * :doc:`Hyperparameter Optimization </hpo/overview>`
 * :doc:`Neural Architecture Search </nas/overview>`
-* :doc:`Model Compression </compression/index>`
+* :doc:`Model Compression </compression/overview>`
 * :doc:`Feature Engineering </feature_engineering/overview>`

 Get Started

--- a/docs/source/index_zh.rst
+++ b/docs/source/index_zh.rst
-.. b1421b75629e06cb368f4c02a12a5f7d
+.. 954c2f433b4617a40d684df9b1a5f16b

 ###########################
 Neural Network Intelligence
@@ -15,7 +15,7 @@ Neural Network Intelligence
    教程<examples>
    超参调优 <hpo/index>
    神经网络架构搜索<nas/toctree>
-    模型压缩<compression/index>
+    模型压缩<compression/toctree>
    特征工程<feature_engineering/toctree>
    NNI实验 <experiment/toctree>
    HPO API Reference <reference/hpo>

--- a/docs/source/notes/architecture_overview.rst
+++ b/docs/source/notes/architecture_overview.rst
@@ -75,7 +75,7 @@ NNI provides an easy-to-use model compression framework to compress deep neural
 inference speed without losing performance significantlly. Model compression on NNI includes pruning algorithms and quantization algorithms. NNI provides many pruning and
 quantization algorithms through NNI trial SDK. Users can directly use them in their trial code and run the trial code without starting an NNI experiment. Users can also use NNI model compression framework to customize their own pruning and quantization algorithms.

-A detailed description of model compression and its usage can be found :doc:`here <../compression/index>`.
+A detailed description of model compression and its usage can be found :doc:`here <../compression/overview>`.

 Automatic Feature Engineering
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

--- a/docs/source/tutorials/index.rst
+++ b/docs/source/tutorials/index.rst
@@ -32,14 +32,14 @@ Tutorials

 .. raw:: html

-    <div class="sphx-glr-thumbcontainer" tooltip="Quantization reduces model size and speeds up inference time by reducing the number of bits req...">
+    <div class="sphx-glr-thumbcontainer" tooltip=" Introduction ------------">

 .. only:: html

- .. figure:: /tutorials/images/thumb/sphx_glr_quantization_quick_start_mnist_thumb.png
-     :alt: Quantization Quickstart
+ .. figure:: /tutorials/images/thumb/sphx_glr_quantization_speedup_thumb.png
+     :alt: SpeedUp Model with Calibration Config

-     :ref:`sphx_glr_tutorials_quantization_quick_start_mnist.py`
+     :ref:`sphx_glr_tutorials_quantization_speedup.py`

 .. raw:: html

@@ -49,18 +49,18 @@ Tutorials
 .. toctree::
   :hidden:

-   /tutorials/quantization_quick_start_mnist
+   /tutorials/quantization_speedup

 .. raw:: html

-    <div class="sphx-glr-thumbcontainer" tooltip=" Introduction ------------">
+    <div class="sphx-glr-thumbcontainer" tooltip="Quantization reduces model size and speeds up inference time by reducing the number of bits req...">

 .. only:: html

- .. figure:: /tutorials/images/thumb/sphx_glr_quantization_speedup_thumb.png
-     :alt: SpeedUp Model with Calibration Config
+ .. figure:: /tutorials/images/thumb/sphx_glr_quantization_quick_start_mnist_thumb.png
+     :alt: Quantization Quickstart

-     :ref:`sphx_glr_tutorials_quantization_speedup.py`
+     :ref:`sphx_glr_tutorials_quantization_quick_start_mnist.py`

 .. raw:: html

@@ -70,7 +70,7 @@ Tutorials
 .. toctree::
   :hidden:

-   /tutorials/quantization_speedup
+   /tutorials/quantization_quick_start_mnist

 .. raw:: html


--- a/docs/source/tutorials/pruning_quick_start_mnist.ipynb
+++ b/docs/source/tutorials/pruning_quick_start_mnist.ipynb
@@ -109,7 +109,7 @@
      },
      "outputs": [],
      "source": [
-        "# need to unwrap the model, if the model is wrapped before speedup\npruner._unwrap_model()\n\n# speedup the model\nfrom nni.compression.pytorch.speedup import ModelSpeedup\n\nModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()"
+        "# need to unwrap the model, if the model is wrapped before speedup\npruner._unwrap_model()\n\n# speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.\nfrom nni.compression.pytorch.speedup import ModelSpeedup\n\nModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()"
      ]
    },
    {
@@ -165,7 +165,7 @@
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
-      "version": "3.8.8"
+      "version": "3.9.7"
    }
  },
  "nbformat": 4,

--- a/docs/source/tutorials/pruning_quick_start_mnist.py
+++ b/docs/source/tutorials/pruning_quick_start_mnist.py
@@ -89,7 +89,7 @@ for name, mask in masks.items():
 # need to unwrap the model, if the model is wrapped before speedup
 pruner._unwrap_model()

-# speedup the model
+# speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.
 from nni.compression.pytorch.speedup import ModelSpeedup

 ModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()

--- a/docs/source/tutorials/pruning_quick_start_mnist.py.md5
+++ b/docs/source/tutorials/pruning_quick_start_mnist.py.md5
-ea8cdb78aeea1a44e96e8903cfeedc41
\ No newline at end of file
+930f8ee2f57b70037e3231152a72606c
\ No newline at end of file
--- a/docs/source/tutorials/pruning_quick_start_mnist.rst
+++ b/docs/source/tutorials/pruning_quick_start_mnist.rst
@@ -72,6 +72,12 @@ If you are familiar with defining a model and training in pytorch, you can skip
      (fc1): Linear(in_features=256, out_features=120, bias=True)
      (fc2): Linear(in_features=120, out_features=84, bias=True)
      (fc3): Linear(in_features=84, out_features=10, bias=True)
+      (relu1): ReLU()
+      (relu2): ReLU()
+      (relu3): ReLU()
+      (relu4): ReLU()
+      (pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
+      (pool2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    )


@@ -102,9 +108,9 @@ If you are familiar with defining a model and training in pytorch, you can skip

 .. code-block:: none

-    Average test loss: 0.5822, Accuracy: 8311/10000 (83%)
-    Average test loss: 0.2795, Accuracy: 9154/10000 (92%)
-    Average test loss: 0.2036, Accuracy: 9345/10000 (93%)
+    Average test loss: 0.5368, Accuracy: 8321/10000 (83%)
+    Average test loss: 0.3092, Accuracy: 9104/10000 (91%)
+    Average test loss: 0.2070, Accuracy: 9380/10000 (94%)



@@ -181,6 +187,12 @@ Pruners usually require `model` and `config_list` as input arguments.
        (module): Linear(in_features=120, out_features=84, bias=True)
      )
      (fc3): Linear(in_features=84, out_features=10, bias=True)
+      (relu1): ReLU()
+      (relu2): ReLU()
+      (relu3): ReLU()
+      (relu4): ReLU()
+      (pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
+      (pool2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    )


@@ -229,7 +241,7 @@ and reaches a higher sparsity ratio because `ModelSpeedup` will propagate the ma
    # need to unwrap the model, if the model is wrapped before speedup
    pruner._unwrap_model()

-    # speedup the model
+    # speedup the model, for more information about speedup, please refer :doc:`pruning_speedup`.
    from nni.compression.pytorch.speedup import ModelSpeedup

    ModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()
@@ -246,7 +258,7 @@ and reaches a higher sparsity ratio because `ModelSpeedup` will propagate the ma

    aten::log_softmax is not Supported! Please report an issue at https://github.com/microsoft/nni. Thanks~
    Note: .aten::log_softmax.12 does not have corresponding mask inference object
-    /home/ningshang/anaconda3/envs/nni-dev/lib/python3.8/site-packages/torch/_tensor.py:1013: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at  aten/src/ATen/core/TensorBody.h:417.)
+    /home/nishang/anaconda3/envs/MCM/lib/python3.9/site-packages/torch/_tensor.py:1013: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at  /opt/conda/conda-bld/pytorch_1640811803361/work/build/aten/src/ATen/core/TensorBody.h:417.)
      return self._grad


@@ -278,6 +290,12 @@ the model will become real smaller after speedup
      (fc1): Linear(in_features=128, out_features=60, bias=True)
      (fc2): Linear(in_features=60, out_features=42, bias=True)
      (fc3): Linear(in_features=42, out_features=10, bias=True)
+      (relu1): ReLU()
+      (relu2): ReLU()
+      (relu3): ReLU()
+      (relu4): ReLU()
+      (pool1): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
+      (pool2): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
    )


@@ -308,7 +326,7 @@ Because speedup will replace the masked big layers with dense small ones.

 .. rst-class:: sphx-glr-timing

-   **Total running time of the script:** ( 1 minutes  38.500 seconds)
+   **Total running time of the script:** ( 0 minutes  58.337 seconds)


 .. _sphx_glr_download_tutorials_pruning_quick_start_mnist.py:

--- a/docs/source/tutorials/pruning_quick_start_mnist_codeobj.pickle
+++ b/docs/source/tutorials/pruning_quick_start_mnist_codeobj.pickle