unify name speed up and speedup to speedup (#4689)

e8b88a79 · J-shang · GitHub · c5066cda · e8b88a79 · e8b88a79
Unverified Commit e8b88a79 authored Mar 28, 2022 by J-shang Committed by GitHub Mar 28, 2022
20 changed files
--- a/docs/source/compression/compression_utils.rst
+++ b/docs/source/compression/compression_utils.rst
@@ -46,7 +46,7 @@ Futhermore, users can specify the sparsities values used to prune for each layer

 the SensitivityAnalysis will prune 25% 50% 75% weights gradually for each layer, and record the model's accuracy at the same time (SensitivityAnalysis only prune a layer once a time, the other layers are set to their original weights). If the sparsities is not set, SensitivityAnalysis will use the numpy.arange(0.1, 1.0, 0.1) as the default sparsity values.

-Users can also speed up the progress of sensitivity analysis by the early_stop_mode and early_stop_value option. By default, the SensitivityAnalysis will test the accuracy under all sparsities for each layer. In contrast, when the early_stop_mode and early_stop_value are set, the sensitivity analysis for a layer will stop, when the accuracy/loss has already met the threshold set by early_stop_value. We support four early stop modes:  minimize, maximize, dropped, raised.
+Users can also speedup the progress of sensitivity analysis by the early_stop_mode and early_stop_value option. By default, the SensitivityAnalysis will test the accuracy under all sparsities for each layer. In contrast, when the early_stop_mode and early_stop_value are set, the sensitivity analysis for a layer will stop, when the accuracy/loss has already met the threshold set by early_stop_value. We support four early stop modes:  minimize, maximize, dropped, raised.

 minimize: The analysis stops when the validation metric return by the val_func lower than ``early_stop_value``.


--- a/docs/source/compression/index.rst
+++ b/docs/source/compression/index.rst
@@ -38,7 +38,7 @@ There are several core features supported by NNI model compression:

 * Support many popular pruning and quantization algorithms.
 * Automate model pruning and quantization process with state-of-the-art strategies and NNI's auto tuning power.
-* Speed up a compressed model to make it have lower inference latency and also make it smaller.
+* Speedup a compressed model to make it have lower inference latency and also make it smaller.
 * Provide friendly and easy-to-use compression utilities for users to dive into the compression process and results.
 * Concise interface for users to customize their own compression algorithms.

@@ -54,7 +54,7 @@ If users want to apply both, a sequential mode is recommended as common practise

 .. note::
  Note that NNI pruners or quantizers are not meant to physically compact the model but for simulating the compression effect. Whereas NNI speedup tool can truly compress model by changing the network architecture and therefore reduce latency.
-  To obtain a truly compact model, users should conduct :doc:`pruning speedup <../tutorials/pruning_speed_up>` or :doc:`quantizaiton speedup <../tutorials/quantization_speed_up>`. 
+  To obtain a truly compact model, users should conduct :doc:`pruning speedup <../tutorials/pruning_speedup>` or :doc:`quantizaiton speedup <../tutorials/quantization_speedup>`. 
  The interface and APIs are unified for both PyTorch and TensorFlow. Currently only PyTorch version has been supported, and TensorFlow version will be supported in future.


@@ -131,7 +131,7 @@ Quantization algorithms compress the original network by reducing the number of
 The final goal of model compression is to reduce inference latency and model size.
 However, existing model compression algorithms mainly use simulation to check the performance (e.g., accuracy) of compressed model.
 For example, using masks for pruning algorithms, and storing quantized values still in float32 for quantization algorithms.
-Given the output masks and quantization bits produced by those algorithms, NNI can really speed up the model.
+Given the output masks and quantization bits produced by those algorithms, NNI can really speedup the model.

 The following figure shows how NNI prunes and speeds up your models. 

@@ -140,8 +140,8 @@ The following figure shows how NNI prunes and speeds up your models.
   :scale: 40%
   :alt:

-The detailed tutorial of Speed Up Model with Mask can be found :doc:`here <../tutorials/pruning_speed_up>`.
-The detailed tutorial of Speed Up Model with Calibration Config can be found :doc:`here <../tutorials/quantization_speed_up>`.
+The detailed tutorial of Speedup Model with Mask can be found :doc:`here <../tutorials/pruning_speedup>`.
+The detailed tutorial of Speedup Model with Calibration Config can be found :doc:`here <../tutorials/quantization_speedup>`.

 .. attention::


--- a/docs/source/compression/index_zh.rst
+++ b/docs/source/compression/index_zh.rst
-.. cacd7e0a78bfacc867ee868c07c1d700
+.. f84e48e51358b0989cd18267e5abd329

 模型压缩
 ========

--- a/docs/source/compression/pruning.rst
+++ b/docs/source/compression/pruning.rst
@@ -23,7 +23,7 @@ It usually takes model and config as input arguments, then generate a mask for t

 .. rubric:: Scheduled Pruner

-Scheduled pruner decides how to allocate sparsity ratio to each pruning targets, it also handles the pruning speed up and finetuning logic.
+Scheduled pruner decides how to allocate sparsity ratio to each pruning targets, it also handles the pruning speedup and finetuning logic.
 From the implementation logic, the scheduled pruner is a combination of pruning scheduler, basic pruner and task generator.

 Task generator only cares about the pruning effect that should be achieved in each round, and uses a config list to express how to pruning.
@@ -105,4 +105,4 @@ In the dependency-aware mode, the pruner will provide a better speed gain from t

    Quickstart <../tutorials/cp_pruning_quick_start_mnist>
    Pruner <pruner>
-    Speed Up <../tutorials/cp_pruning_speed_up>
+    Speedup <../tutorials/cp_pruning_speedup>
--- a/docs/source/compression/pruning_scheduler.rst
+++ b/docs/source/compression/pruning_scheduler.rst
@@ -35,7 +35,7 @@ Using AGP Pruning as an example to explain how to implement an iterative pruning

    pruner = L1NormPruner(model=None, config_list=None, mode='dependency_aware', dummy_input=torch.rand(10, 3, 224, 224).to(device))
    task_generator = AGPTaskGenerator(total_iteration=10, origin_model=model, origin_config_list=config_list, log_dir='.', keep_intermediate_result=True)
-    scheduler = PruningScheduler(pruner, task_generator, finetuner=finetuner, speed_up=True, dummy_input=dummy_input, evaluator=None, reset_weight=False)
+    scheduler = PruningScheduler(pruner, task_generator, finetuner=finetuner, speedup=True, dummy_input=dummy_input, evaluator=None, reset_weight=False)

    scheduler.compress()
    _, model, masks, _, _ = scheduler.get_best_result()

--- a/docs/source/compression/quantization.rst
+++ b/docs/source/compression/quantization.rst
@@ -16,4 +16,4 @@ create your own quantizer using NNI model compression interface.

    Quickstart <../tutorials/cp_quantization_quick_start_mnist>
    Quantizer <quantizer>
-    Speed Up <../tutorials/cp_quantization_speed_up>
+    SpeedUp <../tutorials/cp_quantization_speedup>
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -133,9 +133,9 @@ tutorials_copy_list = [
    # Others in full-scale materials
    ('tutorials/hello_nas.rst', 'tutorials/cp_hello_nas_quickstart.rst'),
    ('tutorials/pruning_quick_start_mnist.rst', 'tutorials/cp_pruning_quick_start_mnist.rst'),
-    ('tutorials/pruning_speed_up.rst', 'tutorials/cp_pruning_speed_up.rst'),
+    ('tutorials/pruning_speedup.rst', 'tutorials/cp_pruning_speedup.rst'),
    ('tutorials/quantization_quick_start_mnist.rst', 'tutorials/cp_quantization_quick_start_mnist.rst'),
-    ('tutorials/quantization_speed_up.rst', 'tutorials/cp_quantization_speed_up.rst'),
+    ('tutorials/quantization_speedup.rst', 'tutorials/cp_quantization_speedup.rst'),
 ]

 # Add any paths that contain templates here, relative to this directory.

--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -134,14 +134,14 @@ Then, please read :doc:`quickstart` and :doc:`tutorials` to start your journey w
          L1NormPruner(model, config). \
          compress()

-      # apply the masks for real speed up
+      # apply the masks for real speedup
      ModelSpeedup(unwrapped_model, input, masks). \
          speedup_model()

 .. codesnippetcard::
   :icon: ../img/thumbnails/quantization-small.svg
   :title: Quantization
-   :link: tutorials/quantization_speed_up
+   :link: tutorials/quantization_speedup

   .. code-block::

@@ -158,7 +158,7 @@ Then, please read :doc:`quickstart` and :doc:`tutorials` to start your journey w
      # Training...

      # export calibration config and
-      # generate TensorRT engine for real speed up
+      # generate TensorRT engine for real speedup
      calibration_config = quantizer.export_model(
          model_path, calibration_path)
      engine = ModelSpeedupTensorRT(

--- a/docs/source/index_zh.rst
+++ b/docs/source/index_zh.rst
-.. 4d622b7ee5031e9cccec635bf6c7427d
+.. c16ad1fb7782d3510f6a6fa8c931d8aa

 ###########################
 Neural Network Intelligence

--- a/docs/source/misc/model_compress_comp.rst
+++ b/docs/source/misc/model_compress_comp.rst
@@ -92,7 +92,7 @@ Implementation Details
  The experiment results are all collected with the default configuration of the pruners in nni, which means that when we call a pruner class in nni, we don't change any default class arguments.

 * 
-  Both FLOPs and the number of parameters are counted with :githublink:`Model FLOPs/Parameters Counter <docs/en_US/Compression/CompressionUtils.md#model-flopsparameters-counter>` after :githublink:`model speed up <docs/en_US/Compression/ModelSpeedup.rst>`.
+  Both FLOPs and the number of parameters are counted with :githublink:`Model FLOPs/Parameters Counter <docs/en_US/Compression/CompressionUtils.md#model-flopsparameters-counter>` after :githublink:`model speedup <docs/en_US/Compression/ModelSpeedup.rst>`.
  This avoids potential issues of counting them of masked models.

 * 

--- a/docs/source/tutorials.rst
+++ b/docs/source/tutorials.rst
@@ -11,9 +11,9 @@ Tutorials
   tutorials/hello_nas
   tutorials/nasbench_as_dataset
   tutorials/pruning_quick_start_mnist
-   tutorials/pruning_speed_up
+   tutorials/pruning_speedup
   tutorials/quantization_quick_start_mnist
-   tutorials/quantization_speed_up
+   tutorials/quantization_speedup

 .. ----------------------

@@ -71,17 +71,17 @@ Tutorials
   :tags: Compression

 .. cardlinkitem::
-   :header: Speed Up Model with Mask
+   :header: Speedup Model with Mask
   :description: Make your model real smaller and faster with speed-up after pruned by pruner
-   :link: tutorials/pruning_speed_up.html
+   :link: tutorials/pruning_speedup.html
   :image: ../img/thumbnails/overview-29.png
   :background: cyan
   :tags: Compression

 .. cardlinkitem::
-   :header: Speed Up Model with Calibration Config
+   :header: Speedup Model with Calibration Config
   :description: Make your model real smaller and faster with speed-up after quantized by quantizer
-   :link: tutorials/quantization_speed_up.html
+   :link: tutorials/quantization_speedup.html
   :image: ../img/thumbnails/overview-29.png
   :background: cyan
   :tags: Compression
--- a/docs/source/tutorials/images/thumb/sphx_glr_pruning_quick_start_mnist_thumb.png
+++ b/docs/source/tutorials/images/thumb/sphx_glr_pruning_quick_start_mnist_thumb.png
--- a/docs/source/tutorials/images/thumb/sphx_glr_pruning_speed_up_thumb.png
+++ b/docs/source/tutorials/images/thumb/sphx_glr_pruning_speed_up_thumb.png
--- a/docs/source/tutorials/images/thumb/sphx_glr_quantization_speed_up_thumb.png
+++ b/docs/source/tutorials/images/thumb/sphx_glr_quantization_speed_up_thumb.png
--- a/docs/source/tutorials/images/thumb/sphx_glr_quantization_speedup_thumb.png
+++ b/docs/source/tutorials/images/thumb/sphx_glr_quantization_speedup_thumb.png
--- a/docs/source/tutorials/index.rst
+++ b/docs/source/tutorials/index.rst
@@ -15,10 +15,10 @@ Tutorials

 .. only:: html

- .. figure:: /tutorials/images/thumb/sphx_glr_pruning_speed_up_thumb.png
-     :alt: Speed Up Model with Mask
+ .. figure:: /tutorials/images/thumb/sphx_glr_pruning_speedup_thumb.png
+     :alt: Speedup Model with Mask

-     :ref:`sphx_glr_tutorials_pruning_speed_up.py`
+     :ref:`sphx_glr_tutorials_pruning_speedup.py`

 .. raw:: html

@@ -28,7 +28,7 @@ Tutorials
 .. toctree::
   :hidden:

-   /tutorials/pruning_speed_up
+   /tutorials/pruning_speedup

 .. raw:: html

@@ -78,10 +78,10 @@ Tutorials

 .. only:: html

- .. figure:: /tutorials/images/thumb/sphx_glr_quantization_speed_up_thumb.png
-     :alt: Speed Up Model with Calibration Config
+ .. figure:: /tutorials/images/thumb/sphx_glr_quantization_speedup_thumb.png
+     :alt: SpeedUp Model with Calibration Config

-     :ref:`sphx_glr_tutorials_quantization_speed_up.py`
+     :ref:`sphx_glr_tutorials_quantization_speedup.py`

 .. raw:: html

@@ -91,7 +91,7 @@ Tutorials
 .. toctree::
   :hidden:

-   /tutorials/quantization_speed_up
+   /tutorials/quantization_speedup

 .. raw:: html


--- a/docs/source/tutorials/pruning_quick_start_mnist.ipynb
+++ b/docs/source/tutorials/pruning_quick_start_mnist.ipynb
@@ -98,7 +98,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "Speed up the original model with masks, note that `ModelSpeedup` requires an unwrapped model.\nThe model becomes smaller after speed-up,\nand reaches a higher sparsity ratio because `ModelSpeedup` will propagate the masks across layers.\n\n"
+        "Speedup the original model with masks, note that `ModelSpeedup` requires an unwrapped model.\nThe model becomes smaller after speedup,\nand reaches a higher sparsity ratio because `ModelSpeedup` will propagate the masks across layers.\n\n"
      ]
    },
    {
@@ -109,14 +109,14 @@
      },
      "outputs": [],
      "source": [
-        "# need to unwrap the model, if the model is wrapped before speed up\npruner._unwrap_model()\n\n# speed up the model\nfrom nni.compression.pytorch.speedup import ModelSpeedup\n\nModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()"
+        "# need to unwrap the model, if the model is wrapped before speedup\npruner._unwrap_model()\n\n# speedup the model\nfrom nni.compression.pytorch.speedup import ModelSpeedup\n\nModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "the model will become real smaller after speed up\n\n"
+        "the model will become real smaller after speedup\n\n"
      ]
    },
    {
@@ -134,7 +134,7 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-        "## Fine-tuning Compacted Model\nNote that if the model has been sped up, you need to re-initialize a new optimizer for fine-tuning.\nBecause speed up will replace the masked big layers with dense small ones.\n\n"
+        "## Fine-tuning Compacted Model\nNote that if the model has been sped up, you need to re-initialize a new optimizer for fine-tuning.\nBecause speedup will replace the masked big layers with dense small ones.\n\n"
      ]
    },
    {

--- a/docs/source/tutorials/pruning_quick_start_mnist.py
+++ b/docs/source/tutorials/pruning_quick_start_mnist.py
@@ -82,27 +82,27 @@ for name, mask in masks.items():
    print(name, ' sparsity : ', '{:.2}'.format(mask['weight'].sum() / mask['weight'].numel()))

 # %%
-# Speed up the original model with masks, note that `ModelSpeedup` requires an unwrapped model.
-# The model becomes smaller after speed-up,
+# Speedup the original model with masks, note that `ModelSpeedup` requires an unwrapped model.
+# The model becomes smaller after speedup,
 # and reaches a higher sparsity ratio because `ModelSpeedup` will propagate the masks across layers.

-# need to unwrap the model, if the model is wrapped before speed up
+# need to unwrap the model, if the model is wrapped before speedup
 pruner._unwrap_model()

-# speed up the model
+# speedup the model
 from nni.compression.pytorch.speedup import ModelSpeedup

 ModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()

 # %%
-# the model will become real smaller after speed up
+# the model will become real smaller after speedup
 print(model)

 # %%
 # Fine-tuning Compacted Model
 # ---------------------------
 # Note that if the model has been sped up, you need to re-initialize a new optimizer for fine-tuning.
-# Because speed up will replace the masked big layers with dense small ones.
+# Because speedup will replace the masked big layers with dense small ones.

 optimizer = SGD(model.parameters(), 1e-2)
 for epoch in range(3):

--- a/docs/source/tutorials/pruning_quick_start_mnist.py.md5
+++ b/docs/source/tutorials/pruning_quick_start_mnist.py.md5
-bacea60d39b0445d01e3a233b1bfd249
\ No newline at end of file
+d92f924d6f03735378a573851b96f54e
\ No newline at end of file
--- a/docs/source/tutorials/pruning_quick_start_mnist.rst
+++ b/docs/source/tutorials/pruning_quick_start_mnist.rst
@@ -102,9 +102,9 @@ If you are familiar with defining a model and training in pytorch, you can skip

 .. code-block:: none

-    Average test loss: 0.5876, Accuracy: 8158/10000 (82%)
-    Average test loss: 0.2501, Accuracy: 9217/10000 (92%)
-    Average test loss: 0.1786, Accuracy: 9486/10000 (95%)
+    Average test loss: 0.5606, Accuracy: 8239/10000 (82%)
+    Average test loss: 0.2550, Accuracy: 9228/10000 (92%)
+    Average test loss: 0.1870, Accuracy: 9432/10000 (94%)



@@ -217,8 +217,8 @@ Pruners usually require `model` and `config_list` as input arguments.

 .. GENERATED FROM PYTHON SOURCE LINES 85-88

-Speed up the original model with masks, note that `ModelSpeedup` requires an unwrapped model.
-The model becomes smaller after speed-up,
+Speedup the original model with masks, note that `ModelSpeedup` requires an unwrapped model.
+The model becomes smaller after speedup,
 and reaches a higher sparsity ratio because `ModelSpeedup` will propagate the masks across layers.

 .. GENERATED FROM PYTHON SOURCE LINES 88-97
@@ -226,10 +226,10 @@ and reaches a higher sparsity ratio because `ModelSpeedup` will propagate the ma
 .. code-block:: default


-    # need to unwrap the model, if the model is wrapped before speed up
+    # need to unwrap the model, if the model is wrapped before speedup
    pruner._unwrap_model()

-    # speed up the model
+    # speedup the model
    from nni.compression.pytorch.speedup import ModelSpeedup

    ModelSpeedup(model, torch.rand(3, 1, 28, 28).to(device), masks).speedup_model()
@@ -254,7 +254,7 @@ and reaches a higher sparsity ratio because `ModelSpeedup` will propagate the ma

 .. GENERATED FROM PYTHON SOURCE LINES 98-99

-the model will become real smaller after speed up
+the model will become real smaller after speedup

 .. GENERATED FROM PYTHON SOURCE LINES 99-101

@@ -288,7 +288,7 @@ the model will become real smaller after speed up
 Fine-tuning Compacted Model
 ---------------------------
 Note that if the model has been sped up, you need to re-initialize a new optimizer for fine-tuning.
-Because speed up will replace the masked big layers with dense small ones.
+Because speedup will replace the masked big layers with dense small ones.

 .. GENERATED FROM PYTHON SOURCE LINES 106-110

@@ -308,7 +308,7 @@ Because speed up will replace the masked big layers with dense small ones.

 .. rst-class:: sphx-glr-timing

-   **Total running time of the script:** ( 1 minutes  33.096 seconds)
+   **Total running time of the script:** ( 1 minutes  26.953 seconds)


 .. _sphx_glr_download_tutorials_pruning_quick_start_mnist.py: