Fix bug in document conversion (#3203)

1e439e45 · kvartet · GitHub · 9520f251 · 1e439e45 · 1e439e45
Unverified Commit 1e439e45 authored Dec 25, 2020 by kvartet Committed by GitHub Dec 25, 2020
20 changed files
--- a/README.md
+++ b/README.md
--- a/docs/en_US/Assessor/CustomizeAssessor.rst
+++ b/docs/en_US/Assessor/CustomizeAssessor.rst
@@ -60,8 +60,6 @@ The working directory of your assessor is ``<home>/nni-experiments/<experiment_i

 More detail example you could see:

-..
-
-   * :githublink:`medianstop-assessor <src/sdk/pynni/nni/medianstop_assessor>`
-   * :githublink:`curvefitting-assessor <src/sdk/pynni/nni/curvefitting_assessor>`
+* :githublink:`medianstop-assessor <nni/algorithms/hpo/medianstop_assessor.py>`
+* :githublink:`curvefitting-assessor <nni/algorithms/hpo/curvefitting_assessor/>`

--- a/docs/en_US/CommunitySharings/AutoCompletion.rst
+++ b/docs/en_US/CommunitySharings/AutoCompletion.rst
@@ -27,6 +27,8 @@ Step 1. Download ``bash-completion``

 Here, {nni-version} should by replaced by the version of NNI, e.g., ``master``\ , ``v1.9``. You can also check the latest ``bash-completion`` script :githublink:`here <tools/bash-completion>`.

+.. cannot find :githublink:`here <tools/bash-completion>`.
+
 Step 2. Install the script
 ^^^^^^^^^^^^^^^^^^^^^^^^^^


--- a/docs/en_US/CommunitySharings/HpoComparison.rst
+++ b/docs/en_US/CommunitySharings/HpoComparison.rst
@@ -229,9 +229,9 @@ Storage performance

 **Latency**\ : each IO request will take some time to complete, this is called the average latency. There are several factors that would affect this time including network connection quality and hard disk IO performance.

-**IOPS**\ :** IO operations per second**\ , which means the amount of *read or write operations* that could be done in one seconds time.
+**IOPS**\ : **IO operations per second**\ , which means the amount of *read or write operations* that could be done in one seconds time.

-**IO size**\ :** the size of each IO request**. Depending on the operating system and the application/service that needs disk access it will issue a request to read or write a certain amount of data at the same time.
+**IO size**\ : **the size of each IO request**. Depending on the operating system and the application/service that needs disk access it will issue a request to read or write a certain amount of data at the same time.

 **Throughput (in MB/s) = Average IO size x IOPS** 


--- a/docs/en_US/CommunitySharings/ModelCompressionComparison.rst
+++ b/docs/en_US/CommunitySharings/ModelCompressionComparison.rst
@@ -42,7 +42,7 @@ Experiment Result

 For each dataset/model/pruner combination, we prune the model to different levels by setting a series of target sparsities for the pruner. 

-Here we plot both **Number of Weights - Performances** curve and** FLOPs - Performance** curve. 
+Here we plot both **Number of Weights - Performances** curve and **FLOPs - Performance** curve. 
 As a reference, we also plot the result declared in the paper `AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates <http://arxiv.org/abs/1907.03141>`__ for models VGG16 and ResNet18 on CIFAR-10.

 The experiment result are shown in the following figures:

--- a/docs/en_US/CommunitySharings/NNI_AutoFeatureEng.rst
+++ b/docs/en_US/CommunitySharings/NNI_AutoFeatureEng.rst
@@ -7,7 +7,7 @@ NNI review article from Zhihu: :raw-html:`<an open source project with highly re

 The article is by a NNI user on Zhihu forum. In the article, Garvin had shared his experience on using NNI for Automatic Feature Engineering. We think this article is very useful for users who are interested in using NNI for feature engineering. With author's permission, we translated the original article into English.  

-**原文(source)**\ : `如何看待微软最新发布的AutoML平台NNI？By Garvin Li <https://www.zhihu.com/question/297982959/answer/964961829?utm_source=wechat_session&utm_medium=social&utm_oi=28812108627968&from=singlemessage&isappinstalled=0>`__
+**source**\ : `如何看待微软最新发布的AutoML平台NNI？By Garvin Li <https://www.zhihu.com/question/297982959/answer/964961829?utm_source=wechat_session&utm_medium=social&utm_oi=28812108627968&from=singlemessage&isappinstalled=0>`__

 01 Overview of AutoML
 ---------------------
@@ -24,7 +24,7 @@ Microsoft, to help users design and tune machine learning models, neural network
 architectures, or a complex system’s parameters in an efficient and automatic
 way.

-Link:\ ` https://github.com/Microsoft/nni <https://github.com/Microsoft/nni>`__
+Link: `https://github.com/Microsoft/nni <https://github.com/Microsoft/nni>`__

 In general, most of Microsoft tools have one prominent characteristic: the
 design is highly reasonable (regardless of the technology innovation degree).
@@ -51,7 +51,7 @@ NNI treats AutoFeatureENG as a two-steps-task, feature generation exploration an
 04 Feature Exploration
 ----------------------

-For feature derivation, NNI offers many operations which could automatically generate new features, which list \ `as following <https://github.com/SpongebBob/tabular_automl_NNI/blob/master/AutoFEOp.rst>`__\  :
+For feature derivation, NNI offers many operations which could automatically generate new features, which list \ `as following <https://github.com/SpongebBob/tabular_automl_NNI/blob/master/AutoFEOp.md>`__\  :

 **count**\ : Count encoding is based on replacing categories with their counts computed on the train set, also named frequency encoding.

@@ -111,7 +111,7 @@ To avoid feature explosion and overfitting, feature selection is necessary. In t
   :alt: 


-If you have used **XGBoost** or** GBDT**\ , you would know the algorithm based on tree structure can easily calculate the importance of each feature on results. LightGBM is able to make feature selection naturally.
+If you have used **XGBoost** or **GBDT**\ , you would know the algorithm based on tree structure can easily calculate the importance of each feature on results. LightGBM is able to make feature selection naturally.

 The issue is that selected features might be applicable to *GBDT* (Gradient Boosting Decision Tree), but not to the linear algorithm like *LR* (Logistic Regression).


--- a/docs/en_US/CommunitySharings/NasComparison.rst
+++ b/docs/en_US/CommunitySharings/NasComparison.rst
@@ -61,7 +61,7 @@ To avoid over-fitting in **CIFAR-10**\ , we also compare the models in the other

 We do not change the default fine-tuning technique in their source code. In order to match each task, the codes of input image shape and output numbers are changed.

-Search phase time for all NAS methods is **two days** as well as the retrain time.  Average results are reported based on** three repeat times**. Our evaluation machines have one Nvidia Tesla P100 GPU, 112GB of RAM and one 2.60GHz CPU (Intel E5-2690).
+Search phase time for all NAS methods is **two days** as well as the retrain time.  Average results are reported based on **three repeat times**. Our evaluation machines have one Nvidia Tesla P100 GPU, 112GB of RAM and one 2.60GHz CPU (Intel E5-2690).

 For NAO, it requires too much computing resources, so we only use NAO-WS which provides the pipeline script.


--- a/docs/en_US/CommunitySharings/ParallelizingTpeSearch.rst
+++ b/docs/en_US/CommunitySharings/ParallelizingTpeSearch.rst
@@ -176,8 +176,8 @@ Note: The total number of samples per test is 240 (ensure that the budget is equ
 References
 ----------

-[1] James Bergstra, Remi Bardenet, Yoshua Bengio, Balazs Kegl. "Algorithms for Hyper-Parameter Optimization". `Link <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__
+[1] James Bergstra, Remi Bardenet, Yoshua Bengio, Balazs Kegl. `Algorithms for Hyper-Parameter Optimization. <https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf>`__

-[2] Meng-Hiot Lim, Yew-Soon Ong. "Computational Intelligence in Expensive Optimization Problems". `Link <https://link.springer.com/content/pdf/10.1007%2F978-3-642-10701-6.pdf>`__
+[2] Meng-Hiot Lim, Yew-Soon Ong. `Computational Intelligence in Expensive Optimization Problems. <https://link.springer.com/content/pdf/10.1007%2F978-3-642-10701-6.pdf>`__

-[3] M. Jordan, J. Kleinberg, B. Scho¨lkopf. "Pattern Recognition and Machine Learning". `Link <http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf>`__
+[3] M. Jordan, J. Kleinberg, B. Scho¨lkopf. `Pattern Recognition and Machine Learning. <http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf>`__
--- a/docs/en_US/CommunitySharings/RecommendersSvd.rst
+++ b/docs/en_US/CommunitySharings/RecommendersSvd.rst
@@ -4,12 +4,12 @@ Automatically tuning SVD (NNI in Recommenders)
 In this tutorial, we first introduce a github repo `Recommenders <https://github.com/Microsoft/Recommenders>`__. It is a repository that provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. It has various models that are popular and widely deployed in recommendation systems. To provide a complete end-to-end experience, they present each example in five key tasks, as shown below:


-* `Prepare Data <https://github.com/Microsoft/Recommenders/blob/master/notebooks/01_prepare_data/README.rst>`__\ : Preparing and loading data for each recommender algorithm.
-* `Model <https://github.com/Microsoft/Recommenders/blob/master/notebooks/02_model/README.rst>`__\ : Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares (\ `ALS <https://spark.apache.org/docs/latest/api/python/_modules/pyspark/ml/recommendation.html#ALS>`__\ ) or eXtreme Deep Factorization Machines (\ `xDeepFM <https://arxiv.org/abs/1803.05170>`__\ ).
-* `Evaluate <https://github.com/Microsoft/Recommenders/blob/master/notebooks/03_evaluate/README.rst>`__\ : Evaluating algorithms with offline metrics.
-* `Model Select and Optimize <https://github.com/Microsoft/Recommenders/blob/master/notebooks/04_model_select_and_optimize/README.rst>`__\ : Tuning and optimizing hyperparameters for recommender models.
-* `Operationalize <https://github.com/Microsoft/Recommenders/blob/master/notebooks/05_operationalize/README.rst>`__\ : Operationalizing models in a production environment on Azure.
+* `Prepare Data <https://github.com/microsoft/recommenders/tree/master/examples/01_prepare_data>`__\ : Preparing and loading data for each recommender algorithm.
+* Model(`collaborative filtering algorithms <https://github.com/microsoft/recommenders/tree/master/examples/02_model_collaborative_filtering>`__\ , `content-based filtering algorithms <https://github.com/microsoft/recommenders/tree/master/examples/02_model_content_based_filtering>`__\ , `hybrid algorithms <https://github.com/microsoft/recommenders/tree/master/examples/02_model_hybrid>`__\ ): Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares (\ `ALS <https://spark.apache.org/docs/latest/api/python/_modules/pyspark/ml/recommendation.html#ALS>`__\ ) or eXtreme Deep Factorization Machines (\ `xDeepFM <https://arxiv.org/abs/1803.05170>`__\ ).
+* `Evaluate <https://github.com/microsoft/recommenders/tree/master/examples/03_evaluate>`__\ : Evaluating algorithms with offline metrics.
+* `Model Select and Optimize <https://github.com/microsoft/recommenders/tree/master/examples/04_model_select_and_optimize>`__\ : Tuning and optimizing hyperparameters for recommender models.
+* `Operationalize <https://github.com/microsoft/recommenders/tree/master/examples/05_operationalize>`__\ : Operationalizing models in a production environment on Azure.

-The fourth task is tuning and optimizing the model's hyperparameters, this is where NNI could help. To give a concrete example that NNI tunes the models in Recommenders, let's demonstrate with the model `SVD <https://github.com/Microsoft/Recommenders/blob/master/notebooks/02_model/surprise_svd_deep_dive.ipynb>`__\ , and data Movielens100k. There are more than 10 hyperparameters to be tuned in this model.
+The fourth task is tuning and optimizing the model's hyperparameters, this is where NNI could help. To give a concrete example that NNI tunes the models in Recommenders, let's demonstrate with the model `SVD <https://github.com/microsoft/recommenders/blob/master/examples/02_model_collaborative_filtering/surprise_svd_deep_dive.ipynb>`__\ , and data Movielens100k. There are more than 10 hyperparameters to be tuned in this model.

-`This Jupyter notebook <https://github.com/Microsoft/Recommenders/blob/master/notebooks/04_model_select_and_optimize/nni_surprise_svd.ipynb>`__ provided by Recommenders is a very detailed step-by-step tutorial for this example. It uses different built-in tuning algorithms in NNI, including ``Annealing``\ , ``SMAC``\ , ``Random Search``\ , ``TPE``\ , ``Hyperband``\ , ``Metis`` and ``Evolution``. Finally, the results of different tuning algorithms are compared. Please go through this notebook to learn how to use NNI to tune SVD model, then you could further use NNI to tune other models in Recommenders.
+This `Jupyter notebook <https://github.com/microsoft/recommenders/blob/master/examples/04_model_select_and_optimize/nni_surprise_svd.ipynb>`__ provided by Recommenders is a very detailed step-by-step tutorial for this example. It uses different built-in tuning algorithms in NNI, including ``Annealing``\ , ``SMAC``\ , ``Random Search``\ , ``TPE``\ , ``Hyperband``\ , ``Metis`` and ``Evolution``. Finally, the results of different tuning algorithms are compared. Please go through this notebook to learn how to use NNI to tune SVD model, then you could further use NNI to tune other models in Recommenders.
--- a/docs/en_US/CommunitySharings/SptagAutoTune.rst
+++ b/docs/en_US/CommunitySharings/SptagAutoTune.rst
@@ -6,4 +6,4 @@ Automatically tuning SPTAG with NNI
 This library assumes that the samples are represented as vectors and that the vectors can be compared by L2 distances or cosine distances. Vectors returned for a query vector are the vectors that have smallest L2 distance or cosine distances with the query vector.
 SPTAG provides two methods: kd-tree and relative neighborhood graph (SPTAG-KDT) and balanced k-means tree and relative neighborhood graph (SPTAG-BKT). SPTAG-KDT is advantageous in index building cost, and SPTAG-BKT is advantageous in search accuracy in very high-dimensional data.

-In SPTAG, there are tens of parameters that can be tuned for specified scenarios or datasets. NNI is a great tool for automatically tuning those parameters. The authors of SPTAG tried NNI for the auto tuning and found good-performing parameters easily, thus, they shared the practice of tuning SPTAG on NNI in their document `here <https://github.com/microsoft/SPTAG/blob/master/docs/Parameters.rst>`__. Please refer to it for detailed tutorial.
+In SPTAG, there are tens of parameters that can be tuned for specified scenarios or datasets. NNI is a great tool for automatically tuning those parameters. The authors of SPTAG tried NNI for the auto tuning and found good-performing parameters easily, thus, they shared the practice of tuning SPTAG on NNI in their document `here <https://github.com/microsoft/SPTAG/blob/master/docs/Parameters.md>`__. Please refer to it for detailed tutorial.
--- a/docs/en_US/CommunitySharings/autosys.rst
+++ b/docs/en_US/CommunitySharings/autosys.rst
@@ -2,7 +2,7 @@
 Automatic System Tuning
 #######################

-The performance of systems, such as database, tensor operator implementaion, often need to be tuned to adapt to specific hardware configuration, targeted workload, etc. Manually tuning a system is complicated and often requires detailed understanding of hardware and workload. NNI can make such tasks much easier and help system owners find the best configuration to the system automatically. The detailed design philosophy of automatic system tuning can be found in [this paper](https://dl.acm.org/doi/10.1145/3352020.3352031). The following are some typical cases that NNI can help.
+The performance of systems, such as database, tensor operator implementaion, often need to be tuned to adapt to specific hardware configuration, targeted workload, etc. Manually tuning a system is complicated and often requires detailed understanding of hardware and workload. NNI can make such tasks much easier and help system owners find the best configuration to the system automatically. The detailed design philosophy of automatic system tuning can be found in this `paper <https://dl.acm.org/doi/10.1145/3352020.3352031>`__\ . The following are some typical cases that NNI can help.

 ..  toctree::
    :maxdepth: 1

--- a/docs/en_US/Compression/AutoPruningUsingTuners.rst
+++ b/docs/en_US/Compression/AutoPruningUsingTuners.rst
@@ -15,7 +15,7 @@ You can easily compress a model with NNI compression. Take pruning for example,
   pruner = LevelPruner(model, config_list)
   pruner.compress()

-The 'default' op_type stands for the module types defined in :githublink:`default_layers.py <src/sdk/pynni/nni/compression/pytorch/default_layers.py>` for pytorch.
+The 'default' op_type stands for the module types defined in :githublink:`default_layers.py <nni/compression/pytorch/default_layers.py>` for pytorch.

 Therefore ``{ 'sparsity': 0.8, 'op_types': ['default'] }``\ means that **all layers with specified op_types will be compressed with the same 0.8 sparsity**. When ``pruner.compress()`` called, the model is compressed with masks and after that you can normally fine tune this model and **pruned weights won't be updated** which have been masked.


--- a/docs/en_US/Compression/CustomizeCompressor.rst
+++ b/docs/en_US/Compression/CustomizeCompressor.rst
@@ -5,7 +5,7 @@ Customize New Compression Algorithm

 In order to simplify the process of writing new compression algorithms, we have designed simple and flexible programming interface, which covers pruning and quantization. Below, we first demonstrate how to customize a new pruning algorithm and then demonstrate how to customize a new quantization algorithm.

-**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference `Framework overview of model compression </Compression/Framework.html>`__
+**Important Note** To better understand how to customize new pruning/quantization algorithms, users should first understand the framework that supports various pruning algorithms in NNI. Reference `Framework overview of model compression <../Compression/Framework.rst>`__

 Customize a new pruning algorithm
 ---------------------------------
@@ -28,7 +28,7 @@ An implementation of ``weight masker`` may look like this:
           # mask = ...
           return {'weight_mask': mask}

-You can reference nni provided :githublink:`weight masker <src/sdk/pynni/nni/compression/pytorch/pruning/structured_pruning.py>` implementations to implement your own weight masker.
+You can reference nni provided :githublink:`weight masker <nni/algorithms/compression/pytorch/pruning/structured_pruning.py>` implementations to implement your own weight masker.

 A basic ``pruner`` looks likes this:

@@ -52,7 +52,7 @@ A basic ``pruner`` looks likes this:
               wrapper.if_calculated = True
               return masks

-Reference nni provided :githublink:`pruner <src/sdk/pynni/nni/compression/pytorch/pruning/one_shot.py>` implementations to implement your own pruner class.
+Reference nni provided :githublink:`pruner <nni/algorithms/compression/pytorch/pruning/one_shot.py>` implementations to implement your own pruner class.

 ----


--- a/docs/en_US/Compression/DependencyAware.rst
+++ b/docs/en_US/Compression/DependencyAware.rst
@@ -3,7 +3,7 @@ Dependency-aware Mode for Filter Pruning

 Currently, we have several filter pruning algorithm for the convolutional layers: FPGM Pruner, L1Filter Pruner, L2Filter Pruner, Activation APoZ Rank Filter Pruner, Activation Mean Rank Filter Pruner, Taylor FO On Weight Pruner. In these filter pruning algorithms, the pruner will prune each convolutional layer separately. While pruning a convolution layer, the algorithm will quantify the importance of each filter based on some specific rules(such as l1-norm), and prune the less important filters.

-As `dependency analysis utils <./CompressionUtils.md>`__ shows, if the output channels of two convolutional layers(conv1, conv2) are added together, then these two conv layers have channel dependency with each other(more details please see `Compression Utils <./CompressionUtils.rst>`__\ ). Take the following figure as an example.
+As `dependency analysis utils <./CompressionUtils.rst>`__ shows, if the output channels of two convolutional layers(conv1, conv2) are added together, then these two conv layers have channel dependency with each other(more details please see `Compression Utils <./CompressionUtils.rst>`__\ ). Take the following figure as an example.


 .. image:: ../../img/mask_conflict.jpg

--- a/docs/en_US/Compression/Overview.rst
+++ b/docs/en_US/Compression/Overview.rst
@@ -32,39 +32,39 @@ Pruning algorithms compress the original network by removing redundant weights o

   * - Name
     - Brief Introduction of Algorithm
-   * - `Level Pruner </Compression/Pruner.html#level-pruner>`__
+   * - `Level Pruner <Pruner.rst#level-pruner>`__
     - Pruning the specified ratio on each weight based on absolute values of weights
-   * - `AGP Pruner </Compression/Pruner.html#agp-pruner>`__
+   * - `AGP Pruner <../Compression/Pruner.rst#agp-pruner>`__
     - Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) `Reference Paper <https://arxiv.org/abs/1710.01878>`__
-   * - `Lottery Ticket Pruner </Compression/Pruner.html#lottery-ticket-hypothesis>`__
+   * - `Lottery Ticket Pruner <../Compression/Pruner.rst#lottery-ticket-hypothesis>`__
     - The pruning process used by "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". It prunes a model iteratively. `Reference Paper <https://arxiv.org/abs/1803.03635>`__
-   * - `FPGM Pruner </Compression/Pruner.html#fpgm-pruner>`__
+   * - `FPGM Pruner <../Compression/Pruner.rst#fpgm-pruner>`__
     - Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration `Reference Paper <https://arxiv.org/pdf/1811.00250.pdf>`__
-   * - `L1Filter Pruner </Compression/Pruner.html#l1filter-pruner>`__
+   * - `L1Filter Pruner <../Compression/Pruner.rst#l1filter-pruner>`__
     - Pruning filters with the smallest L1 norm of weights in convolution layers (Pruning Filters for Efficient Convnets) `Reference Paper <https://arxiv.org/abs/1608.08710>`__
-   * - `L2Filter Pruner </Compression/Pruner.html#l2filter-pruner>`__
+   * - `L2Filter Pruner <../Compression/Pruner.rst#l2filter-pruner>`__
     - Pruning filters with the smallest L2 norm of weights in convolution layers
-   * - `ActivationAPoZRankFilterPruner </Compression/Pruner.html#activationapozrankfilterpruner>`__
+   * - `ActivationAPoZRankFilterPruner <../Compression/Pruner.rst#activationapozrankfilter-pruner>`__
     - Pruning filters based on the metric APoZ (average percentage of zeros) which measures the percentage of zeros in activations of (convolutional) layers. `Reference Paper <https://arxiv.org/abs/1607.03250>`__
-   * - `ActivationMeanRankFilterPruner </Compression/Pruner.html#activationmeanrankfilterpruner>`__
+   * - `ActivationMeanRankFilterPruner <../Compression/Pruner.rst#activationmeanrankfilter-pruner>`__
     - Pruning filters based on the metric that calculates the smallest mean value of output activations
-   * - `Slim Pruner </Compression/Pruner.html#slim-pruner>`__
+   * - `Slim Pruner <../Compression/Pruner.rst#slim-pruner>`__
     - Pruning channels in convolution layers by pruning scaling factors in BN layers(Learning Efficient Convolutional Networks through Network Slimming) `Reference Paper <https://arxiv.org/abs/1708.06519>`__
-   * - `TaylorFO Pruner </Compression/Pruner.html#taylorfoweightfilterpruner>`__
+   * - `TaylorFO Pruner <../Compression/Pruner.rst#taylorfoweightfilter-pruner>`__
     - Pruning filters based on the first order taylor expansion on weights(Importance Estimation for Neural Network Pruning) `Reference Paper <http://jankautz.com/publications/Importance4NNPruning_CVPR19.pdf>`__
-   * - `ADMM Pruner </Compression/Pruner.html#admm-pruner>`__
+   * - `ADMM Pruner <../Compression/Pruner.rst#admm-pruner>`__
     - Pruning based on ADMM optimization technique `Reference Paper <https://arxiv.org/abs/1804.03294>`__
-   * - `NetAdapt Pruner </Compression/Pruner.html#netadapt-pruner>`__
+   * - `NetAdapt Pruner <../Compression/Pruner.rst#netadapt-pruner>`__
     - Automatically simplify a pretrained network to meet the resource budget by iterative pruning  `Reference Paper <https://arxiv.org/abs/1804.03230>`__
-   * - `SimulatedAnnealing Pruner </Compression/Pruner.html#simulatedannealing-pruner>`__
+   * - `SimulatedAnnealing Pruner <../Compression/Pruner.rst#simulatedannealing-pruner>`__
     - Automatic pruning with a guided heuristic search method, Simulated Annealing algorithm `Reference Paper <https://arxiv.org/abs/1907.03141>`__
-   * - `AutoCompress Pruner </Compression/Pruner.html#autocompress-pruner>`__
+   * - `AutoCompress Pruner <../Compression/Pruner.rst#autocompress-pruner>`__
     - Automatic pruning by iteratively call SimulatedAnnealing Pruner and ADMM Pruner `Reference Paper <https://arxiv.org/abs/1907.03141>`__
-   * - `AMC Pruner </Compression/Pruner.html#amc-pruner>`__
+   * - `AMC Pruner <../Compression/Pruner.rst#amc-pruner>`__
     - AMC: AutoML for Model Compression and Acceleration on Mobile Devices `Reference Paper <https://arxiv.org/pdf/1802.03494.pdf>`__


-You can refer to this :githublink:`benchmark <docs/en_US/CommunitySharings/ModelCompressionComparison.rst>` for the performance of these pruners on some benchmark problems.
+You can refer to this `benchmark <../CommunitySharings/ModelCompressionComparison.rst>`__ for the performance of these pruners on some benchmark problems.

 Quantization Algorithms
 ^^^^^^^^^^^^^^^^^^^^^^^
@@ -77,13 +77,13 @@ Quantization algorithms compress the original network by reducing the number of

   * - Name
     - Brief Introduction of Algorithm
-   * - `Naive Quantizer </Compression/Quantizer.html#naive-quantizer>`__
+   * - `Naive Quantizer <../Compression/Quantizer.rst#naive-quantizer>`__
     - Quantize weights to default 8 bits
-   * - `QAT Quantizer </Compression/Quantizer.html#qat-quantizer>`__
+   * - `QAT Quantizer <../Compression/Quantizer.rst#qat-quantizer>`__
     - Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. `Reference Paper <http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf>`__
-   * - `DoReFa Quantizer </Compression/Quantizer.html#dorefa-quantizer>`__
+   * - `DoReFa Quantizer <../Compression/Quantizer.rst#dorefa-quantizer>`__
     - DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. `Reference Paper <https://arxiv.org/abs/1606.06160>`__
-   * - `BNN Quantizer </Compression/Quantizer.html#bnn-quantizer>`__
+   * - `BNN Quantizer <../Compression/Quantizer.rst#bnn-quantizer>`__
     - Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. `Reference Paper <https://arxiv.org/abs/1602.02830>`__



--- a/docs/en_US/Compression/Pruner.rst
+++ b/docs/en_US/Compression/Pruner.rst
@@ -113,7 +113,7 @@ User configuration for Slim Pruner
 Reproduced Experiment
 ^^^^^^^^^^^^^^^^^^^^^

-We implemented one of the experiments in `'Learning Efficient Convolutional Networks through Network Slimming' <https://arxiv.org/pdf/1708.06519.pdf>`__\ , we pruned $70\%$ channels in the **VGGNet** for CIFAR-10 in the paper, in which $88.5\%$ parameters are pruned. Our experiments results are as follows:
+We implemented one of the experiments in `Learning Efficient Convolutional Networks through Network Slimming <https://arxiv.org/pdf/1708.06519.pdf>`__\ , we pruned ``70%`` channels in the **VGGNet** for CIFAR-10 in the paper, in which ``88.5%`` parameters are pruned. Our experiments results are as follows:

 .. list-table::
   :header-rows: 1
@@ -182,7 +182,7 @@ User configuration for FPGM Pruner
 L1Filter Pruner
 ---------------

-This is an one-shot pruner, In `'PRUNING FILTERS FOR EFFICIENT CONVNETS' <https://arxiv.org/abs/1608.08710>`__\ , authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.
+This is an one-shot pruner, In `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__\ , authors Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet and Hans Peter Graf.


 .. image:: ../../img/l1filter_pruner.png
@@ -232,7 +232,7 @@ User configuration for L1Filter Pruner
 Reproduced Experiment
 ^^^^^^^^^^^^^^^^^^^^^

-We implemented one of the experiments in `'PRUNING FILTERS FOR EFFICIENT CONVNETS' <https://arxiv.org/abs/1608.08710>`__ with **L1FilterPruner**\ , we pruned** VGG-16** for CIFAR-10 to** VGG-16-pruned-A** in the paper, in which $64\%$ parameters are pruned. Our experiments results are as follows:
+We implemented one of the experiments in `PRUNING FILTERS FOR EFFICIENT CONVNETS <https://arxiv.org/abs/1608.08710>`__ with **L1FilterPruner**\ , we pruned **VGG-16** for CIFAR-10 to **VGG-16-pruned-A** in the paper, in which ``64%`` parameters are pruned. Our experiments results are as follows:

 .. list-table::
   :header-rows: 1
@@ -330,7 +330,7 @@ User configuration for ActivationAPoZRankFilter Pruner
 ActivationMeanRankFilter Pruner
 -------------------------------

-ActivationMeanRankFilterPruner is a pruner which prunes the filters with the smallest importance criterion ``mean activation`` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion ``mean activation`` is explained in section 2.2 of the paper\ `Pruning Convolutional Neural Networks for Resource Efficient Inference <https://arxiv.org/abs/1611.06440>`__. Other pruning criteria mentioned in this paper will be supported in future release.
+ActivationMeanRankFilterPruner is a pruner which prunes the filters with the smallest importance criterion ``mean activation`` calculated from the output activations of convolution layers to achieve a preset level of network sparsity. The pruning criterion ``mean activation`` is explained in section 2.2 of the paper `Pruning Convolutional Neural Networks for Resource Efficient Inference <https://arxiv.org/abs/1611.06440>`__. Other pruning criteria mentioned in this paper will be supported in future release.

 We also provide a dependency-aware mode for this pruner to get better speedup from the pruning. Please reference `dependency-aware <./DependencyAware.rst>`__ for more details.


--- a/docs/en_US/Compression/Quantizer.rst
+++ b/docs/en_US/Compression/Quantizer.rst
@@ -71,7 +71,7 @@ You can view example for more information
 User configuration for QAT Quantizer
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-common configuration needed by compression algorithms can be found at `Specification of ``config_list`` <./QuickStart.rst>`__.
+common configuration needed by compression algorithms can be found at `Specification of `config_list <./QuickStart.rst>`__.

 configuration needed by this algorithm :


--- a/docs/en_US/Compression/QuickStart.rst
+++ b/docs/en_US/Compression/QuickStart.rst
@@ -8,7 +8,7 @@ In this tutorial, we use the `first section <#quick-start-to-compress-a-model>`_
 Quick Start to Compress a Model
 -------------------------------

-NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use `slim pruner </Compression/Pruner.html#slim-pruner>`__ as an example to show the usage.
+NNI provides very simple APIs for compressing a model. The compression includes pruning algorithms and quantization algorithms. The usage of them are the same, thus, here we use `slim pruner <../Compression/Pruner.rst#slim-pruner>`__ as an example to show the usage.

 Write configuration
 ^^^^^^^^^^^^^^^^^^^
@@ -82,13 +82,13 @@ Tensorflow code
   pruner = LevelPruner(tf.get_default_graph(), config_list)
   pruner.compress()

-You can use other compression algorithms in the package of ``nni.compression``. The algorithms are implemented in both PyTorch and TensorFlow (partial support on TensorFlow), under ``nni.compression.pytorch`` and ``nni.compression.tensorflow`` respectively. You can refer to `Pruner <./Pruner.md>`__ and `Quantizer <./Quantizer.md>`__ for detail description of supported algorithms. Also if you want to use knowledge distillation, you can refer to `KDExample <../TrialExample/KDExample.rst>`__
+You can use other compression algorithms in the package of ``nni.compression``. The algorithms are implemented in both PyTorch and TensorFlow (partial support on TensorFlow), under ``nni.compression.pytorch`` and ``nni.compression.tensorflow`` respectively. You can refer to `Pruner <./Pruner.rst>`__ and `Quantizer <./Quantizer.rst>`__ for detail description of supported algorithms. Also if you want to use knowledge distillation, you can refer to `KDExample <../TrialExample/KDExample.rst>`__

 A compression algorithm is first instantiated with a ``config_list`` passed in. The specification of this ``config_list`` will be described later.

 The function call ``pruner.compress()`` modifies user defined model (in Tensorflow the model can be obtained with ``tf.get_default_graph()``\ , while in PyTorch the model is the defined model class), and the model is modified with masks inserted. Then when you run the model, the masks take effect. The masks can be adjusted at runtime by the algorithms.

-*Note that, ``pruner.compress`` simply adds masks on model weights, it does not include fine tuning logic. If users want to fine tune the compressed model, they need to write the fine tune logic by themselves after ``pruner.compress``.*
+Note that, ``pruner.compress`` simply adds masks on model weights, it does not include fine tuning logic. If users want to fine tune the compressed model, they need to write the fine tune logic by themselves after ``pruner.compress``.

 Specification of ``config_list``
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -104,7 +104,7 @@ There are different keys in a ``dict``. Some of them are common keys supported b
 * **op_names**\ : This is to specify by name what operations to be compressed. If this field is omitted, operations will not be filtered by it.
 * **exclude**\ : Default is False. If this field is True, it means the operations with specified types and names will be excluded from the compression.

-Some other keys are often specific to a certain algorithms, users can refer to `pruning algorithms <./Pruner.md>`__ and `quantization algorithms <./Quantizer.rst>`__ for the keys allowed by each algorithm.
+Some other keys are often specific to a certain algorithms, users can refer to `pruning algorithms <./Pruner.rst>`__ and `quantization algorithms <./Quantizer.rst>`__ for the keys allowed by each algorithm.

 A simple example of configuration is shown below:

@@ -190,7 +190,7 @@ In this example, 'op_names' is the name of layer and four layers will be quantiz
 APIs for Updating Fine Tuning Status
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-Some compression algorithms use epochs to control the progress of compression (e.g. `AGP </Compression/Pruner.html#agp-pruner>`__\ ), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke: ``pruner.update_epoch(epoch)`` and ``pruner.step()``.
+Some compression algorithms use epochs to control the progress of compression (e.g. `AGP <../Compression/Pruner.rst#agp-pruner>`__\ ), and some algorithms need to do something after every minibatch. Therefore, we provide another two APIs for users to invoke: ``pruner.update_epoch(epoch)`` and ``pruner.step()``.

 ``update_epoch`` should be invoked in every epoch, while ``step`` should be invoked after each minibatch. Note that most algorithms do not require calling the two APIs. Please refer to each algorithm's document for details. For the algorithms that do not need them, calling them is allowed but has no effect.


--- a/docs/en_US/Compression/quantization.rst
+++ b/docs/en_US/Compression/quantization.rst
@@ -8,8 +8,8 @@ format for model weights is 32-bit float, or FP32. Many research works have demo
 can be represented using 8-bit integers without significant loss in accuracy. Even lower bit-widths, such as 4/2/1 bits,
 is an active field of research.

-A quantizer is a quantization algorithm implementation in NNI, NNI provides multiple quntizers as below. You can also
-create your own quntizer using NNI model compression interface.
+A quantizer is a quantization algorithm implementation in NNI, NNI provides multiple quantizers as below. You can also
+create your own quantizer using NNI model compression interface.

 ..  toctree::
    :maxdepth: 2

--- a/docs/en_US/FeatureEngineering/GBDTSelector.rst
+++ b/docs/en_US/FeatureEngineering/GBDTSelector.rst
@@ -40,7 +40,7 @@ Then

 And you could reference the examples in ``/examples/feature_engineering/gbdt_selector/``\ , too.

-**Requirement of ``fit`` FuncArgs**
+**Requirement of fit FuncArgs**


 * 
@@ -64,7 +64,7 @@ And you could reference the examples in ``/examples/feature_engineering/gbdt_sel
 * 
  **num_boost_round** (int, require) - number of boost round. The detail you could reference `here <https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.train.html#lightgbm.train>`__.

-**Requirement of ``get_selected_features`` FuncArgs**
+**Requirement of get_selected_features FuncArgs**


 * **topk** (int, require) - the topK impotance features you want to selected.