Merge branch 'master' of github.com:Microsoft/nni into dev-nas-refactor

594924a9 · quzha · d43fbe82 · 262fabf1 · 594924a9 · 594924a9
Commit 594924a9 authored Nov 18, 2019 by quzha
20 changed files
--- a/azure-pipelines.yml
+++ b/azure-pipelines.yml
@@ -126,18 +126,10 @@ jobs:
      cd test
      powershell.exe -file unittest.ps1
    displayName: 'unit test'
-  - script: |
-      cd test
-      python naive_test.py
-    displayName: 'Naive test'
  - script: |
      cd test
      python tuner_test.py
    displayName: 'Built-in tuners / assessors tests'
-  - script: |
-      cd test
-      python metrics_test.py
-    displayName: 'Trial job metrics test'
  - script: |
      cd test
      PATH=$HOME/.local/bin:$PATH python3 cli_test.py

--- a/docs/en_US/Compressor/Overview.md
+++ b/docs/en_US/Compressor/Overview.md
@@ -12,6 +12,7 @@ We have provided two naive compression algorithms and three popular ones for use
 |---|---|
 | [Level Pruner](./Pruner.md#level-pruner) | Pruning the specified ratio on each weight based on absolute values of weights |
 | [AGP Pruner](./Pruner.md#agp-pruner) | Automated gradual pruning (To prune, or not to prune: exploring the efficacy of pruning for model compression) [Reference Paper](https://arxiv.org/abs/1710.01878)|
+| [FPGM Pruner](./Pruner.md#fpgm-pruner) | Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [Reference Paper](https://arxiv.org/pdf/1811.00250.pdf)|
 | [Naive Quantizer](./Quantizer.md#naive-quantizer) |  Quantize weights to default 8 bits |
 | [QAT Quantizer](./Quantizer.md#qat-quantizer) | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. [Reference Paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)|
 | [DoReFa Quantizer](./Quantizer.md#dorefa-quantizer) | DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. [Reference Paper](https://arxiv.org/abs/1606.06160)|
@@ -180,12 +181,54 @@ class YourQuantizer(nni.compression.tensorflow.Quantizer):
    def quantize_weight(self, weight, config, **kwargs):
        """
-        weight is the target weight tensor
+        quantize should overload this method to quantize weight tensors.
-        config is the selected dict object in config_list for this layer
+        This method is effectively hooked to :meth:`forward` of the model.
-        kwargs contains op, op_types, and op_name
-        design your quantizer and return new weight
+        Parameters
+        ----------
+        weight : Tensor
+            weight that needs to be quantized
+        config : dict
+            the configuration for weight quantization
        """
+        # Put your code to generate `new_weight` here
        return new_weight
+    def quantize_output(self, output, config, **kwargs):
+        """
+        quantize should overload this method to quantize output.
+        This method is effectively hooked to `:meth:`forward` of the model.
+        Parameters
+        ----------
+        output : Tensor
+            output that needs to be quantized
+        config : dict
+            the configuration for output quantization
+        """
+        # Put your code to generate `new_output` here
+        return new_output
+    def quantize_input(self, *inputs, config, **kwargs):
+        """
+        quantize should overload this method to quantize input.
+        This method is effectively hooked to :meth:`forward` of the model.
+        Parameters
+        ----------
+        inputs : Tensor
+            inputs that needs to be quantized
+        config : dict
+            the configuration for inputs quantization
+        """
+        # Put your code to generate `new_input` here
+        return new_input
    # note for pytorch version, there is no sess in input arguments
    def update_epoch(self, epoch_num, sess):
@@ -200,8 +243,6 @@ class YourQuantizer(nni.compression.tensorflow.Quantizer):
        pass
 ```
-__[TODO]__ Will add another member function `quantize_layer_output`, as some quantization algorithms also quantize layers' output.
 ### Usage of user customized compression algorithm
 __[TODO]__ ...
--- a/docs/en_US/Compressor/Pruner.md
+++ b/docs/en_US/Compressor/Pruner.md
@@ -92,3 +92,49 @@ You can view example for more information
 ***
+## FPGM Pruner
+FPGM Pruner is an implementation of paper [Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration](https://arxiv.org/pdf/1811.00250.pdf)
+>Previous works utilized “smaller-norm-less-important” criterion to prune filters with smaller norm values in a convolutional neural network. In this paper, we analyze this norm-based criterion and point out that its effectiveness depends on two requirements that are not always met: (1) the norm deviation of the filters should be large; (2) the minimum norm of the filters should be small. To solve this problem, we propose a novel filter pruning method, namely Filter Pruning via Geometric Median (FPGM), to compress the model regardless of those two requirements. Unlike previous methods, FPGM compresses CNN models by pruning filters with redundancy, rather than those with “relatively less” importance.
+### Usage
+First, you should import pruner and add mask to model.
+Tensorflow code
+```python
+from nni.compression.tensorflow import FPGMPruner
+config_list = [{
+    'sparsity': 0.5,
+    'op_types': ['Conv2D']
+}]
+pruner = FPGMPruner(model, config_list)
+pruner.compress()
+```
+PyTorch code
+```python
+from nni.compression.torch import FPGMPruner
+config_list = [{
+    'sparsity': 0.5,
+    'op_types': ['Conv2d']
+}]
+pruner = FPGMPruner(model, config_list)
+pruner.compress()
+```
+Note: FPGM Pruner is used to prune convolutional layers within deep neural networks, therefore the `op_types` field supports only convolutional layers.
+Second, you should add code below to update epoch number at beginning of each epoch.
+Tensorflow code
+```python
+pruner.update_epoch(epoch, sess)
+```
+PyTorch code
+```python
+pruner.update_epoch(epoch)
+```
+You can view example for more information
+#### User configuration for FPGM Pruner
+* **sparsity:** How much percentage of convolutional filters are to be pruned.
+***
--- a/docs/en_US/Tutorial/Contributing.md
+++ b/docs/en_US/Tutorial/Contributing.md
@@ -40,6 +40,9 @@ A person looking to contribute can take up an issue by claiming it as a comment/
 ## Code Styles & Naming Conventions
 * We follow [PEP8](https://www.python.org/dev/peps/pep-0008/) for Python code and naming conventions, do try to adhere to the same when making a pull request or making a change. One can also take the help of linters such as `flake8` or `pylint`
 * We also follow [NumPy Docstring Style](https://www.sphinx-doc.org/en/master/usage/extensions/example_numpy.html#example-numpy) for Python Docstring Conventions. During the [documentation building](Contributing.md#documentation), we use [sphinx.ext.napoleon](https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html) to generate Python API documentation from Docstring.
+* For docstrings, please refer to [numpydoc docstring guide](https://numpydoc.readthedocs.io/en/latest/format.html) and [pandas docstring guide](https://python-sprints.github.io/pandas/guide/pandas_docstring.html)
+    * For function docstring, **description**, **Parameters**, and **Returns**/**Yields** are mandatory.
+    * For class docstring, **description**, **Attributes** are mandatory.
 ## Documentation
 Our documentation is built with [sphinx](http://sphinx-doc.org/), supporting [Markdown](https://guides.github.com/features/mastering-markdown/) and [reStructuredText](http://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html) format. All our documentations are placed under [docs/en_US](https://github.com/Microsoft/nni/tree/master/docs).
@@ -48,4 +51,4 @@ Our documentation is built with [sphinx](http://sphinx-doc.org/), supporting [Ma
 * For links, please consider using __relative paths__ first. However, if the documentation is written in Markdown format, and:
    * It's an image link which needs to be formatted with embedded html grammar, please use global URL like `https://user-images.githubusercontent.com/44491713/51381727-e3d0f780-1b4f-11e9-96ab-d26b9198ba65.png`, which can be automatically generated by dragging picture onto [Github Issue](https://github.com/Microsoft/nni/issues/new) Box.
    * It cannot be re-formatted by sphinx, such as source code, please use its global URL. For source code that links to our github repo, please use URLs rooted at `https://github.com/Microsoft/nni/tree/master/` ([mnist.py](https://github.com/Microsoft/nni/blob/master/examples/trials/mnist/mnist.py) for example).
\ No newline at end of file
--- a/docs/en_US/sdk_reference.rst
+++ b/docs/en_US/sdk_reference.rst
@@ -24,10 +24,10 @@ Tuner
 ..  autoclass:: nni.evolution_tuner.evolution_tuner.EvolutionTuner
    :members:
-..  autoclass:: nni.smac_tuner.smac_tuner.SMACTuner
+..  autoclass:: nni.smac_tuner.SMACTuner
    :members:
-..  autoclass:: nni.gridsearch_tuner.gridsearch_tuner.GridSearchTuner
+..  autoclass:: nni.gridsearch_tuner.GridSearchTuner
    :members:
 ..  autoclass:: nni.networkmorphism_tuner.networkmorphism_tuner.NetworkMorphismTuner
@@ -36,9 +36,15 @@ Tuner
 ..  autoclass:: nni.metis_tuner.metis_tuner.MetisTuner
    :members:
+..  autoclass:: nni.ppo_tuner.PPOTuner
+    :members:
 ..  autoclass:: nni.batch_tuner.batch_tuner.BatchTuner
    :members:
+..  autoclass:: nni.gp_tuner.gp_tuner.GPTuner
+    :members:
 Assessor
 ------------------------
 ..  autoclass:: nni.assessor.Assessor

--- a/examples/model_compress/fpgm_tf_mnist.py
+++ b/examples/model_compress/fpgm_tf_mnist.py
+import tensorflow as tf
+from tensorflow import keras
+assert tf.__version__ >= "2.0"
+import numpy as np
+from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
+from nni.compression.tensorflow import FPGMPruner
+def get_data():
+    (X_train_full, y_train_full), _ = keras.datasets.mnist.load_data()
+    X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]
+    y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]
+    X_mean = X_train.mean(axis=0, keepdims=True)
+    X_std = X_train.std(axis=0, keepdims=True) + 1e-7
+    X_train = (X_train - X_mean) / X_std
+    X_valid = (X_valid - X_mean) / X_std
+    X_train = X_train[..., np.newaxis]
+    X_valid = X_valid[..., np.newaxis]
+    return X_train, X_valid, y_train, y_valid
+def get_model():
+    model = keras.models.Sequential([
+        Conv2D(filters=32, kernel_size=7, input_shape=[28, 28, 1], activation='relu', padding="SAME"),
+        MaxPooling2D(pool_size=2),
+        Conv2D(filters=64, kernel_size=3, activation='relu', padding="SAME"),
+        MaxPooling2D(pool_size=2),
+        Flatten(),
+        Dense(units=128, activation='relu'),
+        Dropout(0.5),
+        Dense(units=10, activation='softmax'),
+    ])
+    model.compile(loss="sparse_categorical_crossentropy",
+        optimizer=keras.optimizers.SGD(lr=1e-3),
+        metrics=["accuracy"])
+    return model
+def main():
+    X_train, X_valid, y_train, y_valid = get_data()
+    model = get_model()
+    configure_list = [{
+        'sparsity': 0.5,
+        'op_types': ['Conv2D']
+    }]
+    pruner = FPGMPruner(model, configure_list)
+    pruner.compress()
+    update_epoch_callback = keras.callbacks.LambdaCallback(on_epoch_begin=lambda epoch, logs: pruner.update_epoch(epoch))
+    model.fit(X_train, y_train, epochs=10, validation_data=(X_valid, y_valid), callbacks=[update_epoch_callback])
+if __name__ == '__main__':
+    main()
--- a/examples/model_compress/fpgm_torch_mnist.py
+++ b/examples/model_compress/fpgm_torch_mnist.py
+from nni.compression.torch import FPGMPruner
+import torch
+import torch.nn.functional as F
+from torchvision import datasets, transforms
+class Mnist(torch.nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.conv1 = torch.nn.Conv2d(1, 20, 5, 1)
+        self.conv2 = torch.nn.Conv2d(20, 50, 5, 1)
+        self.fc1 = torch.nn.Linear(4 * 4 * 50, 500)
+        self.fc2 = torch.nn.Linear(500, 10)
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        x = F.max_pool2d(x, 2, 2)
+        x = F.relu(self.conv2(x))
+        x = F.max_pool2d(x, 2, 2)
+        x = x.view(-1, 4 * 4 * 50)
+        x = F.relu(self.fc1(x))
+        x = self.fc2(x)
+        return F.log_softmax(x, dim=1)
+    def _get_conv_weight_sparsity(self, conv_layer):
+        num_zero_filters = (conv_layer.weight.data.sum((2,3)) == 0).sum()
+        num_filters = conv_layer.weight.data.size(0) * conv_layer.weight.data.size(1)
+        return num_zero_filters, num_filters, float(num_zero_filters)/num_filters
+    def print_conv_filter_sparsity(self):
+        conv1_data = self._get_conv_weight_sparsity(self.conv1)
+        conv2_data = self._get_conv_weight_sparsity(self.conv2)
+        print('conv1: num zero filters: {}, num filters: {}, sparsity: {:.4f}'.format(conv1_data[0], conv1_data[1], conv1_data[2]))
+        print('conv2: num zero filters: {}, num filters: {}, sparsity: {:.4f}'.format(conv2_data[0], conv2_data[1], conv2_data[2]))
+def train(model, device, train_loader, optimizer):
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = F.nll_loss(output, target)
+        if batch_idx % 100 == 0:
+            print('{:2.0f}%  Loss {}'.format(100 * batch_idx / len(train_loader), loss.item()))
+            model.print_conv_filter_sparsity()
+        loss.backward()
+        optimizer.step()
+def test(model, device, test_loader):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            test_loss += F.nll_loss(output, target, reduction='sum').item()
+            pred = output.argmax(dim=1, keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+    test_loss /= len(test_loader.dataset)
+    print('Loss: {}  Accuracy: {}%)\n'.format(
+        test_loss, 100 * correct / len(test_loader.dataset)))
+def main():
+    torch.manual_seed(0)
+    device = torch.device('cpu')
+    trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
+    train_loader = torch.utils.data.DataLoader(
+        datasets.MNIST('data', train=True, download=True, transform=trans),
+        batch_size=64, shuffle=True)
+    test_loader = torch.utils.data.DataLoader(
+        datasets.MNIST('data', train=False, transform=trans),
+        batch_size=1000, shuffle=True)
+    model = Mnist()
+    model.print_conv_filter_sparsity()
+    '''you can change this to LevelPruner to implement it
+    pruner = LevelPruner(configure_list)
+    '''
+    configure_list = [{
+        'sparsity': 0.5,
+        'op_types': ['Conv2d']
+    }]
+    pruner = FPGMPruner(model, configure_list)
+    pruner.compress()
+    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
+    for epoch in range(10):
+        pruner.update_epoch(epoch)
+        print('# Epoch {} #'.format(epoch))
+        train(model, device, train_loader, optimizer)
+        test(model, device, test_loader)
+if __name__ == '__main__':
+    main()
--- a/src/sdk/pynni/nni/compression/tensorflow/builtin_pruners.py
+++ b/src/sdk/pynni/nni/compression/tensorflow/builtin_pruners.py
 import logging
+import numpy as np
 import tensorflow as tf
 from .compressor import Pruner
-__all__ = ['LevelPruner', 'AGP_Pruner']
+__all__ = ['LevelPruner', 'AGP_Pruner', 'FPGMPruner']
 _logger = logging.getLogger(__name__)
@@ -98,3 +99,104 @@ class AGP_Pruner(Pruner):
        sess.run(tf.assign(self.now_epoch, int(epoch)))
        for k in self.if_init_list:
            self.if_init_list[k] = True
+class FPGMPruner(Pruner):
+    """
+    A filter pruner via geometric median.
+    "Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration",
+    https://arxiv.org/pdf/1811.00250.pdf
+    """
+    def __init__(self, model, config_list):
+        """
+        Parameters
+        ----------
+        model : pytorch model
+            the model user wants to compress
+        config_list: list
+            support key for each list item:
+                - sparsity: percentage of convolutional filters to be pruned.
+        """
+        super().__init__(model, config_list)
+        self.mask_dict = {}
+        self.assign_handler = []
+        self.epoch_pruned_layers = set()
+    def calc_mask(self, layer, config):
+        """
+        Supports Conv1D, Conv2D
+        filter dimensions for Conv1D:
+        LEN: filter length
+        IN: number of input channel
+        OUT: number of output channel
+        filter dimensions for Conv2D:
+        H: filter height
+        W: filter width
+        IN: number of input channel
+        OUT: number of output channel
+        Parameters
+        ----------
+        layer : LayerInfo
+            calculate mask for `layer`'s weight
+        config : dict
+            the configuration for generating the mask
+        """
+        weight = layer.weight
+        op_type = layer.type
+        op_name = layer.name
+        assert 0 <= config.get('sparsity') < 1
+        assert op_type in ['Conv1D', 'Conv2D']
+        assert op_type in config['op_types']
+        if layer.name in self.epoch_pruned_layers:
+            assert layer.name in self.mask_dict
+            return self.mask_dict.get(layer.name)
+        try:
+            weight = tf.stop_gradient(tf.transpose(weight, [2, 3, 0, 1]))
+            masks = np.ones(weight.shape)
+            num_kernels = weight.shape[0] * weight.shape[1]
+            num_prune = int(num_kernels * config.get('sparsity'))
+            if num_kernels < 2 or num_prune < 1:
+                return masks
+            min_gm_idx = self._get_min_gm_kernel_idx(weight, num_prune)
+            for idx in min_gm_idx:
+                masks[tuple(idx)] = 0.
+        finally:
+            masks = np.transpose(masks, [2, 3, 0, 1])
+            masks = tf.Variable(masks)
+            self.mask_dict.update({op_name: masks})
+            self.epoch_pruned_layers.add(layer.name)
+        return masks
+    def _get_min_gm_kernel_idx(self, weight, n):
+        assert len(weight.shape) >= 3
+        assert weight.shape[0] * weight.shape[1] > 2
+        dist_list, idx_list = [], []
+        for in_i in range(weight.shape[0]):
+            for out_i in range(weight.shape[1]):
+                dist_sum = self._get_distance_sum(weight, in_i, out_i)
+                dist_list.append(dist_sum)
+                idx_list.append([in_i, out_i])
+        dist_tensor = tf.convert_to_tensor(dist_list)
+        idx_tensor = tf.constant(idx_list)
+        _, idx = tf.math.top_k(dist_tensor, k=n)
+        return tf.gather(idx_tensor, idx)
+    def _get_distance_sum(self, weight, in_idx, out_idx):
+        w = tf.reshape(weight, (-1, weight.shape[-2], weight.shape[-1]))
+        anchor_w = tf.tile(tf.expand_dims(weight[in_idx, out_idx], 0), [w.shape[0], 1, 1])
+        x = w - anchor_w
+        x = tf.math.reduce_sum((x*x), (-2, -1))
+        x = tf.math.sqrt(x)
+        return tf.math.reduce_sum(x)
+    def update_epoch(self, epoch):
+        self.epoch_pruned_layers = set()
--- a/src/sdk/pynni/nni/compression/tensorflow/compressor.py
+++ b/src/sdk/pynni/nni/compression/tensorflow/compressor.py
 import logging
 import tensorflow as tf
 from . import default_layers
+tf.config.experimental_run_functions_eagerly(True)
 _logger = logging.getLogger(__name__)
 class LayerInfo:
-    def __init__(self, op, weight, weight_op):
+    def __init__(self, keras_layer):
-        self.op = op
+        self.keras_layer = keras_layer
-        self.name = op.name
+        self.name = keras_layer.name
-        self.type = op.type
+        self.type = default_layers.get_op_type(type(keras_layer))
-        self.weight = weight
+        self.weight_index = default_layers.get_weight_index(self.type)
-        self.weight_op = weight_op
+        if self.weight_index is not None:
+            self.weight = keras_layer.weights[self.weight_index]
+        self._call = None
 class Compressor:
    """
@@ -25,7 +27,7 @@ class Compressor:
        Parameters
        ----------
-        model : pytorch model
+        model : keras model
            the model user wants to compress
        config_list : list
            the configurations that users specify for compression
@@ -34,6 +36,21 @@ class Compressor:
        self.config_list = config_list
        self.modules_to_compress = []
+    def detect_modules_to_compress(self):
+        """
+        detect all modules should be compressed, and save the result in `self.modules_to_compress`.
+        The model will be instrumented and user should never edit it after calling this method.
+        """
+        if self.modules_to_compress is None:
+            self.modules_to_compress = []
+            for keras_layer in self.bound_model.layers:
+                layer = LayerInfo(keras_layer)
+                config = self.select_config(layer)
+                if config is not None:
+                    self.modules_to_compress.append((layer, config))
+        return self.modules_to_compress
    def compress(self):
        """
        Compress the model with algorithm implemented by subclass.
@@ -41,19 +58,9 @@ class Compressor:
        The model will be instrumented and user should never edit it after calling this method.
        `self.modules_to_compress` records all the to-be-compressed layers
        """
-        for op in self.bound_model.get_operations():
+        modules_to_compress = self.detect_modules_to_compress()
-            weight_index = _detect_weight_index(op)
+        for layer, config in modules_to_compress:
-            if weight_index is None:
+            self._instrument_layer(layer, config)
-                _logger.warning('Failed to detect weight for layer %s', op.name)
-                return
-            weight_op = op.inputs[weight_index].op
-            weight = weight_op.inputs[0]
-            layer = LayerInfo(op, weight, weight_op)
-            config = self.select_config(layer)
-            if config is not None:
-                self._instrument_layer(layer, config)
-                self.modules_to_compress.append((layer, config))
        return self.bound_model
    def get_modules_to_compress(self):
@@ -74,7 +81,7 @@ class Compressor:
        Parameters
        ----------
-        layer : LayerInfo
+        layer: LayerInfo
            one layer
        Returns
@@ -84,11 +91,12 @@ class Compressor:
            not be compressed
        """
        ret = None
+        if layer.type is None:
+            return None
        for config in self.config_list:
-            op_types = config.get('op_types')
+            config = config.copy()
-            if op_types == 'default':
+            config['op_types'] = self._expand_config_op_types(config)
-                op_types = default_layers.op_weight_index.keys()
+            if layer.type not in config['op_types']:
-            if op_types and layer.type not in op_types:
                continue
            if config.get('op_names') and layer.name not in config['op_names']:
                continue
@@ -97,7 +105,7 @@ class Compressor:
            return None
        return ret
-    def update_epoch(self, epoch, sess):
+    def update_epoch(self, epoch):
        """
        If user want to update model every epoch, user can override this method.
        This method should be called at the beginning of each epoch
@@ -108,7 +116,7 @@ class Compressor:
            the current epoch number
        """
-    def step(self, sess):
+    def step(self):
        """
        If user want to update mask every step, user can override this method
        """
@@ -127,6 +135,18 @@ class Compressor:
        """
        raise NotImplementedError()
+    def _expand_config_op_types(self, config):
+        if config is None:
+            return []
+        op_types = []
+        for op_type in config.get('op_types', []):
+            if op_type == 'default':
+                op_types.extend(default_layers.default_layers)
+            else:
+                op_types.append(op_type)
+        return op_types
 class Pruner(Compressor):
    """
@@ -160,10 +180,17 @@ class Pruner(Compressor):
        config : dict
            the configuration for generating the mask
        """
-        mask = self.calc_mask(layer, config)
+        layer._call = layer.keras_layer.call
-        new_weight = layer.weight * mask
-        tf.contrib.graph_editor.swap_outputs(layer.weight_op, new_weight.op)
+        def new_call(*inputs):
+            weights = [x.numpy() for x in layer.keras_layer.weights]
+            mask = self.calc_mask(layer, config)
+            weights[layer.weight_index] = weights[layer.weight_index] * mask
+            layer.keras_layer.set_weights(weights)
+            ret = layer._call(*inputs)
+            return ret
+        layer.keras_layer.call = new_call
 class Quantizer(Compressor):
    """
@@ -172,23 +199,3 @@ class Quantizer(Compressor):
    def quantize_weight(self, weight, config, op, op_type, op_name):
        raise NotImplementedError("Quantizer must overload quantize_weight()")
-    def _instrument_layer(self, layer, config):
-        weight_index = _detect_weight_index(layer)
-        if weight_index is None:
-            _logger.warning('Failed to detect weight for layer %s', layer.name)
-            return
-        weight_op = layer.op.inputs[weight_index].op
-        weight = weight_op.inputs[0]
-        new_weight = self.quantize_weight(weight, config, op=layer.op, op_type=layer.type, op_name=layer.name)
-        tf.contrib.graph_editor.swap_outputs(weight_op, new_weight.op)
-def _detect_weight_index(layer):
-    index = default_layers.op_weight_index.get(layer.type)
-    if index is not None:
-        return index
-    weight_indices = [i for i, op in enumerate(layer.inputs) if op.name.endswith('Variable/read')]
-    if len(weight_indices) == 1:
-        return weight_indices[0]
-    return None
--- a/src/sdk/pynni/nni/compression/tensorflow/default_layers.py
+++ b/src/sdk/pynni/nni/compression/tensorflow/default_layers.py
-op_weight_index = {
+from tensorflow import keras
-    'Conv2D': None,
-    'Conv3D': None,
-    'DepthwiseConv2dNative': None,
-    'Mul': None,
+supported_layers = {
-    'MatMul': None,
+    keras.layers.Conv1D: ('Conv1D', 0),
+    keras.layers.Conv2D: ('Conv2D', 0),
+    keras.layers.Conv2DTranspose: ('Conv2DTranspose', 0),
+    keras.layers.Conv3D: ('Conv3D', 0),
+    keras.layers.Conv3DTranspose: ('Conv3DTranspose', 0),
+    keras.layers.ConvLSTM2D: ('ConvLSTM2D', 0),
+    keras.layers.Dense: ('Dense', 0),
+    keras.layers.Embedding: ('Embedding', 0),
+    keras.layers.GRU: ('GRU', 0),
+    keras.layers.LSTM: ('LSTM', 0),
 }
+default_layers = [x[0] for x in supported_layers.values()]
+def get_op_type(layer_type):
+    if layer_type in supported_layers:
+        return supported_layers[layer_type][0]
+    else:
+        return None
+def get_weight_index(op_type):
+    for k in supported_layers:
+        if supported_layers[k][0] == op_type:
+            return supported_layers[k][1]
+    return None
--- a/src/sdk/pynni/nni/compression/torch/builtin_pruners.py
+++ b/src/sdk/pynni/nni/compression/torch/builtin_pruners.py
@@ -2,7 +2,7 @@ import logging
 import torch
 from .compressor import Pruner
-__all__ = ['LevelPruner', 'AGP_Pruner']
+__all__ = ['LevelPruner', 'AGP_Pruner', 'FPGMPruner']
 logger = logging.getLogger('torch pruner')
@@ -106,3 +106,125 @@ class AGP_Pruner(Pruner):
            self.now_epoch = epoch
            for k in self.if_init_list:
                self.if_init_list[k] = True
+class FPGMPruner(Pruner):
+    """
+    A filter pruner via geometric median.
+    "Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration",
+    https://arxiv.org/pdf/1811.00250.pdf
+    """
+    def __init__(self, model, config_list):
+        """
+        Parameters
+        ----------
+        model : pytorch model
+            the model user wants to compress
+        config_list: list
+            support key for each list item:
+                - sparsity: percentage of convolutional filters to be pruned.
+        """
+        super().__init__(model, config_list)
+        self.mask_dict = {}
+        self.epoch_pruned_layers = set()
+    def calc_mask(self, layer, config):
+        """
+        Supports Conv1d, Conv2d
+        filter dimensions for Conv1d:
+        OUT: number of output channel
+        IN: number of input channel
+        LEN: filter length
+        filter dimensions for Conv2d:
+        OUT: number of output channel
+        IN: number of input channel
+        H: filter height
+        W: filter width
+        Parameters
+        ----------
+        layer : LayerInfo
+            calculate mask for `layer`'s weight
+        config : dict
+            the configuration for generating the mask
+        """
+        weight = layer.module.weight.data
+        assert 0 <= config.get('sparsity') < 1
+        assert layer.type in ['Conv1d', 'Conv2d']
+        assert layer.type in config['op_types']
+        if layer.name in self.epoch_pruned_layers:
+            assert layer.name in self.mask_dict
+            return self.mask_dict.get(layer.name)
+        masks = torch.ones(weight.size()).type_as(weight)
+        try:
+            num_kernels = weight.size(0) * weight.size(1)
+            num_prune = int(num_kernels * config.get('sparsity'))
+            if num_kernels < 2 or num_prune < 1:
+                return masks
+            min_gm_idx = self._get_min_gm_kernel_idx(weight, num_prune)
+            for idx in min_gm_idx:
+                masks[idx] = 0.
+        finally:
+            self.mask_dict.update({layer.name: masks})
+            self.epoch_pruned_layers.add(layer.name)
+        return masks
+    def _get_min_gm_kernel_idx(self, weight, n):
+        assert len(weight.size()) in [3, 4]
+        dist_list = []
+        for out_i in range(weight.size(0)):
+            for in_i in range(weight.size(1)):
+                dist_sum = self._get_distance_sum(weight, out_i, in_i)
+                dist_list.append((dist_sum, (out_i, in_i)))
+        min_gm_kernels = sorted(dist_list, key=lambda x: x[0])[:n]
+        return [x[1] for x in min_gm_kernels]
+    def _get_distance_sum(self, weight, out_idx, in_idx):
+        """
+        Calculate the total distance between a specified filter (by out_idex and in_idx) and
+        all other filters.
+        Optimized verision of following naive implementation:
+        def _get_distance_sum(self, weight, in_idx, out_idx):
+            w = weight.view(-1, weight.size(-2), weight.size(-1))
+            dist_sum = 0.
+            for k in w:
+                dist_sum += torch.dist(k, weight[in_idx, out_idx], p=2)
+            return dist_sum
+        Parameters
+        ----------
+        weight: Tensor
+            convolutional filter weight
+        out_idx: int
+            output channel index of specified filter, this method calculates the total distance
+            between this specified filter and all other filters.
+        in_idx: int
+            input channel index of specified filter
+        Returns
+        -------
+        float32
+            The total distance
+        """
+        logger.debug('weight size: %s', weight.size())
+        if len(weight.size()) == 4: # Conv2d
+            w = weight.view(-1, weight.size(-2), weight.size(-1))
+            anchor_w = weight[out_idx, in_idx].unsqueeze(0).expand(w.size(0), w.size(1), w.size(2))
+        elif len(weight.size()) == 3: # Conv1d
+            w = weight.view(-1, weight.size(-1))
+            anchor_w = weight[out_idx, in_idx].unsqueeze(0).expand(w.size(0), w.size(1))
+        else:
+            raise RuntimeError('unsupported layer type')
+        x = w - anchor_w
+        x = (x*x).sum((-2, -1))
+        x = torch.sqrt(x)
+        return x.sum()
+    def update_epoch(self, epoch):
+        self.epoch_pruned_layers = set()
--- a/src/sdk/pynni/nni/compression/torch/compressor.py
+++ b/src/sdk/pynni/nni/compression/torch/compressor.py
@@ -32,7 +32,23 @@ class Compressor:
        """
        self.bound_model = model
        self.config_list = config_list
-        self.modules_to_compress = []
+        self.modules_to_compress = None
+    def detect_modules_to_compress(self):
+        """
+        detect all modules should be compressed, and save the result in `self.modules_to_compress`.
+        The model will be instrumented and user should never edit it after calling this method.
+        """
+        if self.modules_to_compress is None:
+            self.modules_to_compress = []
+            for name, module in self.bound_model.named_modules():
+                layer = LayerInfo(name, module)
+                config = self.select_config(layer)
+                if config is not None:
+                    self.modules_to_compress.append((layer, config))
+        return self.modules_to_compress
    def compress(self):
        """
@@ -41,12 +57,9 @@ class Compressor:
        The model will be instrumented and user should never edit it after calling this method.
        `self.modules_to_compress` records all the to-be-compressed layers
        """
-        for name, module in self.bound_model.named_modules():
+        modules_to_compress = self.detect_modules_to_compress()
-            layer = LayerInfo(name, module)
+        for layer, config in modules_to_compress:
-            config = self.select_config(layer)
+            self._instrument_layer(layer, config)
-            if config is not None:
-                self._instrument_layer(layer, config)
-                self.modules_to_compress.append((layer, config))
        return self.bound_model
    def get_modules_to_compress(self):
@@ -55,7 +68,7 @@ class Compressor:
        Returns
        -------
-        self.modules_to_compress : list
+        list
            a list of the layers, each of which is a tuple (`layer`, `config`),
            `layer` is `LayerInfo`, `config` is a `dict`
        """
@@ -72,12 +85,13 @@ class Compressor:
        Returns
        -------
-        ret : config or None
+        config or None
            the retrieved configuration for this layer, if None, this layer should
            not be compressed
        """
        ret = None
        for config in self.config_list:
+            config = config.copy()
            config['op_types'] = self._expand_config_op_types(config)
            if layer.type not in config['op_types']:
                continue
@@ -206,6 +220,8 @@ class Pruner(Compressor):
        """
        assert model_path is not None, 'model_path must be specified'
        for name, m in self.bound_model.named_modules():
+            if name == "":
+                continue
            mask = self.mask_dict.get(name)
            if mask is not None:
                mask_sum = mask.sum().item()
@@ -238,26 +254,87 @@ class Quantizer(Compressor):
    """
    def quantize_weight(self, weight, config, op, op_type, op_name):
-        """user should know where dequantize goes and implement it in quantize method
+        """
-        we now do not provide dequantize method
+        quantize should overload this method to quantize weight.
+        This method is effectively hooked to :meth:`forward` of the model.
+        Parameters
+        ----------
+        weight : Tensor
+            weight that needs to be quantized
+        config : dict
+            the configuration for weight quantization
        """
        raise NotImplementedError("Quantizer must overload quantize_weight()")
+    def quantize_output(self, output, config, op, op_type, op_name):
+        """
+        quantize should overload this method to quantize output.
+        This method is effectively hooked to :meth:`forward` of the model.
+        Parameters
+        ----------
+        output : Tensor
+            output that needs to be quantized
+        config : dict
+            the configuration for output quantization
+        """
+        raise NotImplementedError("Quantizer must overload quantize_output()")
+    def quantize_input(self, *inputs, config, op, op_type, op_name):
+        """
+        quantize should overload this method to quantize input.
+        This method is effectively hooked to :meth:`forward` of the model.
+        Parameters
+        ----------
+        inputs : Tensor
+            inputs that needs to be quantized
+        config : dict
+            the configuration for inputs quantization
+        """
+        raise NotImplementedError("Quantizer must overload quantize_input()")
    def _instrument_layer(self, layer, config):
+        """
+        Create a wrapper forward function to replace the original one.
+        Parameters
+        ----------
+        layer : LayerInfo
+            the layer to instrument the mask
+        config : dict
+            the configuration for quantization
+        """
        assert layer._forward is None, 'Each model can only be compressed once'
-        if not _check_weight(layer.module):
+        assert "quant_types" in config, 'must provide quant_types in config'
-            _logger.warning('Module %s does not have parameter "weight"', layer.name)
+        assert isinstance(config["quant_types"], list), 'quant_types must be list type'
-            return
+        if 'weight' in config["quant_types"]:
+            if not _check_weight(layer.module):
+                _logger.warning('Module %s does not have parameter "weight"', layer.name)
        layer._forward = layer.module.forward
        def new_forward(*inputs):
-            weight = layer.module.weight.data
+            if 'input' in config["quant_types"]:
-            new_weight = self.quantize_weight(weight, config, op=layer.module, op_type=layer.type, op_name=layer.name)
+                inputs = self.quantize_input(inputs, config=config, op=layer.module, op_type=layer.type, op_name=layer.name)
-            layer.module.weight.data = new_weight
-            return layer._forward(*inputs)
+            if 'weight' in config["quant_types"] and _check_weight(layer.module):
+                weight = layer.module.weight.data
+                new_weight = self.quantize_weight(weight, config, op=layer.module, op_type=layer.type, op_name=layer.name)
+                layer.module.weight.data = new_weight
+                result = layer._forward(*inputs)
+                layer.module.weight.data = weight
+            else:
+                result = layer._forward(*inputs)
-        layer.module.forward = new_forward
+            if 'output' in config["quant_types"]:
+                result = self.quantize_output(result, config, op=layer.module, op_type=layer.type, op_name=layer.name)
+            return result
+        layer.module.forward = new_forward
 def _check_weight(module):
    try:

--- a/src/sdk/pynni/nni/curvefitting_assessor/curvefitting_assessor.py
+++ b/src/sdk/pynni/nni/curvefitting_assessor/curvefitting_assessor.py
@@ -29,13 +29,13 @@ class CurvefittingAssessor(Assessor):
    Parameters
    ----------
-    epoch_num: int
+    epoch_num : int
        The total number of epoch
-    optimize_mode: str
+    optimize_mode : str
        optimize mode, 'maximize' or 'minimize'
-    start_step: int
+    start_step : int
        only after receiving start_step number of reported intermediate results
-    threshold: float
+    threshold : float
        The threshold that we decide to early stop the worse performance curve.
    """
    def __init__(self, epoch_num=20, optimize_mode='maximize', start_step=6, threshold=0.95, gap=1):
@@ -70,9 +70,9 @@ class CurvefittingAssessor(Assessor):
        Parameters
        ----------
-        trial_job_id: int
+        trial_job_id : int
            trial job id
-        success: bool
+        success : bool
            True if succssfully finish the experiment, False otherwise
        """
        if success:
@@ -90,9 +90,9 @@ class CurvefittingAssessor(Assessor):
        Parameters
        ----------
-        trial_job_id: int
+        trial_job_id : int
            trial job id
-        trial_history: list
+        trial_history : list
            The history performance matrix of each trial
        Returns
@@ -105,7 +105,6 @@ class CurvefittingAssessor(Assessor):
        Exception
            unrecognize exception in curvefitting_assessor
        """
-        trial_job_id = trial_job_id
        self.trial_history = trial_history
        if not self.set_best_performance:
            return AssessResult.Good

--- a/src/sdk/pynni/nni/curvefitting_assessor/curvefunctions.py
+++ b/src/sdk/pynni/nni/curvefitting_assessor/curvefunctions.py
@@ -14,7 +14,9 @@
 # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
 # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+"""
+A family of functions used by CurvefittingAssessor
+"""
 import numpy as np
 all_models = {}
@@ -29,10 +31,10 @@ def vap(x, a, b, c):
    Parameters
    ----------
-    x: int
+    x : int
-    a: float
+    a : float
-    b: float
+    b : float
-    c: float
+    c : float
    Returns
    -------
@@ -50,10 +52,10 @@ def pow3(x, c, a, alpha):
    Parameters
    ----------
-    x: int
+    x : int
-    c: float
+    c : float
-    a: float
+    a : float
-    alpha: float
+    alpha : float
    Returns
    -------
@@ -71,9 +73,9 @@ def linear(x, a, b):
    Parameters
    ----------
-    x: int
+    x : int
-    a: float
+    a : float
-    b: float
+    b : float
    Returns
    -------
@@ -91,9 +93,9 @@ def logx_linear(x, a, b):
    Parameters
    ----------
-    x: int
+    x : int
-    a: float
+    a : float
-    b: float
+    b : float
    Returns
    -------
@@ -112,10 +114,10 @@ def dr_hill_zero_background(x, theta, eta, kappa):
    Parameters
    ----------
-    x: int
+    x : int
-    theta: float
+    theta : float
-    eta: float
+    eta : float
-    kappa: float
+    kappa : float
    Returns
    -------
@@ -133,10 +135,10 @@ def log_power(x, a, b, c):
    Parameters
    ----------
-    x: int
+    x : int
-    a: float
+    a : float
-    b: float
+    b : float
-    c: float
+    c : float
    Returns
    -------
@@ -154,11 +156,11 @@ def pow4(x, alpha, a, b, c):
    Parameters
    ----------
-    x: int
+    x : int
-    alpha: float
+    alpha : float
-    a: float
+    a : float
-    b: float
+    b : float
-    c: float
+    c : float
    Returns
    -------
@@ -177,11 +179,11 @@ def mmf(x, alpha, beta, kappa, delta):
    Parameters
    ----------
-    x: int
+    x : int
-    alpha: float
+    alpha : float
-    beta: float
+    beta : float
-    kappa: float
+    kappa : float
-    delta: float
+    delta : float
    Returns
    -------
@@ -199,11 +201,11 @@ def exp4(x, c, a, b, alpha):
    Parameters
    ----------
-    x: int
+    x : int
-    c: float
+    c : float
-    a: float
+    a : float
-    b: float
+    b : float
-    alpha: float
+    alpha : float
    Returns
    -------
@@ -221,9 +223,9 @@ def ilog2(x, c, a):
    Parameters
    ----------
-    x: int
+    x : int
-    c: float
+    c : float
-    a: float
+    a : float
    Returns
    -------
@@ -242,11 +244,11 @@ def weibull(x, alpha, beta, kappa, delta):
    Parameters
    ----------
-    x: int
+    x : int
-    alpha: float
+    alpha : float
-    beta: float
+    beta : float
-    kappa: float
+    kappa : float
-    delta: float
+    delta : float
    Returns
    -------
@@ -264,11 +266,11 @@ def janoschek(x, a, beta, k, delta):
    Parameters
    ----------
-    x: int
+    x : int
-    a: float
+    a : float
-    beta: float
+    beta : float
-    k: float
+    k : float
-    delta: float
+    delta : float
    Returns
    -------

--- a/src/sdk/pynni/nni/curvefitting_assessor/model_factory.py
+++ b/src/sdk/pynni/nni/curvefitting_assessor/model_factory.py
@@ -40,7 +40,7 @@ class CurveModel:
    Parameters
    ----------
-    target_pos: int
+    target_pos : int
        The point we need to predict
    """
    def __init__(self, target_pos):
@@ -120,14 +120,14 @@ class CurveModel:
        Parameters
        ----------
-        model: string
+        model : string
            name of the curve function model
-        pos: int
+        pos : int
            the epoch number of the position you want to predict
        Returns
        -------
-        int:
+        int
            The expected matrix at pos
        """
        if model_para_num[model] == 2:
@@ -143,9 +143,9 @@ class CurveModel:
        Parameters
        ----------
-        pos: int
+        pos : int
            the epoch number of the position you want to predict
-        sample: list
+        sample : list
            sample is a (1 * NUM_OF_FUNCTIONS) matrix, representing{w1, w2, ... wk}
        Returns
@@ -165,7 +165,7 @@ class CurveModel:
        Parameters
        ----------
-        samples: list
+        samples : list
            a collection of sample, it's a (NUM_OF_INSTANCE * NUM_OF_FUNCTIONS) matrix,
            representing{{w11, w12, ..., w1k}, {w21, w22, ... w2k}, ...{wk1, wk2,..., wkk}}
@@ -187,7 +187,7 @@ class CurveModel:
        Parameters
        ----------
-        sample: list
+        sample : list
            sample is a (1 * NUM_OF_FUNCTIONS) matrix, representing{w1, w2, ... wk}
        Returns
@@ -206,9 +206,9 @@ class CurveModel:
        Parameters
        ----------
-        pos: int
+        pos : int
            the epoch number of the position you want to predict
-        sample: list
+        sample : list
            sample is a (1 * NUM_OF_FUNCTIONS) matrix, representing{w1, w2, ... wk}
        Returns
@@ -225,7 +225,7 @@ class CurveModel:
        Parameters
        ----------
-        sample: list
+        sample : list
            sample is a (1 * NUM_OF_FUNCTIONS) matrix, representing{w1, w2, ... wk}
        Returns
@@ -244,7 +244,7 @@ class CurveModel:
        Parameters
        ----------
-        samples: list
+        samples : list
            a collection of sample, it's a (NUM_OF_INSTANCE * NUM_OF_FUNCTIONS) matrix,
            representing{{w11, w12, ..., w1k}, {w21, w22, ... w2k}, ...{wk1, wk2,..., wkk}}
@@ -267,7 +267,7 @@ class CurveModel:
        Parameters
        ----------
-        samples: list
+        samples : list
            a collection of sample, it's a (NUM_OF_INSTANCE * NUM_OF_FUNCTIONS) matrix,
            representing{{w11, w12, ..., w1k}, {w21, w22, ... w2k}, ...{wk1, wk2,..., wkk}}
@@ -322,7 +322,7 @@ class CurveModel:
        Parameters
        ----------
-        trial_history: list
+        trial_history : list
            The history performance matrix of each trial.
        Returns

--- a/src/sdk/pynni/nni/gp_tuner/gp_tuner.py
+++ b/src/sdk/pynni/nni/gp_tuner/gp_tuner.py
@@ -17,9 +17,11 @@
 # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
 # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-'''
+"""
-gp_tuner.py
+GPTuner is a Bayesian Optimization method where Gaussian Process is used for modeling loss functions.
-'''
+See :class:`GPTuner` for details.
+"""
 import warnings
 import logging
@@ -38,18 +40,40 @@ logger = logging.getLogger("GP_Tuner_AutoML")
 class GPTuner(Tuner):
-    '''
+    """
-    GPTuner
+    GPTuner is a Bayesian Optimization method where Gaussian Process is used for modeling loss functions.
-    '''
+    Parameters
+    ----------
+    optimize_mode : str
+        optimize mode, 'maximize' or 'minimize', by default 'maximize'
+    utility : str
+        utility function (also called 'acquisition funcition') to use, which can be 'ei', 'ucb' or 'poi'. By default 'ei'.
+    kappa : float
+        value used by utility function 'ucb'. The bigger kappa is, the more the tuner will be exploratory. By default 5.
+    xi : float
+        used by utility function 'ei' and 'poi'. The bigger xi is, the more the tuner will be exploratory. By default 0.
+    nu : float
+        used to specify Matern kernel. The smaller nu, the less smooth the approximated function is. By default 2.5.
+    alpha : float
+        Used to specify Gaussian Process Regressor. Larger values correspond to increased noise level in the observations.
+        By default 1e-6.
+    cold_start_num : int
+        Number of random exploration to perform before Gaussian Process. By default 10.
+    selection_num_warm_up : int
+        Number of random points to evaluate for getting the point which maximizes the acquisition function. By default 100000
+    selection_num_starting_points : int
+        Number of times to run L-BFGS-B from a random starting point after the warmup. By default 250.
+    """
    def __init__(self, optimize_mode="maximize", utility='ei', kappa=5, xi=0, nu=2.5, alpha=1e-6, cold_start_num=10,
                 selection_num_warm_up=100000, selection_num_starting_points=250):
-        self.optimize_mode = OptimizeMode(optimize_mode)
+        self._optimize_mode = OptimizeMode(optimize_mode)
        # utility function related
-        self.utility = utility
+        self._utility = utility
-        self.kappa = kappa
+        self._kappa = kappa
-        self.xi = xi
+        self._xi = xi
        # target space
        self._space = None
@@ -72,30 +96,23 @@ class GPTuner(Tuner):
        self._selection_num_starting_points = selection_num_starting_points
        # num of imported data
-        self.supplement_data_num = 0
+        self._supplement_data_num = 0
    def update_search_space(self, search_space):
-        """Update the self.bounds and self.types by the search_space.json
+        """
+        Update the self.bounds and self.types by the search_space.json file.
-        Parameters
+        Override of the abstract method in :class:`~nni.tuner.Tuner`.
-        ----------
-        search_space : dict
        """
        self._space = TargetSpace(search_space, self._random_state)
    def generate_parameters(self, parameter_id, **kwargs):
-        """Generate next parameter for trial
+        """
-        If the number of trial result is lower than cold start number,
+        Method which provides one set of hyper-parameters.
-        gp will first randomly generate some parameters.
+        If the number of trial result is lower than cold_start_number, GPTuner will first randomly generate some parameters.
-        Otherwise, choose the parameters by the Gussian Process Model
+        Otherwise, choose the parameters by the Gussian Process Model.
-        Parameters
+        Override of the abstract method in :class:`~nni.tuner.Tuner`.
-        ----------
-        parameter_id : int
-        Returns
-        -------
-        result : dict
        """
        if self._space.len() < self._cold_start_num:
            results = self._space.random_sample()
@@ -107,7 +124,7 @@ class GPTuner(Tuner):
                self._gp.fit(self._space.params, self._space.target)
            util = UtilityFunction(
-                kind=self.utility, kappa=self.kappa, xi=self.xi)
+                kind=self._utility, kappa=self._kappa, xi=self._xi)
            results = acq_max(
                f_acq=util.utility,
@@ -124,17 +141,13 @@ class GPTuner(Tuner):
        return results
    def receive_trial_result(self, parameter_id, parameters, value, **kwargs):
-        """Tuner receive result from trial.
+        """
+        Method invoked when a trial reports its final result.
-        Parameters
-        ----------
+        Override of the abstract method in :class:`~nni.tuner.Tuner`.
-        parameter_id : int
-        parameters : dict
-        value : dict/float
-            if value is dict, it should have "default" key.
        """
        value = extract_scalar_reward(value)
-        if self.optimize_mode == OptimizeMode.Minimize:
+        if self._optimize_mode == OptimizeMode.Minimize:
            value = -value
        logger.info("Received trial result.")
@@ -143,26 +156,27 @@ class GPTuner(Tuner):
        self._space.register(parameters, value)
    def import_data(self, data):
-        """Import additional data for tuning
+        """
-        Parameters
+        Import additional data for tuning.
-        ----------
-        data:
+        Override of the abstract method in :class:`~nni.tuner.Tuner`.
-            a list of dictionarys, each of which has at least two keys, 'parameter' and 'value'
        """
        _completed_num = 0
        for trial_info in data:
-            logger.info("Importing data, current processing progress %s / %s", _completed_num, len(data))
+            logger.info(
+                "Importing data, current processing progress %s / %s", _completed_num, len(data))
            _completed_num += 1
            assert "parameter" in trial_info
            _params = trial_info["parameter"]
            assert "value" in trial_info
            _value = trial_info['value']
            if not _value:
-                logger.info("Useless trial data, value is %s, skip this trial data.", _value)
+                logger.info(
+                    "Useless trial data, value is %s, skip this trial data.", _value)
                continue
-            self.supplement_data_num += 1
+            self._supplement_data_num += 1
            _parameter_id = '_'.join(
-                ["ImportData", str(self.supplement_data_num)])
+                ["ImportData", str(self._supplement_data_num)])
            self.receive_trial_result(
                parameter_id=_parameter_id, parameters=_params, value=_value)
        logger.info("Successfully import data to GP tuner.")
--- a/src/sdk/pynni/nni/gp_tuner/target_space.py
+++ b/src/sdk/pynni/nni/gp_tuner/target_space.py
@@ -17,39 +17,51 @@
 # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
 # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-'''
+"""
-target_space.py
+Tool class to hold the param-space coordinates (X) and target values (Y).
-'''
+"""
 import numpy as np
 import nni.parameter_expressions as parameter_expressions
 def _hashable(params):
-    """ ensure that an point is hashable by a python dict """
+    """
+    Transform list params to tuple format. Ensure that an point is hashable by a python dict.
+    Parameters
+    ----------
+    params : numpy array
+        array format of parameters
+    Returns
+    -------
+    tuple
+        tuple format of parameters
+    """
    return tuple(map(float, params))
 class TargetSpace():
    """
    Holds the param-space coordinates (X) and target values (Y)
+    Parameters
+    ----------
+    pbounds : dict
+        Dictionary with parameters names and legal values.
+    random_state : int, RandomState, or None
+        optionally specify a seed for a random number generator, by default None.
    """
    def __init__(self, pbounds, random_state=None):
-        """
+        self._random_state = random_state
-        Parameters
-        ----------
-        pbounds : dict
-            Dictionary with parameters names as keys and a tuple with minimum
-            and maximum values.
-        random_state : int, RandomState, or None
-            optionally specify a seed for a random number generator
-        """
-        self.random_state = random_state
        # Get the name of the parameters
        self._keys = sorted(pbounds)
        # Create an array with parameters bounds
        self._bounds = np.array(
            [item[1] for item in sorted(pbounds.items(), key=lambda x: x[0])]
@@ -71,54 +83,100 @@ class TargetSpace():
        self._cache = {}
    def __contains__(self, params):
-        '''
+        """
        check if a parameter is already registered
-        '''
+        Parameters
+        ----------
+        params : numpy array
+        Returns
+        -------
+        bool
+            True if the parameter is already registered, else false
+        """
        return _hashable(params) in self._cache
    def len(self):
-        '''
+        """
        length of registered params and targets
-        '''
+        Returns
+        -------
+        int
+        """
        assert len(self._params) == len(self._target)
        return len(self._target)
    @property
    def params(self):
-        '''
+        """
-        params: numpy array
+        registered parameters
-        '''
+        Returns
+        -------
+        numpy array
+        """
        return self._params
    @property
    def target(self):
-        '''
+        """
-        target: numpy array
+        registered target values
-        '''
+        Returns
+        -------
+        numpy array
+        """
        return self._target
    @property
    def dim(self):
-        '''
+        """
-        dim: int
+        dimension of parameters
-            length of keys
-        '''
+        Returns
+        -------
+        int
+        """
        return len(self._keys)
    @property
    def keys(self):
-        '''
+        """
-        keys: numpy array
+        keys of parameters
-        '''
+        Returns
+        -------
+        numpy array
+        """
        return self._keys
    @property
    def bounds(self):
-        '''bounds'''
+        """
+        bounds of parameters
+        Returns
+        -------
+        numpy array
+        """
        return self._bounds
    def params_to_array(self, params):
-        ''' dict to array '''
+        """
+        dict to array
+        Parameters
+        ----------
+        params : dict
+            dict format of parameters
+        Returns
+        -------
+        numpy array
+            array format of parameters
+        """
        try:
            assert set(params) == set(self.keys)
        except AssertionError:
@@ -129,11 +187,20 @@ class TargetSpace():
        return np.asarray([params[key] for key in self.keys])
    def array_to_params(self, x):
-        '''
+        """
        array to dict
        maintain int type if the paramters is defined as int in search_space.json
-        '''
+        Parameters
+        ----------
+        x : numpy array
+            array format of parameters
+        Returns
+        -------
+        dict
+            dict format of parameters
+        """
        try:
            assert len(x) == len(self.keys)
        except AssertionError:
@@ -159,15 +226,15 @@ class TargetSpace():
        Parameters
        ----------
-        x : dict
+        params : dict
+            parameters
-        y : float
+        target : float
            target function value
        """
        x = self.params_to_array(params)
        if x in self:
-            #raise KeyError('Data point {} is not unique'.format(x))
            print('Data point {} is not unique'.format(x))
        # Insert data into unique dictionary
@@ -180,32 +247,43 @@ class TargetSpace():
        """
        Creates a random point within the bounds of the space.
+        Returns
+        -------
+        numpy array
+            one groupe of parameter
        """
        params = np.empty(self.dim)
        for col, _bound in enumerate(self._bounds):
            if _bound['_type'] == 'choice':
                params[col] = parameter_expressions.choice(
-                    _bound['_value'], self.random_state)
+                    _bound['_value'], self._random_state)
            elif _bound['_type'] == 'randint':
-                params[col] = self.random_state.randint(
+                params[col] = self._random_state.randint(
                    _bound['_value'][0], _bound['_value'][1], size=1)
            elif _bound['_type'] == 'uniform':
                params[col] = parameter_expressions.uniform(
-                    _bound['_value'][0], _bound['_value'][1], self.random_state)
+                    _bound['_value'][0], _bound['_value'][1], self._random_state)
            elif _bound['_type'] == 'quniform':
                params[col] = parameter_expressions.quniform(
-                    _bound['_value'][0], _bound['_value'][1], _bound['_value'][2], self.random_state)
+                    _bound['_value'][0], _bound['_value'][1], _bound['_value'][2], self._random_state)
            elif _bound['_type'] == 'loguniform':
                params[col] = parameter_expressions.loguniform(
-                    _bound['_value'][0], _bound['_value'][1], self.random_state)
+                    _bound['_value'][0], _bound['_value'][1], self._random_state)
            elif _bound['_type'] == 'qloguniform':
                params[col] = parameter_expressions.qloguniform(
-                    _bound['_value'][0], _bound['_value'][1], _bound['_value'][2], self.random_state)
+                    _bound['_value'][0], _bound['_value'][1], _bound['_value'][2], self._random_state)
        return params
    def max(self):
-        """Get maximum target value found and corresponding parametes."""
+        """
+        Get maximum target value found and its corresponding parameters.
+        Returns
+        -------
+        dict
+            target value and parameters, empty dict if nothing registered
+        """
        try:
            res = {
                'target': self.target.max(),
@@ -218,7 +296,14 @@ class TargetSpace():
        return res
    def res(self):
-        """Get all target values found and corresponding parametes."""
+        """
+        Get all target values found and corresponding parameters.
+        Returns
+        -------
+        list
+            a list of target values and their corresponding parameters
+        """
        params = [dict(zip(self.keys, p)) for p in self.params]
        return [

--- a/src/sdk/pynni/nni/gp_tuner/util.py
+++ b/src/sdk/pynni/nni/gp_tuner/util.py
@@ -17,9 +17,9 @@
 # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
 # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-'''
+"""
-gp_tuner.py
+utility functions and classes for GPTuner
-'''
+"""
 import warnings
 import numpy as np
@@ -28,9 +28,21 @@ from scipy.optimize import minimize
 def _match_val_type(vals, bounds):
-    '''
+    """
-    Update values in the array, to match their corresponding type
+    Update values in the array, to match their corresponding type, make sure the value is legal.
-    '''
+    Parameters
+    ----------
+    vals : numpy array
+        values of parameters
+    bounds : numpy array
+        list of dictionary which stores parameters names and legal values.
+    Returns
+    -------
+    vals_new : list
+        The closest legal value to the original value
+    """
    vals_new = []
    for i, bound in enumerate(bounds):
@@ -52,32 +64,33 @@ def acq_max(f_acq, gp, y_max, bounds, space, num_warmup, num_starting_points):
    A function to find the maximum of the acquisition function
    It uses a combination of random sampling (cheap) and the 'L-BFGS-B'
-    optimization method. First by sampling `n_warmup` (1e5) points at random,
+    optimization method. First by sampling ``num_warmup`` points at random,
-    and then running L-BFGS-B from `n_iter` (250) random starting points.
+    and then running L-BFGS-B from ``num_starting_points`` random starting points.
    Parameters
    ----------
-    :param f_acq:
+    f_acq : UtilityFunction.utility
        The acquisition function object that return its point-wise value.
-    :param gp:
+    gp : GaussianProcessRegressor
        A gaussian process fitted to the relevant data.
-    :param y_max:
+    y_max : float
        The current maximum known value of the target function.
-    :param bounds:
+    bounds : numpy array
        The variables bounds to limit the search of the acq max.
-    :param num_warmup:
+    num_warmup : int
        number of times to randomly sample the aquisition function
-    :param num_starting_points:
+    num_starting_points : int
        number of times to run scipy.minimize
    Returns
    -------
-    :return: x_max, The arg max of the acquisition function.
+    numpy array
+        The parameter which achieves max of the acquisition function.
    """
    # Warm up with random points
@@ -117,36 +130,70 @@ def acq_max(f_acq, gp, y_max, bounds, space, num_warmup, num_starting_points):
 class UtilityFunction():
    """
-    An object to compute the acquisition functions.
+    A class to compute different acquisition function values.
+    Parameters
+    ----------
+    kind : string
+        specification of utility function to use
+    kappa : float
+        parameter usedd for 'ucb' acquisition function
+    xi : float
+        parameter usedd for 'ei' and 'poi' acquisition function
    """
    def __init__(self, kind, kappa, xi):
-        """
+        self._kappa = kappa
-        If UCB is to be used, a constant kappa is needed.
+        self._xi = xi
-        """
-        self.kappa = kappa
-        self.xi = xi
        if kind not in ['ucb', 'ei', 'poi']:
            err = "The utility function " \
                "{} has not been implemented, " \
                "please choose one of ucb, ei, or poi.".format(kind)
            raise NotImplementedError(err)
-        self.kind = kind
+        self._kind = kind
    def utility(self, x, gp, y_max):
-        '''return utility function'''
+        """
-        if self.kind == 'ucb':
+        return utility function
-            return self._ucb(x, gp, self.kappa)
-        if self.kind == 'ei':
+        Parameters
-            return self._ei(x, gp, y_max, self.xi)
+        ----------
-        if self.kind == 'poi':
+        x : numpy array
-            return self._poi(x, gp, y_max, self.xi)
+            parameters
+        gp : GaussianProcessRegressor
+        y_max : float
+            maximum target value observed so far
+        Returns
+        -------
+        function
+            return corresponding function, return None if parameter is illegal
+        """
+        if self._kind == 'ucb':
+            return self._ucb(x, gp, self._kappa)
+        if self._kind == 'ei':
+            return self._ei(x, gp, y_max, self._xi)
+        if self._kind == 'poi':
+            return self._poi(x, gp, y_max, self._xi)
        return None
    @staticmethod
    def _ucb(x, gp, kappa):
+        """
+        Upper Confidence Bound (UCB) utility function
+        Parameters
+        ----------
+        x : numpy array
+            parameters
+        gp : GaussianProcessRegressor
+        kappa : float
+        Returns
+        -------
+        float
+        """
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            mean, std = gp.predict(x, return_std=True)
@@ -155,6 +202,22 @@ class UtilityFunction():
    @staticmethod
    def _ei(x, gp, y_max, xi):
+        """
+        Expected Improvement (EI) utility function
+        Parameters
+        ----------
+        x : numpy array
+            parameters
+        gp : GaussianProcessRegressor
+        y_max : float
+            maximum target value observed so far
+        xi : float
+        Returns
+        -------
+        float
+        """
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            mean, std = gp.predict(x, return_std=True)
@@ -164,6 +227,22 @@ class UtilityFunction():
    @staticmethod
    def _poi(x, gp, y_max, xi):
+        """
+        Possibility Of Improvement (POI) utility function
+        Parameters
+        ----------
+        x : numpy array
+            parameters
+        gp : GaussianProcessRegressor
+        y_max : float
+            maximum target value observed so far
+        xi : float
+        Returns
+        -------
+        float
+        """
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            mean, std = gp.predict(x, return_std=True)

--- a/src/sdk/pynni/nni/gridsearch_tuner/__init__.py
+++ b/src/sdk/pynni/nni/gridsearch_tuner/__init__.py
+from .gridsearch_tuner import GridSearchTuner
\ No newline at end of file
--- a/src/sdk/pynni/nni/gridsearch_tuner/gridsearch_tuner.py
+++ b/src/sdk/pynni/nni/gridsearch_tuner/gridsearch_tuner.py
@@ -17,10 +17,10 @@
 # NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
 # DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-'''
+"""
 gridsearch_tuner.py including:
    class GridSearchTuner
-'''
+"""
 import copy
 import logging
@@ -37,29 +37,40 @@ VALUE = '_value'
 logger = logging.getLogger('grid_search_AutoML')
 class GridSearchTuner(Tuner):
-    '''
+    """
    GridSearchTuner will search all the possible configures that the user define in the searchSpace.
-    The only acceptable types of search space are 'choice', 'quniform', 'randint'
+    The only acceptable types of search space are ``choice``, ``quniform``, ``randint``
-    Type 'choice' will select one of the options. Note that it can also be nested.
+    Type ``choice`` will select one of the options. Note that it can also be nested.
-    Type 'quniform' will receive three values [low, high, q], where [low, high] specifies a range and 'q' specifies the interval
+    Type ``quniform`` will receive three values [``low``, ``high``, ``q``],
-    It will be sampled in a way that the first sampled value is 'low',
+    where [``low``, ``high``] specifies a range and ``q`` specifies the interval.
+    It will be sampled in a way that the first sampled value is ``low``,
    and each of the following values is 'interval' larger than the value in front of it.
-    Type 'randint' gives all possible intergers in range[low, high). Note that 'high' is not included.
+    Type ``randint`` gives all possible intergers in range[``low``, ``high``). Note that ``high`` is not included.
-    '''
+    """
    def __init__(self):
        self.count = -1
        self.expanded_search_space = []
        self.supplement_data = dict()
-    def json2parameter(self, ss_spec):
+    def _json2parameter(self, ss_spec):
-        '''
+        """
-        generate all possible configs for hyperparameters from hyperparameter space.
+        Generate all possible configs for hyperparameters from hyperparameter space.
-        ss_spec: hyperparameter space
-        '''
+        Parameters
+        ----------
+        ss_spec : dict or list
+            Hyperparameter space or the ``_value`` of a hyperparameter
+        Returns
+        -------
+        list or dict
+            All the candidate choices of hyperparameters. for a hyperparameter, chosen_params
+            is a list. for multiple hyperparameters (e.g., search space), chosen_params is a dict.
+        """
        if isinstance(ss_spec, dict):
            if '_type' in ss_spec.keys():
                _type = ss_spec['_type']
@@ -67,7 +78,7 @@ class GridSearchTuner(Tuner):
                chosen_params = list()
                if _type == 'choice':
                    for value in _value:
-                        choice = self.json2parameter(value)
+                        choice = self._json2parameter(value)
                        if isinstance(choice, list):
                            chosen_params.extend(choice)
                        else:
@@ -81,12 +92,12 @@ class GridSearchTuner(Tuner):
            else:
                chosen_params = dict()
                for key in ss_spec.keys():
-                    chosen_params[key] = self.json2parameter(ss_spec[key])
+                    chosen_params[key] = self._json2parameter(ss_spec[key])
-                return self.expand_parameters(chosen_params)
+                return self._expand_parameters(chosen_params)
        elif isinstance(ss_spec, list):
            chosen_params = list()
            for subspec in ss_spec[1:]:
-                choice = self.json2parameter(subspec)
+                choice = self._json2parameter(subspec)
                if isinstance(choice, list):
                    chosen_params.extend(choice)
                else:
@@ -97,27 +108,39 @@ class GridSearchTuner(Tuner):
        return chosen_params
    def _parse_quniform(self, param_value):
-        '''parse type of quniform parameter and return a list'''
+        """
+        Parse type of quniform parameter and return a list
+        """
        low, high, q = param_value[0], param_value[1], param_value[2]
        return np.clip(np.arange(np.round(low/q), np.round(high/q)+1) * q, low, high)
    def _parse_randint(self, param_value):
-        '''parse type of randint parameter and return a list'''
+        """
+        Parse type of randint parameter and return a list
+        """
        return np.arange(param_value[0], param_value[1]).tolist()
-    def expand_parameters(self, para):
+    def _expand_parameters(self, para):
-        '''
+        """
        Enumerate all possible combinations of all parameters
-        para: {key1: [v11, v12, ...], key2: [v21, v22, ...], ...}
-        return: {{key1: v11, key2: v21, ...}, {key1: v11, key2: v22, ...}, ...}
+        Parameters
-        '''
+        ----------
+        para : dict
+            {key1: [v11, v12, ...], key2: [v21, v22, ...], ...}
+        Returns
+        -------
+        dict
+            {{key1: v11, key2: v21, ...}, {key1: v11, key2: v22, ...}, ...}
+        """
        if len(para) == 1:
            for key, values in para.items():
                return list(map(lambda v: {key: v}, values))
        key = list(para)[0]
        values = para.pop(key)
-        rest_para = self.expand_parameters(para)
+        rest_para = self._expand_parameters(para)
        ret_para = list()
        for val in values:
            for config in rest_para:
@@ -126,12 +149,37 @@ class GridSearchTuner(Tuner):
        return ret_para
    def update_search_space(self, search_space):
-        '''
+        """
-        Check if the search space is valid and expand it: support only 'choice', 'quniform', randint'
+        Check if the search space is valid and expand it: support only ``choice``, ``quniform``, ``randint``.
-        '''
-        self.expanded_search_space = self.json2parameter(search_space)
+        Parameters
+        ----------
+        search_space : dict
+            The format could be referred to search space spec (https://nni.readthedocs.io/en/latest/Tutorial/SearchSpaceSpec.html).
+        """
+        self.expanded_search_space = self._json2parameter(search_space)
    def generate_parameters(self, parameter_id, **kwargs):
+        """
+        Generate parameters for one trial.
+        Parameters
+        ----------
+        parameter_id : int
+            The id for the generated hyperparameter
+        **kwargs
+            Not used
+        Returns
+        -------
+        dict
+            One configuration from the expanded search space.
+        Raises
+        ------
+        NoMoreTrialError
+            If all the configurations has been sent, raise :class:`~nni.NoMoreTrialError`.
+        """
        self.count += 1
        while self.count <= len(self.expanded_search_space) - 1:
            _params_tuple = convert_dict2tuple(self.expanded_search_space[self.count])
@@ -142,15 +190,20 @@ class GridSearchTuner(Tuner):
        raise nni.NoMoreTrialError('no more parameters now.')
    def receive_trial_result(self, parameter_id, parameters, value, **kwargs):
+        """
+        Receive a trial's final performance result reported through :func:`~nni.report_final_result` by the trial.
+        GridSearchTuner does not need trial's results.
+        """
        pass
    def import_data(self, data):
-        """Import additional data for tuning
+        """
+        Import additional data for tuning
        Parameters
        ----------
-        data:
+        list
-            a list of dictionarys, each of which has at least two keys, 'parameter' and 'value'
+            A list of dictionarys, each of which has at least two keys, ``parameter`` and ``value``
        """
        _completed_num = 0
        for trial_info in data: