[Compression] Evaluator - step 3 Tutorial (#5016)

5f571327 · J-shang · GitHub · f77db747 · 5f571327 · 5f571327
Unverified Commit 5f571327 authored Aug 16, 2022 by J-shang Committed by GitHub Aug 16, 2022
10 changed files
--- a/docs/source/compression/compression_evaluator.rst
+++ b/docs/source/compression/compression_evaluator.rst
+Compression Evaluator
+=====================
+
+The ``Evaluator`` is used to package the training and evaluation process for a targeted model.
+To explain why NNI needs an ``Evaluator``, let's first look at the general process of model compression in NNI.
+
+In model pruning, some algorithms need to prune according to some intermediate variables (gradients, activations, etc.) generated during the training process,
+and some algorithms need to gradually increase or adjust the sparsity of different layers during the training process,
+or adjust the pruning strategy according to the performance changes of the model during the pruning process.
+
+In model quantization, NNI has quantization-aware training algorithm,
+it can adjust the scale and zero point required for model quantization from time to time during the training process,
+and may achieve a better performance compare to post-training quantization.
+
+In order to better support the above algorithms' needs and maintain the consistency of the interface,
+NNI introduces the ``Evaluator`` as the carrier of the training and evaluation process.
+
+.. note::
+    For users prior to NNI v2.8: NNI previously provided APIs like ``trainer``, ``traced_optimizer``, ``criterion``, ``finetuner``.
+    These APIs were maybe tedious in terms of user experience. Users need to exchange the corresponding API frequently if they want to switch compression algorithms.
+    ``Evaluator`` is an alternative to the above interface, users only need to create the evaluator once and it can be used in all compressors.
+
+For users of native PyTorch, :class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>` requires the user to encapsulate the training process as a function and exposes the specified interface,
+which will bring some complexity. But don't worry, in most cases, this will not change too much code.
+
+For users of `PyTorchLightning <https://www.pytorchlightning.ai/>`__, :class:`LightningEvaluator <nni.compression.pytorch.LightningEvaluator>` can be created with only a few lines of code based on your original Lightning code.
+
+Here we give two examples of how to create an ``Evaluator`` for both native PyTorch and PyTorchLightning users.
+
+TorchEvaluator
+--------------
+
+:class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>` is for the users who work in a native PyTorch environment (If you are using PyTorchLightning, please refer `LightningEvaluator`_).
+
+:class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>` has six initialization parameters ``training_func``, ``optimizers``, ``criterion``, ``lr_schedulers``,
+``dummy_input``, ``evaluating_func``.
+
+* ``training_func`` is the training loop to train the compressed model.
+  It is a callable function with six input parameters ``model``, ``optimizers``,
+  ``criterion``, ``lr_schedulers``, ``max_steps``, ``max_epochs``.
+  Please make sure each input argument of the ``training_func`` is actually used,
+  especially ``max_steps`` and ``max_epochs`` can correctly control the duration of training.
+* ``optimizers`` is a single / a list of traced optimizer(s),
+  please make sure using ``nni.trace`` wrapping the ``Optimizer`` class before initializing it / them.
+* ``criterion`` is a callable function to compute loss, it has two input parameters ``input`` and ``target``, and returns a tensor as loss.
+* ``lr_schedulers`` is a single / a list of traced scheduler(s), same as ``optimizers``,
+  please make sure using ``nni.trace`` wrapping the ``_LRScheduler`` class before initializing it / them.
+* ``dummy_input`` is used to trace the model, same as ``example_inputs``
+  in `torch.jit.trace <https://pytorch.org/docs/stable/generated/torch.jit.trace.html?highlight=torch%20jit%20trace#torch.jit.trace>`_.
+* ``evaluating_func`` is a callable function to evaluate the compressed model performance. Its input is a compressed model and its output is metric.
+  The format of metric should be a float number or a dict with key ``default``.
+
+Please refer :class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>` for more details.
+Here is an example of how to initialize a :class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>`.
+
+.. code-block:: python
+
+    from __future__ import annotations
+    from typing import Callable, Any
+
+    import torch
+    from torch.optim.lr_scheduler import StepLR, _LRScheduler
+    from torch.utils.data import DataLoader
+    from torchvision import datasets, models
+
+    import nni
+    from nni.algorithms.compression.v2.pytorch import TorchEvaluator
+
+
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+
+    def training_func(model: torch.nn.Module, optimizers: torch.optim.Optimizer,
+                      criterion: Callable[[Any, Any], torch.Tensor],
+                      lr_schedulers: _LRScheduler | None = None, max_steps: int | None = None,
+                      max_epochs: int | None = None, *args, **kwargs):
+        model.train()
+
+        # prepare data
+        imagenet_train_data = datasets.ImageNet(root='data/imagenet', split='train', download=True)
+        train_dataloader = DataLoader(imagenet_train_data, batch_size=4, shuffle=True)
+
+        #############################################################################
+        # NNI may change the training duration by setting max_steps or max_epochs.
+        # To ensure that NNI has the ability to control the training duration,
+        # please add max_steps and max_epochs as constraints to the training loop.
+        #############################################################################
+        total_epochs = max_epochs if max_epochs else 20
+        total_steps = max_steps if max_steps else 1000000
+        current_steps = 0
+
+        # training loop
+        for _ in range(total_epochs):
+            for inputs, labels in train_dataloader:
+                inputs, labels = inputs.to(device), labels.to(device)
+
+                optimizers.zero_grad()
+                loss = criterion(model(inputs), labels)
+                loss.backward()
+                optimizers.step()
+                ######################################################################
+                # stop the training loop when reach the total_steps
+                ######################################################################
+                current_steps += 1
+                if total_steps and current_steps == total_steps:
+                    return
+            lr_schedulers.step()
+
+
+    def evaluating_func(model: torch.nn.Module):
+        model.eval()
+
+        # prepare data
+        imagenet_val_data = datasets.ImageNet(root='./data/imagenet', split='val', download=True)
+        val_dataloader = DataLoader(imagenet_val_data, batch_size=4, shuffle=False)
+
+        # testing loop
+        correct = 0
+        with torch.no_grad():
+            for inputs, labels in val_dataloader:
+                inputs, labels = inputs.to(device), labels.to(device)
+                logits = model(inputs)
+                preds = torch.argmax(logits, dim=1)
+                correct += preds.eq(labels.view_as(preds)).sum().item()
+        return correct / len(imagenet_val_data)
+
+
+    # initialize the optimizer, criterion, lr_scheduler, dummy_input
+    model = models.resnet18().to(device)
+    ######################################################################
+    # please use nni.trace wrap the optimizer class,
+    # NNI will use the trace information to re-initialize the optimizer
+    ######################################################################
+    optimizer = nni.trace(torch.optim.Adam)(model.parameters(), lr=1e-3)
+    criterion = torch.nn.CrossEntropyLoss()
+    ######################################################################
+    # please use nni.trace wrap the lr_scheduler class,
+    # NNI will use the trace information to re-initialize the lr_scheduler
+    ######################################################################
+    lr_scheduler = nni.trace(StepLR)(optimizer, step_size=5, gamma=0.1)
+    dummy_input = torch.rand(4, 3, 224, 224).to(device)
+
+    # TorchEvaluator initialization
+    evaluator = TorchEvaluator(training_func=training_func, optimizers=optimizer, criterion=criterion,
+                               lr_schedulers=lr_scheduler, dummy_input=dummy_input, evaluating_func=evaluating_func)
+
+
+.. note::
+    It is also worth to note that not all the arguments of :class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>` must be provided.
+    Some compressors only require ``evaluate_func`` as they do not train the model, some compressors only require ``training_func``.
+    Please refer to each compressor's doc to check the required arguments.
+    But, it is fine to provide more arguments than the compressor's need.
+
+
+A complete example of pruner using :class:`TorchEvaluator <nni.compression.pytorch.TorchEvaluator>` to compress model can be found :githublink:`here <examples/model_compress/pruning/taylorfo_torch_evaluator.py>`.
+
+
+LightningEvaluator
+------------------
+:class:`LightningEvaluator <nni.compression.pytorch.LightningEvaluator>` is for the users who work with PyTorchLightning.
+
+Only three parts users need to modify compared with the original pytorch-lightning code:
+
+1. Wrap the ``Optimizer`` and ``_LRScheduler`` class with ``nni.trace``.
+2. Wrap the ``LightningModule`` class with ``nni.trace``.
+3. Wrap the ``LightningDataModule`` class with ``nni.trace``.
+
+Please refer :class:`LightningEvaluator <nni.compression.pytorch.LightningEvaluator>` for more details.
+Here is an example of how to initialize a :class:`LightningEvaluator <nni.compression.pytorch.LightningEvaluator>`.
+
+.. code-block:: python
+
+    import pytorch_lightning as pl
+    from pytorch_lightning.loggers import TensorBoardLogger
+    import torch
+    from torch.optim.lr_scheduler import StepLR
+    from torch.utils.data import DataLoader
+    from torchmetrics.functional import accuracy
+    from torchvision import datasets, models
+
+    import nni
+    from nni.algorithms.compression.v2.pytorch import LightningEvaluator
+
+
+    class SimpleLightningModel(pl.LightningModule):
+        def __init__(self):
+            super().__init__()
+            self.model = models.resnet18()
+            self.criterion = torch.nn.CrossEntropyLoss()
+
+        def forward(self, x):
+            return self.model(x)
+
+        def training_step(self, batch, batch_idx):
+            x, y = batch
+            logits = self(x)
+            loss = self.criterion(logits, y)
+            self.log("train_loss", loss)
+            return loss
+
+        def evaluate(self, batch, stage=None):
+            x, y = batch
+            logits = self(x)
+            loss = self.criterion(logits, y)
+            preds = torch.argmax(logits, dim=1)
+            acc = accuracy(preds, y)
+
+            if stage:
+                self.log(f"default", loss, prog_bar=False)
+                self.log(f"{stage}_loss", loss, prog_bar=True)
+                self.log(f"{stage}_acc", acc, prog_bar=True)
+
+        def validation_step(self, batch, batch_idx):
+            self.evaluate(batch, "val")
+
+        def test_step(self, batch, batch_idx):
+            self.evaluate(batch, "test")
+
+        #####################################################################
+        # please pay attention to this function,
+        # using nni.trace trace the optimizer and lr_scheduler class.
+        #####################################################################
+        def configure_optimizers(self):
+            optimizer = nni.trace(torch.optim.SGD)(
+                self.parameters(),
+                lr=0.01,
+                momentum=0.9,
+                weight_decay=5e-4,
+            )
+            scheduler_dict = {
+                "scheduler": nni.trace(StepLR)(
+                    optimizer,
+                    step_size=5,
+                    amma=0.1
+                ),
+                "interval": "epoch",
+            }
+            return {"optimizer": optimizer, "lr_scheduler": scheduler_dict}
+
+
+    class ImageNetDataModule(pl.LightningDataModule):
+        def __init__(self, data_dir: str = "./data/imagenet"):
+            super().__init__()
+            self.data_dir = data_dir
+
+        def prepare_data(self):
+            # download
+            datasets.ImageNet(self.data_dir, split='train', download=True)
+            datasets.ImageNet(self.data_dir, split='val', download=True)
+
+        def setup(self, stage: str | None = None):
+            if stage == "fit" or stage is None:
+                self.imagenet_train_data = datasets.ImageNet(root='data/imagenet', split='train')
+                self.imagenet_val_data = datasets.ImageNet(root='./data/imagenet', split='val')
+
+            if stage == "test" or stage is None:
+                self.imagenet_test_data = datasets.ImageNet(root='./data/imagenet', split='val')
+
+            if stage == "predict" or stage is None:
+                self.imagenet_predict_data = datasets.ImageNet(root='./data/imagenet', split='val')
+
+        def train_dataloader(self):
+            return DataLoader(self.imagenet_train_data, batch_size=4)
+
+        def val_dataloader(self):
+            return DataLoader(self.imagenet_val_data, batch_size=4)
+
+        def test_dataloader(self):
+            return DataLoader(self.imagenet_test_data, batch_size=4)
+
+        def predict_dataloader(self):
+            return DataLoader(self.imagenet_predict_data, batch_size=4)
+
+    #####################################################################
+    # please use nni.trace wrap the pl.Trainer class,
+    # NNI will use the trace information to re-initialize the trainer
+    #####################################################################
+    pl_trainer = nni.trace(pl.Trainer)(
+        accelerator='auto',
+        devices=1,
+        max_epochs=1,
+        max_steps=50,
+        logger=TensorBoardLogger('./lightning_logs', name="resnet"),
+    )
+
+    #####################################################################
+    # please use nni.trace wrap the pl.LightningDataModule class,
+    # NNI will use the trace information to re-initialize the datamodule
+    #####################################################################
+    pl_data = nni.trace(ImageNetDataModule)(data_dir='./data/imagenet')
+
+    evaluator = LightningEvaluator(pl_trainer, pl_data)
+
+
+.. note::
+    In ``LightningModule.configure_optimizers``, user should use traced ``torch.optim.Optimizer`` and traced ``torch.optim._LRScheduler``.
+    It's for NNI can get the initialization parameters of the optimizers and lr_schedulers.
+
+    .. code-block:: python
+
+        class SimpleModel(pl.LightningModule):
+            ...
+
+            def configure_optimizers(self):
+                optimizers = nni.trace(torch.optim.SGD)(model.parameters(), lr=0.001)
+                lr_schedulers = nni.trace(ExponentialLR)(optimizer=optimizers, gamma=0.1)
+                return optimizers, lr_schedulers
+
+
+A complete example of pruner using :class:`LightningEvaluator <nni.compression.pytorch.LightningEvaluator>` to compress model can be found :githublink:`here <examples/model_compress/pruning/taylorfo_lightning_evaluator.py>`.
--- a/docs/source/compression/toctree.rst
+++ b/docs/source/compression/toctree.rst
@@ -9,4 +9,5 @@ Compression
    Pruning <toctree_pruning>
    Quantization <toctree_quantization>
    Config Specification <compression_config_list>
+    Evaluator <compression_evaluator>
    Advanced Usage <advanced_usage>
--- a/docs/source/reference/compression/evaluator.rst
+++ b/docs/source/reference/compression/evaluator.rst
 Evaluator
 =========

-.. _compression-torch-evaluator:
-
 TorchEvaluator
 --------------

 ..  autoclass:: nni.compression.pytorch.TorchEvaluator

-.. _compression-lightning-evaluator:
-
 LightningEvaluator
 ------------------


--- a/examples/model_compress/pruning/taylorfo_lightning_evaluator.py
+++ b/examples/model_compress/pruning/taylorfo_lightning_evaluator.py
+from __future__ import annotations
+import pytorch_lightning as pl
+from pytorch_lightning.loggers import TensorBoardLogger
+import torch
+from torch.optim.lr_scheduler import StepLR
+from torch.utils.data import DataLoader
+from torchmetrics.functional import accuracy
+from torchvision import datasets, transforms
+
+import nni
+from nni.algorithms.compression.v2.pytorch import LightningEvaluator
+
+import sys
+from pathlib import Path
+sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
+from cifar10.vgg import VGG
+
+
+class SimpleLightningModel(pl.LightningModule):
+    def __init__(self):
+        super().__init__()
+        self.model = VGG()
+        self.criterion = torch.nn.CrossEntropyLoss()
+
+    def forward(self, x):
+        return self.model(x)
+
+    def training_step(self, batch, batch_idx):
+        x, y = batch
+        logits = self(x)
+        loss = self.criterion(logits, y)
+        self.log("train_loss", loss)
+        return loss
+
+    def evaluate(self, batch, stage=None):
+        x, y = batch
+        logits = self(x)
+        loss = self.criterion(logits, y)
+        preds = torch.argmax(logits, dim=1)
+        acc = accuracy(preds, y)
+
+        if stage:
+            self.log(f"default", loss, prog_bar=False)
+            self.log(f"{stage}_loss", loss, prog_bar=True)
+            self.log(f"{stage}_acc", acc, prog_bar=True)
+
+    def validation_step(self, batch, batch_idx):
+        self.evaluate(batch, "val")
+
+    def test_step(self, batch, batch_idx):
+        self.evaluate(batch, "test")
+
+    def configure_optimizers(self):
+        optimizer = nni.trace(torch.optim.Adam)(
+            self.parameters(),
+            lr=0.001
+        )
+        scheduler_dict = {
+            "scheduler": nni.trace(StepLR)(
+                optimizer,
+                step_size=1,
+                gamma=0.5
+            ),
+            "interval": "epoch",
+        }
+        return {"optimizer": optimizer, "lr_scheduler": scheduler_dict}
+
+
+class ImageNetDataModule(pl.LightningDataModule):
+    def __init__(self, data_dir: str = "./data"):
+        super().__init__()
+        self.data_dir = data_dir
+
+    def prepare_data(self):
+        # download
+        datasets.CIFAR10(self.data_dir, train=True, download=True)
+        datasets.CIFAR10(self.data_dir, train=False, download=True)
+
+    def setup(self, stage: str | None = None):
+        if stage == "fit" or stage is None:
+            self.cifar10_train_data = datasets.CIFAR10(root='data', train=True, transform=transforms.Compose([
+                transforms.RandomHorizontalFlip(),
+                transforms.RandomCrop(32, 4),
+                transforms.ToTensor(),
+                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+            ]))
+            self.cifar10_val_data = datasets.CIFAR10(root='./data', train=False, transform=transforms.Compose([
+                transforms.ToTensor(),
+                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+            ]))
+
+        if stage == "test" or stage is None:
+            self.cifar10_test_data = datasets.CIFAR10(root='./data', train=False, transform=transforms.Compose([
+                transforms.ToTensor(),
+                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+            ]))
+
+        if stage == "predict" or stage is None:
+            self.cifar10_predict_data = datasets.CIFAR10(root='./data', train=False, transform=transforms.Compose([
+                transforms.ToTensor(),
+                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+            ]))
+
+    def train_dataloader(self):
+        return DataLoader(self.cifar10_train_data, batch_size=128, shuffle=True)
+
+    def val_dataloader(self):
+        return DataLoader(self.cifar10_val_data, batch_size=128, shuffle=False)
+
+    def test_dataloader(self):
+        return DataLoader(self.cifar10_test_data, batch_size=128, shuffle=False)
+
+    def predict_dataloader(self):
+        return DataLoader(self.cifar10_predict_data, batch_size=128, shuffle=False)
+
+# Train the model
+pl_trainer = nni.trace(pl.Trainer)(
+    accelerator='auto',
+    devices=1,
+    max_epochs=3,
+    logger=TensorBoardLogger('./lightning_logs', name="vgg"),
+)
+pl_data = nni.trace(ImageNetDataModule)(data_dir='./data')
+model = SimpleLightningModel()
+pl_trainer.fit(model, pl_data)
+metric = pl_trainer.test(model, pl_data)
+print(f'The trained model accuracy: {metric}')
+
+# create traced optimizer / lr_scheduler
+optimizer = nni.trace(torch.optim.Adam)(model.parameters(), lr=1e-3)
+criterion = torch.nn.CrossEntropyLoss()
+lr_scheduler = nni.trace(StepLR)(optimizer, step_size=1, gamma=0.5)
+dummy_input = torch.rand(4, 3, 224, 224)
+
+# TorchEvaluator initialization
+evaluator = LightningEvaluator(pl_trainer, pl_data)
+
+# apply pruning
+from nni.compression.pytorch.pruning import TaylorFOWeightPruner
+from nni.compression.pytorch.speedup import ModelSpeedup
+
+pruner = TaylorFOWeightPruner(model, config_list=[{'total_sparsity': 0.5, 'op_types': ['Conv2d']}], evaluator=evaluator, training_steps=100)
+_, masks = pruner.compress()
+metric = pl_trainer.test(model, pl_data)
+print(f'The masked model accuracy: {metric}')
+pruner.show_pruned_weights()
+pruner._unwrap_model()
+ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]), masks_file=masks).speedup_model()
+metric = pl_trainer.test(model, pl_data)
+print(f'The speedup model accuracy: {metric}')
+
+# finetune the speedup model
+optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
+criterion = torch.nn.CrossEntropyLoss()
+lr_scheduler = StepLR(optimizer, step_size=1, gamma=0.5)
+
+pl_trainer = pl.Trainer(
+    accelerator='auto',
+    devices=1,
+    max_epochs=3,
+    logger=TensorBoardLogger('./lightning_logs', name="vgg"),
+)
+pl_trainer.fit(model, pl_data)
+metric = pl_trainer.test(model, pl_data)
+print(f'The speedup model after finetuning accuracy: {metric}')
--- a/examples/model_compress/pruning/taylorfo_torch_evaluator.py
+++ b/examples/model_compress/pruning/taylorfo_torch_evaluator.py
+from __future__ import annotations
+from typing import Callable, Any
+
+import torch
+from torch.optim.lr_scheduler import StepLR, _LRScheduler
+from torch.utils.data import DataLoader
+from torchvision import datasets, transforms
+
+import nni
+from nni.algorithms.compression.v2.pytorch import TorchEvaluator
+
+import sys
+from pathlib import Path
+sys.path.append(str(Path(__file__).absolute().parents[1] / 'models'))
+from cifar10.vgg import VGG
+
+
+device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+model: torch.nn.Module = VGG().to(device)
+
+
+def training_func(model: torch.nn.Module, optimizers: torch.optim.Optimizer,
+                  criterion: Callable[[Any, Any], torch.Tensor],
+                  lr_schedulers: _LRScheduler | None = None, max_steps: int | None = None,
+                  max_epochs: int | None = None, *args, **kwargs):
+    model.train()
+    # prepare data
+    cifar10_train_data = datasets.CIFAR10('./data', train=True, transform=transforms.Compose([
+        transforms.RandomHorizontalFlip(),
+        transforms.RandomCrop(32, 4),
+        transforms.ToTensor(),
+        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+    ]), download=True)
+    train_dataloader = DataLoader(cifar10_train_data, batch_size=128, shuffle=True)
+
+    total_epochs = max_epochs if max_epochs else 3
+    total_steps = max_steps if max_steps else None
+    current_steps = 0
+
+    # training loop
+    for _ in range(total_epochs):
+        for inputs, labels in train_dataloader:
+            inputs, labels = inputs.to(device), labels.to(device)
+
+            optimizers.zero_grad()
+            loss = criterion(model(inputs), labels)
+            loss.backward()
+            optimizers.step()
+            current_steps += 1
+            if total_steps and current_steps == total_steps:
+                return
+        lr_schedulers.step()
+
+
+def evaluating_func(model: torch.nn.Module):
+    model.eval()
+    # prepare data
+    cifar10_val_data = datasets.CIFAR10('./data', train=False, transform=transforms.Compose([
+        transforms.ToTensor(),
+        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
+    ]), download=True)
+    val_dataloader = DataLoader(cifar10_val_data, batch_size=4, shuffle=False)
+    # testing loop
+    correct = 0
+    with torch.no_grad():
+        for inputs, labels in val_dataloader:
+            inputs, labels = inputs.to(device), labels.to(device)
+            logits = model(inputs)
+            preds = torch.argmax(logits, dim=1)
+            correct += preds.eq(labels.view_as(preds)).sum().item()
+    return correct / len(cifar10_val_data)
+
+# Train the model
+optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
+criterion = torch.nn.CrossEntropyLoss()
+lr_scheduler = StepLR(optimizer, step_size=1, gamma=0.5)
+training_func(model, optimizer, criterion, lr_scheduler)
+acc = evaluating_func(model)
+print(f'The trained model accuracy: {acc}')
+
+# create traced optimizer / lr_scheduler
+optimizer = nni.trace(torch.optim.Adam)(model.parameters(), lr=1e-3)
+criterion = torch.nn.CrossEntropyLoss()
+lr_scheduler = nni.trace(StepLR)(optimizer, step_size=1, gamma=0.5)
+dummy_input = torch.rand(4, 3, 224, 224).to(device)
+
+# TorchEvaluator initialization
+evaluator = TorchEvaluator(training_func=training_func, optimizers=optimizer, criterion=criterion,
+                           lr_schedulers=lr_scheduler, dummy_input=dummy_input, evaluating_func=evaluating_func)
+
+# apply pruning
+from nni.compression.pytorch.pruning import TaylorFOWeightPruner
+from nni.compression.pytorch.speedup import ModelSpeedup
+
+pruner = TaylorFOWeightPruner(model, config_list=[{'total_sparsity': 0.5, 'op_types': ['Conv2d']}], evaluator=evaluator, training_steps=100)
+_, masks = pruner.compress()
+acc = evaluating_func(model)
+print(f'The masked model accuracy: {acc}')
+pruner.show_pruned_weights()
+pruner._unwrap_model()
+ModelSpeedup(model, dummy_input=torch.rand([10, 3, 32, 32]).to(device), masks_file=masks).speedup_model()
+acc = evaluating_func(model)
+print(f'The speedup model accuracy: {acc}')
+
+# finetune the speedup model
+optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
+criterion = torch.nn.CrossEntropyLoss()
+lr_scheduler = StepLR(optimizer, step_size=1, gamma=0.5)
+training_func(model, optimizer, criterion, lr_scheduler)
+acc = evaluating_func(model)
+print(f'The speedup model after finetuning accuracy: {acc}')
--- a/nni/algorithms/compression/v2/pytorch/pruning/basic_scheduler.py
+++ b/nni/algorithms/compression/v2/pytorch/pruning/basic_scheduler.py
@@ -48,7 +48,7 @@ class EvaluatorBasedPruningScheduler(BasePruningScheduler):
            self._evaluator: _LEGACY_EVALUATOR = init_kwargs.pop('evaluator')
            self.dummy_input = init_kwargs.pop('dummy_input')
            self.using_evaluator = False
-            warn_msg = f'The old API ...{",".join(old_api)} will be deprecated after NNI v3.0,' +\
+            warn_msg = f'The old API ...{",".join(old_api)} will be deprecated after NNI v3.0,' + \
                       f'please using the new one ...{",".join(new_api)}'
            _logger.warning(warn_msg)
        return init_kwargs

--- a/nni/algorithms/compression/v2/pytorch/utils/docstring.py
+++ b/nni/algorithms/compression/v2/pytorch/utils/docstring.py
@@ -6,7 +6,7 @@ _EVALUATOR_DOCSTRING = r"""NNI will use the evaluator to intervene in the model
        so as to perform training-aware model compression.
        All training-aware model compression will use the evaluator as the entry for intervention training in the future.
        Usually you just need to wrap some classes with ``nni.trace`` or package the training process as a function to initialize the evaluator.
-        Please refer ... for a full tutorial on how to initialize a ``evaluator``.
+        Please refer :doc:`/compression/compression_evaluator` for a full tutorial on how to initialize a ``evaluator``.

        The following are two simple examples, if you use pytorch_lightning, please refer to :class:`nni.compression.pytorch.LightningEvaluator`,
        if you use native pytorch, please refer to :class:`nni.compression.pytorch.TorchEvaluator`::

--- a/nni/algorithms/compression/v2/pytorch/utils/evaluator.py
+++ b/nni/algorithms/compression/v2/pytorch/utils/evaluator.py
@@ -120,7 +120,7 @@ class Evaluator:
    Evaluator is a package for the training & evaluation process. In model compression,
    NNI have the need to intervene in the training process to collect intermediate information,
    and even modify part of the training loop. Evaluator provides a series of member functions that are convenient to modify these,
-    and the compressor can easily intervene in training by calling these functions.
+    and the pruner (or quantizer) can easily intervene in training by calling these functions.

    Notes
    -----
@@ -266,14 +266,16 @@ class Evaluator:

 class LightningEvaluator(Evaluator):
    """
-    LightningEvaluator is the Evaluator based on PytorchLightning.
-    It is very friendly to the users who are familiar to PytorchLightning
-    or already have training/validation/testing code written in PytorchLightning.
+    LightningEvaluator is the Evaluator based on PyTorchLightning.
+    It is very friendly to the users who are familiar to PyTorchLightning
+    or already have training/validation/testing code written in PyTorchLightning.
    The only need is to use ``nni.trace`` to trace the Trainer & LightningDataModule.

    Additionally, please make sure the ``Optimizer`` class and ``LR_Scheduler`` class used in ``LightningModule.configure_optimizers()``
    are also be traced by ``nni.trace``.

+    Please refer to the :doc:`/compression/compression_evaluator` for the evaluator initialization example.
+
    Parameters
    ----------
    trainer
@@ -536,88 +538,99 @@ _TRAINING_FUNC = Callable[[Module, _OPTIMIZERS, _CRITERION, _SCHEDULERS, Optiona

 class TorchEvaluator(Evaluator):
    """
-    TorchEvaluator is the Evaluator for native Pytorch users.
-    It has some requirements for the writing of the training loop, please refer to the documentation for details.
+    TorchEvaluator is the Evaluator for native PyTorch users.
+    Please refer to the :doc:`/compression/compression_evaluator` for the evaluator initialization example.

    Parameters
    ----------
    training_func
        The training function is used to train the model, note that this a entire optimization training loop.
-        It should have three required parameters [model, optimizers, criterion]
-        and three optional parameters [schedulers, max_steps, max_epochs].
-        ``optimizers`` can be an instance of ``torch.optim.Optimizer`` or a list of ``torch.optim.Optimizer``,
-        it belongs to the ``optimizers`` pass to ``TorchEvaluator``.
-        ``criterion`` and ``schedulers`` are also belonging to the ``criterion`` and ``schedulers`` pass to ``TorchEvaluator``.
-        ``max_steps`` and ``max_epochs`` are used to control the training duration.
-
-        Example::
-
-            def training_func(model: Module, optimizer: Optimizer, criterion: Callable, scheduler: _LRScheduler,
-                              max_steps: int | None = None, max_epochs: int | None = None, *args, **kwargs):
-                model.train()
-
-                # prepare data
-                data_dir = Path(__file__).parent / 'data'
-                MNIST(data_dir, train=True, download=True)
-                transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
-                mnist_train = MNIST(data_dir, train=True, transform=transform)
-                train_dataloader = DataLoader(mnist_train, batch_size=32)
-
-                max_epochs = max_epochs if max_epochs else 3
-                max_steps = max_steps if max_steps else 6000
+        Training function has three required parameters, ``model``, ``optimizers`` and ``criterion``,
+        and three optional parameters, ``lr_schedulers``, ``max_steps``, ``max_epochs``.
+
+        Let's explain these six parameters NNI passed in, but in most cases, users don't need to care about these.
+        Users only need to treat these six parameters as the original parameters during the training process.
+
+        * The ``model`` is a wrapped model from the original model, it has a similar structure to the model to be pruned,
+          so it can share training function with the original model.
+        * ``optimizers`` are re-initialized from the ``optimizers`` passed to the evaluator and the wrapped model's parameters.
+        * ``criterion`` also based on the ``criterion`` passed to the evaluator,
+          it might be modified modified by the pruner during model pruning.
+        * If users use ``lr_schedulers`` in the ``training_func``, NNI will re-initialize the ``lr_schedulers`` with the re-initialized
+          optimizers.
+        * ``max_steps`` is the NNI training duration limitation. It is for pruner (or quantizer) to control the number of training steps.
+          The user implemented ``training_func`` should respect ``max_steps`` by stopping the training loop after ``max_steps`` is reached.
+          Pruner may pass ``None`` to ``max_steps`` when it only controls ``max_epochs``.
+        * ``max_epochs`` is similar to the ``max_steps``, the only different is that it controls the number of training epochs.
+          The user implemented ``training_func`` should respect ``max_epochs`` by stopping the training loop
+          after ``max_epochs`` is reached. Pruner may pass ``None`` to ``max_epochs`` when it only controls ``max_steps``.
+
+        Note that when the pruner passes ``None`` to both ``max_steps`` and ``max_epochs``,
+        it treats ``training_func`` as a function of model fine-tuning.
+        Users should assign proper values to ``max_steps`` and ``max_epochs``.
+
+        .. code-block:: python
+
+            def training_func(model: torch.nn.Module, optimizers: torch.optim.Optimizer,
+                              criterion: Callable[[Any, Any], torch.Tensor],
+                              lr_schedulers: _LRScheduler | None = None, max_steps: int | None = None,
+                              max_epochs: int | None = None, *args, **kwargs):
+
+                ...
+
+                total_epochs = max_epochs if max_epochs else 20
+                total_steps = max_steps if max_steps else 1000000
                current_steps = 0

-                # training
-                for _ in range(max_epochs):
-                    for x, y in train_dataloader:
-                        optimizer.zero_grad()
-                        x, y = x.to(device), y.to(device)
-                        logits = model(x)
-                        loss: torch.Tensor = criterion(logits, y)
-                        loss.backward()
-                        optimizer.step()
-                        current_steps += 1
-                        if max_steps and current_steps == max_steps:
-                            return
-                    scheduler.step()
+                ...
+
+                for epoch in range(total_epochs):
+
+                    ...
+
+                    if current_steps >= total_steps:
+                        return
+
+        Note that ``optimizers`` and ``lr_schedulers`` passed to the ``training_func`` have the same type as the ``optimizers``
+        and ``lr_schedulers`` passed to evaluator, a single ``torch.optim.Optimzier``/ ``torch.optim._LRScheduler`` instance or
+        a list of them.

    optimziers
-        The traced optimizer instance which the optimizer class is wrapped by nni.trace.
+        A single traced optimizer instance or a list of traced optimizers by ``nni.trace``.
+
+        NNI may modify the ``torch.optim.Optimizer`` member function ``step`` and/or optimize compressed models,
+        so NNI needs to have the ability to re-initialize the optimizer. ``nni.trace`` can record the initialization parameters
+        of a function/class, which can then be used by NNI to re-initialize the optimizer for a new but structurally similar model.
+
        E.g. ``traced_optimizer = nni.trace(torch.nn.Adam)(model.parameters())``.
    criterion
        The criterion function used in trainer. Take model output and target as input, and return the loss.
+
        E.g. ``criterion = torch.nn.functional.nll_loss``.
    lr_schedulers
-        Optional. The traced _LRScheduler instance which the lr scheduler class is wrapped by nni.trace.
+        Optional. A single traced lr_scheduler instance or a list of traced lr_schedulers by ``nni.trace``.
+        For the same reason with ``optimizers``, NNI needs the traced lr_scheduler to re-initialize it.
+
        E.g. ``traced_lr_scheduler = nni.trace(ExponentialLR)(optimizer, 0.1)``.
    dummy_input
-        Optional. The dummy_input is used to trace the graph,
-        the same with ``example_inputs`` in ``torch.jit.trace(func, example_inputs, ...)``.
+        Optional. The dummy_input is used to trace the graph, it's same with ``example_inputs`` in
+        `torch.jit.trace <https://pytorch.org/docs/stable/generated/torch.jit.trace.html?highlight=torch%20jit%20trace#torch.jit.trace>`_.
    evaluating_func
        Optional. A function that input is model and return the evaluation metric.
-        The return value can be a single float or a tuple (float, Any).
-
-        Example::
-
-            def evaluating_func(model: Module):
-                model.eval()
-
-                # prepare data
-                data_dir = Path(__file__).parent / 'data'
-                MNIST(data_dir, train=False, download=True)
-                transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
-                mnist_test = MNIST(data_dir, train=False, transform=transform)
-                test_dataloader = DataLoader(mnist_test, batch_size=32)
-
-                # testing
-                correct = 0
-                with torch.no_grad():
-                    for x, y in test_dataloader:
-                        x, y = x.to(device), y.to(device)
-                        logits = model(x)
-                        preds = torch.argmax(logits, dim=1)
-                        correct += preds.eq(y.view_as(preds)).sum().item()
-                return correct / len(mnist_test)
+        This is the function used to evaluate the compressed model performance.
+        The input is a model and the output is a ``float`` metric or a ``dict``
+        (``dict`` should contains key ``default`` with a ``float`` value).
+        NNI will take the float number as the model score, and assume the higher score means the better performance.
+        If you want to provide additional information, please put it into a dict
+        and NNI will take the value of key ``default`` as evaluation metric.
+
+    Notes
+    -----
+    It is also worth to note that not all the arguments of ``TorchEvaluator`` must be provided.
+    Some pruners (or quantizers) only require ``evaluating_func`` as they do not train the model,
+    some pruners (or quantizers) only require ``training_func``.
+    Please refer to each pruner's (or quantizer's) doc to check the required arguments.
+    But, it is fine to provide more arguments than the pruner's (or quantizer's) need.
    """

    def __init__(self, training_func: _TRAINING_FUNC, optimizers: Optimizer | List[Optimizer], criterion: _CRITERION,

--- a/test/algo/compression/assets/simple_mnist/simple_evaluator.py
+++ b/test/algo/compression/assets/simple_mnist/simple_evaluator.py
@@ -28,7 +28,6 @@ def create_lighting_evaluator() -> LightningEvaluator:
        max_steps=50,
        logger=TensorBoardLogger(Path(__file__).parent.parent / 'lightning_logs', name="resnet"),
    )
-    pl.Trainer()
    pl_trainer.num_sanity_val_steps = 0
    pl_data = nni.trace(MNISTDataModule)(data_dir='data/mnist')
    evaluator = LightningEvaluator(pl_trainer, pl_data, dummy_input=torch.rand(8, 1, 28, 28))

--- a/test/algo/compression/assets/simple_mnist/simple_lightning_model.py
+++ b/test/algo/compression/assets/simple_mnist/simple_lightning_model.py
@@ -71,7 +71,7 @@ class SimpleLightningModel(pl.LightningModule):
 class MNISTDataModule(pl.LightningDataModule):
    def __init__(self, data_dir: str = "./"):
        super().__init__()
-        self.data_dir = 'data/mnist'
+        self.data_dir = data_dir
        self.transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])

    def prepare_data(self):