Remove runner, parallel, engine and device (#2216)

* Remove runner, parallel, engine and device * fix format * remove outdated docs

Remove runner, parallel, engine and device (#2216)
* Remove runner, parallel, engine and device * fix format * remove outdated docs
9185eee8 · Zaida Zhou · GitHub · 19a02415 · 9185eee8 · 9185eee8
Unverified Commit 9185eee8 authored Aug 22, 2022 by Zaida Zhou Committed by GitHub Aug 22, 2022
20 changed files
--- a/MANIFEST.in
+++ b/MANIFEST.in
 include requirements/runtime.txt
-include mmcv/model_zoo/open_mmlab.json mmcv/model_zoo/deprecated.json mmcv/model_zoo/mmcls.json mmcv/model_zoo/torchvision_0.12.json
 include mmcv/ops/csrc/common/cuda/*.cuh mmcv/ops/csrc/common/cuda/*.hpp mmcv/ops/csrc/common/*.hpp
 include mmcv/ops/csrc/pytorch/*.cpp mmcv/ops/csrc/pytorch/cuda/*.cu mmcv/ops/csrc/pytorch/cuda/*.cpp mmcv/ops/csrc/pytorch/cpu/*.cpp
 include mmcv/ops/csrc/parrots/*.h mmcv/ops/csrc/parrots/*.cpp

--- a/docs/en/api.rst
+++ b/docs/en/api.rst
@@ -28,16 +28,6 @@ cnn
 .. automodule:: mmcv.cnn
    :members:

-runner
------
-.. automodule:: mmcv.runner
-    :members:
-
-engine
------
-.. automodule:: mmcv.engine
-    :members:
-
 ops
 ------
 .. automodule:: mmcv.ops

--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@@ -17,7 +17,6 @@ You can switch between Chinese and English documents in the lower-left corner of

   understand_mmcv/config.md
   understand_mmcv/registry.md
-   understand_mmcv/runner.md
   understand_mmcv/data_process.md
   understand_mmcv/visualization.md
   understand_mmcv/cnn.md

--- a/docs/en/understand_mmcv/runner.md
+++ b/docs/en/understand_mmcv/runner.md
-## Runner
-
-The runner class is designed to manage the training. It eases the training process with less code demanded from users while staying flexible and configurable. The main features are as listed:
-
- Support `EpochBasedRunner` and `IterBasedRunner` for different scenarios. Implementing customized runners is also allowed to meet customized needs.
- Support customized workflow to allow switching between different modes while training. Currently, supported modes are train and val.
- Enable extensibility through various hooks, including hooks defined in MMCV and customized ones.
-
-### EpochBasedRunner
-
-As its name indicates, workflow in `EpochBasedRunner` should be set based on epochs. For example, \[('train', 2), ('val', 1)\] means running 2 epochs for training and 1 epoch for validation, iteratively. And each epoch may contain multiple iterations. Currently, MMDetection uses `EpochBasedRunner` by default.
-
-Let's take a look at its core logic:
-
-```python
-# the condition to stop training
-while curr_epoch < max_epochs:
-    # traverse the workflow.
-    # e.g. workflow = [('train', 2), ('val', 1)]
-    for i, flow in enumerate(workflow):
-        # mode(e.g. train) determines which function to run
-        mode, epochs = flow
-        # epoch_runner will be either self.train() or self.val()
-        epoch_runner = getattr(self, mode)
-        # execute the corresponding function
-        for _ in range(epochs):
-            epoch_runner(data_loaders[i], **kwargs)
-```
-
-Currently, we support 2 modes: train and val. Let's take a train function for example and have a look at its core logic:
-
-```python
-# Currently, epoch_runner could be either train or val
-def train(self, data_loader, **kwargs):
-    # traverse the dataset and get batch data for 1 epoch
-    for i, data_batch in enumerate(data_loader):
-        # it will execute all before_train_iter function in the hooks registered. You may want to watch out for the order.
-        self.call_hook('before_train_iter')
-        # set train_mode as False in val function
-        self.run_iter(data_batch, train_mode=True, **kwargs)
-        self.call_hook('after_train_iter')
-   self.call_hook('after_train_epoch')
-```
-
-### IterBasedRunner
-
-Different from `EpochBasedRunner`, workflow in `IterBasedRunner` should be set based on iterations. For example, \[('train', 2), ('val', 1)\] means running 2 iters for training and 1 iter for validation, iteratively. Currently, MMSegmentation uses `IterBasedRunner` by default.
-
-Let's take a look at its core logic:
-
-```python
-# Although we set workflow by iters here, we might also need info on the epochs in some using cases. That can be provided by IterLoader.
-iter_loaders = [IterLoader(x) for x in data_loaders]
-# the condition to stop training
-while curr_iter < max_iters:
-    # traverse the workflow.
-    # e.g. workflow = [('train', 2), ('val', 1)]
-    for i, flow in enumerate(workflow):
-        # mode(e.g. train) determines which function to run
-        mode, iters = flow
-        # iter_runner will be either self.train() or self.val()
-        iter_runner = getattr(self, mode)
-        # execute the corresponding function
-        for _ in range(iters):
-            iter_runner(iter_loaders[i], **kwargs)
-```
-
-Currently, we support 2 modes: train and val. Let's take a val function for example and have a look at its core logic:
-
-```python
-# Currently, iter_runner could be either train or val
-def val(self, data_loader, **kwargs):
-    # get batch data for 1 iter
-    data_batch = next(data_loader)
-    # it will execute all before_val_iter function in the hooks registered. You may want to watch out for the order.
-    self.call_hook('before_val_iter')
-    outputs = self.model.val_step(data_batch, self.optimizer, **kwargs)
-    self.outputs = outputs
-    self.call_hook('after_val_iter')
-```
-
-Other than the basic functionalities explained above, `EpochBasedRunner` and `IterBasedRunner` provide methods such as `resume`, `save_checkpoint` and `register_hook`. In case you are not familiar with the term Hook mentioned earlier, we will also provide a tutorial about it.(coming soon...) Essentially, a hook is functionality to alter or augment the code behaviors through predefined api. It allows users to have their own code called under certain circumstances. It makes code extensible in a non-intrusive manner.
-
-### A Simple Example
-
-We will walk you through the usage of runner with a classification task. The following code only contains essential steps for demonstration purposes. The following steps are necessary for any training tasks.
-
-**(1) Initialize dataloader, model, optimizer, etc.**
-
-```python
-# initialize model
-model=...
-# initialize optimizer, typically, we set: cfg.optimizer = dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001)
-optimizer = build_optimizer(model, cfg.optimizer)
-# initialize the dataloader corresponding to the workflow(train/val)
-data_loaders = [
-        build_dataloader(
-            ds,
-            cfg.data.samples_per_gpu,
-            cfg.data.workers_per_gpu,
-            ...) for ds in dataset
-    ]
-```
-
-**(2) Initialize runner**
-
-```python
-runner = build_runner(
-    # cfg.runner is typically set as:
-    # runner = dict(type='EpochBasedRunner', max_epochs=200)
-    cfg.runner,
-    default_args=dict(
-        model=model,
-        batch_processor=None,
-        optimizer=optimizer,
-        logger=logger))
-```
-
-**(3) Register training hooks and customized hooks.**
-
-```python
-# register default hooks necessary for training
-runner.register_training_hooks(
-    # configs of learning rate, it is typically set as:
-    # lr_config = dict(policy='step', step=[100, 150])
-    cfg.lr_config,
-    # configuration of optimizer, e.g. grad_clip
-    optimizer_config,
-    # configuration of saving checkpoints, it is typically set as:
-    # checkpoint_config = dict(interval=1), saving checkpoints every epochs
-    cfg.checkpoint_config,
-    # configuration of logs
-    cfg.log_config,
-    ...)
-
-# register customized hooks
-# say we want to enable ema, then we could set custom_hooks=[dict(type='EMAHook')]
-if cfg.get('custom_hooks', None):
-    custom_hooks = cfg.custom_hooks
-    for hook_cfg in cfg.custom_hooks:
-        hook_cfg = hook_cfg.copy()
-        priority = hook_cfg.pop('priority', 'NORMAL')
-        hook = build_from_cfg(hook_cfg, HOOKS)
-        runner.register_hook(hook, priority=priority)
-```
-
-Then, we can use `resume` or `load_checkpoint` to load existing weights.
-
-**(4) Start training**
-
-```python
-# workflow is typically set as: workflow = [('train', 1)]
-# here the training begins.
-runner.run(data_loaders, cfg.workflow)
-```
-
-Let's take `EpochBasedRunner` for example and go a little bit into details about setting workflow:
-
- Say we only want to put train in the workflow, then we can set: workflow = \[('train', 1)\]. The runner will only execute train iteratively in this case.
- Say we want to put both train and val in the workflow, then we can set: workflow = \[('train', 3), ('val',1)\]. The runner will first execute train for 3 epochs and then switch to val mode and execute val for 1 epoch. The workflow will be repeated until the current epoch hit the max_epochs.
- Workflow is highly flexible. Therefore, you can set workflow = \[('val', 1), ('train',1)\] if you would like the runner to validate first and train after.
-
-The code we demonstrated above is already in `train.py` in MM repositories. Simply modify the corresponding keys in the configuration files and the script will execute the expected workflow automatically.
--- a/docs/zh_cn/api.rst
+++ b/docs/zh_cn/api.rst
@@ -28,16 +28,6 @@ cnn
 .. automodule:: mmcv.cnn
    :members:

-runner
------
-.. automodule:: mmcv.runner
-    :members:
-
-engine
------
-.. automodule:: mmcv.engine
-    :members:
-
 ops
 ------
 .. automodule:: mmcv.ops

--- a/docs/zh_cn/index.rst
+++ b/docs/zh_cn/index.rst
@@ -17,7 +17,6 @@

   understand_mmcv/config.md
   understand_mmcv/registry.md
-   understand_mmcv/runner.md
   understand_mmcv/data_process.md
   understand_mmcv/data_transform.md
   understand_mmcv/visualization.md

--- a/docs/zh_cn/understand_mmcv/runner.md
+++ b/docs/zh_cn/understand_mmcv/runner.md
-## 执行器
-
-执行器模块负责模型训练过程调度，主要目的是让用户使用更少的代码以及灵活可配置方式开启训练。其具备如下核心特性:
-
- 支持以 `EpochBasedRunner` 和 `IterBasedRunner` 为单位的迭代模式以满足不同场景
- 支持定制工作流以满足训练过程中各状态自由切换，目前支持训练和验证两个工作流。工作流可以简单理解为一个完成的训练和验证迭代过程。
- 配合各类默认和自定义 Hook，对外提供了灵活扩展能力
-
-### EpochBasedRunner
-
-顾名思义，`EpochBasedRunner` 是指以 epoch 为周期的工作流，例如设置 workflow = \[('train', 2), ('val', 1)\] 表示循环迭代地训练 2 个 epoch，然后验证 1 个 epoch。MMDetection 目标检测框架默认采用的是 `EpochBasedRunner`。
-
-其抽象逻辑如下所示：
-
-```python
-# 训练终止条件
-while curr_epoch < max_epochs:
-    # 遍历用户设置的工作流，例如 workflow = [('train', 2)，('val', 1)]
-    for i, flow in enumerate(workflow):
-        # mode 是工作流函数，例如 train, epochs 是迭代次数
-        mode, epochs = flow
-        # 要么调用 self.train()，要么调用 self.val()
-        epoch_runner = getattr(self, mode)
-        # 运行对应工作流函数
-        for _ in range(epochs):
-            epoch_runner(data_loaders[i], **kwargs)
-```
-
-目前支持训练和验证两个工作流，以训练函数为例，其抽象逻辑是：
-
-```python
-# epoch_runner 目前可以是 train 或者 val
-def train(self, data_loader, **kwargs):
-    # 遍历 dataset，共返回一个 epoch 的 batch 数据
-    for i, data_batch in enumerate(data_loader):
-        self.call_hook('before_train_iter')
-        # 验证时候 train_mode=False
-        self.run_iter(data_batch, train_mode=True, **kwargs)
-        self.call_hook('after_train_iter')
-   self.call_hook('after_train_epoch')
-```
-
-### IterBasedRunner
-
-不同于 `EpochBasedRunner`，`IterBasedRunner` 是指以 iter 为周期的工作流，例如设置 workflow = \[('train', 2)， ('val', 1)\] 表示循环迭代的训练 2 个 iter，然后验证 1 个 iter，MMSegmentation 语义分割框架默认采用的是  `IterBasedRunner`。
-
-其抽象逻辑如下所示：
-
-```python
-# 虽然是 iter 单位，但是某些场合需要 epoch 信息，由 IterLoader 提供
-iter_loaders = [IterLoader(x) for x in data_loaders]
-# 训练终止条件
-while curr_iter < max_iters:
-    # 遍历用户设置的工作流，例如 workflow = [('train', 2)， ('val', 1)]
-    for i, flow in enumerate(workflow):
-        # mode 是工作流函数，例如 train, iters 是迭代次数
-        mode, iters = flow
-        # 要么调用 self.train()，要么调用 self.val()
-        iter_runner = getattr(self, mode)
-        # 运行对应工作流函数
-        for _ in range(iters):
-            iter_runner(iter_loaders[i], **kwargs)
-```
-
-目前支持训练和验证两个工作流，以验证函数为例，其抽象逻辑是：
-
-```python
-# iter_runner 目前可以是 train 或者 val
-def val(self, data_loader, **kwargs):
-    # 获取 batch 数据，用于一次迭代
-    data_batch = next(data_loader)
-    self.call_hook('before_val_iter')
-    outputs = self.model.val_step(data_batch, self.optimizer, **kwargs)
-    self.outputs = outputs
-    self.call_hook('after_val_iter')
-```
-
-除了上述基础功能外，`EpochBasedRunner` 和 `IterBasedRunner` 还提供了 resume 、 save_checkpoint 和注册 hook 功能。
-
-### 一个简单例子
-
-以最常用的分类任务为例详细说明 `runner` 的使用方法。 开启任何一个训练任务，都需要包括如下步骤：
-
-**(1) dataloader、model 和优化器等类初始化**
-
-```python
-# 模型类初始化
-model=...
-# 优化器类初始化，典型值 cfg.optimizer = dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001)
-optimizer = build_optimizer(model, cfg.optimizer)
-# 工作流对应的 dataloader 初始化
-data_loaders = [
-        build_dataloader(
-            ds,
-            cfg.data.samples_per_gpu,
-            cfg.data.workers_per_gpu,
-            ...) for ds in dataset
-    ]
-```
-
-**(2) runner 类初始化**
-
-```python
-runner = build_runner(
-    # cfg.runner 典型配置为
-    # runner = dict(type='EpochBasedRunner', max_epochs=200)
-    cfg.runner,
-    default_args=dict(
-        model=model,
-        batch_processor=None,
-        optimizer=optimizer,
-        logger=logger))
-```
-
-**(3) 注册默认训练所必须的 hook，和用户自定义 hook**
-
-```python
-# 注册定制必需的 hook
-runner.register_training_hooks(
-    # lr相关配置，典型为
-    # lr_config = dict(policy='step', step=[100, 150])
-    cfg.lr_config,
-    # 优化相关配置，例如 grad_clip 等
-    optimizer_config,
-    # 权重保存相关配置，典型为
-    # checkpoint_config = dict(interval=1)，每个单位都保存权重
-    cfg.checkpoint_config,
-    # 日志相关配置
-    cfg.log_config,
-    ...)
-
-# 注册用户自定义 hook
-# 例如想使用 ema 功能，则可以设置 custom_hooks=[dict(type='EMAHook')]
-if cfg.get('custom_hooks', None):
-    custom_hooks = cfg.custom_hooks
-    for hook_cfg in cfg.custom_hooks:
-        hook_cfg = hook_cfg.copy()
-        priority = hook_cfg.pop('priority', 'NORMAL')
-        hook = build_from_cfg(hook_cfg, HOOKS)
-        runner.register_hook(hook, priority=priority)
-```
-
-然后可以进行 resume 或者 load_checkpoint 对权重进行加载。
-
-**(4) 开启训练流**
-
-```python
-# workflow 典型为 workflow = [('train', 1)]
-# 此时就真正开启了训练
-runner.run(data_loaders, cfg.workflow)
-```
-
-关于 workflow 设置，以 `EpochBasedRunner` 为例，详情如下：
-
- 假设只想运行训练工作流，则可以设置 workflow = \[('train', 1)\]，表示只进行迭代训练
- 假设想运行训练和验证工作流，则可以设置 workflow = \[('train',  3), ('val', 1)\]，表示先训练 3 个 epoch ，然后切换到 val 工作流，运行 1 个 epoch，然后循环，直到训练 epoch 次数达到指定值
- 工作流设置还自由定制，例如你可以先验证再训练 workflow = \[('val', 1), ('train', 1)\]
-
-上述代码都已经封装到了各个代码库的 train.py 中，用户只需要设置相应的配置即可，上述流程会自动运行。
--- a/examples/train.py
+++ b/examples/train.py
-import torch
-import torch.nn as nn
-import torch.nn.functional as F
-import torch.optim as optim
-import torchvision.transforms as transforms
-from torch.utils.data import DataLoader
-from torchvision.datasets import CIFAR10
-
-from mmcv.parallel import MMDataParallel
-from mmcv.runner import EpochBasedRunner
-from mmcv.utils import get_logger
-
-
-class Model(nn.Module):
-
-    def __init__(self):
-        super().__init__()
-        self.conv1 = nn.Conv2d(3, 6, 5)
-        self.pool = nn.MaxPool2d(2, 2)
-        self.conv2 = nn.Conv2d(6, 16, 5)
-        self.fc1 = nn.Linear(16 * 5 * 5, 120)
-        self.fc2 = nn.Linear(120, 84)
-        self.fc3 = nn.Linear(84, 10)
-        self.loss_fn = nn.CrossEntropyLoss()
-
-    def forward(self, x):
-        x = self.pool(F.relu(self.conv1(x)))
-        x = self.pool(F.relu(self.conv2(x)))
-        x = x.view(-1, 16 * 5 * 5)
-        x = F.relu(self.fc1(x))
-        x = F.relu(self.fc2(x))
-        x = self.fc3(x)
-        return x
-
-    def train_step(self, data, optimizer):
-        images, labels = data
-        predicts = self(images)  # -> self.__call__() -> self.forward()
-        loss = self.loss_fn(predicts, labels)
-        return {'loss': loss}
-
-
-if __name__ == '__main__':
-    model = Model()
-    if torch.cuda.is_available():
-        # only use gpu:0 to train
-        # Solved issue https://github.com/open-mmlab/mmcv/issues/1470
-        model = MMDataParallel(model.cuda(), device_ids=[0])
-
-    # dataset and dataloader
-    transform = transforms.Compose([
-        transforms.ToTensor(),
-        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
-    ])
-    trainset = CIFAR10(
-        root='data', train=True, download=True, transform=transform)
-    trainloader = DataLoader(
-        trainset, batch_size=128, shuffle=True, num_workers=2)
-
-    optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
-    logger = get_logger('mmcv')
-    # runner is a scheduler to manage the training
-    runner = EpochBasedRunner(
-        model,
-        optimizer=optimizer,
-        work_dir='./work_dir',
-        logger=logger,
-        max_epochs=4)
-
-    # learning rate scheduler config
-    lr_config = dict(policy='step', step=[2, 3])
-    # configuration of optimizer
-    optimizer_config = dict(grad_clip=None)
-    # configuration of saving checkpoints periodically
-    checkpoint_config = dict(interval=1)
-    # save log periodically and multiple hooks can be used simultaneously
-    log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
-    # register hooks to runner and those hooks will be invoked automatically
-    runner.register_training_hooks(
-        lr_config=lr_config,
-        optimizer_config=optimizer_config,
-        checkpoint_config=checkpoint_config,
-        log_config=log_config)
-
-    runner.run([trainloader], [('train', 1)])
--- a/mmcv/__init__.py
+++ b/mmcv/__init__.py
@@ -10,7 +10,4 @@ from .visualization import *

 # The following modules are not imported to this level, so mmcv may be used
 # without PyTorch.
-# - runner
-# - parallel
 # - op
-# - device
--- a/mmcv/cnn/alexnet.py
+++ b/mmcv/cnn/alexnet.py
@@ -4,6 +4,7 @@ from typing import Optional

 import torch
 import torch.nn as nn
+from mmengine.runner import load_checkpoint


 class AlexNet(nn.Module):
@@ -45,7 +46,6 @@ class AlexNet(nn.Module):
    def init_weights(self, pretrained: Optional[str] = None) -> None:
        if isinstance(pretrained, str):
            logger = logging.getLogger()
-            from ..runner import load_checkpoint
            load_checkpoint(self, pretrained, strict=False, logger=logger)
        elif pretrained is None:
            # use default initializer

--- a/mmcv/cnn/resnet.py
+++ b/mmcv/cnn/resnet.py
@@ -5,6 +5,7 @@ from typing import Optional, Sequence, Tuple, Union
 import torch.nn as nn
 import torch.utils.checkpoint as cp
 from mmengine.model.utils import constant_init, kaiming_init
+from mmengine.runner import load_checkpoint
 from torch import Tensor


@@ -270,7 +271,6 @@ class ResNet(nn.Module):
    def init_weights(self, pretrained: Optional[str] = None) -> None:
        if isinstance(pretrained, str):
            logger = logging.getLogger()
-            from ..runner import load_checkpoint
            load_checkpoint(self, pretrained, strict=False, logger=logger)
        elif pretrained is None:
            for m in self.modules():

--- a/mmcv/cnn/vgg.py
+++ b/mmcv/cnn/vgg.py
@@ -4,6 +4,7 @@ from typing import List, Optional, Sequence, Tuple, Union

 import torch.nn as nn
 from mmengine.model.utils import constant_init, kaiming_init, normal_init
+from mmengine.runner import load_checkpoint
 from torch import Tensor


@@ -126,7 +127,6 @@ class VGG(nn.Module):
    def init_weights(self, pretrained: Optional[str] = None) -> None:
        if isinstance(pretrained, str):
            logger = logging.getLogger()
-            from ..runner import load_checkpoint
            load_checkpoint(self, pretrained, strict=False, logger=logger)
        elif pretrained is None:
            for m in self.modules():

--- a/mmcv/device/__init__.py
+++ b/mmcv/device/__init__.py
-# Copyright (c) OpenMMLab. All rights reserved.
-from . import ipu, mlu, mps
-from .scatter_gather import scatter, scatter_kwargs
-from .utils import get_device
-
-__all__ = ['mlu', 'ipu', 'mps', 'get_device', 'scatter', 'scatter_kwargs']
--- a/mmcv/device/_functions.py
+++ b/mmcv/device/_functions.py
-# Copyright (c) OpenMMLab. All rights reserved.
-from typing import List, Union
-
-import torch
-
-from mmcv.utils import deprecated_api_warning
-from .utils import get_device
-
-
-def scatter(input: Union[List, torch.Tensor], devices: List) -> List:
-    """scatter copies tensor to devices directly."""
-    current_device = get_device()
-    if isinstance(input, list):
-        outputs = [scatter(_input, devices) for _input in input]
-        return outputs
-    elif isinstance(input, torch.Tensor):
-        output = input.contiguous()
-        return output.to(current_device) if devices != [-1] else output
-    else:
-        raise Exception(f'Unknown type {type(input)}.')
-
-
-class Scatter:
-
-    @staticmethod
-    @deprecated_api_warning({'target_mlus': 'target_devices'},
-                            cls_name='Scatter')
-    def forward(target_devices, input):
-        outputs = scatter(input, target_devices)
-        return tuple(outputs) if isinstance(outputs, list) else (outputs, )
--- a/mmcv/device/ipu/__init__.py
+++ b/mmcv/device/ipu/__init__.py
-# Copyright (c) OpenMMLab. All rights reserved.
-from mmcv.utils import IS_IPU_AVAILABLE
-
-if IS_IPU_AVAILABLE:
-    from .dataloader import IPUDataLoader
-    from .hook_wrapper import IPUFp16OptimizerHook
-    from .model_wrapper import ipu_model_wrapper
-    from .runner import IPUBaseRunner, IPUEpochBasedRunner, IPUIterBasedRunner
-    from .utils import cfg2options
-    __all__ = [
-        'cfg2options', 'ipu_model_wrapper', 'IPUFp16OptimizerHook',
-        'IPUDataLoader', 'IPUBaseRunner', 'IPUEpochBasedRunner',
-        'IPUIterBasedRunner'
-    ]
--- a/mmcv/device/ipu/dataloader.py
+++ b/mmcv/device/ipu/dataloader.py
-# Copyright (c) OpenMMLab. All rights reserved.
-from collections.abc import Mapping, Sequence
-from functools import partial
-
-import poptorch
-from torch.utils.data.dataloader import default_collate
-
-from mmcv.parallel import DataContainer
-
-
-def collate(batch, samples_per_gpu=1):
-    """Put each data field into a tensor/DataContainer with outer dimension
-    batch size.
-
-    TODO support for
-    :type:`~mmcv.parallel.DataContainer`. Currently, it will be ignored.
-    There are 3 cases.
-
-    1. cpu_only = True, e.g., meta data.
-    2. cpu_only = False, stack = True, e.g., images tensors.
-    3. cpu_only = False, stack = False, e.g., gt bboxes.
-    """
-
-    if not isinstance(batch, Sequence):
-        raise TypeError(
-            f'`batch` should be a sequence, but got {type(batch)}.')
-
-    if isinstance(batch[0], DataContainer):
-        # TODO `DataContainer` will be supported in the future.
-        raise TypeError('DataContainer is not supported in ipu data loader.')
-    elif isinstance(batch[0], Sequence):
-        transposed = zip(*batch)
-        collated_batch = []
-        for samples in transposed:
-            if not isinstance(samples[0], DataContainer):
-                # At present, we will skip the processing of datacontainer,
-                # which will reduce the performance of IPU DataLoder
-                collated_batch.append(collate(samples, samples_per_gpu))
-        return collated_batch
-    elif isinstance(batch[0], Mapping):
-        collated_batch = {}
-        for key in batch[0]:
-            if not isinstance(batch[0][key], DataContainer):
-                # At present, we will skip the processing of datacontainer,
-                # which will reduce the performance of IPU DataLoder
-                collated_batch[key] = collate([d[key] for d in batch])
-        return collated_batch
-    else:
-        return default_collate(batch)
-
-
-class IPUDataLoader(poptorch.DataLoader):
-    """Thin wrapper of `torch.utils.data.DataLoader`.
-
-    Compared with the pytorch DataLoder, this DataLoder changes the way of
-    calculation of batch size and adds the AsynchronousDataAccessor to
-    load and release data faster in cpu mode.
-
-    If this data loader is used in a distributed execution environment, it will
-    ensure that each process uses a different subset of the dataset, providing
-    you first call ``options.randomSeed(N)`` with an integer N which is the
-    same across all hosts.
-
-    Args:
-        dataset (torch.utils.data.Dataset): The dataset to get the data from.
-        options (poptorch.Options): Options that will be used to compile
-            and run the model.
-        batch_size (int, optional): This is the batch size in the conventional
-            sense of being the size that runs through an operation in the model
-            at any given time.
-        shuffle (bool, optional): set to ``True`` to have the data reshuffled
-            at every epoch (default: ``False``).
-        num_workers (int, optional): how many subprocesses to use for data
-            loading. ``0`` means that the data will be loaded in the main
-            process. (default: ``0``)
-        drop_last (bool, optional): If True and the number of elements in the
-            dataset is not a multiple of the combined batch size then the
-            incomplete batch at the end will be dropped.
-        persistent_workers (bool, optional): Re-use workers between
-            iterations if True.
-        auto_distributed_partitioning (bool, optional): If True, partitions the
-            dataset for distributed execution automatically. Otherwise, it is
-            assumed that partitioning has been handled manually.
-        mode (poptorch.DataLoaderMode, optional): If `DataLoaderMode.Async`,
-            uses an :py:class:`~poptorch.AsynchronousDataAccessor` to access
-            the dataset. If `DataLoaderMode.Sync`, accesses the dataset
-            synchronously.
-        async_options (Dict[str, Any], optional): Options to pass to
-            :py:class:`~poptorch.AsynchronousDataAccessor`.
-        rebatched_worker_size (int, optional): When using AsyncRebatched: batch
-            size of the tensors loaded by the workers.
-            Default to the combined batch size.
-            If specified the ``rebatched_worker_size`` must be less than
-            or equal to the combined batch size.
-        kwargs (Dict[str, Any], optional): Other options to pass to PyTorch's
-            ``DataLoader`` constructor.
-    """
-
-    def __init__(self,
-                 dataset,
-                 options,
-                 batch_size=1,
-                 shuffle=False,
-                 num_workers=0,
-                 drop_last=True,
-                 persistent_workers=True,
-                 auto_distributed_partitioning=True,
-                 mode='sync',
-                 async_options=None,
-                 rebatched_worker_size=None,
-                 **kwargs):
-        """Lazy init:
-
-        In many frameworks, the dataloader will be constructed before the
-        initialization of the ipu options, so the lazy init method is used
-        here, and the real initialization will not be done until the dataloader
-        needs to be used and the options are input.
-        """
-        # lazy init: sometimes, we cannot get IPU options when build data
-        #            loader
-        self.kwargs = {
-            'dataset': dataset,
-            'batch_size': batch_size,
-            'shuffle': shuffle,
-            'num_workers': num_workers,
-            'drop_last': drop_last,
-            'persistent_workers': persistent_workers,
-            'auto_distributed_partitioning': auto_distributed_partitioning,
-            'mode': mode,
-            'collate_fn': partial(collate, samples_per_gpu=batch_size),
-            'async_options': async_options,
-            'rebatched_worker_size': rebatched_worker_size,
-            **kwargs
-        }
-        self.dataset = dataset
-        self.initialized = False
-        if options:
-            self.init(options=options)
-
-    def init(self, options, **kwargs):
-        if not self.initialized:
-            kwargs = {**self.kwargs, **kwargs, 'options': options}
-            if kwargs['mode'] == 'sync':
-                kwargs['mode'] = poptorch.DataLoaderMode.Sync
-            elif kwargs['mode'] == 'async':
-                kwargs['mode'] = poptorch.DataLoaderMode.AsyncRebatched
-                if kwargs['async_options'] is None:
-                    kwargs['async_options'] = {
-                        'load_indefinitely': True,
-                        'buffer_size': 8
-                    }
-                if kwargs['rebatched_worker_size'] is None:
-                    kwargs['rebatched_worker_size'] = 128
-            super().__init__(**kwargs)
-            self.initialized = True
-
-        return self
--- a/mmcv/device/ipu/hierarchical_data_manager.py
+++ b/mmcv/device/ipu/hierarchical_data_manager.py
-# Copyright (c) OpenMMLab. All rights reserved.
-import warnings
-
-import numpy as np
-import torch
-
-from mmcv.parallel import DataContainer
-
-# A customized None type for HierarchicalDataManager
-HierarchicalDataNone = object()
-
-
-class HierarchicalDataManager:
-    """A class manage all the tensors in the hierarchical data.
-
-    At present, the input data structure accepted by IPU is limited,
-    when the input data structure of mmcv varies.
-    Here, an intermediate class is needed to get and update tensors
-    from the original data.
-
-    HierarchicalDataManager will record a hierarchical input/output data in
-    self._hierarchical_data. For example, we have an input data:
-    {'img': tensorA, 'label': tensorB, 'img_metas': [tensorC, tensorD]}
-    To enable IPU to use the input, HierarchicalDataManager will collect
-    the torch tensors from self._hierarchical_data into a tuple like:
-    (tensorA, tensorB, tensorC, tensorD).
-    Meanwhile, the return of IPU is a tuple of tensors, HierarchicalDataManager
-    also have a function named update_all_tensors to update tensors in
-    self._hierarchical_data which is the output for upper calls.
-
-    Args:
-        logger (:obj:`logging.Logger`): Logger used during running.
-             Defaults to None.
-    """
-
-    def __init__(self, logger=None):
-        self.atomic_types = (int, str, float, np.ndarray, type(None))
-        self.warning = warnings.warn if logger is None else logger.warning
-        # enable or disable input data's shape and value check
-        self.quick_mode = False
-        self._hierarchical_data = None
-
-    def quick(self):
-        self.quick_mode = True
-
-    def compare_atomic_type(self, a, b):
-        """Compare data, supported datatypes are numpy array and python basic
-        types."""
-        if isinstance(a, np.ndarray):
-            return np.all(a == b)
-        else:
-            return a == b
-
-    def record_hierarchical_data(self, data):
-        """Record a hierarchical data."""
-        if self._hierarchical_data is not None:
-            if isinstance(data, torch.Tensor):
-                assert isinstance(self._hierarchical_data, torch.Tensor), \
-                    'original hierarchical data is not torch.tensor'
-                self._hierarchical_data = data
-            else:
-                self.update_hierarchical_data(data)
-        else:
-            self._hierarchical_data = data
-
-    @property
-    def hierarchical_data(self):
-        return self._hierarchical_data
-
-    def update_hierarchical_data(self,
-                                 dataA,
-                                 dataB=HierarchicalDataNone,
-                                 strict=True,
-                                 address='data'):
-        """Update dataB with dataA in-place.
-
-        Args:
-            dataA (list or dict or tuple): New hierarchical data.
-            dataB (list or dict or tuple): hierarchical data to update.
-                if not specified, self.hierarchical_data will be updated then.
-            strict (bool, optional): If true, an error will be reported
-                when the following conditions occur:
-                1. Non-torch.Tensor data changed.
-                2. Torch.Tensor data shape changed.
-            address (str): Record the address of current data to be updated.
-                Default: 'data'.
-        """
-        if dataB is HierarchicalDataNone:
-            dataB = self.hierarchical_data
-
-        # Update with a da ta with the same structure
-        # but different values(tensors and basic python data types)
-        if isinstance(dataA, (tuple, list)):
-            for idx, node in enumerate(dataA):
-                new_address = ''
-                if not self.quick_mode:
-                    new_address = address + f'[{str(idx)}]'
-                    assert isinstance(node, type(dataB[idx])),\
-                        f'data structure changed: {new_address}'
-                if isinstance(node, torch.Tensor):
-                    dataB[idx] = node
-                else:
-                    self.update_hierarchical_data(
-                        node, dataB[idx], strict, address=new_address)
-        elif isinstance(dataA, dict):
-            for k, v in dataA.items():
-                new_address = ''
-                if not self.quick_mode:
-                    new_address = address + f'[{str(k)}]'
-                    assert isinstance(v, type(dataB[k])),\
-                        f'data structure changed: {new_address}'
-                if isinstance(v, torch.Tensor):
-                    dataB[k] = v
-                else:
-                    self.update_hierarchical_data(
-                        v, dataB[k], strict, address=new_address)
-        elif isinstance(dataA, self.atomic_types):
-            if not self.quick_mode:
-                is_equal = self.compare_atomic_type(dataA, dataB)
-                if not is_equal:
-                    if strict:
-                        raise ValueError(
-                            'all data except torch.Tensor should be same, '
-                            f'but data({address}) is changed.')
-                    else:
-                        self.warning(
-                            f'find a non-torch.Tensor data({type(dataA)}) '
-                            f'changed, and the address is {address}')
-        elif isinstance(dataA, DataContainer):
-            if not self.quick_mode:
-                assert isinstance(dataB, DataContainer)
-                new_address = address + '.data'
-                self.update_hierarchical_data(
-                    dataA.data, dataB.data, False, address=new_address)
-        else:
-            raise NotImplementedError(
-                f'not supported datatype:{type(dataA)}, address is {address}')
-
-    def collect_all_tensors(self, hierarchical_data=None):
-        """Collect torch.Tensor data from self.hierarchical_data to a list and
-        return."""
-        # get a list of tensor from self._hierarchical_data
-        if hierarchical_data is None:
-            hierarchical_data = self._hierarchical_data
-        tensors = []
-        if isinstance(hierarchical_data, torch.Tensor):
-            tensors = [hierarchical_data]
-        else:
-            self._collect_tensors(hierarchical_data, tensors)
-        return tensors
-
-    def _collect_tensors(self, data, tensors):
-        if isinstance(data, (tuple, list)):
-            for node in data:
-                if isinstance(node, torch.Tensor):
-                    tensors.append(node)
-                else:
-                    self._collect_tensors(node, tensors)
-        elif isinstance(data, dict):
-            for v in data.values():
-                if isinstance(v, torch.Tensor):
-                    tensors.append(v)
-                else:
-                    self._collect_tensors(v, tensors)
-        elif isinstance(data, self.atomic_types):
-            pass
-        elif isinstance(data, DataContainer):
-            self._collect_tensors(data.data, tensors)
-        else:
-            raise NotImplementedError(f'not supported datatype:{type(data)}')
-
-    def update_all_tensors(self, tensors):
-        """Put tensors from tuple back to self.hierarchical_data."""
-        if isinstance(self._hierarchical_data, torch.Tensor):
-            print(tensors, len(tensors))
-            assert len(tensors) == 1
-            assert isinstance(tensors[0], torch.Tensor)
-            self._hierarchical_data = tensors[0]
-        else:
-            # convert to list if tensors is tuple
-            tensors = list(tensors)
-            self._set_tensors(self._hierarchical_data, tensors)
-        return self.hierarchical_data
-
-    def _set_tensors(self, data, tensors):
-        if isinstance(data, tuple):
-            data = list(data)
-            for idx in range(len(data)):
-                if isinstance(data[idx], torch.Tensor):
-                    data[idx] = tensors.pop(0)
-                else:
-                    self._set_tensors(data[idx], tensors)
-            data = tuple(data)
-        elif isinstance(data, list):
-            for idx in range(len(data)):
-                if isinstance(data[idx], torch.Tensor):
-                    data[idx] = tensors.pop(0)
-                else:
-                    self._set_tensors(data[idx], tensors)
-        elif isinstance(data, dict):
-            for k, v in data.items():
-                if isinstance(v, torch.Tensor):
-                    data[k] = tensors.pop(0)
-                else:
-                    self._set_tensors(v, tensors)
-        elif isinstance(data, self.atomic_types):
-            pass
-        elif isinstance(data, DataContainer):
-            self._set_tensors(data.data, tensors)
-        else:
-            raise NotImplementedError(f'not supported datatype:{type(data)}')
-
-    def clean_all_tensors(self):
-        """Delete tensors from self.hierarchical_data."""
-        self._clean_tensors(self._hierarchical_data)
-
-    def _clean_tensors(self, data):
-        if isinstance(data, tuple):
-            data = list(data)
-            for idx in range(len(data)):
-                if isinstance(data[idx], torch.Tensor):
-                    data[idx] = None
-                else:
-                    self._clean_tensors(data[idx])
-            data = tuple(data)
-        elif isinstance(data, list):
-            for idx in range(len(data)):
-                if isinstance(data[idx], torch.Tensor):
-                    data[idx] = None
-                else:
-                    self._clean_tensors(data[idx])
-        elif isinstance(data, dict):
-            for k, v in data.items():
-                if isinstance(v, torch.Tensor):
-                    data[k] = None
-                else:
-                    self._clean_tensors(v)
-        elif isinstance(data, self.atomic_types):
-            pass
-        elif isinstance(data, DataContainer):
-            self._clean_tensors(data.data)
-        else:
-            raise NotImplementedError(f'not supported datatype:{type(data)}')
--- a/mmcv/device/ipu/hook_wrapper.py
+++ b/mmcv/device/ipu/hook_wrapper.py
-# Copyright (c) OpenMMLab. All rights reserved.
-from mmcv.runner import HOOKS, LrUpdaterHook, OptimizerHook
-from mmcv.utils import TORCH_VERSION, digit_version
-
-
-def wrap_lr_updater_hook(lr_hook_class):
-    """A wrapper function to wrap any subclass of LrUpdaterHook.
-
-    IPU needs extra operations to upload optimizer settings. This wrapper will
-    override function(_set_lr) of a subclass of LrUpdaterHook.
-    """
-    assert issubclass(lr_hook_class, LrUpdaterHook)
-
-    class ipu_lr_hook_class(lr_hook_class):
-
-        def _set_lr(self, runner, *args, **kwargs):
-            super()._set_lr(runner, *args, **kwargs)
-            # convert torch optimizer to poptorch optimizer
-            runner.model.setOptimizer(runner.optimizer)
-
-    return ipu_lr_hook_class
-
-
-def wrap_optimizer_hook(optimizer_hook_class):
-    """A wrapper function to wrap OptimizerHook.
-
-    This is an non-intrusive implementation of wrapping optimizer hook (or you
-    need to change every config file to use IPU optimizer hook) IPU's clip-norm
-    implementation is different from pytorch, so there should be an error
-    raised when using clip-norm.
-    """
-
-    class ipu_optimizer_hook_class(OptimizerHook):
-
-        def __init__(self, **kwargs):
-            super().__init__(**kwargs)
-            if self.grad_clip is not None:
-                raise NotImplementedError('IPU does not support gradient clip')
-
-    return ipu_optimizer_hook_class
-
-
-if (TORCH_VERSION != 'parrots'
-        and digit_version(TORCH_VERSION) >= digit_version('1.6.0')):
-
-    @HOOKS.register_module()
-    class IPUFp16OptimizerHook(OptimizerHook):
-        """FP16 optimizer hook (using PyTorch's implementation).
-
-        If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend,
-        to take care of the optimization procedure.
-
-        Args:
-            loss_scale (float | str | dict): Scale factor configuration.
-                If loss_scale is a float, static loss scaling will be used with
-                the specified scale. If loss_scale is a string, it must be
-                'dynamic', then dynamic loss scaling will be used.
-                It can also be a dict containing arguments of GradScalar.
-                Defaults to 512. For Pytorch >= 1.6, mmcv uses official
-                implementation of GradScaler. If you use a dict version of
-                loss_scale to create GradScaler, please refer to:
-                https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler
-                for the parameters.
-
-        Examples:
-            >>> loss_scale = dict(
-            ...     init_scale=65536.0,
-            ...     growth_factor=2.0,
-            ...     backoff_factor=0.5,
-            ...     growth_interval=2000
-            ... )
-            >>> optimizer_hook = Fp16OptimizerHook(loss_scale=loss_scale)
-        """
-
-        def __init__(self,
-                     grad_clip=None,
-                     coalesce=True,
-                     bucket_size_mb=-1,
-                     loss_scale=512.,
-                     distributed=True):
-            assert grad_clip is None,\
-                'IPU mode does not support `grad_clip` currently'
-            assert coalesce,\
-                'implemented all reduce in distributed training currently'
-            assert bucket_size_mb == -1,\
-                '`bucket_size_mb` should not be set in IPU mode'
-            self.distributed = distributed
-            self._scale_update_param = None
-            if loss_scale == 'dynamic':
-                raise NotImplementedError(
-                    'IPU mode does not support dynamic loss scale currently')
-            elif isinstance(loss_scale, float):
-                self.loss_scale = loss_scale
-            elif isinstance(loss_scale, dict):
-                raise NotImplementedError(
-                    'IPU mode supports single scale currently')
-            else:
-                raise ValueError(
-                    f'loss_scale should be float, but got {loss_scale} ')
-
-        def after_train_iter(self, runner):
-            pass
-
-else:
-    raise RuntimeError('The IPU mode only supports torch 1.6 and above')
--- a/mmcv/device/ipu/model_wrapper.py
+++ b/mmcv/device/ipu/model_wrapper.py
-# Copyright (c) OpenMMLab. All rights reserved.
-import copy
-import inspect
-from collections import OrderedDict
-from typing import Optional, Union
-
-import poptorch
-import torch
-import torch.nn as nn
-from poptorch import PoplarExecutor, __version__, identity_loss
-from poptorch._args_parser import ArgsParser
-
-from mmcv.runner import auto_fp16
-from .hierarchical_data_manager import HierarchicalDataManager
-from .utils import compare_ndarray, model_sharding, recomputation_checkpoint
-
-
-class DictArgsParser(ArgsParser):
-    """A helper class for handling model input.
-
-    Args:
-        inputs (list): Inputs of model.
-    """
-
-    def __init__(self, inputs):
-        # Combine args and kwargs:
-        self._has_variadic_arguments = True
-        self._varnames = list(inputs.keys())
-        self._defaults = [inspect.Parameter.empty for _ in self._varnames]
-        self._warned_not_contiguous_input = False
-
-
-class WrappedNet(nn.Module):
-    """A net wrapper for model conversion.
-
-    This wrapper will make some changes and add some extra functions to
-    training/inference model.
-
-    Args:
-        model (:obj:`nn.Module`): The model to run.
-        inputs_manager (:obj:`HierarchicalDataManager`): A parser
-            converting inputs from tuple to dictionary.
-        outputs_manager (:obj:`HierarchicalDataManager`): A parser
-            converting outputs from dictionary to tuple.
-        inter_outputs_in_cpu (dict): Specify the features to be
-            recorded.
-        modules_to_record (mmcv.Config, list): Index or name of modules which
-            will be recorded for output. It is necessary to specify output for
-            static graph of model training or inference.
-    """
-
-    def __init__(self,
-                 model,
-                 inputs_manager,
-                 outputs_manager,
-                 inter_outputs_in_cpu,
-                 modules_to_record=None):
-        super().__init__()
-        self.model = model
-        self.inputs_manager = inputs_manager
-        self.outputs_manager = outputs_manager
-        self.training = model.training
-        # Register a hook function to capture the intermediate features
-        # generated by the network to align the outputs between ipu and cpu
-        # Used to confirm whether the implementation of CPU is consistent
-        # with the implementation of IPU
-        self.inter_outputs_in_cpu = inter_outputs_in_cpu
-        if modules_to_record is None:
-            modules_to_record = []
-
-        for idx, (name, module) in enumerate(model.named_modules()):
-            if name in modules_to_record or idx in modules_to_record:
-                features_hook = self.get_input_output_hook(
-                    name, idx, self.inter_outputs_in_cpu)
-                module.register_forward_hook(hook=features_hook)
-
-    def get_input_output_hook(self, name, idx, save_dict):
-
-        def input_output_hook(module, fea_in, fea_out):
-            if isinstance(fea_in, tuple):
-                fea_in = list(fea_in)
-            if isinstance(fea_out, tuple):
-                fea_out = list(fea_out)
-            save_dict[name] = {
-                'fea_in': fea_in,
-                'fea_out': fea_out,
-                'idx': idx
-            }
-            return None
-
-        return input_output_hook
-
-    def forward(self, inputs_tuple):
-        """This function is used to be compiled to ipu, the inputs and outputs
-        need to be tuples, so here we need to restore the input back to a
-        dictionary and convert the output to a tuple."""
-        self.inputs_manager.update_all_tensors(inputs_tuple)
-        kwargs = {**(self.inputs_manager.hierarchical_data)}
-        if self.training:
-            outputs = self.forward_train(kwargs)
-            # tell poptorch which loss will be used finally
-            identity_loss(outputs['loss'], reduction='none')
-        else:
-            outputs = self.forward_eval(kwargs)
-
-        if isinstance(outputs, torch.Tensor):
-            # currently not support single tensor output,
-            # need to wrap it with a dictionary,
-            # use a keyword to identify this case
-            outputs = {'output of WrappedNet: single tensor': outputs}
-
-        # if there are some features need to be record, add extra outputs
-        for name in self.inter_outputs_in_cpu:
-            outputs[name] = self.inter_outputs_in_cpu[name]
-
-        # record all the places of return tensors in the converting stage
-        # while in the real run stage, all the tensor are changed in-place
-        # that means the output can be obtained directly outside this function
-        self.outputs_manager.record_hierarchical_data(outputs)
-        plain_outputs = self.outputs_manager.collect_all_tensors()
-        return plain_outputs
-
-    def forward_train(self, kwargs):
-        optimizer = kwargs.pop('optimizer')
-        outputs = self.train_step(kwargs, optimizer)
-        return outputs
-
-    def train_step(self, data, optimizer=None, **kwargs):
-        """The iteration step during training.
-
-        This method defines an iteration step during training, except for the
-        back propagation and optimizer updating, which are done in an optimizer
-        hook. Note that in some complicated cases or models, the whole process
-        including back propagation and optimizer updating are also defined in
-        this method, such as GAN.
-
-        Args:
-            data (dict): The output of dataloader.
-            optimizer (:obj:`torch.optim.Optimizer`, optional): The
-                optimizer of runner is passed to ``train_step()``. This
-                argument is unused and reserved.
-
-        Returns:
-            dict: Dict of outputs. The following fields are contained.
-                - loss (torch.Tensor): A tensor for back propagation, which \
-                    can be a weighted sum of multiple losses.
-                - log_vars (dict): Dict contains all the variables to be sent \
-                    to the logger.
-                - num_samples (int): Indicates the batch size (when the model \
-                    is DDP, it means the batch size on each GPU), which is \
-                    used for averaging the logs.
-        """
-        losses = self.model(**data)
-        loss, log_vars = self._parse_losses(losses)
-
-        outputs = dict(
-            loss=loss, log_vars=log_vars, num_samples=len(data['img'].data))
-
-        return outputs
-
-    def _parse_losses(self, losses):
-        log_vars = OrderedDict()
-        for loss_name, loss_value in losses.items():
-            if isinstance(loss_value, torch.Tensor):
-                log_vars[loss_name] = loss_value.mean()
-            elif isinstance(loss_value, list):
-                log_vars[loss_name] = sum(loss.mean() for loss in loss_value)
-            elif isinstance(loss_value, dict):
-                for name, value in loss_value.items():
-                    log_vars[name] = value
-            else:
-                raise TypeError(
-                    f'{loss_name} is not a tensor or list of tensors')
-
-        loss = sum(value for key, value in log_vars.items() if 'loss' in key)
-        log_vars['loss'] = loss
-
-        return loss, log_vars
-
-    def forward_eval(self, kwargs):
-        img = kwargs.pop('img')
-        img_metas = kwargs.pop('img_metas', None)
-        return_loss = kwargs.pop('return_loss')
-        assert not return_loss
-        # TODO Temporarily hard-code to close post_process,
-        # otherwise, in the third trace(_check_trace),
-        # post_process will convert output tensor to numpy array automatically,
-        # resulting in _check_trace failure
-        outputs = self.model(
-            img,
-            img_metas=img_metas,
-            return_loss=return_loss,
-            post_process=False)
-        return outputs
-
-
-class MMPoplarExecutor(PoplarExecutor):
-    """An executor for inputs/outputs parsing, model compilation, data
-    alignment and IPU upload/download.
-
-    Args:
-        model (:obj:`nn.Module`): The model to be compiled.
-        logger (:obj:`logging.Logger`): Logger used during running.
-             Defaults to None.
-        training (bool): Model in training mode or eval mode.
-        modules_to_record (mmcv.Config, list): Index or name of modules which
-            will be recorded for output. It is necessary to specify output for
-            static graph of model training or inference.
-        args (argument list): Arguments passed to the `__init__`
-            method of PoplarExecutor.
-        kwargs (keyword arguments): Keyword arguments passed to the `__init__`
-            method of PoplarExecutor.
-    """
-
-    def __init__(self,
-                 model,
-                 logger=None,
-                 training=True,
-                 modules_to_record=None,
-                 *args,
-                 **kwargs):
-        # self.model == self._user_model: input pytorch model
-        # self._model: wrapped model which is used to compile
-        # and update weights, these two models use same weights
-        # wrapped model only accept and output tuple, so
-        # HierarchicalDataManager will convert dictionary
-        # to tuple and convert them back
-        self.inputs_manager = HierarchicalDataManager(logger=logger)
-        self.outputs_manager = HierarchicalDataManager(logger=logger)
-        self.logger = logger
-        # the features calculated by CPU
-        self.inter_outputs_in_cpu = {}
-        # the features calculated by IPU
-        self.inter_outputs_in_ipu = {}
-        if modules_to_record is None:
-            # It is possible that the IPU implementation of some operators
-            # is inconsistent with the expected (CPU), here you can use
-            # this method to confirm whether there is a problem
-            self.compare_with_cpu = False
-        else:
-            self.compare_with_cpu = True
-        # move model.fp16_enabled to self.fp16_enabled,
-        # modify the position where the input is automatically casted to half
-        if getattr(model, 'fp16_enabled', False):
-            model.fp16_enabled = False
-            self.fp16_enabled = True
-        # make torch.jit.trace convert self._model
-        model = WrappedNet(
-            model,
-            self.inputs_manager,
-            self.outputs_manager,
-            self.inter_outputs_in_cpu,
-            modules_to_record=modules_to_record)
-        super().__init__(model, training=training, *args, **kwargs)
-        # overwrite self._args_parser in train_step or val_step
-        self._args_parser = None
-        if training:
-            assert self.training
-        else:
-            assert not self.training
-
-    @property
-    def training(self):
-        # If trying to get the attribute(training) of self,
-        # since the class has no training attribute,
-        # it will automatically look for the training attribute of self.model.
-        # However, the real attribute we want to check is self._training,
-        # self.model.training  and self._training are often inconsistent.
-        # It is not clear whether it is a Poptorch bug or a special design,
-        # temporarily use this function to fix the problem
-        return self._training  # comes from self.model._training
-
-    @auto_fp16(supported_types=(PoplarExecutor, ))
-    def run_model(self, data_dict):
-        # this function is used to parse input_dict
-        # and convert to output_dict
-        if self.isCompiled():
-            self.inputs_manager.record_hierarchical_data(data_dict)
-            inputs_tuple = tuple(self.inputs_manager.collect_all_tensors())
-        else:
-            # get tensors out of data and put them in a tuple
-            self.inputs_manager.record_hierarchical_data(data_dict)
-            inputs_tuple = tuple(self.inputs_manager.collect_all_tensors())
-            # turn logger in data manager off after compilation
-            self.inputs_manager.quick()
-            self.outputs_manager.quick()
-
-        # parser args in the first iter
-        if self._args_parser is None:
-            self._args_parser = DictArgsParser({'args': inputs_tuple})
-
-        # run or convert model
-        # the plain_outputs will be used in converting stage
-        plain_outputs = self(inputs_tuple)
-
-        self.inputs_manager.clean_all_tensors()
-
-        # put list of tensors back to the output dict
-        # according to the same order
-        self.outputs_manager.update_all_tensors(plain_outputs)
-        # get the real output dictionary from self.outputs_manager
-        output_dict = self.outputs_manager.hierarchical_data
-
-        # split output_dict into inter_outputs_in_ipu
-        # and output of the torch model
-        torch_model_output = {}
-        for name in output_dict:
-            if name in self.inter_outputs_in_cpu:
-                self.inter_outputs_in_ipu[name] = output_dict[name]
-            else:
-                torch_model_output[name] = output_dict[name]
-
-        if 'output of WrappedNet: single tensor' in output_dict:
-            assert len(torch_model_output) == 1
-            assert isinstance(
-                torch_model_output['output of WrappedNet: single tensor'],
-                torch.Tensor)
-            torch_model_output = \
-                torch_model_output['output of WrappedNet: single tensor']
-
-        return torch_model_output
-
-    def train_step(self, data, optimizer=None, **kwargs):
-        # arguments from mmcls/models/classifiers/base.py:
-        # BaseClassifier.train_step
-        assert self.training
-        assert len(kwargs) == 0  # TODO, support later if necessary
-
-        # TODO support datacontainer as input
-        # currently, auto_fp16 and HierarchicalDataManager take too much
-        # time on traversing datacontainer
-        data['img_metas'] = None
-        num_samples = len(data['img'].data)
-
-        # TODO we will ignore optimizer because it will not be used in model,
-        # support later if necessary
-        data['optimizer'] = None
-        output_dict = self.run_model(data)
-
-        # outputs contained loss, log_vars, num_samples,
-        # only loss(torch.tensor) has been updated
-        # remove all unchanged vars, left torch.tensor
-        neat_output_dict = {'loss': output_dict['loss']}
-
-        # re-parse outputs, get back log_vars and num_samples
-        loss, log_vars = self.model._parse_losses(neat_output_dict)
-        final_output_dict = dict(
-            loss=loss, log_vars=log_vars, num_samples=num_samples)
-        return final_output_dict
-
-    def eval_call(self, img, img_metas=None, return_loss=True, **kwargs):
-        # arguments from mmdet/models/detectors/base.py:BaseDetector.forward
-        # tmp usssage for eval mode
-        assert not self.training
-        assert len(kwargs) == 0  # TODO, support later if necessary
-        assert not return_loss
-        data = {'img': img, 'img_metas': img_metas, 'return_loss': return_loss}
-
-        output_dict = self.run_model(data)
-
-        return output_dict
-
-    def detachFromDevice(self):
-        if self.isCompiled() and self._is_attached:
-            super().detachFromDevice()
-
-    def attachToDevice(self):
-        if self.isCompiled() and not self._is_attached:
-            super().attachToDevice()
-
-
-class TrainEvalModel:
-    """A class maintaining training MMPoplarExecutor and inference
-    MMPoplarExecutor.
-
-    Args:
-        train_model (:obj:`nn.Module`): The training model to be compiled.
-            ``train_model`` can be None if only executing validation.
-        eval_model (:obj:`nn.Module`): The inference model to be compiled.
-        options (mmcv.Config, dict): Options that will be used to compile
-            and run the model.
-        optimizer (:obj:`torch.optim.Optimizer`, optional): torch
-            optimizer, necessary if in training mode
-        logger (:obj:`logging.Logger`): Logger used during running.
-             Defaults to None.
-        modules_to_record (mmcv.Config, list): Index or name of modules which
-            will be recorded for output. It is necessary to specify output for
-            static graph of model training or inference.
-    """
-
-    def __init__(self,
-                 train_model,
-                 eval_model,
-                 options,
-                 optimizer,
-                 modules_to_record=None,
-                 logger=None):
-        if train_model is None:
-            self._train_executor = None
-            self.training = False
-        else:
-            self._train_executor = get_training_model(
-                train_model,
-                options=options['training'],
-                optimizer=optimizer,
-                logger=logger,
-                modules_to_record=modules_to_record)
-            self.training = True
-        self._eval_executor = get_inference_model(
-            eval_model, options=options['inference'], logger=logger)
-
-    @property
-    def executor(self):
-        if self.training:
-            return self._train_executor
-        else:
-            return self._eval_executor
-
-    def train(self, mode: bool = True):
-        """Sets the module in training mode.
-
-        This has any effect only on certain modules. See documentations of
-        particular modules for details of their behaviors in
-        training/evaluation mode, if they are affected,
-        e.g. :class:`Dropout`, :class:`BatchNorm`, etc.
-
-        Args:
-            mode (bool): whether to set training mode (``True``) or evaluation
-                mode (``False``). Default: ``True``.
-
-        Returns:
-            Module: self
-        """
-        if not isinstance(mode, bool):
-            raise ValueError('training mode is expected to be boolean, '
-                             f'but got {type(mode)}')
-        if self._train_executor is None and mode:
-            raise RuntimeError(
-                'The train_executor is not initialized.'
-                'If you want to initialize train_executor,'
-                'you need to input optimizer when converting pytorch model')
-
-        if mode == self.training:
-            self.model.train(mode)
-            return self
-        else:
-            if self.isCompiled():
-                # copy weights from IPU to cpu before off-load current session
-                self.copyWeightsToHost()
-                # detach the current session before change the mode,
-                # if is training mode and weights are updated,
-                # poptorch will copy weights from IPU to host
-                self.detachFromDevice()
-
-            self.training = mode  # session will changed with mode changing
-            self.model.train(mode)
-
-            # after changing mode, attach the current new session,
-            # and this function will copy weights of model to device
-            self.attachToDevice()
-            return self
-
-    def eval(self):
-        """Sets the module in evaluation mode.
-
-        This has any effect only on certain modules.
-        See documentations of particular modules
-        for details of their behaviors in training/evaluation mode,
-        if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`, etc.
-
-        This is equivalent with :meth:`self.train(False)
-        <nn.Module.train>`.
-
-        See :ref:`locally-disable-grad-doc` for a comparison between
-        `.eval()` and several similar mechanisms that may be confused with it.
-
-        Returns:
-            Module: self
-        """
-        return self.train(False)
-
-    def compare_data_between_ipu_and_cpu(self, inter_outputs_in_cpu,
-                                         inter_outputs_in_ipu):
-        for key, val in inter_outputs_in_cpu.items():
-            is_tensor = isinstance(val['fea_in'], torch.Tensor)
-            fea_in_cpu = val['fea_in']
-            fea_in_cpu_list = [fea_in_cpu] if is_tensor else fea_in_cpu
-            fea_in_ipu = inter_outputs_in_ipu[key]['fea_in']
-            fea_in_ipu_list = [fea_in_ipu] if is_tensor else fea_in_ipu
-
-            is_tensor = isinstance(val['fea_out'], torch.Tensor)
-            fea_out_cpu = val['fea_out']
-            fea_out_cpu_list = [fea_out_cpu] if is_tensor else fea_out_cpu
-            fea_out_ipu = inter_outputs_in_ipu[key]['fea_out']
-            fea_out_ipu_list = [fea_out_ipu] if is_tensor else fea_out_ipu
-
-            print('comparing layer:', key)
-            for idx, (featA, featB) in \
-                    enumerate(zip(fea_in_cpu_list, fea_in_ipu_list)):
-                print('fea_in, tensor ', idx)
-                compare_ndarray(featA.detach().numpy(), featB.detach().numpy())
-            for idx, (featA, featB) in \
-                    enumerate(zip(fea_out_cpu_list, fea_out_ipu_list)):
-                print('fea_out, tensor', idx)
-                compare_ndarray(featA.detach().numpy(), featB.detach().numpy())
-
-    # TODO Unified training and eval interface,
-    # merge train_step(train) and __call__(eval) together
-    def train_step(self, data, optimizer=None, **kwargs):
-        assert self.training, 'not supported train_step on eval mode'
-        inter_outputs_in_cpu = {}
-        if (self._train_executor.isCompiled()
-                and self._train_executor.compare_with_cpu):
-            self.copyWeightsToHost()
-            # run in CPU mode
-            self._train_executor.model.train_step(data, optimizer, **kwargs)
-            inter_outputs_in_cpu = {
-                **(self._train_executor.inter_outputs_in_cpu)
-            }
-        # run in IPU mode
-        result = self._train_executor.train_step(data, optimizer, **kwargs)
-        if (self._train_executor.isCompiled()
-                and self._train_executor.compare_with_cpu
-                and len(inter_outputs_in_cpu) > 0):
-            self.compare_data_between_ipu_and_cpu(
-                inter_outputs_in_cpu,
-                self._train_executor.inter_outputs_in_ipu)
-        return result
-
-    # TODO Unified training and eval interface,
-    # merge train_step(train) and __call__(eval) together
-    def __call__(self, *args, **kwargs):
-        if self.training:
-            raise NotImplementedError('use train_step rather than __call__')
-        else:
-            return self._eval_executor.eval_call(*args, **kwargs)
-
-    def __getattr__(self, attr):
-        return getattr(self.executor, attr)
-
-
-def get_training_model(model: nn.Module,
-                       options: Optional[poptorch.Options] = None,
-                       optimizer: Optional[torch.optim.Optimizer] = None,
-                       logger=None,
-                       modules_to_record=None) -> poptorch.PoplarExecutor:
-    """Create a PopTorch training model from a PyTorch model, running on IPU
-    hardware in training mode.
-
-    Note:
-        PopTorch makes a shallow copy of the model. Changes to the
-        parameters in the returned training model affect the original model
-        and vice versa. However, primitive variable types are not synced: for
-        example calling ``model.train()`` on the original model, which
-        changes the ``training`` bool of the model instance, will not alter the
-        model returned by this function. You may need to call ``model.train()``
-        on your model before you call this function for correct behavior.
-
-    Args:
-        model (:obj:`nn.Module`): The model to run.
-        options (poptorch.Options): Options that will be used to compile
-            and run the model.
-        optimizer (:obj:`torch.optim.Optimizer`, optional): The optimizers
-            to apply during training.
-        logger (:obj:`logging.Logger`): Logger used during running.
-             Defaults to None.
-        modules_to_record (mmcv.Config, list): Index or name of modules which
-            will be recorded for output. It is necessary to specify output for
-            static graph of model training or inference.
-
-    Returns:
-        The :class:`poptorch.PoplarExecutor` wrapper to use in place
-        of ``model``.
-    """
-    # Create a copy of the original model in case it needs to be wrapped
-    maybe_wrapped_model = copy.copy(model)
-
-    return MMPoplarExecutor(
-        model=maybe_wrapped_model,
-        logger=logger,
-        options=options,
-        training=True,
-        optimizer=optimizer,
-        user_model=model,
-        modules_to_record=modules_to_record,
-        poptorch_version=__version__)
-
-
-def get_inference_model(model: Union[nn.Module, poptorch.PoplarExecutor],
-                        options: Optional[poptorch.Options] = None,
-                        logger=None) -> poptorch.PoplarExecutor:
-    """Create a PopTorch inference model from a PyTorch model, running on IPU
-    hardware in inference mode.
-
-    Note:
-        PopTorch makes a shallow copy of the model. Changes to the
-        parameters in the returned inference model affect the original model
-        and vice versa. However, primitive variable types are not synced: for
-        example calling ``model.eval()`` on the original model will not alter
-        the model returned by this function. You may need to call
-        ``model.eval()`` on your model before you call this function for
-        correct behavior.
-
-    Args:
-        model (:obj:`nn.Module`): The model to run.
-        options (poptorch.Options): Options that will be used to compile
-            and run the model.
-        logger (:obj:`logging.Logger`): Logger used during running.
-             Defaults to None.
-
-    Returns:
-        The :class:`poptorch.PoplarExecutor` wrapper to use in place of
-        ``model``.
-    """
-
-    return MMPoplarExecutor(
-        model=copy.copy(model),
-        logger=logger,
-        options=options,
-        training=False,
-        poptorch_version=__version__)
-
-
-def ipu_model_wrapper(model,
-                      options,
-                      optimizer=None,
-                      logger=None,
-                      modules_to_record=None,
-                      ipu_model_cfg=None,
-                      fp16_cfg=None):
-    """Convert torch model to IPU model.
-
-    Args:
-        model (nn.Module): The target model to be converted.
-        options (dict[str, poptorch.Options]): IPU options, generated
-            by :func:`cfg2options`.
-        optimizer (:obj:`torch.optim.Optimizer`, optional): torch
-            optimizer, necessary if in training mode
-        logger (:obj:`logging.Logger`): Logger used during training.
-        modules_to_record (mmcv.Config, list): Index or name of modules which
-            will be recorded for output. It is necessary to specify output for
-            static graph of model training or inference.
-        ipu_model_cfg (dict): A dictionary contains train_split_edges and
-            train_ckpt_nodes, See details in :func:`model_sharding` and
-            :func:`recomputation_checkpoint` functions.
-        fp16_cfg (dict): Config for IPU fp16 training. Currently supports
-            configs: `loss_scale`, `velocity_accum_type` and `accum_type`.
-            See details in
-            https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/index.html
-
-    Returns:
-        TrainEvalModel: IPU wrapped model.
-    """
-    if ipu_model_cfg is None:
-        ipu_model_cfg = {}
-    training = model.training if optimizer is not None else False
-    # set mixed-precision
-    if fp16_cfg is not None:
-        from mmcv.runner import wrap_fp16_model
-        loss_scale = fp16_cfg['loss_scale']
-        wrap_fp16_model(model)
-        model.half()
-        # TODO tmp ussage to set loss scaling for torch original optimizer
-        if optimizer is not None:
-            optimizer.loss_scaling = loss_scale
-            if fp16_cfg.get('velocity_accum_type', False):
-                if fp16_cfg['velocity_accum_type'] == 'half':
-                    optimizer.velocity_accum_type = torch.half
-                else:
-                    optimizer.velocity_accum_type = torch.float32
-            if fp16_cfg.get('accum_type', False):
-                if fp16_cfg['accum_type'] == 'half':
-                    optimizer.accum_type = torch.half
-                else:
-                    optimizer.accum_type = torch.float32
-        # TODO support feature alignment for fp16
-        if modules_to_record is not None:
-            raise NotImplementedError(
-                'Feature alignment for fp16 is not implemented')
-
-    # set model partition
-    if optimizer is None:
-        train_model = None
-    else:
-        # split model into multi-IPUs if specified
-        train_model = model_sharding(
-            copy.copy(model).train(),
-            ipu_model_cfg.get('train_split_edges', []))
-
-        recomputation_checkpoint(train_model,
-                                 ipu_model_cfg.get('train_ckpt_nodes', []))
-
-        # TODO support feature alignment for gradient accumulation mode
-        gradient_accumulation = \
-            getattr(options['training'].Training, 'gradient_accumulation', 1)
-        if gradient_accumulation > 1:
-            assert modules_to_record is None, \
-                'Feature alignment for grad-accumulation mode not implemented'
-
-        # TODO support feature alignment for multi-replica mode
-        replication_factor = \
-            getattr(options['training'], 'replication_factor', 1)
-        if replication_factor > 1:
-            assert modules_to_record is None, \
-                'Feature alignment for multi-replica mode not implemented'
-
-    # TODO supports different model partitions between train and eval mode
-    assert len(ipu_model_cfg.get('eval_split_edges', [])) == 0,\
-        'Currently, BeginBlock can only be used once on the same model'
-    eval_model = copy.copy(model).eval()
-
-    # wrap model for compilation
-    model = TrainEvalModel(
-        train_model,
-        eval_model,
-        options=options,
-        optimizer=optimizer,
-        logger=logger,
-        modules_to_record=modules_to_record)
-    model.train(training)
-    return model
--- a/mmcv/device/ipu/runner.py
+++ b/mmcv/device/ipu/runner.py
-# Copyright (c) OpenMMLab. All rights reserved.
-
-from mmcv.runner import (HOOKS, RUNNERS, BaseRunner, EpochBasedRunner,
-                         IterBasedRunner)
-from mmcv.utils import IS_IPU_AVAILABLE
-
-if IS_IPU_AVAILABLE:
-    from .dataloader import IPUDataLoader
-    from .hook_wrapper import (IPUFp16OptimizerHook, wrap_lr_updater_hook,
-                               wrap_optimizer_hook)
-    from .model_wrapper import ipu_model_wrapper
-    from .utils import build_from_cfg_with_wrapper, cfg2options
-
-
-class IPUBaseRunner(BaseRunner):
-    """A base runner for IPU.
-
-    This runner has some extra processes for IPU which are shown below:
-
-    1. Parse options for IPU
-    2. wrap pytorch model for IPU
-    3. Raise errors while encountering illegal usage
-    4. Input IPU options and initialize dataloader if finding an instance
-       of IPUDataLoader
-
-    Args:
-        model (:obj:`nn.Module`): The model to run.
-        options_cfg (mmcv.Config, dict): Options that will be used to compile
-            and run the model.
-        modules_to_record (mmcv.Config, list): Index or name of modules which
-            will be recorded for output. It is necessary to specify output for
-            static graph of model training or inference.
-        ipu_model_cfg (mmcv.Config, dict): Config of model partition and
-            recomputing checkpoint
-        fp16_cfg (mmcv.Config): Config for fp16 training.
-        batch_processor (callable): A callable method that process a data
-            batch. Should be None for IPU runner
-        kwargs (Dict[str, Any], optional): Keyword arguments will be passed to
-        ``base_runner.BaseRunner``.
-    """
-
-    def __init__(self,
-                 model,
-                 options_cfg=None,
-                 modules_to_record=None,
-                 ipu_model_cfg=None,
-                 fp16_cfg=None,
-                 batch_processor=None,
-                 **kwargs):
-        assert hasattr(model, 'train_step') and batch_processor is None,\
-            'only support model with train_step'
-
-        if options_cfg is None:
-            options_cfg = {}
-        # call BaseRunner.__init__() here
-        super().__init__(model, **kwargs)
-
-        # process options of ipu
-        if IS_IPU_AVAILABLE:
-            self.options = cfg2options(options_cfg)
-            self.model = ipu_model_wrapper(
-                self.model,
-                self.options,
-                self.optimizer,
-                self.logger,
-                modules_to_record=modules_to_record,
-                ipu_model_cfg=ipu_model_cfg,
-                fp16_cfg=fp16_cfg)
-        else:
-            raise NotImplementedError('cpu mode on IPURunner is not supported')
-
-    def register_lr_hook(self, lr_config):
-        if lr_config is None:
-            return
-        assert isinstance(lr_config, dict)
-        assert 'policy' in lr_config
-        policy_type = lr_config.pop('policy')
-        # If the type of policy is all in lower case,
-        # e.g., 'cyclic', then its first letter will be capitalized,
-        # e.g., to be 'Cyclic'.
-        # This is for the convenient usage of Lr updater.
-        # Since this is not applicable for `
-        # CosineAnnealingLrUpdater`, the string will not be changed
-        # if it contains capital letters.
-        if policy_type == policy_type.lower():
-            policy_type = policy_type.title()
-        hook_type = policy_type + 'LrUpdaterHook'
-        lr_config['type'] = hook_type
-        hook = build_from_cfg_with_wrapper(lr_config, HOOKS,
-                                           wrap_lr_updater_hook)
-        self.register_hook(hook, priority='VERY_HIGH')
-
-    def register_optimizer_hook(self, optimizer_config):
-        if optimizer_config is None:
-            return
-        assert isinstance(optimizer_config, (dict, IPUFp16OptimizerHook))
-        if isinstance(optimizer_config, dict):
-            optimizer_config.setdefault('type', 'OptimizerHook')
-            hook = build_from_cfg_with_wrapper(optimizer_config, HOOKS,
-                                               wrap_optimizer_hook)
-        else:
-            hook = optimizer_config
-        self.register_hook(hook, priority='ABOVE_NORMAL')
-
-    def run(self, data_loaders, workflow, *args, **kwargs):
-        for i, flow in enumerate(workflow):
-            mode, _ = flow
-            # initialize IPU dataloader if not initialized
-            assert isinstance(data_loaders[i], IPUDataLoader),\
-                'IPU runner can only work with `IPUDataLoader`'
-            data_loaders[i].init(options=self.get_options(mode))
-
-        super().run(data_loaders, workflow, *args, **kwargs)
-
-    def get_options(self, mode):
-        if mode == 'train':
-            return self.options['training']
-        elif mode == 'val':
-            return self.options['inference']
-        else:
-            raise ValueError(f'mode should be train or val but got {mode}')
-
-
-@RUNNERS.register_module()
-class IPUEpochBasedRunner(IPUBaseRunner, EpochBasedRunner):
-    """Epoch-based Runner for IPU.
-
-    The Inheritance order(MRO) is: IPUEpochBasedRunner -> IPUBaseRunner ->
-    EpochBasedRunner -> BaseRunner This runner train models epoch by epoch.
-    """
-    pass
-
-
-@RUNNERS.register_module()
-class IPUIterBasedRunner(IPUBaseRunner, IterBasedRunner):
-    """Iteration-based Runner for IPU.
-
-    The Inheritance order(MRO) is: IPUIterBasedRunner -> IPUBaseRunner ->
-    IterBasedRunner -> BaseRunner This runner train models iteration by
-    iteration.
-    """
-    pass