Unverified Commit 9185eee8 authored by Zaida Zhou's avatar Zaida Zhou Committed by GitHub
Browse files

Remove runner, parallel, engine and device (#2216)

* Remove runner, parallel, engine and device

* fix format

* remove outdated docs
parent 19a02415
include requirements/runtime.txt
include mmcv/model_zoo/open_mmlab.json mmcv/model_zoo/deprecated.json mmcv/model_zoo/mmcls.json mmcv/model_zoo/torchvision_0.12.json
include mmcv/ops/csrc/common/cuda/*.cuh mmcv/ops/csrc/common/cuda/*.hpp mmcv/ops/csrc/common/*.hpp
include mmcv/ops/csrc/pytorch/*.cpp mmcv/ops/csrc/pytorch/cuda/*.cu mmcv/ops/csrc/pytorch/cuda/*.cpp mmcv/ops/csrc/pytorch/cpu/*.cpp
include mmcv/ops/csrc/parrots/*.h mmcv/ops/csrc/parrots/*.cpp
......
......@@ -28,16 +28,6 @@ cnn
.. automodule:: mmcv.cnn
:members:
runner
------
.. automodule:: mmcv.runner
:members:
engine
------
.. automodule:: mmcv.engine
:members:
ops
------
.. automodule:: mmcv.ops
......
......@@ -17,7 +17,6 @@ You can switch between Chinese and English documents in the lower-left corner of
understand_mmcv/config.md
understand_mmcv/registry.md
understand_mmcv/runner.md
understand_mmcv/data_process.md
understand_mmcv/visualization.md
understand_mmcv/cnn.md
......
## Runner
The runner class is designed to manage the training. It eases the training process with less code demanded from users while staying flexible and configurable. The main features are as listed:
- Support `EpochBasedRunner` and `IterBasedRunner` for different scenarios. Implementing customized runners is also allowed to meet customized needs.
- Support customized workflow to allow switching between different modes while training. Currently, supported modes are train and val.
- Enable extensibility through various hooks, including hooks defined in MMCV and customized ones.
### EpochBasedRunner
As its name indicates, workflow in `EpochBasedRunner` should be set based on epochs. For example, \[('train', 2), ('val', 1)\] means running 2 epochs for training and 1 epoch for validation, iteratively. And each epoch may contain multiple iterations. Currently, MMDetection uses `EpochBasedRunner` by default.
Let's take a look at its core logic:
```python
# the condition to stop training
while curr_epoch < max_epochs:
# traverse the workflow.
# e.g. workflow = [('train', 2), ('val', 1)]
for i, flow in enumerate(workflow):
# mode(e.g. train) determines which function to run
mode, epochs = flow
# epoch_runner will be either self.train() or self.val()
epoch_runner = getattr(self, mode)
# execute the corresponding function
for _ in range(epochs):
epoch_runner(data_loaders[i], **kwargs)
```
Currently, we support 2 modes: train and val. Let's take a train function for example and have a look at its core logic:
```python
# Currently, epoch_runner could be either train or val
def train(self, data_loader, **kwargs):
# traverse the dataset and get batch data for 1 epoch
for i, data_batch in enumerate(data_loader):
# it will execute all before_train_iter function in the hooks registered. You may want to watch out for the order.
self.call_hook('before_train_iter')
# set train_mode as False in val function
self.run_iter(data_batch, train_mode=True, **kwargs)
self.call_hook('after_train_iter')
self.call_hook('after_train_epoch')
```
### IterBasedRunner
Different from `EpochBasedRunner`, workflow in `IterBasedRunner` should be set based on iterations. For example, \[('train', 2), ('val', 1)\] means running 2 iters for training and 1 iter for validation, iteratively. Currently, MMSegmentation uses `IterBasedRunner` by default.
Let's take a look at its core logic:
```python
# Although we set workflow by iters here, we might also need info on the epochs in some using cases. That can be provided by IterLoader.
iter_loaders = [IterLoader(x) for x in data_loaders]
# the condition to stop training
while curr_iter < max_iters:
# traverse the workflow.
# e.g. workflow = [('train', 2), ('val', 1)]
for i, flow in enumerate(workflow):
# mode(e.g. train) determines which function to run
mode, iters = flow
# iter_runner will be either self.train() or self.val()
iter_runner = getattr(self, mode)
# execute the corresponding function
for _ in range(iters):
iter_runner(iter_loaders[i], **kwargs)
```
Currently, we support 2 modes: train and val. Let's take a val function for example and have a look at its core logic:
```python
# Currently, iter_runner could be either train or val
def val(self, data_loader, **kwargs):
# get batch data for 1 iter
data_batch = next(data_loader)
# it will execute all before_val_iter function in the hooks registered. You may want to watch out for the order.
self.call_hook('before_val_iter')
outputs = self.model.val_step(data_batch, self.optimizer, **kwargs)
self.outputs = outputs
self.call_hook('after_val_iter')
```
Other than the basic functionalities explained above, `EpochBasedRunner` and `IterBasedRunner` provide methods such as `resume`, `save_checkpoint` and `register_hook`. In case you are not familiar with the term Hook mentioned earlier, we will also provide a tutorial about it.(coming soon...) Essentially, a hook is functionality to alter or augment the code behaviors through predefined api. It allows users to have their own code called under certain circumstances. It makes code extensible in a non-intrusive manner.
### A Simple Example
We will walk you through the usage of runner with a classification task. The following code only contains essential steps for demonstration purposes. The following steps are necessary for any training tasks.
**(1) Initialize dataloader, model, optimizer, etc.**
```python
# initialize model
model=...
# initialize optimizer, typically, we set: cfg.optimizer = dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001)
optimizer = build_optimizer(model, cfg.optimizer)
# initialize the dataloader corresponding to the workflow(train/val)
data_loaders = [
build_dataloader(
ds,
cfg.data.samples_per_gpu,
cfg.data.workers_per_gpu,
...) for ds in dataset
]
```
**(2) Initialize runner**
```python
runner = build_runner(
# cfg.runner is typically set as:
# runner = dict(type='EpochBasedRunner', max_epochs=200)
cfg.runner,
default_args=dict(
model=model,
batch_processor=None,
optimizer=optimizer,
logger=logger))
```
**(3) Register training hooks and customized hooks.**
```python
# register default hooks necessary for training
runner.register_training_hooks(
# configs of learning rate, it is typically set as:
# lr_config = dict(policy='step', step=[100, 150])
cfg.lr_config,
# configuration of optimizer, e.g. grad_clip
optimizer_config,
# configuration of saving checkpoints, it is typically set as:
# checkpoint_config = dict(interval=1), saving checkpoints every epochs
cfg.checkpoint_config,
# configuration of logs
cfg.log_config,
...)
# register customized hooks
# say we want to enable ema, then we could set custom_hooks=[dict(type='EMAHook')]
if cfg.get('custom_hooks', None):
custom_hooks = cfg.custom_hooks
for hook_cfg in cfg.custom_hooks:
hook_cfg = hook_cfg.copy()
priority = hook_cfg.pop('priority', 'NORMAL')
hook = build_from_cfg(hook_cfg, HOOKS)
runner.register_hook(hook, priority=priority)
```
Then, we can use `resume` or `load_checkpoint` to load existing weights.
**(4) Start training**
```python
# workflow is typically set as: workflow = [('train', 1)]
# here the training begins.
runner.run(data_loaders, cfg.workflow)
```
Let's take `EpochBasedRunner` for example and go a little bit into details about setting workflow:
- Say we only want to put train in the workflow, then we can set: workflow = \[('train', 1)\]. The runner will only execute train iteratively in this case.
- Say we want to put both train and val in the workflow, then we can set: workflow = \[('train', 3), ('val',1)\]. The runner will first execute train for 3 epochs and then switch to val mode and execute val for 1 epoch. The workflow will be repeated until the current epoch hit the max_epochs.
- Workflow is highly flexible. Therefore, you can set workflow = \[('val', 1), ('train',1)\] if you would like the runner to validate first and train after.
The code we demonstrated above is already in `train.py` in MM repositories. Simply modify the corresponding keys in the configuration files and the script will execute the expected workflow automatically.
......@@ -28,16 +28,6 @@ cnn
.. automodule:: mmcv.cnn
:members:
runner
------
.. automodule:: mmcv.runner
:members:
engine
------
.. automodule:: mmcv.engine
:members:
ops
------
.. automodule:: mmcv.ops
......
......@@ -17,7 +17,6 @@
understand_mmcv/config.md
understand_mmcv/registry.md
understand_mmcv/runner.md
understand_mmcv/data_process.md
understand_mmcv/data_transform.md
understand_mmcv/visualization.md
......
## 执行器
执行器模块负责模型训练过程调度,主要目的是让用户使用更少的代码以及灵活可配置方式开启训练。其具备如下核心特性:
- 支持以 `EpochBasedRunner``IterBasedRunner` 为单位的迭代模式以满足不同场景
- 支持定制工作流以满足训练过程中各状态自由切换,目前支持训练和验证两个工作流。工作流可以简单理解为一个完成的训练和验证迭代过程。
- 配合各类默认和自定义 Hook,对外提供了灵活扩展能力
### EpochBasedRunner
顾名思义,`EpochBasedRunner` 是指以 epoch 为周期的工作流,例如设置 workflow = \[('train', 2), ('val', 1)\] 表示循环迭代地训练 2 个 epoch,然后验证 1 个 epoch。MMDetection 目标检测框架默认采用的是 `EpochBasedRunner`
其抽象逻辑如下所示:
```python
# 训练终止条件
while curr_epoch < max_epochs:
# 遍历用户设置的工作流,例如 workflow = [('train', 2),('val', 1)]
for i, flow in enumerate(workflow):
# mode 是工作流函数,例如 train, epochs 是迭代次数
mode, epochs = flow
# 要么调用 self.train(),要么调用 self.val()
epoch_runner = getattr(self, mode)
# 运行对应工作流函数
for _ in range(epochs):
epoch_runner(data_loaders[i], **kwargs)
```
目前支持训练和验证两个工作流,以训练函数为例,其抽象逻辑是:
```python
# epoch_runner 目前可以是 train 或者 val
def train(self, data_loader, **kwargs):
# 遍历 dataset,共返回一个 epoch 的 batch 数据
for i, data_batch in enumerate(data_loader):
self.call_hook('before_train_iter')
# 验证时候 train_mode=False
self.run_iter(data_batch, train_mode=True, **kwargs)
self.call_hook('after_train_iter')
self.call_hook('after_train_epoch')
```
### IterBasedRunner
不同于 `EpochBasedRunner``IterBasedRunner` 是指以 iter 为周期的工作流,例如设置 workflow = \[('train', 2), ('val', 1)\] 表示循环迭代的训练 2 个 iter,然后验证 1 个 iter,MMSegmentation 语义分割框架默认采用的是 `IterBasedRunner`
其抽象逻辑如下所示:
```python
# 虽然是 iter 单位,但是某些场合需要 epoch 信息,由 IterLoader 提供
iter_loaders = [IterLoader(x) for x in data_loaders]
# 训练终止条件
while curr_iter < max_iters:
# 遍历用户设置的工作流,例如 workflow = [('train', 2), ('val', 1)]
for i, flow in enumerate(workflow):
# mode 是工作流函数,例如 train, iters 是迭代次数
mode, iters = flow
# 要么调用 self.train(),要么调用 self.val()
iter_runner = getattr(self, mode)
# 运行对应工作流函数
for _ in range(iters):
iter_runner(iter_loaders[i], **kwargs)
```
目前支持训练和验证两个工作流,以验证函数为例,其抽象逻辑是:
```python
# iter_runner 目前可以是 train 或者 val
def val(self, data_loader, **kwargs):
# 获取 batch 数据,用于一次迭代
data_batch = next(data_loader)
self.call_hook('before_val_iter')
outputs = self.model.val_step(data_batch, self.optimizer, **kwargs)
self.outputs = outputs
self.call_hook('after_val_iter')
```
除了上述基础功能外,`EpochBasedRunner``IterBasedRunner` 还提供了 resume 、 save_checkpoint 和注册 hook 功能。
### 一个简单例子
以最常用的分类任务为例详细说明 `runner` 的使用方法。 开启任何一个训练任务,都需要包括如下步骤:
**(1) dataloader、model 和优化器等类初始化**
```python
# 模型类初始化
model=...
# 优化器类初始化,典型值 cfg.optimizer = dict(type='SGD', lr=0.1, momentum=0.9, weight_decay=0.0001)
optimizer = build_optimizer(model, cfg.optimizer)
# 工作流对应的 dataloader 初始化
data_loaders = [
build_dataloader(
ds,
cfg.data.samples_per_gpu,
cfg.data.workers_per_gpu,
...) for ds in dataset
]
```
**(2) runner 类初始化**
```python
runner = build_runner(
# cfg.runner 典型配置为
# runner = dict(type='EpochBasedRunner', max_epochs=200)
cfg.runner,
default_args=dict(
model=model,
batch_processor=None,
optimizer=optimizer,
logger=logger))
```
**(3) 注册默认训练所必须的 hook,和用户自定义 hook**
```python
# 注册定制必需的 hook
runner.register_training_hooks(
# lr相关配置,典型为
# lr_config = dict(policy='step', step=[100, 150])
cfg.lr_config,
# 优化相关配置,例如 grad_clip 等
optimizer_config,
# 权重保存相关配置,典型为
# checkpoint_config = dict(interval=1),每个单位都保存权重
cfg.checkpoint_config,
# 日志相关配置
cfg.log_config,
...)
# 注册用户自定义 hook
# 例如想使用 ema 功能,则可以设置 custom_hooks=[dict(type='EMAHook')]
if cfg.get('custom_hooks', None):
custom_hooks = cfg.custom_hooks
for hook_cfg in cfg.custom_hooks:
hook_cfg = hook_cfg.copy()
priority = hook_cfg.pop('priority', 'NORMAL')
hook = build_from_cfg(hook_cfg, HOOKS)
runner.register_hook(hook, priority=priority)
```
然后可以进行 resume 或者 load_checkpoint 对权重进行加载。
**(4) 开启训练流**
```python
# workflow 典型为 workflow = [('train', 1)]
# 此时就真正开启了训练
runner.run(data_loaders, cfg.workflow)
```
关于 workflow 设置,以 `EpochBasedRunner` 为例,详情如下:
- 假设只想运行训练工作流,则可以设置 workflow = \[('train', 1)\],表示只进行迭代训练
- 假设想运行训练和验证工作流,则可以设置 workflow = \[('train', 3), ('val', 1)\],表示先训练 3 个 epoch ,然后切换到 val 工作流,运行 1 个 epoch,然后循环,直到训练 epoch 次数达到指定值
- 工作流设置还自由定制,例如你可以先验证再训练 workflow = \[('val', 1), ('train', 1)\]
上述代码都已经封装到了各个代码库的 train.py 中,用户只需要设置相应的配置即可,上述流程会自动运行。
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10
from mmcv.parallel import MMDataParallel
from mmcv.runner import EpochBasedRunner
from mmcv.utils import get_logger
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
self.loss_fn = nn.CrossEntropyLoss()
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def train_step(self, data, optimizer):
images, labels = data
predicts = self(images) # -> self.__call__() -> self.forward()
loss = self.loss_fn(predicts, labels)
return {'loss': loss}
if __name__ == '__main__':
model = Model()
if torch.cuda.is_available():
# only use gpu:0 to train
# Solved issue https://github.com/open-mmlab/mmcv/issues/1470
model = MMDataParallel(model.cuda(), device_ids=[0])
# dataset and dataloader
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
trainset = CIFAR10(
root='data', train=True, download=True, transform=transform)
trainloader = DataLoader(
trainset, batch_size=128, shuffle=True, num_workers=2)
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
logger = get_logger('mmcv')
# runner is a scheduler to manage the training
runner = EpochBasedRunner(
model,
optimizer=optimizer,
work_dir='./work_dir',
logger=logger,
max_epochs=4)
# learning rate scheduler config
lr_config = dict(policy='step', step=[2, 3])
# configuration of optimizer
optimizer_config = dict(grad_clip=None)
# configuration of saving checkpoints periodically
checkpoint_config = dict(interval=1)
# save log periodically and multiple hooks can be used simultaneously
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
# register hooks to runner and those hooks will be invoked automatically
runner.register_training_hooks(
lr_config=lr_config,
optimizer_config=optimizer_config,
checkpoint_config=checkpoint_config,
log_config=log_config)
runner.run([trainloader], [('train', 1)])
......@@ -10,7 +10,4 @@ from .visualization import *
# The following modules are not imported to this level, so mmcv may be used
# without PyTorch.
# - runner
# - parallel
# - op
# - device
......@@ -4,6 +4,7 @@ from typing import Optional
import torch
import torch.nn as nn
from mmengine.runner import load_checkpoint
class AlexNet(nn.Module):
......@@ -45,7 +46,6 @@ class AlexNet(nn.Module):
def init_weights(self, pretrained: Optional[str] = None) -> None:
if isinstance(pretrained, str):
logger = logging.getLogger()
from ..runner import load_checkpoint
load_checkpoint(self, pretrained, strict=False, logger=logger)
elif pretrained is None:
# use default initializer
......
......@@ -5,6 +5,7 @@ from typing import Optional, Sequence, Tuple, Union
import torch.nn as nn
import torch.utils.checkpoint as cp
from mmengine.model.utils import constant_init, kaiming_init
from mmengine.runner import load_checkpoint
from torch import Tensor
......@@ -270,7 +271,6 @@ class ResNet(nn.Module):
def init_weights(self, pretrained: Optional[str] = None) -> None:
if isinstance(pretrained, str):
logger = logging.getLogger()
from ..runner import load_checkpoint
load_checkpoint(self, pretrained, strict=False, logger=logger)
elif pretrained is None:
for m in self.modules():
......
......@@ -4,6 +4,7 @@ from typing import List, Optional, Sequence, Tuple, Union
import torch.nn as nn
from mmengine.model.utils import constant_init, kaiming_init, normal_init
from mmengine.runner import load_checkpoint
from torch import Tensor
......@@ -126,7 +127,6 @@ class VGG(nn.Module):
def init_weights(self, pretrained: Optional[str] = None) -> None:
if isinstance(pretrained, str):
logger = logging.getLogger()
from ..runner import load_checkpoint
load_checkpoint(self, pretrained, strict=False, logger=logger)
elif pretrained is None:
for m in self.modules():
......
# Copyright (c) OpenMMLab. All rights reserved.
from . import ipu, mlu, mps
from .scatter_gather import scatter, scatter_kwargs
from .utils import get_device
__all__ = ['mlu', 'ipu', 'mps', 'get_device', 'scatter', 'scatter_kwargs']
# Copyright (c) OpenMMLab. All rights reserved.
from typing import List, Union
import torch
from mmcv.utils import deprecated_api_warning
from .utils import get_device
def scatter(input: Union[List, torch.Tensor], devices: List) -> List:
"""scatter copies tensor to devices directly."""
current_device = get_device()
if isinstance(input, list):
outputs = [scatter(_input, devices) for _input in input]
return outputs
elif isinstance(input, torch.Tensor):
output = input.contiguous()
return output.to(current_device) if devices != [-1] else output
else:
raise Exception(f'Unknown type {type(input)}.')
class Scatter:
@staticmethod
@deprecated_api_warning({'target_mlus': 'target_devices'},
cls_name='Scatter')
def forward(target_devices, input):
outputs = scatter(input, target_devices)
return tuple(outputs) if isinstance(outputs, list) else (outputs, )
# Copyright (c) OpenMMLab. All rights reserved.
from mmcv.utils import IS_IPU_AVAILABLE
if IS_IPU_AVAILABLE:
from .dataloader import IPUDataLoader
from .hook_wrapper import IPUFp16OptimizerHook
from .model_wrapper import ipu_model_wrapper
from .runner import IPUBaseRunner, IPUEpochBasedRunner, IPUIterBasedRunner
from .utils import cfg2options
__all__ = [
'cfg2options', 'ipu_model_wrapper', 'IPUFp16OptimizerHook',
'IPUDataLoader', 'IPUBaseRunner', 'IPUEpochBasedRunner',
'IPUIterBasedRunner'
]
# Copyright (c) OpenMMLab. All rights reserved.
from collections.abc import Mapping, Sequence
from functools import partial
import poptorch
from torch.utils.data.dataloader import default_collate
from mmcv.parallel import DataContainer
def collate(batch, samples_per_gpu=1):
"""Put each data field into a tensor/DataContainer with outer dimension
batch size.
TODO support for
:type:`~mmcv.parallel.DataContainer`. Currently, it will be ignored.
There are 3 cases.
1. cpu_only = True, e.g., meta data.
2. cpu_only = False, stack = True, e.g., images tensors.
3. cpu_only = False, stack = False, e.g., gt bboxes.
"""
if not isinstance(batch, Sequence):
raise TypeError(
f'`batch` should be a sequence, but got {type(batch)}.')
if isinstance(batch[0], DataContainer):
# TODO `DataContainer` will be supported in the future.
raise TypeError('DataContainer is not supported in ipu data loader.')
elif isinstance(batch[0], Sequence):
transposed = zip(*batch)
collated_batch = []
for samples in transposed:
if not isinstance(samples[0], DataContainer):
# At present, we will skip the processing of datacontainer,
# which will reduce the performance of IPU DataLoder
collated_batch.append(collate(samples, samples_per_gpu))
return collated_batch
elif isinstance(batch[0], Mapping):
collated_batch = {}
for key in batch[0]:
if not isinstance(batch[0][key], DataContainer):
# At present, we will skip the processing of datacontainer,
# which will reduce the performance of IPU DataLoder
collated_batch[key] = collate([d[key] for d in batch])
return collated_batch
else:
return default_collate(batch)
class IPUDataLoader(poptorch.DataLoader):
"""Thin wrapper of `torch.utils.data.DataLoader`.
Compared with the pytorch DataLoder, this DataLoder changes the way of
calculation of batch size and adds the AsynchronousDataAccessor to
load and release data faster in cpu mode.
If this data loader is used in a distributed execution environment, it will
ensure that each process uses a different subset of the dataset, providing
you first call ``options.randomSeed(N)`` with an integer N which is the
same across all hosts.
Args:
dataset (torch.utils.data.Dataset): The dataset to get the data from.
options (poptorch.Options): Options that will be used to compile
and run the model.
batch_size (int, optional): This is the batch size in the conventional
sense of being the size that runs through an operation in the model
at any given time.
shuffle (bool, optional): set to ``True`` to have the data reshuffled
at every epoch (default: ``False``).
num_workers (int, optional): how many subprocesses to use for data
loading. ``0`` means that the data will be loaded in the main
process. (default: ``0``)
drop_last (bool, optional): If True and the number of elements in the
dataset is not a multiple of the combined batch size then the
incomplete batch at the end will be dropped.
persistent_workers (bool, optional): Re-use workers between
iterations if True.
auto_distributed_partitioning (bool, optional): If True, partitions the
dataset for distributed execution automatically. Otherwise, it is
assumed that partitioning has been handled manually.
mode (poptorch.DataLoaderMode, optional): If `DataLoaderMode.Async`,
uses an :py:class:`~poptorch.AsynchronousDataAccessor` to access
the dataset. If `DataLoaderMode.Sync`, accesses the dataset
synchronously.
async_options (Dict[str, Any], optional): Options to pass to
:py:class:`~poptorch.AsynchronousDataAccessor`.
rebatched_worker_size (int, optional): When using AsyncRebatched: batch
size of the tensors loaded by the workers.
Default to the combined batch size.
If specified the ``rebatched_worker_size`` must be less than
or equal to the combined batch size.
kwargs (Dict[str, Any], optional): Other options to pass to PyTorch's
``DataLoader`` constructor.
"""
def __init__(self,
dataset,
options,
batch_size=1,
shuffle=False,
num_workers=0,
drop_last=True,
persistent_workers=True,
auto_distributed_partitioning=True,
mode='sync',
async_options=None,
rebatched_worker_size=None,
**kwargs):
"""Lazy init:
In many frameworks, the dataloader will be constructed before the
initialization of the ipu options, so the lazy init method is used
here, and the real initialization will not be done until the dataloader
needs to be used and the options are input.
"""
# lazy init: sometimes, we cannot get IPU options when build data
# loader
self.kwargs = {
'dataset': dataset,
'batch_size': batch_size,
'shuffle': shuffle,
'num_workers': num_workers,
'drop_last': drop_last,
'persistent_workers': persistent_workers,
'auto_distributed_partitioning': auto_distributed_partitioning,
'mode': mode,
'collate_fn': partial(collate, samples_per_gpu=batch_size),
'async_options': async_options,
'rebatched_worker_size': rebatched_worker_size,
**kwargs
}
self.dataset = dataset
self.initialized = False
if options:
self.init(options=options)
def init(self, options, **kwargs):
if not self.initialized:
kwargs = {**self.kwargs, **kwargs, 'options': options}
if kwargs['mode'] == 'sync':
kwargs['mode'] = poptorch.DataLoaderMode.Sync
elif kwargs['mode'] == 'async':
kwargs['mode'] = poptorch.DataLoaderMode.AsyncRebatched
if kwargs['async_options'] is None:
kwargs['async_options'] = {
'load_indefinitely': True,
'buffer_size': 8
}
if kwargs['rebatched_worker_size'] is None:
kwargs['rebatched_worker_size'] = 128
super().__init__(**kwargs)
self.initialized = True
return self
# Copyright (c) OpenMMLab. All rights reserved.
import warnings
import numpy as np
import torch
from mmcv.parallel import DataContainer
# A customized None type for HierarchicalDataManager
HierarchicalDataNone = object()
class HierarchicalDataManager:
"""A class manage all the tensors in the hierarchical data.
At present, the input data structure accepted by IPU is limited,
when the input data structure of mmcv varies.
Here, an intermediate class is needed to get and update tensors
from the original data.
HierarchicalDataManager will record a hierarchical input/output data in
self._hierarchical_data. For example, we have an input data:
{'img': tensorA, 'label': tensorB, 'img_metas': [tensorC, tensorD]}
To enable IPU to use the input, HierarchicalDataManager will collect
the torch tensors from self._hierarchical_data into a tuple like:
(tensorA, tensorB, tensorC, tensorD).
Meanwhile, the return of IPU is a tuple of tensors, HierarchicalDataManager
also have a function named update_all_tensors to update tensors in
self._hierarchical_data which is the output for upper calls.
Args:
logger (:obj:`logging.Logger`): Logger used during running.
Defaults to None.
"""
def __init__(self, logger=None):
self.atomic_types = (int, str, float, np.ndarray, type(None))
self.warning = warnings.warn if logger is None else logger.warning
# enable or disable input data's shape and value check
self.quick_mode = False
self._hierarchical_data = None
def quick(self):
self.quick_mode = True
def compare_atomic_type(self, a, b):
"""Compare data, supported datatypes are numpy array and python basic
types."""
if isinstance(a, np.ndarray):
return np.all(a == b)
else:
return a == b
def record_hierarchical_data(self, data):
"""Record a hierarchical data."""
if self._hierarchical_data is not None:
if isinstance(data, torch.Tensor):
assert isinstance(self._hierarchical_data, torch.Tensor), \
'original hierarchical data is not torch.tensor'
self._hierarchical_data = data
else:
self.update_hierarchical_data(data)
else:
self._hierarchical_data = data
@property
def hierarchical_data(self):
return self._hierarchical_data
def update_hierarchical_data(self,
dataA,
dataB=HierarchicalDataNone,
strict=True,
address='data'):
"""Update dataB with dataA in-place.
Args:
dataA (list or dict or tuple): New hierarchical data.
dataB (list or dict or tuple): hierarchical data to update.
if not specified, self.hierarchical_data will be updated then.
strict (bool, optional): If true, an error will be reported
when the following conditions occur:
1. Non-torch.Tensor data changed.
2. Torch.Tensor data shape changed.
address (str): Record the address of current data to be updated.
Default: 'data'.
"""
if dataB is HierarchicalDataNone:
dataB = self.hierarchical_data
# Update with a da ta with the same structure
# but different values(tensors and basic python data types)
if isinstance(dataA, (tuple, list)):
for idx, node in enumerate(dataA):
new_address = ''
if not self.quick_mode:
new_address = address + f'[{str(idx)}]'
assert isinstance(node, type(dataB[idx])),\
f'data structure changed: {new_address}'
if isinstance(node, torch.Tensor):
dataB[idx] = node
else:
self.update_hierarchical_data(
node, dataB[idx], strict, address=new_address)
elif isinstance(dataA, dict):
for k, v in dataA.items():
new_address = ''
if not self.quick_mode:
new_address = address + f'[{str(k)}]'
assert isinstance(v, type(dataB[k])),\
f'data structure changed: {new_address}'
if isinstance(v, torch.Tensor):
dataB[k] = v
else:
self.update_hierarchical_data(
v, dataB[k], strict, address=new_address)
elif isinstance(dataA, self.atomic_types):
if not self.quick_mode:
is_equal = self.compare_atomic_type(dataA, dataB)
if not is_equal:
if strict:
raise ValueError(
'all data except torch.Tensor should be same, '
f'but data({address}) is changed.')
else:
self.warning(
f'find a non-torch.Tensor data({type(dataA)}) '
f'changed, and the address is {address}')
elif isinstance(dataA, DataContainer):
if not self.quick_mode:
assert isinstance(dataB, DataContainer)
new_address = address + '.data'
self.update_hierarchical_data(
dataA.data, dataB.data, False, address=new_address)
else:
raise NotImplementedError(
f'not supported datatype:{type(dataA)}, address is {address}')
def collect_all_tensors(self, hierarchical_data=None):
"""Collect torch.Tensor data from self.hierarchical_data to a list and
return."""
# get a list of tensor from self._hierarchical_data
if hierarchical_data is None:
hierarchical_data = self._hierarchical_data
tensors = []
if isinstance(hierarchical_data, torch.Tensor):
tensors = [hierarchical_data]
else:
self._collect_tensors(hierarchical_data, tensors)
return tensors
def _collect_tensors(self, data, tensors):
if isinstance(data, (tuple, list)):
for node in data:
if isinstance(node, torch.Tensor):
tensors.append(node)
else:
self._collect_tensors(node, tensors)
elif isinstance(data, dict):
for v in data.values():
if isinstance(v, torch.Tensor):
tensors.append(v)
else:
self._collect_tensors(v, tensors)
elif isinstance(data, self.atomic_types):
pass
elif isinstance(data, DataContainer):
self._collect_tensors(data.data, tensors)
else:
raise NotImplementedError(f'not supported datatype:{type(data)}')
def update_all_tensors(self, tensors):
"""Put tensors from tuple back to self.hierarchical_data."""
if isinstance(self._hierarchical_data, torch.Tensor):
print(tensors, len(tensors))
assert len(tensors) == 1
assert isinstance(tensors[0], torch.Tensor)
self._hierarchical_data = tensors[0]
else:
# convert to list if tensors is tuple
tensors = list(tensors)
self._set_tensors(self._hierarchical_data, tensors)
return self.hierarchical_data
def _set_tensors(self, data, tensors):
if isinstance(data, tuple):
data = list(data)
for idx in range(len(data)):
if isinstance(data[idx], torch.Tensor):
data[idx] = tensors.pop(0)
else:
self._set_tensors(data[idx], tensors)
data = tuple(data)
elif isinstance(data, list):
for idx in range(len(data)):
if isinstance(data[idx], torch.Tensor):
data[idx] = tensors.pop(0)
else:
self._set_tensors(data[idx], tensors)
elif isinstance(data, dict):
for k, v in data.items():
if isinstance(v, torch.Tensor):
data[k] = tensors.pop(0)
else:
self._set_tensors(v, tensors)
elif isinstance(data, self.atomic_types):
pass
elif isinstance(data, DataContainer):
self._set_tensors(data.data, tensors)
else:
raise NotImplementedError(f'not supported datatype:{type(data)}')
def clean_all_tensors(self):
"""Delete tensors from self.hierarchical_data."""
self._clean_tensors(self._hierarchical_data)
def _clean_tensors(self, data):
if isinstance(data, tuple):
data = list(data)
for idx in range(len(data)):
if isinstance(data[idx], torch.Tensor):
data[idx] = None
else:
self._clean_tensors(data[idx])
data = tuple(data)
elif isinstance(data, list):
for idx in range(len(data)):
if isinstance(data[idx], torch.Tensor):
data[idx] = None
else:
self._clean_tensors(data[idx])
elif isinstance(data, dict):
for k, v in data.items():
if isinstance(v, torch.Tensor):
data[k] = None
else:
self._clean_tensors(v)
elif isinstance(data, self.atomic_types):
pass
elif isinstance(data, DataContainer):
self._clean_tensors(data.data)
else:
raise NotImplementedError(f'not supported datatype:{type(data)}')
# Copyright (c) OpenMMLab. All rights reserved.
from mmcv.runner import HOOKS, LrUpdaterHook, OptimizerHook
from mmcv.utils import TORCH_VERSION, digit_version
def wrap_lr_updater_hook(lr_hook_class):
"""A wrapper function to wrap any subclass of LrUpdaterHook.
IPU needs extra operations to upload optimizer settings. This wrapper will
override function(_set_lr) of a subclass of LrUpdaterHook.
"""
assert issubclass(lr_hook_class, LrUpdaterHook)
class ipu_lr_hook_class(lr_hook_class):
def _set_lr(self, runner, *args, **kwargs):
super()._set_lr(runner, *args, **kwargs)
# convert torch optimizer to poptorch optimizer
runner.model.setOptimizer(runner.optimizer)
return ipu_lr_hook_class
def wrap_optimizer_hook(optimizer_hook_class):
"""A wrapper function to wrap OptimizerHook.
This is an non-intrusive implementation of wrapping optimizer hook (or you
need to change every config file to use IPU optimizer hook) IPU's clip-norm
implementation is different from pytorch, so there should be an error
raised when using clip-norm.
"""
class ipu_optimizer_hook_class(OptimizerHook):
def __init__(self, **kwargs):
super().__init__(**kwargs)
if self.grad_clip is not None:
raise NotImplementedError('IPU does not support gradient clip')
return ipu_optimizer_hook_class
if (TORCH_VERSION != 'parrots'
and digit_version(TORCH_VERSION) >= digit_version('1.6.0')):
@HOOKS.register_module()
class IPUFp16OptimizerHook(OptimizerHook):
"""FP16 optimizer hook (using PyTorch's implementation).
If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend,
to take care of the optimization procedure.
Args:
loss_scale (float | str | dict): Scale factor configuration.
If loss_scale is a float, static loss scaling will be used with
the specified scale. If loss_scale is a string, it must be
'dynamic', then dynamic loss scaling will be used.
It can also be a dict containing arguments of GradScalar.
Defaults to 512. For Pytorch >= 1.6, mmcv uses official
implementation of GradScaler. If you use a dict version of
loss_scale to create GradScaler, please refer to:
https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler
for the parameters.
Examples:
>>> loss_scale = dict(
... init_scale=65536.0,
... growth_factor=2.0,
... backoff_factor=0.5,
... growth_interval=2000
... )
>>> optimizer_hook = Fp16OptimizerHook(loss_scale=loss_scale)
"""
def __init__(self,
grad_clip=None,
coalesce=True,
bucket_size_mb=-1,
loss_scale=512.,
distributed=True):
assert grad_clip is None,\
'IPU mode does not support `grad_clip` currently'
assert coalesce,\
'implemented all reduce in distributed training currently'
assert bucket_size_mb == -1,\
'`bucket_size_mb` should not be set in IPU mode'
self.distributed = distributed
self._scale_update_param = None
if loss_scale == 'dynamic':
raise NotImplementedError(
'IPU mode does not support dynamic loss scale currently')
elif isinstance(loss_scale, float):
self.loss_scale = loss_scale
elif isinstance(loss_scale, dict):
raise NotImplementedError(
'IPU mode supports single scale currently')
else:
raise ValueError(
f'loss_scale should be float, but got {loss_scale} ')
def after_train_iter(self, runner):
pass
else:
raise RuntimeError('The IPU mode only supports torch 1.6 and above')
# Copyright (c) OpenMMLab. All rights reserved.
import copy
import inspect
from collections import OrderedDict
from typing import Optional, Union
import poptorch
import torch
import torch.nn as nn
from poptorch import PoplarExecutor, __version__, identity_loss
from poptorch._args_parser import ArgsParser
from mmcv.runner import auto_fp16
from .hierarchical_data_manager import HierarchicalDataManager
from .utils import compare_ndarray, model_sharding, recomputation_checkpoint
class DictArgsParser(ArgsParser):
"""A helper class for handling model input.
Args:
inputs (list): Inputs of model.
"""
def __init__(self, inputs):
# Combine args and kwargs:
self._has_variadic_arguments = True
self._varnames = list(inputs.keys())
self._defaults = [inspect.Parameter.empty for _ in self._varnames]
self._warned_not_contiguous_input = False
class WrappedNet(nn.Module):
"""A net wrapper for model conversion.
This wrapper will make some changes and add some extra functions to
training/inference model.
Args:
model (:obj:`nn.Module`): The model to run.
inputs_manager (:obj:`HierarchicalDataManager`): A parser
converting inputs from tuple to dictionary.
outputs_manager (:obj:`HierarchicalDataManager`): A parser
converting outputs from dictionary to tuple.
inter_outputs_in_cpu (dict): Specify the features to be
recorded.
modules_to_record (mmcv.Config, list): Index or name of modules which
will be recorded for output. It is necessary to specify output for
static graph of model training or inference.
"""
def __init__(self,
model,
inputs_manager,
outputs_manager,
inter_outputs_in_cpu,
modules_to_record=None):
super().__init__()
self.model = model
self.inputs_manager = inputs_manager
self.outputs_manager = outputs_manager
self.training = model.training
# Register a hook function to capture the intermediate features
# generated by the network to align the outputs between ipu and cpu
# Used to confirm whether the implementation of CPU is consistent
# with the implementation of IPU
self.inter_outputs_in_cpu = inter_outputs_in_cpu
if modules_to_record is None:
modules_to_record = []
for idx, (name, module) in enumerate(model.named_modules()):
if name in modules_to_record or idx in modules_to_record:
features_hook = self.get_input_output_hook(
name, idx, self.inter_outputs_in_cpu)
module.register_forward_hook(hook=features_hook)
def get_input_output_hook(self, name, idx, save_dict):
def input_output_hook(module, fea_in, fea_out):
if isinstance(fea_in, tuple):
fea_in = list(fea_in)
if isinstance(fea_out, tuple):
fea_out = list(fea_out)
save_dict[name] = {
'fea_in': fea_in,
'fea_out': fea_out,
'idx': idx
}
return None
return input_output_hook
def forward(self, inputs_tuple):
"""This function is used to be compiled to ipu, the inputs and outputs
need to be tuples, so here we need to restore the input back to a
dictionary and convert the output to a tuple."""
self.inputs_manager.update_all_tensors(inputs_tuple)
kwargs = {**(self.inputs_manager.hierarchical_data)}
if self.training:
outputs = self.forward_train(kwargs)
# tell poptorch which loss will be used finally
identity_loss(outputs['loss'], reduction='none')
else:
outputs = self.forward_eval(kwargs)
if isinstance(outputs, torch.Tensor):
# currently not support single tensor output,
# need to wrap it with a dictionary,
# use a keyword to identify this case
outputs = {'output of WrappedNet: single tensor': outputs}
# if there are some features need to be record, add extra outputs
for name in self.inter_outputs_in_cpu:
outputs[name] = self.inter_outputs_in_cpu[name]
# record all the places of return tensors in the converting stage
# while in the real run stage, all the tensor are changed in-place
# that means the output can be obtained directly outside this function
self.outputs_manager.record_hierarchical_data(outputs)
plain_outputs = self.outputs_manager.collect_all_tensors()
return plain_outputs
def forward_train(self, kwargs):
optimizer = kwargs.pop('optimizer')
outputs = self.train_step(kwargs, optimizer)
return outputs
def train_step(self, data, optimizer=None, **kwargs):
"""The iteration step during training.
This method defines an iteration step during training, except for the
back propagation and optimizer updating, which are done in an optimizer
hook. Note that in some complicated cases or models, the whole process
including back propagation and optimizer updating are also defined in
this method, such as GAN.
Args:
data (dict): The output of dataloader.
optimizer (:obj:`torch.optim.Optimizer`, optional): The
optimizer of runner is passed to ``train_step()``. This
argument is unused and reserved.
Returns:
dict: Dict of outputs. The following fields are contained.
- loss (torch.Tensor): A tensor for back propagation, which \
can be a weighted sum of multiple losses.
- log_vars (dict): Dict contains all the variables to be sent \
to the logger.
- num_samples (int): Indicates the batch size (when the model \
is DDP, it means the batch size on each GPU), which is \
used for averaging the logs.
"""
losses = self.model(**data)
loss, log_vars = self._parse_losses(losses)
outputs = dict(
loss=loss, log_vars=log_vars, num_samples=len(data['img'].data))
return outputs
def _parse_losses(self, losses):
log_vars = OrderedDict()
for loss_name, loss_value in losses.items():
if isinstance(loss_value, torch.Tensor):
log_vars[loss_name] = loss_value.mean()
elif isinstance(loss_value, list):
log_vars[loss_name] = sum(loss.mean() for loss in loss_value)
elif isinstance(loss_value, dict):
for name, value in loss_value.items():
log_vars[name] = value
else:
raise TypeError(
f'{loss_name} is not a tensor or list of tensors')
loss = sum(value for key, value in log_vars.items() if 'loss' in key)
log_vars['loss'] = loss
return loss, log_vars
def forward_eval(self, kwargs):
img = kwargs.pop('img')
img_metas = kwargs.pop('img_metas', None)
return_loss = kwargs.pop('return_loss')
assert not return_loss
# TODO Temporarily hard-code to close post_process,
# otherwise, in the third trace(_check_trace),
# post_process will convert output tensor to numpy array automatically,
# resulting in _check_trace failure
outputs = self.model(
img,
img_metas=img_metas,
return_loss=return_loss,
post_process=False)
return outputs
class MMPoplarExecutor(PoplarExecutor):
"""An executor for inputs/outputs parsing, model compilation, data
alignment and IPU upload/download.
Args:
model (:obj:`nn.Module`): The model to be compiled.
logger (:obj:`logging.Logger`): Logger used during running.
Defaults to None.
training (bool): Model in training mode or eval mode.
modules_to_record (mmcv.Config, list): Index or name of modules which
will be recorded for output. It is necessary to specify output for
static graph of model training or inference.
args (argument list): Arguments passed to the `__init__`
method of PoplarExecutor.
kwargs (keyword arguments): Keyword arguments passed to the `__init__`
method of PoplarExecutor.
"""
def __init__(self,
model,
logger=None,
training=True,
modules_to_record=None,
*args,
**kwargs):
# self.model == self._user_model: input pytorch model
# self._model: wrapped model which is used to compile
# and update weights, these two models use same weights
# wrapped model only accept and output tuple, so
# HierarchicalDataManager will convert dictionary
# to tuple and convert them back
self.inputs_manager = HierarchicalDataManager(logger=logger)
self.outputs_manager = HierarchicalDataManager(logger=logger)
self.logger = logger
# the features calculated by CPU
self.inter_outputs_in_cpu = {}
# the features calculated by IPU
self.inter_outputs_in_ipu = {}
if modules_to_record is None:
# It is possible that the IPU implementation of some operators
# is inconsistent with the expected (CPU), here you can use
# this method to confirm whether there is a problem
self.compare_with_cpu = False
else:
self.compare_with_cpu = True
# move model.fp16_enabled to self.fp16_enabled,
# modify the position where the input is automatically casted to half
if getattr(model, 'fp16_enabled', False):
model.fp16_enabled = False
self.fp16_enabled = True
# make torch.jit.trace convert self._model
model = WrappedNet(
model,
self.inputs_manager,
self.outputs_manager,
self.inter_outputs_in_cpu,
modules_to_record=modules_to_record)
super().__init__(model, training=training, *args, **kwargs)
# overwrite self._args_parser in train_step or val_step
self._args_parser = None
if training:
assert self.training
else:
assert not self.training
@property
def training(self):
# If trying to get the attribute(training) of self,
# since the class has no training attribute,
# it will automatically look for the training attribute of self.model.
# However, the real attribute we want to check is self._training,
# self.model.training and self._training are often inconsistent.
# It is not clear whether it is a Poptorch bug or a special design,
# temporarily use this function to fix the problem
return self._training # comes from self.model._training
@auto_fp16(supported_types=(PoplarExecutor, ))
def run_model(self, data_dict):
# this function is used to parse input_dict
# and convert to output_dict
if self.isCompiled():
self.inputs_manager.record_hierarchical_data(data_dict)
inputs_tuple = tuple(self.inputs_manager.collect_all_tensors())
else:
# get tensors out of data and put them in a tuple
self.inputs_manager.record_hierarchical_data(data_dict)
inputs_tuple = tuple(self.inputs_manager.collect_all_tensors())
# turn logger in data manager off after compilation
self.inputs_manager.quick()
self.outputs_manager.quick()
# parser args in the first iter
if self._args_parser is None:
self._args_parser = DictArgsParser({'args': inputs_tuple})
# run or convert model
# the plain_outputs will be used in converting stage
plain_outputs = self(inputs_tuple)
self.inputs_manager.clean_all_tensors()
# put list of tensors back to the output dict
# according to the same order
self.outputs_manager.update_all_tensors(plain_outputs)
# get the real output dictionary from self.outputs_manager
output_dict = self.outputs_manager.hierarchical_data
# split output_dict into inter_outputs_in_ipu
# and output of the torch model
torch_model_output = {}
for name in output_dict:
if name in self.inter_outputs_in_cpu:
self.inter_outputs_in_ipu[name] = output_dict[name]
else:
torch_model_output[name] = output_dict[name]
if 'output of WrappedNet: single tensor' in output_dict:
assert len(torch_model_output) == 1
assert isinstance(
torch_model_output['output of WrappedNet: single tensor'],
torch.Tensor)
torch_model_output = \
torch_model_output['output of WrappedNet: single tensor']
return torch_model_output
def train_step(self, data, optimizer=None, **kwargs):
# arguments from mmcls/models/classifiers/base.py:
# BaseClassifier.train_step
assert self.training
assert len(kwargs) == 0 # TODO, support later if necessary
# TODO support datacontainer as input
# currently, auto_fp16 and HierarchicalDataManager take too much
# time on traversing datacontainer
data['img_metas'] = None
num_samples = len(data['img'].data)
# TODO we will ignore optimizer because it will not be used in model,
# support later if necessary
data['optimizer'] = None
output_dict = self.run_model(data)
# outputs contained loss, log_vars, num_samples,
# only loss(torch.tensor) has been updated
# remove all unchanged vars, left torch.tensor
neat_output_dict = {'loss': output_dict['loss']}
# re-parse outputs, get back log_vars and num_samples
loss, log_vars = self.model._parse_losses(neat_output_dict)
final_output_dict = dict(
loss=loss, log_vars=log_vars, num_samples=num_samples)
return final_output_dict
def eval_call(self, img, img_metas=None, return_loss=True, **kwargs):
# arguments from mmdet/models/detectors/base.py:BaseDetector.forward
# tmp usssage for eval mode
assert not self.training
assert len(kwargs) == 0 # TODO, support later if necessary
assert not return_loss
data = {'img': img, 'img_metas': img_metas, 'return_loss': return_loss}
output_dict = self.run_model(data)
return output_dict
def detachFromDevice(self):
if self.isCompiled() and self._is_attached:
super().detachFromDevice()
def attachToDevice(self):
if self.isCompiled() and not self._is_attached:
super().attachToDevice()
class TrainEvalModel:
"""A class maintaining training MMPoplarExecutor and inference
MMPoplarExecutor.
Args:
train_model (:obj:`nn.Module`): The training model to be compiled.
``train_model`` can be None if only executing validation.
eval_model (:obj:`nn.Module`): The inference model to be compiled.
options (mmcv.Config, dict): Options that will be used to compile
and run the model.
optimizer (:obj:`torch.optim.Optimizer`, optional): torch
optimizer, necessary if in training mode
logger (:obj:`logging.Logger`): Logger used during running.
Defaults to None.
modules_to_record (mmcv.Config, list): Index or name of modules which
will be recorded for output. It is necessary to specify output for
static graph of model training or inference.
"""
def __init__(self,
train_model,
eval_model,
options,
optimizer,
modules_to_record=None,
logger=None):
if train_model is None:
self._train_executor = None
self.training = False
else:
self._train_executor = get_training_model(
train_model,
options=options['training'],
optimizer=optimizer,
logger=logger,
modules_to_record=modules_to_record)
self.training = True
self._eval_executor = get_inference_model(
eval_model, options=options['inference'], logger=logger)
@property
def executor(self):
if self.training:
return self._train_executor
else:
return self._eval_executor
def train(self, mode: bool = True):
"""Sets the module in training mode.
This has any effect only on certain modules. See documentations of
particular modules for details of their behaviors in
training/evaluation mode, if they are affected,
e.g. :class:`Dropout`, :class:`BatchNorm`, etc.
Args:
mode (bool): whether to set training mode (``True``) or evaluation
mode (``False``). Default: ``True``.
Returns:
Module: self
"""
if not isinstance(mode, bool):
raise ValueError('training mode is expected to be boolean, '
f'but got {type(mode)}')
if self._train_executor is None and mode:
raise RuntimeError(
'The train_executor is not initialized.'
'If you want to initialize train_executor,'
'you need to input optimizer when converting pytorch model')
if mode == self.training:
self.model.train(mode)
return self
else:
if self.isCompiled():
# copy weights from IPU to cpu before off-load current session
self.copyWeightsToHost()
# detach the current session before change the mode,
# if is training mode and weights are updated,
# poptorch will copy weights from IPU to host
self.detachFromDevice()
self.training = mode # session will changed with mode changing
self.model.train(mode)
# after changing mode, attach the current new session,
# and this function will copy weights of model to device
self.attachToDevice()
return self
def eval(self):
"""Sets the module in evaluation mode.
This has any effect only on certain modules.
See documentations of particular modules
for details of their behaviors in training/evaluation mode,
if they are affected, e.g. :class:`Dropout`, :class:`BatchNorm`, etc.
This is equivalent with :meth:`self.train(False)
<nn.Module.train>`.
See :ref:`locally-disable-grad-doc` for a comparison between
`.eval()` and several similar mechanisms that may be confused with it.
Returns:
Module: self
"""
return self.train(False)
def compare_data_between_ipu_and_cpu(self, inter_outputs_in_cpu,
inter_outputs_in_ipu):
for key, val in inter_outputs_in_cpu.items():
is_tensor = isinstance(val['fea_in'], torch.Tensor)
fea_in_cpu = val['fea_in']
fea_in_cpu_list = [fea_in_cpu] if is_tensor else fea_in_cpu
fea_in_ipu = inter_outputs_in_ipu[key]['fea_in']
fea_in_ipu_list = [fea_in_ipu] if is_tensor else fea_in_ipu
is_tensor = isinstance(val['fea_out'], torch.Tensor)
fea_out_cpu = val['fea_out']
fea_out_cpu_list = [fea_out_cpu] if is_tensor else fea_out_cpu
fea_out_ipu = inter_outputs_in_ipu[key]['fea_out']
fea_out_ipu_list = [fea_out_ipu] if is_tensor else fea_out_ipu
print('comparing layer:', key)
for idx, (featA, featB) in \
enumerate(zip(fea_in_cpu_list, fea_in_ipu_list)):
print('fea_in, tensor ', idx)
compare_ndarray(featA.detach().numpy(), featB.detach().numpy())
for idx, (featA, featB) in \
enumerate(zip(fea_out_cpu_list, fea_out_ipu_list)):
print('fea_out, tensor', idx)
compare_ndarray(featA.detach().numpy(), featB.detach().numpy())
# TODO Unified training and eval interface,
# merge train_step(train) and __call__(eval) together
def train_step(self, data, optimizer=None, **kwargs):
assert self.training, 'not supported train_step on eval mode'
inter_outputs_in_cpu = {}
if (self._train_executor.isCompiled()
and self._train_executor.compare_with_cpu):
self.copyWeightsToHost()
# run in CPU mode
self._train_executor.model.train_step(data, optimizer, **kwargs)
inter_outputs_in_cpu = {
**(self._train_executor.inter_outputs_in_cpu)
}
# run in IPU mode
result = self._train_executor.train_step(data, optimizer, **kwargs)
if (self._train_executor.isCompiled()
and self._train_executor.compare_with_cpu
and len(inter_outputs_in_cpu) > 0):
self.compare_data_between_ipu_and_cpu(
inter_outputs_in_cpu,
self._train_executor.inter_outputs_in_ipu)
return result
# TODO Unified training and eval interface,
# merge train_step(train) and __call__(eval) together
def __call__(self, *args, **kwargs):
if self.training:
raise NotImplementedError('use train_step rather than __call__')
else:
return self._eval_executor.eval_call(*args, **kwargs)
def __getattr__(self, attr):
return getattr(self.executor, attr)
def get_training_model(model: nn.Module,
options: Optional[poptorch.Options] = None,
optimizer: Optional[torch.optim.Optimizer] = None,
logger=None,
modules_to_record=None) -> poptorch.PoplarExecutor:
"""Create a PopTorch training model from a PyTorch model, running on IPU
hardware in training mode.
Note:
PopTorch makes a shallow copy of the model. Changes to the
parameters in the returned training model affect the original model
and vice versa. However, primitive variable types are not synced: for
example calling ``model.train()`` on the original model, which
changes the ``training`` bool of the model instance, will not alter the
model returned by this function. You may need to call ``model.train()``
on your model before you call this function for correct behavior.
Args:
model (:obj:`nn.Module`): The model to run.
options (poptorch.Options): Options that will be used to compile
and run the model.
optimizer (:obj:`torch.optim.Optimizer`, optional): The optimizers
to apply during training.
logger (:obj:`logging.Logger`): Logger used during running.
Defaults to None.
modules_to_record (mmcv.Config, list): Index or name of modules which
will be recorded for output. It is necessary to specify output for
static graph of model training or inference.
Returns:
The :class:`poptorch.PoplarExecutor` wrapper to use in place
of ``model``.
"""
# Create a copy of the original model in case it needs to be wrapped
maybe_wrapped_model = copy.copy(model)
return MMPoplarExecutor(
model=maybe_wrapped_model,
logger=logger,
options=options,
training=True,
optimizer=optimizer,
user_model=model,
modules_to_record=modules_to_record,
poptorch_version=__version__)
def get_inference_model(model: Union[nn.Module, poptorch.PoplarExecutor],
options: Optional[poptorch.Options] = None,
logger=None) -> poptorch.PoplarExecutor:
"""Create a PopTorch inference model from a PyTorch model, running on IPU
hardware in inference mode.
Note:
PopTorch makes a shallow copy of the model. Changes to the
parameters in the returned inference model affect the original model
and vice versa. However, primitive variable types are not synced: for
example calling ``model.eval()`` on the original model will not alter
the model returned by this function. You may need to call
``model.eval()`` on your model before you call this function for
correct behavior.
Args:
model (:obj:`nn.Module`): The model to run.
options (poptorch.Options): Options that will be used to compile
and run the model.
logger (:obj:`logging.Logger`): Logger used during running.
Defaults to None.
Returns:
The :class:`poptorch.PoplarExecutor` wrapper to use in place of
``model``.
"""
return MMPoplarExecutor(
model=copy.copy(model),
logger=logger,
options=options,
training=False,
poptorch_version=__version__)
def ipu_model_wrapper(model,
options,
optimizer=None,
logger=None,
modules_to_record=None,
ipu_model_cfg=None,
fp16_cfg=None):
"""Convert torch model to IPU model.
Args:
model (nn.Module): The target model to be converted.
options (dict[str, poptorch.Options]): IPU options, generated
by :func:`cfg2options`.
optimizer (:obj:`torch.optim.Optimizer`, optional): torch
optimizer, necessary if in training mode
logger (:obj:`logging.Logger`): Logger used during training.
modules_to_record (mmcv.Config, list): Index or name of modules which
will be recorded for output. It is necessary to specify output for
static graph of model training or inference.
ipu_model_cfg (dict): A dictionary contains train_split_edges and
train_ckpt_nodes, See details in :func:`model_sharding` and
:func:`recomputation_checkpoint` functions.
fp16_cfg (dict): Config for IPU fp16 training. Currently supports
configs: `loss_scale`, `velocity_accum_type` and `accum_type`.
See details in
https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/index.html
Returns:
TrainEvalModel: IPU wrapped model.
"""
if ipu_model_cfg is None:
ipu_model_cfg = {}
training = model.training if optimizer is not None else False
# set mixed-precision
if fp16_cfg is not None:
from mmcv.runner import wrap_fp16_model
loss_scale = fp16_cfg['loss_scale']
wrap_fp16_model(model)
model.half()
# TODO tmp ussage to set loss scaling for torch original optimizer
if optimizer is not None:
optimizer.loss_scaling = loss_scale
if fp16_cfg.get('velocity_accum_type', False):
if fp16_cfg['velocity_accum_type'] == 'half':
optimizer.velocity_accum_type = torch.half
else:
optimizer.velocity_accum_type = torch.float32
if fp16_cfg.get('accum_type', False):
if fp16_cfg['accum_type'] == 'half':
optimizer.accum_type = torch.half
else:
optimizer.accum_type = torch.float32
# TODO support feature alignment for fp16
if modules_to_record is not None:
raise NotImplementedError(
'Feature alignment for fp16 is not implemented')
# set model partition
if optimizer is None:
train_model = None
else:
# split model into multi-IPUs if specified
train_model = model_sharding(
copy.copy(model).train(),
ipu_model_cfg.get('train_split_edges', []))
recomputation_checkpoint(train_model,
ipu_model_cfg.get('train_ckpt_nodes', []))
# TODO support feature alignment for gradient accumulation mode
gradient_accumulation = \
getattr(options['training'].Training, 'gradient_accumulation', 1)
if gradient_accumulation > 1:
assert modules_to_record is None, \
'Feature alignment for grad-accumulation mode not implemented'
# TODO support feature alignment for multi-replica mode
replication_factor = \
getattr(options['training'], 'replication_factor', 1)
if replication_factor > 1:
assert modules_to_record is None, \
'Feature alignment for multi-replica mode not implemented'
# TODO supports different model partitions between train and eval mode
assert len(ipu_model_cfg.get('eval_split_edges', [])) == 0,\
'Currently, BeginBlock can only be used once on the same model'
eval_model = copy.copy(model).eval()
# wrap model for compilation
model = TrainEvalModel(
train_model,
eval_model,
options=options,
optimizer=optimizer,
logger=logger,
modules_to_record=modules_to_record)
model.train(training)
return model
# Copyright (c) OpenMMLab. All rights reserved.
from mmcv.runner import (HOOKS, RUNNERS, BaseRunner, EpochBasedRunner,
IterBasedRunner)
from mmcv.utils import IS_IPU_AVAILABLE
if IS_IPU_AVAILABLE:
from .dataloader import IPUDataLoader
from .hook_wrapper import (IPUFp16OptimizerHook, wrap_lr_updater_hook,
wrap_optimizer_hook)
from .model_wrapper import ipu_model_wrapper
from .utils import build_from_cfg_with_wrapper, cfg2options
class IPUBaseRunner(BaseRunner):
"""A base runner for IPU.
This runner has some extra processes for IPU which are shown below:
1. Parse options for IPU
2. wrap pytorch model for IPU
3. Raise errors while encountering illegal usage
4. Input IPU options and initialize dataloader if finding an instance
of IPUDataLoader
Args:
model (:obj:`nn.Module`): The model to run.
options_cfg (mmcv.Config, dict): Options that will be used to compile
and run the model.
modules_to_record (mmcv.Config, list): Index or name of modules which
will be recorded for output. It is necessary to specify output for
static graph of model training or inference.
ipu_model_cfg (mmcv.Config, dict): Config of model partition and
recomputing checkpoint
fp16_cfg (mmcv.Config): Config for fp16 training.
batch_processor (callable): A callable method that process a data
batch. Should be None for IPU runner
kwargs (Dict[str, Any], optional): Keyword arguments will be passed to
``base_runner.BaseRunner``.
"""
def __init__(self,
model,
options_cfg=None,
modules_to_record=None,
ipu_model_cfg=None,
fp16_cfg=None,
batch_processor=None,
**kwargs):
assert hasattr(model, 'train_step') and batch_processor is None,\
'only support model with train_step'
if options_cfg is None:
options_cfg = {}
# call BaseRunner.__init__() here
super().__init__(model, **kwargs)
# process options of ipu
if IS_IPU_AVAILABLE:
self.options = cfg2options(options_cfg)
self.model = ipu_model_wrapper(
self.model,
self.options,
self.optimizer,
self.logger,
modules_to_record=modules_to_record,
ipu_model_cfg=ipu_model_cfg,
fp16_cfg=fp16_cfg)
else:
raise NotImplementedError('cpu mode on IPURunner is not supported')
def register_lr_hook(self, lr_config):
if lr_config is None:
return
assert isinstance(lr_config, dict)
assert 'policy' in lr_config
policy_type = lr_config.pop('policy')
# If the type of policy is all in lower case,
# e.g., 'cyclic', then its first letter will be capitalized,
# e.g., to be 'Cyclic'.
# This is for the convenient usage of Lr updater.
# Since this is not applicable for `
# CosineAnnealingLrUpdater`, the string will not be changed
# if it contains capital letters.
if policy_type == policy_type.lower():
policy_type = policy_type.title()
hook_type = policy_type + 'LrUpdaterHook'
lr_config['type'] = hook_type
hook = build_from_cfg_with_wrapper(lr_config, HOOKS,
wrap_lr_updater_hook)
self.register_hook(hook, priority='VERY_HIGH')
def register_optimizer_hook(self, optimizer_config):
if optimizer_config is None:
return
assert isinstance(optimizer_config, (dict, IPUFp16OptimizerHook))
if isinstance(optimizer_config, dict):
optimizer_config.setdefault('type', 'OptimizerHook')
hook = build_from_cfg_with_wrapper(optimizer_config, HOOKS,
wrap_optimizer_hook)
else:
hook = optimizer_config
self.register_hook(hook, priority='ABOVE_NORMAL')
def run(self, data_loaders, workflow, *args, **kwargs):
for i, flow in enumerate(workflow):
mode, _ = flow
# initialize IPU dataloader if not initialized
assert isinstance(data_loaders[i], IPUDataLoader),\
'IPU runner can only work with `IPUDataLoader`'
data_loaders[i].init(options=self.get_options(mode))
super().run(data_loaders, workflow, *args, **kwargs)
def get_options(self, mode):
if mode == 'train':
return self.options['training']
elif mode == 'val':
return self.options['inference']
else:
raise ValueError(f'mode should be train or val but got {mode}')
@RUNNERS.register_module()
class IPUEpochBasedRunner(IPUBaseRunner, EpochBasedRunner):
"""Epoch-based Runner for IPU.
The Inheritance order(MRO) is: IPUEpochBasedRunner -> IPUBaseRunner ->
EpochBasedRunner -> BaseRunner This runner train models epoch by epoch.
"""
pass
@RUNNERS.register_module()
class IPUIterBasedRunner(IPUBaseRunner, IterBasedRunner):
"""Iteration-based Runner for IPU.
The Inheritance order(MRO) is: IPUIterBasedRunner -> IPUBaseRunner ->
IterBasedRunner -> BaseRunner This runner train models iteration by
iteration.
"""
pass
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment