Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
dcuai
dlexamples
Commits
0fd8347d
Commit
0fd8347d
authored
Jan 08, 2023
by
unknown
Browse files
添加mmclassification-0.24.1代码,删除mmclassification-speed-benchmark
parent
cc567e9e
Changes
838
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
2729 additions
and
11 deletions
+2729
-11
openmmlab_test/mmclassification-0.24.1/docs/zh_CN/tutorials/finetune.md
.../mmclassification-0.24.1/docs/zh_CN/tutorials/finetune.md
+222
-0
openmmlab_test/mmclassification-0.24.1/docs/zh_CN/tutorials/new_dataset.md
...classification-0.24.1/docs/zh_CN/tutorials/new_dataset.md
+230
-0
openmmlab_test/mmclassification-0.24.1/docs/zh_CN/tutorials/new_modules.md
...classification-0.24.1/docs/zh_CN/tutorials/new_modules.md
+280
-0
openmmlab_test/mmclassification-0.24.1/docs/zh_CN/tutorials/runtime.md
...t/mmclassification-0.24.1/docs/zh_CN/tutorials/runtime.md
+260
-0
openmmlab_test/mmclassification-0.24.1/docs/zh_CN/tutorials/schedule.md
.../mmclassification-0.24.1/docs/zh_CN/tutorials/schedule.md
+333
-0
openmmlab_test/mmclassification-0.24.1/hostfile
openmmlab_test/mmclassification-0.24.1/hostfile
+2
-0
openmmlab_test/mmclassification-0.24.1/mmcls/__init__.py
openmmlab_test/mmclassification-0.24.1/mmcls/__init__.py
+60
-0
openmmlab_test/mmclassification-0.24.1/mmcls/apis/__init__.py
...mmlab_test/mmclassification-0.24.1/mmcls/apis/__init__.py
+10
-0
openmmlab_test/mmclassification-0.24.1/mmcls/apis/inference.py
...mlab_test/mmclassification-0.24.1/mmcls/apis/inference.py
+22
-8
openmmlab_test/mmclassification-0.24.1/mmcls/apis/test.py
openmmlab_test/mmclassification-0.24.1/mmcls/apis/test.py
+230
-0
openmmlab_test/mmclassification-0.24.1/mmcls/apis/test_old.py
...mmlab_test/mmclassification-0.24.1/mmcls/apis/test_old.py
+228
-0
openmmlab_test/mmclassification-0.24.1/mmcls/apis/test_time.py
...mlab_test/mmclassification-0.24.1/mmcls/apis/test_time.py
+257
-0
openmmlab_test/mmclassification-0.24.1/mmcls/apis/train.py
openmmlab_test/mmclassification-0.24.1/mmcls/apis/train.py
+232
-0
openmmlab_test/mmclassification-0.24.1/mmcls/core/__init__.py
...mmlab_test/mmclassification-0.24.1/mmcls/core/__init__.py
+5
-0
openmmlab_test/mmclassification-0.24.1/mmcls/core/evaluation/__init__.py
...mmclassification-0.24.1/mmcls/core/evaluation/__init__.py
+12
-0
openmmlab_test/mmclassification-0.24.1/mmcls/core/evaluation/eval_hooks.py
...classification-0.24.1/mmcls/core/evaluation/eval_hooks.py
+78
-0
openmmlab_test/mmclassification-0.24.1/mmcls/core/evaluation/eval_metrics.py
...assification-0.24.1/mmcls/core/evaluation/eval_metrics.py
+259
-0
openmmlab_test/mmclassification-0.24.1/mmcls/core/evaluation/mean_ap.py
.../mmclassification-0.24.1/mmcls/core/evaluation/mean_ap.py
+4
-3
openmmlab_test/mmclassification-0.24.1/mmcls/core/evaluation/multilabel_eval_metrics.py
...n-0.24.1/mmcls/core/evaluation/multilabel_eval_metrics.py
+1
-0
openmmlab_test/mmclassification-0.24.1/mmcls/core/export/__init__.py
...est/mmclassification-0.24.1/mmcls/core/export/__init__.py
+4
-0
No files found.
Too many changes to show.
To preserve performance only
838 of 838+
files are displayed.
Plain diff
Email patch
openmmlab_test/mmclassification-0.24.1/docs/zh_CN/tutorials/finetune.md
0 → 100644
View file @
0fd8347d
# 教程 2:如何微调模型
已经证明,在 ImageNet 数据集上预先训练的分类模型对于其他数据集和其他下游任务有很好的效果。
该教程提供了如何将
[
Model Zoo
](
https://github.com/open-mmlab/mmclassification/blob/master/docs/model_zoo.md
)
中提供的预训练模型用于其他数据集,已获得更好的效果。
在新数据集上微调模型分为两步:
-
按照
[
教程 3:如何自定义数据集
](
new_dataset.md
)
添加对新数据集的支持。
-
按照本教程中讨论的内容修改配置文件
假设我们现在有一个在 ImageNet-2012 数据集上训练好的 ResNet-50 模型,并且希望在
CIFAR-10 数据集上进行模型微调,我们需要修改配置文件中的五个部分。
## 继承基础配置
首先,创建一个新的配置文件
`configs/tutorial/resnet50_finetune_cifar.py`
来保存我们的配置,当然,这个文件名可以自由设定。
为了重用不同配置之间的通用部分,我们支持从多个现有配置中继承配置。要微调
ResNet-50 模型,新配置需要继承
`_base_/models/resnet50.py`
来搭建模型的基本结构。
为了使用 CIFAR10 数据集,新的配置文件可以直接继承
`_base_/datasets/cifar10.py`
。
而为了保留运行相关设置,比如训练调整器,新的配置文件需要继承
`_base_/default_runtime.py`
。
要继承以上这些配置文件,只需要把下面一段代码放在我们的配置文件开头。
```
python
_base_
=
[
'../_base_/models/resnet50.py'
,
'../_base_/datasets/cifar10.py'
,
'../_base_/default_runtime.py'
]
```
除此之外,你也可以不使用继承,直接编写完整的配置文件,例如
[
`configs/lenet/lenet5_mnist.py`
](
https://github.com/open-mmlab/mmclassification/blob/master/configs/lenet/lenet5_mnist.py
)
。
## 修改模型
在进行模型微调是,我们通常希望在主干网络(backbone)加载预训练模型,再用我们的数据集训练一个新的分类头(head)。
为了在主干网络加载预训练模型,我们需要修改主干网络的初始化设置,使用
`Pretrained`
类型的初始化函数。另外,在初始化设置中,我们使用
`prefix='backbone'`
来告诉初始化函数移除权重文件中键值名称的前缀,比如把
`backbone.conv1`
变成
`conv1`
。方便起见,我们这里使用一个在线的权重文件链接,它
会在训练前自动下载对应的文件,你也可以提前下载这个模型,然后使用本地路径。
接下来,新的配置文件需要按照新数据集的类别数目来修改分类头的配置。只需要修改分
类头中的
`num_classes`
设置即可。
```
python
model
=
dict
(
backbone
=
dict
(
init_cfg
=
dict
(
type
=
'Pretrained'
,
checkpoint
=
'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'
,
prefix
=
'backbone'
,
)),
head
=
dict
(
num_classes
=
10
),
)
```
```
{tip}
这里我们只需要设定我们想要修改的部分配置,其他配置将会自动从我们的父配置文件中获取。
```
另外,有时我们在进行微调时会希望冻结主干网络前面几层的参数,这么做有助于在后续
训练中,保持网络从预训练权重中获得的提取低阶特征的能力。在 MMClassification 中,
这一功能可以通过简单的一个
`frozen_stages`
参数来实现。比如我们需要冻结前两层网
络的参数,只需要在上面的配置中添加一行:
```
python
model
=
dict
(
backbone
=
dict
(
frozen_stages
=
2
,
init_cfg
=
dict
(
type
=
'Pretrained'
,
checkpoint
=
'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'
,
prefix
=
'backbone'
,
)),
head
=
dict
(
num_classes
=
10
),
)
```
```
{note}
目前还不是所有的网络都支持 `frozen_stages` 参数,在使用之前,请先检查
[文档](https://mmclassification.readthedocs.io/zh_CN/latest/api/models.html#backbones)
以确认你所使用的主干网络是否支持。
```
## 修改数据集
当针对一个新的数据集进行微调时,我们通常都需要修改一些数据集相关的配置。比如这
里,我们就需要把 CIFAR-10 数据集中的图像大小从 32 缩放到 224 来配合 ImageNet 上
预训练模型的输入。这一需要可以通过修改数据集的预处理流水线(pipeline)来实现。
```
python
img_norm_cfg
=
dict
(
mean
=
[
125.307
,
122.961
,
113.8575
],
std
=
[
51.5865
,
50.847
,
51.255
],
to_rgb
=
False
,
)
train_pipeline
=
[
dict
(
type
=
'RandomCrop'
,
size
=
32
,
padding
=
4
),
dict
(
type
=
'RandomFlip'
,
flip_prob
=
0.5
,
direction
=
'horizontal'
),
dict
(
type
=
'Resize'
,
size
=
224
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'ImageToTensor'
,
keys
=
[
'img'
]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'gt_label'
]),
dict
(
type
=
'Collect'
,
keys
=
[
'img'
,
'gt_label'
]),
]
test_pipeline
=
[
dict
(
type
=
'Resize'
,
size
=
224
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'ImageToTensor'
,
keys
=
[
'img'
]),
dict
(
type
=
'Collect'
,
keys
=
[
'img'
]),
]
data
=
dict
(
train
=
dict
(
pipeline
=
train_pipeline
),
val
=
dict
(
pipeline
=
test_pipeline
),
test
=
dict
(
pipeline
=
test_pipeline
),
)
```
## 修改训练策略设置
用于微调任务的超参数与默认配置不同,通常只需要较小的学习率和较少的训练时间。
```
python
# 用于批大小为 128 的优化器学习率
optimizer
=
dict
(
type
=
'SGD'
,
lr
=
0.01
,
momentum
=
0.9
,
weight_decay
=
0.0001
)
optimizer_config
=
dict
(
grad_clip
=
None
)
# 学习率衰减策略
lr_config
=
dict
(
policy
=
'step'
,
step
=
[
15
])
runner
=
dict
(
type
=
'EpochBasedRunner'
,
max_epochs
=
200
)
log_config
=
dict
(
interval
=
100
)
```
## 开始训练
现在,我们完成了用于微调的配置文件,完整的文件如下:
```
python
_base_
=
[
'../_base_/models/resnet50.py'
,
'../_base_/datasets/cifar10_bs16.py'
,
'../_base_/default_runtime.py'
]
# 模型设置
model
=
dict
(
backbone
=
dict
(
frozen_stages
=
2
,
init_cfg
=
dict
(
type
=
'Pretrained'
,
checkpoint
=
'https://download.openmmlab.com/mmclassification/v0/resnet/resnet50_8xb32_in1k_20210831-ea4938fc.pth'
,
prefix
=
'backbone'
,
)),
head
=
dict
(
num_classes
=
10
),
)
# 数据集设置
img_norm_cfg
=
dict
(
mean
=
[
125.307
,
122.961
,
113.8575
],
std
=
[
51.5865
,
50.847
,
51.255
],
to_rgb
=
False
,
)
train_pipeline
=
[
dict
(
type
=
'RandomCrop'
,
size
=
32
,
padding
=
4
),
dict
(
type
=
'RandomFlip'
,
flip_prob
=
0.5
,
direction
=
'horizontal'
),
dict
(
type
=
'Resize'
,
size
=
224
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'ImageToTensor'
,
keys
=
[
'img'
]),
dict
(
type
=
'ToTensor'
,
keys
=
[
'gt_label'
]),
dict
(
type
=
'Collect'
,
keys
=
[
'img'
,
'gt_label'
]),
]
test_pipeline
=
[
dict
(
type
=
'Resize'
,
size
=
224
),
dict
(
type
=
'Normalize'
,
**
img_norm_cfg
),
dict
(
type
=
'ImageToTensor'
,
keys
=
[
'img'
]),
dict
(
type
=
'Collect'
,
keys
=
[
'img'
]),
]
data
=
dict
(
train
=
dict
(
pipeline
=
train_pipeline
),
val
=
dict
(
pipeline
=
test_pipeline
),
test
=
dict
(
pipeline
=
test_pipeline
),
)
# 训练策略设置
# 用于批大小为 128 的优化器学习率
optimizer
=
dict
(
type
=
'SGD'
,
lr
=
0.01
,
momentum
=
0.9
,
weight_decay
=
0.0001
)
optimizer_config
=
dict
(
grad_clip
=
None
)
# 学习率衰减策略
lr_config
=
dict
(
policy
=
'step'
,
step
=
[
15
])
runner
=
dict
(
type
=
'EpochBasedRunner'
,
max_epochs
=
200
)
log_config
=
dict
(
interval
=
100
)
```
接下来,我们使用一台 8 张 GPU 的电脑来训练我们的模型,指令如下:
```
shell
bash tools/dist_train.sh configs/tutorial/resnet50_finetune_cifar.py 8
```
当然,我们也可以使用单张 GPU 来进行训练,使用如下命令:
```
shell
python tools/train.py configs/tutorial/resnet50_finetune_cifar.py
```
但是如果我们使用单张 GPU 进行训练的话,需要在数据集设置部分作如下修改:
```
python
data
=
dict
(
samples_per_gpu
=
128
,
train
=
dict
(
pipeline
=
train_pipeline
),
val
=
dict
(
pipeline
=
test_pipeline
),
test
=
dict
(
pipeline
=
test_pipeline
),
)
```
这是因为我们的训练策略是针对批次大小(batch size)为 128 设置的。在父配置文件中,
设置了
`samples_per_gpu=16`
,如果使用 8 张 GPU,总的批次大小就是 128。而如果使
用单张 GPU,就必须手动修改
`samples_per_gpu=128`
来匹配训练策略。
openmmlab_test/mmclassification-0.24.1/docs/zh_CN/tutorials/new_dataset.md
0 → 100644
View file @
0fd8347d
# 教程 3:如何自定义数据集
我们支持许多常用的图像分类领域公开数据集,你可以在
[
此页面
](
https://mmclassification.readthedocs.io/zh_CN/latest/api/datasets.html
)
中找到它们。
在本节中,我们将介绍如何
[
使用自己的数据集
](
#使用自己的数据集
)
以及如何
[
使用数据集包装
](
#使用数据集包装
)
。
## 使用自己的数据集
### 将数据集重新组织为已有格式
想要使用自己的数据集,最简单的方法就是将数据集转换为现有的数据集格式。
对于多分类任务,我们推荐使用
[
`CustomDataset`
](
https://mmclassification.readthedocs.io/zh_CN/latest/api/datasets.html#mmcls.datasets.CustomDataset
)
格式。
`CustomDataset`
支持两种类型的数据格式:
1.
提供一个标注文件,其中每一行表示一张样本图片。
样本图片可以以任意的结构进行组织,比如:
```
train/
├── folder_1
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
├── 123.png
├── nsdf3.png
└── ...
```
而标注文件则记录了所有样本图片的文件路径以及相应的类别序号。其中第一列表示图像
相对于主目录(本例中为
`train`
目录)的路径,第二列表示类别序号:
```
folder_1/xxx.png 0
folder_1/xxy.png 1
123.png 1
nsdf3.png 2
...
```
```
{note}
类别序号的值应当属于 `[0, num_classes - 1]` 范围。
```
2.
将所有样本文件按如下结构进行组织:
```
train/
├── cat
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
│ └── xxz.png
├── bird
│ ├── bird1.png
│ ├── bird2.png
│ └── ...
└── dog
├── 123.png
├── nsdf3.png
├── ...
└── asd932_.png
```
这种情况下,你不需要提供标注文件,所有位于
`cat`
目录下的图片文件都会被视为
`cat`
类别的样本。
通常而言,我们会将整个数据集分为三个子数据集:
`train`
,
`val`
和
`test`
,分别用于训练、验证和测试。
**每一个**
子
数据集都需要被组织成如上的一种结构。
举个例子,完整的数据集结构如下所示(使用第一种组织结构):
```
mmclassification
└── data
└── my_dataset
├── meta
│ ├── train.txt
│ ├── val.txt
│ └── test.txt
├── train
├── val
└── test
```
之后在你的配置文件中,可以修改其中的
`data`
字段为如下格式:
```
python
...
dataset_type
=
'CustomDataset'
classes
=
[
'cat'
,
'bird'
,
'dog'
]
# 数据集中各类别的名称
data
=
dict
(
train
=
dict
(
type
=
dataset_type
,
data_prefix
=
'data/my_dataset/train'
,
ann_file
=
'data/my_dataset/meta/train.txt'
,
classes
=
classes
,
pipeline
=
train_pipeline
),
val
=
dict
(
type
=
dataset_type
,
data_prefix
=
'data/my_dataset/val'
,
ann_file
=
'data/my_dataset/meta/val.txt'
,
classes
=
classes
,
pipeline
=
test_pipeline
),
test
=
dict
(
type
=
dataset_type
,
data_prefix
=
'data/my_dataset/test'
,
ann_file
=
'data/my_dataset/meta/test.txt'
,
classes
=
classes
,
pipeline
=
test_pipeline
)
)
...
```
### 创建一个新的数据集类
用户可以编写一个继承自
`BasesDataset`
的新数据集类,并重载
`load_annotations(self)`
方法,
类似
[
CIFAR10
](
https://github.com/open-mmlab/mmclassification/blob/master/mmcls/datasets/cifar.py
)
和
[
ImageNet
](
https://github.com/open-mmlab/mmclassification/blob/master/mmcls/datasets/imagenet.py
)
。
通常,此方法返回一个包含所有样本的列表,其中的每个样本都是一个字典。字典中包含了必要的数据信息,例如
`img`
和
`gt_label`
。
假设我们将要实现一个
`Filelist`
数据集,该数据集将使用文件列表进行训练和测试。注释列表的格式如下:
```
000001.jpg 0
000002.jpg 1
```
我们可以在
`mmcls/datasets/filelist.py`
中创建一个新的数据集类以加载数据。
```
python
import
mmcv
import
numpy
as
np
from
.builder
import
DATASETS
from
.base_dataset
import
BaseDataset
@
DATASETS
.
register_module
()
class
Filelist
(
BaseDataset
):
def
load_annotations
(
self
):
assert
isinstance
(
self
.
ann_file
,
str
)
data_infos
=
[]
with
open
(
self
.
ann_file
)
as
f
:
samples
=
[
x
.
strip
().
split
(
' '
)
for
x
in
f
.
readlines
()]
for
filename
,
gt_label
in
samples
:
info
=
{
'img_prefix'
:
self
.
data_prefix
}
info
[
'img_info'
]
=
{
'filename'
:
filename
}
info
[
'gt_label'
]
=
np
.
array
(
gt_label
,
dtype
=
np
.
int64
)
data_infos
.
append
(
info
)
return
data_infos
```
将新的数据集类加入到
`mmcls/datasets/__init__.py`
中:
```
python
from
.base_dataset
import
BaseDataset
...
from
.filelist
import
Filelist
__all__
=
[
'BaseDataset'
,
...
,
'Filelist'
]
```
然后在配置文件中,为了使用
`Filelist`
,用户可以按以下方式修改配置
```
python
train
=
dict
(
type
=
'Filelist'
,
ann_file
=
'image_list.txt'
,
pipeline
=
train_pipeline
)
```
## 使用数据集包装
数据集包装是一种可以改变数据集类行为的类,比如将数据集中的样本进行重复,或是将不同类别的数据进行再平衡。
### 重复数据集
我们使用
`RepeatDataset`
作为一个重复数据集的封装。举个例子,假设原始数据集是
`Dataset_A`
,为了重复它,我们需要如下的配置文件:
```
python
data
=
dict
(
train
=
dict
(
type
=
'RepeatDataset'
,
times
=
N
,
dataset
=
dict
(
# 这里是 Dataset_A 的原始配置
type
=
'Dataset_A'
,
...
pipeline
=
train_pipeline
)
)
...
)
```
### 类别平衡数据集
我们使用
`ClassBalancedDataset`
作为根据类别频率对数据集进行重复采样的封装类。进行重复采样的数据集需要实现函数
`self.get_cat_ids(idx)`
以支持
`ClassBalancedDataset`
。
举个例子,按照
`oversample_thr=1e-3`
对
`Dataset_A`
进行重复采样,需要如下的配置文件:
```
python
data
=
dict
(
train
=
dict
(
type
=
'ClassBalancedDataset'
,
oversample_thr
=
1e-3
,
dataset
=
dict
(
# 这里是 Dataset_A 的原始配置
type
=
'Dataset_A'
,
...
pipeline
=
train_pipeline
)
)
...
)
```
更加具体的细节,请参考
[
API 文档
](
https://mmclassification.readthedocs.io/zh_CN/latest/api/datasets.html#mmcls.datasets.ClassBalancedDataset
)
。
openmmlab_test/mmclassification-0.24.1/docs/zh_CN/tutorials/new_modules.md
0 → 100644
View file @
0fd8347d
# 教程 5:如何增加新模块
## 开发新组件
我们基本上将模型组件分为 3 种类型。
-
主干网络:通常是一个特征提取网络,例如 ResNet、MobileNet
-
颈部:用于连接主干网络和头部的组件,例如 GlobalAveragePooling
-
头部:用于执行特定任务的组件,例如分类和回归
### 添加新的主干网络
这里,我们以 ResNet_CIFAR 为例,展示了如何开发一个新的主干网络组件。
ResNet_CIFAR 针对 CIFAR 32x32 的图像输入,将 ResNet 中
`kernel_size=7, stride=2`
的设置替换为
`kernel_size=3, stride=1`
,并移除了 stem 层之后的
`MaxPooling`
,以避免传递过小的特征图到残差块中。
它继承自
`ResNet`
并只修改了 stem 层。
1.
创建一个新文件
`mmcls/models/backbones/resnet_cifar.py`
。
```
python
import
torch.nn
as
nn
from
..builder
import
BACKBONES
from
.resnet
import
ResNet
@
BACKBONES
.
register_module
()
class
ResNet_CIFAR
(
ResNet
):
"""ResNet backbone for CIFAR.
(对这个主干网络的简短描述)
Args:
depth(int): Network depth, from {18, 34, 50, 101, 152}.
...
(参数文档)
"""
def
__init__
(
self
,
depth
,
deep_stem
=
False
,
**
kwargs
):
# 调用基类 ResNet 的初始化函数
super
(
ResNet_CIFAR
,
self
).
__init__
(
depth
,
deep_stem
=
deep_stem
**
kwargs
)
# 其他特殊的初始化流程
assert
not
self
.
deep_stem
,
'ResNet_CIFAR do not support deep_stem'
def
_make_stem_layer
(
self
,
in_channels
,
base_channels
):
# 重载基类的方法,以实现对网络结构的修改
self
.
conv1
=
build_conv_layer
(
self
.
conv_cfg
,
in_channels
,
base_channels
,
kernel_size
=
3
,
stride
=
1
,
padding
=
1
,
bias
=
False
)
self
.
norm1_name
,
norm1
=
build_norm_layer
(
self
.
norm_cfg
,
base_channels
,
postfix
=
1
)
self
.
add_module
(
self
.
norm1_name
,
norm1
)
self
.
relu
=
nn
.
ReLU
(
inplace
=
True
)
def
forward
(
self
,
x
):
# 需要返回一个元组
pass
# 此处省略了网络的前向实现
def
init_weights
(
self
,
pretrained
=
None
):
pass
# 如果有必要的话,重载基类 ResNet 的参数初始化函数
def
train
(
self
,
mode
=
True
):
pass
# 如果有必要的话,重载基类 ResNet 的训练状态函数
```
2.
在
`mmcls/models/backbones/__init__.py`
中导入新模块
```
python
...
from
.resnet_cifar
import
ResNet_CIFAR
__all__
=
[
...,
'ResNet_CIFAR'
]
```
3.
在配置文件中使用新的主干网络
```
python
model
=
dict
(
...
backbone
=
dict
(
type
=
'ResNet_CIFAR'
,
depth
=
18
,
other_arg
=
xxx
),
...
```
### 添加新的颈部组件
这里我们以
`GlobalAveragePooling`
为例。这是一个非常简单的颈部组件,没有任何参数。
要添加新的颈部组件,我们主要需要实现
`forward`
函数,该函数对主干网络的输出进行
一些操作并将结果传递到头部。
1.
创建一个新文件
`mmcls/models/necks/gap.py`
```
python
import
torch.nn
as
nn
from
..builder
import
NECKS
@
NECKS
.
register_module
()
class
GlobalAveragePooling
(
nn
.
Module
):
def
__init__
(
self
):
self
.
gap
=
nn
.
AdaptiveAvgPool2d
((
1
,
1
))
def
forward
(
self
,
inputs
):
# 简单起见,我们默认输入是一个张量
outs
=
self
.
gap
(
inputs
)
outs
=
outs
.
view
(
inputs
.
size
(
0
),
-
1
)
return
outs
```
2.
在
`mmcls/models/necks/__init__.py`
中导入新模块
```
python
...
from
.gap
import
GlobalAveragePooling
__all__
=
[
...,
'GlobalAveragePooling'
]
```
3.
修改配置文件以使用新的颈部组件
```
python
model
=
dict
(
neck
=
dict
(
type
=
'GlobalAveragePooling'
),
)
```
### 添加新的头部组件
在此,我们以
`LinearClsHead`
为例,说明如何开发新的头部组件。
要添加一个新的头部组件,基本上我们需要实现
`forward_train`
函数,它接受来自颈部
或主干网络的特征图作为输入,并基于真实标签计算。
1.
创建一个文件
`mmcls/models/heads/linear_head.py`
.
```
python
from
..builder
import
HEADS
from
.cls_head
import
ClsHead
@
HEADS
.
register_module
()
class
LinearClsHead
(
ClsHead
):
def
__init__
(
self
,
num_classes
,
in_channels
,
loss
=
dict
(
type
=
'CrossEntropyLoss'
,
loss_weight
=
1.0
),
topk
=
(
1
,
)):
super
(
LinearClsHead
,
self
).
__init__
(
loss
=
loss
,
topk
=
topk
)
self
.
in_channels
=
in_channels
self
.
num_classes
=
num_classes
if
self
.
num_classes
<=
0
:
raise
ValueError
(
f
'num_classes=
{
num_classes
}
must be a positive integer'
)
self
.
_init_layers
()
def
_init_layers
(
self
):
self
.
fc
=
nn
.
Linear
(
self
.
in_channels
,
self
.
num_classes
)
def
init_weights
(
self
):
normal_init
(
self
.
fc
,
mean
=
0
,
std
=
0.01
,
bias
=
0
)
def
forward_train
(
self
,
x
,
gt_label
):
cls_score
=
self
.
fc
(
x
)
losses
=
self
.
loss
(
cls_score
,
gt_label
)
return
losses
```
2.
在
`mmcls/models/heads/__init__.py`
中导入这个模块
```
python
...
from
.linear_head
import
LinearClsHead
__all__
=
[
...,
'LinearClsHead'
]
```
3.
修改配置文件以使用新的头部组件。
连同
`GlobalAveragePooling`
颈部组件,完整的模型配置如下:
```
python
model
=
dict
(
type
=
'ImageClassifier'
,
backbone
=
dict
(
type
=
'ResNet'
,
depth
=
50
,
num_stages
=
4
,
out_indices
=
(
3
,
),
style
=
'pytorch'
),
neck
=
dict
(
type
=
'GlobalAveragePooling'
),
head
=
dict
(
type
=
'LinearClsHead'
,
num_classes
=
1000
,
in_channels
=
2048
,
loss
=
dict
(
type
=
'CrossEntropyLoss'
,
loss_weight
=
1.0
),
topk
=
(
1
,
5
),
))
```
### 添加新的损失函数
要添加新的损失函数,我们主要需要在损失函数模块中
`forward`
函数。另外,利用装饰器
`weighted_loss`
可以方便的实现对每个元素的损失进行加权平均。
假设我们要模拟从另一个分类模型生成的概率分布,需要添加
`L1loss`
来实现该目的。
1.
创建一个新文件
`mmcls/models/losses/l1_loss.py`
```
python
import
torch
import
torch.nn
as
nn
from
..builder
import
LOSSES
from
.utils
import
weighted_loss
@
weighted_loss
def
l1_loss
(
pred
,
target
):
assert
pred
.
size
()
==
target
.
size
()
and
target
.
numel
()
>
0
loss
=
torch
.
abs
(
pred
-
target
)
return
loss
@
LOSSES
.
register_module
()
class
L1Loss
(
nn
.
Module
):
def
__init__
(
self
,
reduction
=
'mean'
,
loss_weight
=
1.0
):
super
(
L1Loss
,
self
).
__init__
()
self
.
reduction
=
reduction
self
.
loss_weight
=
loss_weight
def
forward
(
self
,
pred
,
target
,
weight
=
None
,
avg_factor
=
None
,
reduction_override
=
None
):
assert
reduction_override
in
(
None
,
'none'
,
'mean'
,
'sum'
)
reduction
=
(
reduction_override
if
reduction_override
else
self
.
reduction
)
loss
=
self
.
loss_weight
*
l1_loss
(
pred
,
target
,
weight
,
reduction
=
reduction
,
avg_factor
=
avg_factor
)
return
loss
```
2.
在文件
`mmcls/models/losses/__init__.py`
中导入这个模块
```
python
...
from
.l1_loss
import
L1Loss
,
l1_loss
__all__
=
[
...,
'L1Loss'
,
'l1_loss'
]
```
3.
修改配置文件中的
`loss`
字段以使用新的损失函数
```
python
loss
=
dict
(
type
=
'L1Loss'
,
loss_weight
=
1.0
))
```
openmmlab_test/mmclassification-0.24.1/docs/zh_CN/tutorials/runtime.md
0 → 100644
View file @
0fd8347d
# 教程 7:如何自定义模型运行参数
在本教程中,我们将介绍如何在运行自定义模型时,进行自定义工作流和钩子的方法。
<!-- TOC -->
-
[
定制工作流
](
#定制工作流
)
-
[
钩子
](
#钩子
)
-
[
默认训练钩子
](
#默认训练钩子
)
-
[
使用内置钩子
](
#使用内置钩子
)
-
[
自定义钩子
](
#自定义钩子
)
-
[
常见问题
](
#常见问题
)
<!-- TOC -->
## 定制工作流
工作流是一个形如 (任务名,周期数) 的列表,用于指定运行顺序和周期。这里“周期数”的单位由执行器的类型来决定。
比如在 MMClassification 中,我们默认使用基于
**轮次**
的执行器(
`EpochBasedRunner`
),那么“周期数”指的就是对应的任务在一个周期中
要执行多少个轮次。通常,我们只希望执行训练任务,那么只需要使用以下设置:
```
python
workflow
=
[(
'train'
,
1
)]
```
有时我们可能希望在训练过程中穿插检查模型在验证集上的一些指标(例如,损失,准确性)。
在这种情况下,可以将工作流程设置为:
```
python
[(
'train'
,
1
),
(
'val'
,
1
)]
```
这样一来,程序会一轮训练一轮测试地反复执行。
需要注意的是,默认情况下,我们并不推荐用这种方式来进行模型验证,而是推荐在训练中使用
**`EvalHook`**
进行模型验证。使用上述工作流的方式进行模型验证只是一个替代方案。
```
{note}
1. 在验证周期时不会更新模型参数。
2. 配置文件内的关键词 `max_epochs` 控制训练时期数,并且不会影响验证工作流程。
3. 工作流 `[('train', 1), ('val', 1)]` 和 `[('train', 1)]` 不会改变 `EvalHook` 的行为。
因为 `EvalHook` 由 `after_train_epoch` 调用,而验证工作流只会影响 `after_val_epoch` 调用的钩子。
因此,`[('train', 1), ('val', 1)]` 和 ``[('train', 1)]`` 的区别在于,runner 在完成每一轮训练后,会计算验证集上的损失。
```
## 钩子
钩子机制在 OpenMMLab 开源算法库中应用非常广泛,结合执行器可以实现对训练过程的整个生命周期进行管理,可以通过
[
相关文章
](
https://zhuanlan.zhihu.com/p/355272220
)
进一步理解钩子。
钩子只有在构造器中被注册才起作用,目前钩子主要分为两类:
-
默认训练钩子
默认训练钩子由运行器默认注册,一般为一些基础型功能的钩子,已经有确定的优先级,一般不需要修改优先级。
-
定制钩子
定制钩子通过
`custom_hooks`
注册,一般为一些增强型功能的钩子,需要在配置文件中指定优先级,不指定该钩子的优先级将默被设定为 'NORMAL'。
**优先级列表**
| Level | Value |
| :-------------: | :---: |
| HIGHEST | 0 |
| VERY_HIGH | 10 |
| HIGH | 30 |
| ABOVE_NORMAL | 40 |
| NORMAL(default) | 50 |
| BELOW_NORMAL | 60 |
| LOW | 70 |
| VERY_LOW | 90 |
| LOWEST | 100 |
优先级确定钩子的执行顺序,每次训练前,日志会打印出各个阶段钩子的执行顺序,方便调试。
### 默认训练钩子
有一些常见的钩子未通过
`custom_hooks`
注册,但会在运行器(
`Runner`
)中默认注册,它们是:
| Hooks | Priority |
| :-------------------: | :---------------: |
|
`LrUpdaterHook`
| VERY_HIGH (10) |
|
`MomentumUpdaterHook`
| HIGH (30) |
|
`OptimizerHook`
| ABOVE_NORMAL (40) |
|
`CheckpointHook`
| NORMAL (50) |
|
`IterTimerHook`
| LOW (70) |
|
`EvalHook`
| LOW (70) |
|
`LoggerHook(s)`
| VERY_LOW (90) |
`OptimizerHook`
,
`MomentumUpdaterHook`
和
`LrUpdaterHook`
在
[
优化策略
](
./schedule.md
)
部分进行了介绍,
`IterTimerHook`
用于记录所用时间,目前不支持修改;
下面介绍如何使用去定制
`CheckpointHook`
、
`LoggerHooks`
以及
`EvalHook`
。
#### 权重文件钩子(CheckpointHook)
MMCV 的 runner 使用
`checkpoint_config`
来初始化
[
`CheckpointHook`
](
https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/hooks/checkpoint.py#L9
)
。
```
python
checkpoint_config
=
dict
(
interval
=
1
)
```
用户可以设置 “max_keep_ckpts” 来仅保存少量模型权重文件,或者通过 “save_optimizer” 决定是否存储优化器的状态字典。
更多细节可参考
[
这里
](
https://mmcv.readthedocs.io/zh_CN/latest/api.html#mmcv.runner.CheckpointHook
)
。
#### 日志钩子(LoggerHooks)
`log_config`
包装了多个记录器钩子,并可以设置间隔。
目前,MMCV 支持
`TextLoggerHook`
、
`WandbLoggerHook`
、
`MlflowLoggerHook`
和
`TensorboardLoggerHook`
。
更多细节可参考
[
这里
](
https://mmcv.readthedocs.io/zh_CN/latest/api.html#mmcv.runner.LoggerHook
)
。
```
python
log_config
=
dict
(
interval
=
50
,
hooks
=
[
dict
(
type
=
'TextLoggerHook'
),
dict
(
type
=
'TensorboardLoggerHook'
)
])
```
#### 验证钩子(EvalHook)
配置中的
`evaluation`
字段将用于初始化
[
`EvalHook`
](
https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/evaluation.py
)
。
`EvalHook`
有一些保留参数,如
`interval`
,
`save_best`
和
`start`
等。其他的参数,如“metrics”将被传递给
`dataset.evaluate()`
。
```
python
evaluation
=
dict
(
interval
=
1
,
metric
=
'accuracy'
,
metric_options
=
{
'topk'
:
(
1
,
)})
```
我们可以通过参数
`save_best`
保存取得最好验证结果时的模型权重:
```
python
# "auto" 表示自动选择指标来进行模型的比较。也可以指定一个特定的 key 比如 "accuracy_top-1"。
evaluation
=
dict
(
interval
=
1
,
save_best
=
True
,
metric
=
'accuracy'
,
metric_options
=
{
'topk'
:
(
1
,
)})
```
在跑一些大型实验时,可以通过修改参数
`start`
跳过训练靠前轮次时的验证步骤,以节约时间。如下:
```
python
evaluation
=
dict
(
interval
=
1
,
start
=
200
,
metric
=
'accuracy'
,
metric_options
=
{
'topk'
:
(
1
,
)})
```
表示在第 200 轮之前,只执行训练流程,不执行验证;从轮次 200 开始,在每一轮训练之后进行验证。
```
{note}
在 MMClassification 的默认配置文件中,evaluation 字段一般被放在 datasets 基础配置文件中。
```
### 使用内置钩子
一些钩子已在 MMCV 和 MMClassification 中实现:
-
[
EMAHook
](
https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/ema.py
)
-
[
SyncBuffersHook
](
https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/sync_buffer.py
)
-
[
EmptyCacheHook
](
https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/memory.py
)
-
[
ProfilerHook
](
https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/profiler.py
)
-
......
可以直接修改配置以使用该钩子,如下格式:
```
python
custom_hooks
=
[
dict
(
type
=
'MMCVHook'
,
a
=
a_value
,
b
=
b_value
,
priority
=
'NORMAL'
)
]
```
例如使用
`EMAHook`
,进行一次 EMA 的间隔是100个迭代:
```
python
custom_hooks
=
[
dict
(
type
=
'EMAHook'
,
interval
=
100
,
priority
=
'HIGH'
)
]
```
## 自定义钩子
### 创建一个新钩子
这里举一个在 MMClassification 中创建一个新钩子,并在训练中使用它的示例:
```
python
from
mmcv.runner
import
HOOKS
,
Hook
@
HOOKS
.
register_module
()
class
MyHook
(
Hook
):
def
__init__
(
self
,
a
,
b
):
pass
def
before_run
(
self
,
runner
):
pass
def
after_run
(
self
,
runner
):
pass
def
before_epoch
(
self
,
runner
):
pass
def
after_epoch
(
self
,
runner
):
pass
def
before_iter
(
self
,
runner
):
pass
def
after_iter
(
self
,
runner
):
pass
```
根据钩子的功能,用户需要指定钩子在训练的每个阶段将要执行的操作,比如
`before_run`
,
`after_run`
,
`before_epoch`
,
`after_epoch`
,
`before_iter`
和
`after_iter`
。
### 注册新钩子
之后,需要导入
`MyHook`
。假设该文件在
`mmcls/core/utils/my_hook.py`
,有两种办法导入它:
-
修改
`mmcls/core/utils/__init__.py`
进行导入
新定义的模块应导入到
`mmcls/core/utils/__init__py`
中,以便注册器能找到并添加新模块:
```
python
from
.my_hook
import
MyHook
__all__
=
[
'MyHook'
]
```
-
使用配置文件中的
`custom_imports`
变量手动导入
```
python
custom_imports
=
dict
(
imports
=
[
'mmcls.core.utils.my_hook'
],
allow_failed_imports
=
False
)
```
### 修改配置
```
python
custom_hooks
=
[
dict
(
type
=
'MyHook'
,
a
=
a_value
,
b
=
b_value
)
]
```
还可通过
`priority`
参数设置钩子优先级,如下所示:
```
python
custom_hooks
=
[
dict
(
type
=
'MyHook'
,
a
=
a_value
,
b
=
b_value
,
priority
=
'NORMAL'
)
]
```
默认情况下,在注册过程中,钩子的优先级设置为“NORMAL”。
## 常见问题
### 1. resume_from, load_from,init_cfg.Pretrained 区别
-
`load_from`
:仅仅加载模型权重,主要用于加载预训练或者训练好的模型;
-
`resume_from`
:不仅导入模型权重,还会导入优化器信息,当前轮次(epoch)信息,主要用于从断点继续训练。
-
`init_cfg.Pretrained`
:在权重初始化期间加载权重,您可以指定要加载的模块。 这通常在微调模型时使用,请参阅
[
教程 2:如何微调模型
](
./finetune.md
)
openmmlab_test/mmclassification-0.24.1/docs/zh_CN/tutorials/schedule.md
0 → 100644
View file @
0fd8347d
# 教程 6:如何自定义优化策略
在本教程中,我们将介绍如何在运行自定义模型时,进行构造优化器、定制学习率及动量调整策略、梯度裁剪、梯度累计以及用户自定义优化方法等。
<!-- TOC -->
-
[
构造 PyTorch 内置优化器
](
#构造-pytorch-内置优化器
)
-
[
定制学习率调整策略
](
#定制学习率调整策略
)
-
[
学习率衰减曲线
](
#定制学习率衰减曲线
)
-
[
学习率预热策略
](
#定制学习率预热策略
)
-
[
定制动量调整策略
](
#定制动量调整策略
)
-
[
参数化精细配置
](
#参数化精细配置
)
-
[
梯度裁剪与梯度累计
](
#梯度裁剪与梯度累计
)
-
[
梯度裁剪
](
#梯度裁剪
)
-
[
梯度累计
](
#梯度累计
)
-
[
用户自定义优化方法
](
#用户自定义优化方法
)
-
[
自定义优化器
](
#自定义优化器
)
-
[
自定义优化器构造器
](
#自定义优化器构造器
)
<!-- TOC -->
## 构造 PyTorch 内置优化器
MMClassification 支持 PyTorch 实现的所有优化器,仅需在配置文件中,指定 “optimizer” 字段。
例如,如果要使用 “SGD”,则修改如下。
```
python
optimizer
=
dict
(
type
=
'SGD'
,
lr
=
0.0003
,
weight_decay
=
0.0001
)
```
要修改模型的学习率,只需要在优化器的配置中修改
`lr`
即可。
要配置其他参数,可直接根据
[
PyTorch API 文档
](
https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim
)
进行。
```
{note}
配置文件中的 'type' 不是构造时的参数,而是 PyTorch 内置优化器的类名。
```
例如,如果想使用
`Adam`
并设置参数为
`torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)`
,
则需要进行如下修改
```
python
optimizer
=
dict
(
type
=
'Adam'
,
lr
=
0.001
,
betas
=
(
0.9
,
0.999
),
eps
=
1e-08
,
weight_decay
=
0
,
amsgrad
=
False
)
```
## 定制学习率调整策略
### 定制学习率衰减曲线
深度学习研究中,广泛应用学习率衰减来提高网络的性能。要使用学习率衰减,可以在配置中设置
`lr_confg`
字段。
比如在默认的 ResNet 网络训练中,我们使用阶梯式的学习率衰减策略,配置文件为:
```
python
lr_config
=
dict
(
policy
=
'step'
,
step
=
[
100
,
150
])
```
在训练过程中,程序会周期性地调用 MMCV 中的
[
`StepLRHook`
](
https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L153
)
来进行学习率更新。
此外,我们也支持其他学习率调整方法,如
`CosineAnnealing`
和
`Poly`
等。详情可见
[
这里
](
https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py
)
-
ConsineAnnealing:
```
python
lr_config
=
dict
(
policy
=
'CosineAnnealing'
,
min_lr_ratio
=
1e-5
)
```
-
Poly:
```
python
lr_config
=
dict
(
policy
=
'poly'
,
power
=
0.9
,
min_lr
=
1e-4
,
by_epoch
=
False
)
```
### 定制学习率预热策略
在训练的早期阶段,网络容易不稳定,而学习率的预热就是为了减少这种不稳定性。通过预热,学习率将会从一个很小的值逐步提高到预定值。
在 MMClassification 中,我们同样使用
`lr_config`
配置学习率预热策略,主要的参数有以下几个:
-
`warmup`
: 学习率预热曲线类别,必须为 'constant'、 'linear', 'exp' 或者
`None`
其一, 如果为
`None`
, 则不使用学习率预热策略。
-
`warmup_by_epoch`
: 是否以轮次(epoch)为单位进行预热。
-
`warmup_iters`
: 预热的迭代次数,当
`warmup_by_epoch=True`
时,单位为轮次(epoch);当
`warmup_by_epoch=False`
时,单位为迭代次数(iter)。
-
`warmup_ratio`
: 预测的初始学习率
`lr = lr * warmup_ratio`
。
例如:
1.
逐
**迭代次数**
地
**线性**
预热
```
python
lr_config
=
dict
(
policy
=
'CosineAnnealing'
,
by_epoch
=
False
,
min_lr_ratio
=
1e-2
,
warmup
=
'linear'
,
warmup_ratio
=
1e-3
,
warmup_iters
=
20
*
1252
,
warmup_by_epoch
=
False
)
```
2.
逐
**轮次**
地
**指数**
预热
```
python
lr_config
=
dict
(
policy
=
'CosineAnnealing'
,
min_lr
=
0
,
warmup
=
'exp'
,
warmup_iters
=
5
,
warmup_ratio
=
0.1
,
warmup_by_epoch
=
True
)
```
```
{tip}
配置完成后,可以使用 MMClassification 提供的 [学习率可视化工具](https://mmclassification.readthedocs.io/zh_CN/latest/tools/visualization.html#id3) 画出对应学习率调整曲线。
```
## 定制动量调整策略
MMClassification 支持动量调整器根据学习率修改模型的动量,从而使模型收敛更快。
动量调整程序通常与学习率调整器一起使用,例如,以下配置用于加速收敛。
更多细节可参考
[
CyclicLrUpdater
](
https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327
)
和
[
CyclicMomentumUpdater
](
https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130
)
。
这里是一个用例:
```
python
lr_config
=
dict
(
policy
=
'cyclic'
,
target_ratio
=
(
10
,
1e-4
),
cyclic_times
=
1
,
step_ratio_up
=
0.4
,
)
momentum_config
=
dict
(
policy
=
'cyclic'
,
target_ratio
=
(
0.85
/
0.95
,
1
),
cyclic_times
=
1
,
step_ratio_up
=
0.4
,
)
```
## 参数化精细配置
一些模型可能具有一些特定于参数的设置以进行优化,例如 BatchNorm 层不添加权重衰减或者对不同的网络层使用不同的学习率。
在 MMClassification 中,我们通过
`optimizer`
的
`paramwise_cfg`
参数进行配置,可以参考
[
MMCV
](
https://mmcv.readthedocs.io/en/latest/_modules/mmcv/runner/optimizer/default_constructor.html#DefaultOptimizerConstructor
)
。
-
使用指定选项
MMClassification 提供了包括
`bias_lr_mult`
、
`bias_decay_mult`
、
`norm_decay_mult`
、
`dwconv_decay_mult`
、
`dcn_offset_lr_mult`
和
`bypass_duplicate`
选项,指定相关所有的
`bais`
、
`norm`
、
`dwconv`
、
`dcn`
和
`bypass`
参数。例如令模型中所有的 BN 不进行参数衰减:
```
python
optimizer
=
dict
(
type
=
'SGD'
,
lr
=
0.8
,
weight_decay
=
1e-4
,
paramwise_cfg
=
dict
(
norm_decay_mult
=
0.
)
)
```
-
使用
`custom_keys`
指定参数
MMClassification 可通过
`custom_keys`
指定不同的参数使用不同的学习率或者权重衰减,例如对特定的参数不使用权重衰减:
```
python
paramwise_cfg
=
dict
(
custom_keys
=
{
'backbone.cls_token'
:
dict
(
decay_mult
=
0.0
),
'backbone.pos_embed'
:
dict
(
decay_mult
=
0.0
)
})
optimizer
=
dict
(
type
=
'SGD'
,
lr
=
0.8
,
weight_decay
=
1e-4
,
paramwise_cfg
=
paramwise_cfg
)
```
对 backbone 使用更小的学习率与衰减系数:
```
python
optimizer
=
dict
(
type
=
'SGD'
,
lr
=
0.8
,
weight_decay
=
1e-4
,
# backbone 的 'lr' and 'weight_decay' 分别为 0.1 * lr 和 0.9 * weight_decay
paramwise_cfg
=
dict
(
custom_keys
=
{
'backbone'
:
dict
(
lr_mult
=
0.1
,
decay_mult
=
0.9
)}))
```
## 梯度裁剪与梯度累计
除了 PyTorch 优化器的基本功能,我们还提供了一些对优化器的增强功能,例如梯度裁剪、梯度累计等,参考
[
MMCV
](
https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/optimizer.py
)
。
### 梯度裁剪
在训练过程中,损失函数可能接近于一些异常陡峭的区域,从而导致梯度爆炸。而梯度裁剪可以帮助稳定训练过程,更多介绍可以参见
[
该页面
](
https://paperswithcode.com/method/gradient-clipping
)
。
目前我们支持在
`optimizer_config`
字段中添加
`grad_clip`
参数来进行梯度裁剪,更详细的参数可参考
[
PyTorch 文档
](
https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html
)
。
用例如下:
```
python
# norm_type: 使用的范数类型,此处使用范数2。
optimizer_config
=
dict
(
grad_clip
=
dict
(
max_norm
=
35
,
norm_type
=
2
))
```
当使用继承并修改基础配置方式时,如果基础配置中
`grad_clip=None`
,需要添加
`_delete_=True`
。有关
`_delete_`
可以参考
[
教程 1:如何编写配置文件
](
https://mmclassification.readthedocs.io/zh_CN/latest/tutorials/config.html#id16
)
。案例如下:
```
python
_base_
=
[.
/
_base_
/
schedules
/
imagenet_bs256_coslr
.
py
]
optimizer_config
=
dict
(
grad_clip
=
dict
(
max_norm
=
35
,
norm_type
=
2
),
_delete_
=
True
,
type
=
'OptimizerHook'
)
# 当 type 为 'OptimizerHook',可以省略 type;其他情况下,此处必须指明 type='xxxOptimizerHook'。
```
### 梯度累计
计算资源缺乏缺乏时,每个训练批次的大小(batch size)只能设置为较小的值,这可能会影响模型的性能。
可以使用梯度累计来规避这一问题。
用例如下:
```
python
data
=
dict
(
samples_per_gpu
=
64
)
optimizer_config
=
dict
(
type
=
"GradientCumulativeOptimizerHook"
,
cumulative_iters
=
4
)
```
表示训练时,每 4 个 iter 执行一次反向传播。由于此时单张 GPU 上的批次大小为 64,也就等价于单张 GPU 上一次迭代的批次大小为 256,也即:
```
python
data
=
dict
(
samples_per_gpu
=
256
)
optimizer_config
=
dict
(
type
=
"OptimizerHook"
)
```
```
{note}
当在 `optimizer_config` 不指定优化器钩子类型时,默认使用 `OptimizerHook`。
```
## 用户自定义优化方法
在学术研究和工业实践中,可能需要使用 MMClassification 未实现的优化方法,可以通过以下方法添加。
```
{note}
本部分将修改 MMClassification 源码或者向 MMClassification 框架添加代码,初学者可跳过。
```
### 自定义优化器
#### 1. 定义一个新的优化器
一个自定义的优化器可根据如下规则进行定制
假设我们想添加一个名为
`MyOptimzer`
的优化器,其拥有参数
`a`
,
`b`
和
`c`
。
可以创建一个名为
`mmcls/core/optimizer`
的文件夹,并在目录下的一个文件,如
`mmcls/core/optimizer/my_optimizer.py`
中实现该自定义优化器:
```
python
from
mmcv.runner
import
OPTIMIZERS
from
torch.optim
import
Optimizer
@
OPTIMIZERS
.
register_module
()
class
MyOptimizer
(
Optimizer
):
def
__init__
(
self
,
a
,
b
,
c
):
```
#### 2. 注册优化器
要注册上面定义的上述模块,首先需要将此模块导入到主命名空间中。有两种方法可以实现它。
-
修改
`mmcls/core/optimizer/__init__.py`
,将其导入至
`optimizer`
包;再修改
`mmcls/core/__init__.py`
以导入
`optimizer`
包
创建
`mmcls/core/optimizer/__init__.py`
文件。
新定义的模块应导入到
`mmcls/core/optimizer/__init__.py`
中,以便注册器能找到新模块并将其添加:
```
python
# 在 mmcls/core/optimizer/__init__.py 中
from
.my_optimizer
import
MyOptimizer
# MyOptimizer 是我们自定义的优化器的名字
__all__
=
[
'MyOptimizer'
]
```
```
python
# 在 mmcls/core/__init__.py 中
...
from
.optimizer
import
*
# noqa: F401, F403
```
-
在配置中使用
`custom_imports`
手动导入
```
python
custom_imports
=
dict
(
imports
=
[
'mmcls.core.optimizer.my_optimizer'
],
allow_failed_imports
=
False
)
```
`mmcls.core.optimizer.my_optimizer`
模块将会在程序开始阶段被导入,
`MyOptimizer`
类会随之自动被注册。
注意,只有包含
`MyOptmizer`
类的包会被导入。
`mmcls.core.optimizer.my_optimizer.MyOptimizer`
**不会**
被直接导入。
#### 3. 在配置文件中指定优化器
之后,用户便可在配置文件的
`optimizer`
域中使用
`MyOptimizer`
。
在配置中,优化器由 “optimizer” 字段定义,如下所示:
```
python
optimizer
=
dict
(
type
=
'SGD'
,
lr
=
0.02
,
momentum
=
0.9
,
weight_decay
=
0.0001
)
```
要使用自定义的优化器,可以将该字段更改为
```
python
optimizer
=
dict
(
type
=
'MyOptimizer'
,
a
=
a_value
,
b
=
b_value
,
c
=
c_value
)
```
### 自定义优化器构造器
某些模型可能具有一些特定于参数的设置以进行优化,例如 BatchNorm 层的权重衰减。
虽然我们的
`DefaultOptimizerConstructor`
已经提供了这些强大的功能,但可能仍然无法覆盖需求。
此时我们可以通过自定义优化器构造函数来进行其他细粒度的参数调整。
```
python
from
mmcv.runner.optimizer
import
OPTIMIZER_BUILDERS
@
OPTIMIZER_BUILDERS
.
register_module
()
class
MyOptimizerConstructor
:
def
__init__
(
self
,
optimizer_cfg
,
paramwise_cfg
=
None
):
pass
def
__call__
(
self
,
model
):
...
# 在这里实现自己的优化器构造器。
return
my_optimizer
```
[
这里
](
https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/optimizer/default_constructor.py#L11
)
是我们默认的优化器构造器的实现,可以作为新优化器构造器实现的模板。
openmmlab_test/mmclassification-0.24.1/hostfile
0 → 100644
View file @
0fd8347d
a03r3n15 slots=4
a03r1n12 slots=4
openmmlab_test/mmclassification-0.24.1/mmcls/__init__.py
0 → 100644
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
import
warnings
import
mmcv
from
packaging.version
import
parse
from
.version
import
__version__
def
digit_version
(
version_str
:
str
,
length
:
int
=
4
):
"""Convert a version string into a tuple of integers.
This method is usually used for comparing two versions. For pre-release
versions: alpha < beta < rc.
Args:
version_str (str): The version string.
length (int): The maximum number of version levels. Default: 4.
Returns:
tuple[int]: The version info in digits (integers).
"""
version
=
parse
(
version_str
)
assert
version
.
release
,
f
'failed to parse version
{
version_str
}
'
release
=
list
(
version
.
release
)
release
=
release
[:
length
]
if
len
(
release
)
<
length
:
release
=
release
+
[
0
]
*
(
length
-
len
(
release
))
if
version
.
is_prerelease
:
mapping
=
{
'a'
:
-
3
,
'b'
:
-
2
,
'rc'
:
-
1
}
val
=
-
4
# version.pre can be None
if
version
.
pre
:
if
version
.
pre
[
0
]
not
in
mapping
:
warnings
.
warn
(
f
'unknown prerelease version
{
version
.
pre
[
0
]
}
, '
'version checking may go wrong'
)
else
:
val
=
mapping
[
version
.
pre
[
0
]]
release
.
extend
([
val
,
version
.
pre
[
-
1
]])
else
:
release
.
extend
([
val
,
0
])
elif
version
.
is_postrelease
:
release
.
extend
([
1
,
version
.
post
])
else
:
release
.
extend
([
0
,
0
])
return
tuple
(
release
)
mmcv_minimum_version
=
'1.4.2'
mmcv_maximum_version
=
'1.9.0'
mmcv_version
=
digit_version
(
mmcv
.
__version__
)
assert
(
mmcv_version
>=
digit_version
(
mmcv_minimum_version
)
and
mmcv_version
<=
digit_version
(
mmcv_maximum_version
)),
\
f
'MMCV==
{
mmcv
.
__version__
}
is used but incompatible. '
\
f
'Please install mmcv>=
{
mmcv_minimum_version
}
, <=
{
mmcv_maximum_version
}
.'
__all__
=
[
'__version__'
,
'digit_version'
]
openmmlab_test/mmclassification-0.24.1/mmcls/apis/__init__.py
0 → 100644
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
from
.inference
import
inference_model
,
init_model
,
show_result_pyplot
from
.test
import
multi_gpu_test
,
single_gpu_test
from
.train
import
init_random_seed
,
set_random_seed
,
train_model
__all__
=
[
'set_random_seed'
,
'train_model'
,
'init_model'
,
'inference_model'
,
'multi_gpu_test'
,
'single_gpu_test'
,
'show_result_pyplot'
,
'init_random_seed'
]
openmmlab_test/mmclassification-
speed-benchmark
/mmcls/apis/inference.py
→
openmmlab_test/mmclassification-
0.24.1
/mmcls/apis/inference.py
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
import
warnings
import
matplotlib.pyplot
as
plt
import
mmcv
import
numpy
as
np
import
torch
...
...
@@ -34,8 +34,9 @@ def init_model(config, checkpoint=None, device='cuda:0', options=None):
config
.
model
.
pretrained
=
None
model
=
build_classifier
(
config
.
model
)
if
checkpoint
is
not
None
:
map_loc
=
'cpu'
if
device
==
'cpu'
else
None
checkpoint
=
load_checkpoint
(
model
,
checkpoint
,
map_location
=
map_loc
)
# Mapping the weights to GPU may cause unexpected video memory leak
# which refers to https://github.com/open-mmlab/mmdetection/pull/6405
checkpoint
=
load_checkpoint
(
model
,
checkpoint
,
map_location
=
'cpu'
)
if
'CLASSES'
in
checkpoint
.
get
(
'meta'
,
{}):
model
.
CLASSES
=
checkpoint
[
'meta'
][
'CLASSES'
]
else
:
...
...
@@ -89,7 +90,12 @@ def inference_model(model, img):
return
result
def
show_result_pyplot
(
model
,
img
,
result
,
fig_size
=
(
15
,
10
)):
def
show_result_pyplot
(
model
,
img
,
result
,
fig_size
=
(
15
,
10
),
title
=
'result'
,
wait_time
=
0
):
"""Visualize the classification results on the image.
Args:
...
...
@@ -97,10 +103,18 @@ def show_result_pyplot(model, img, result, fig_size=(15, 10)):
img (str or np.ndarray): Image filename or loaded image.
result (list): The classification result.
fig_size (tuple): Figure size of the pyplot figure.
Defaults to (15, 10).
title (str): Title of the pyplot figure.
Defaults to 'result'.
wait_time (int): How many seconds to display the image.
Defaults to 0.
"""
if
hasattr
(
model
,
'module'
):
model
=
model
.
module
img
=
model
.
show_result
(
img
,
result
,
show
=
False
)
plt
.
figure
(
figsize
=
fig_size
)
plt
.
imshow
(
mmcv
.
bgr2rgb
(
img
))
plt
.
show
()
model
.
show_result
(
img
,
result
,
show
=
True
,
fig_size
=
fig_size
,
win_name
=
title
,
wait_time
=
wait_time
)
openmmlab_test/mmclassification-0.24.1/mmcls/apis/test.py
0 → 100644
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
import
os.path
as
osp
import
pickle
import
shutil
import
tempfile
import
time
import
mmcv
import
numpy
as
np
import
torch
import
torch.distributed
as
dist
from
mmcv.image
import
tensor2imgs
from
mmcv.runner
import
get_dist_info
def
single_gpu_test
(
model
,
data_loader
,
show
=
False
,
out_dir
=
None
,
**
show_kwargs
):
"""Test model with local single gpu.
This method tests model with a single gpu and supports showing results.
Args:
model (:obj:`torch.nn.Module`): Model to be tested.
data_loader (:obj:`torch.utils.data.DataLoader`): Pytorch data loader.
show (bool): Whether to show the test results. Defaults to False.
out_dir (str): The output directory of result plots of all samples.
Defaults to None, which means not to write output files.
**show_kwargs: Any other keyword arguments for showing results.
Returns:
list: The prediction results.
"""
model
.
eval
()
results
=
[]
dataset
=
data_loader
.
dataset
for
i
,
data
in
enumerate
(
data_loader
):
if
i
<
100
:
with
torch
.
no_grad
():
result
=
model
(
return_loss
=
False
,
**
data
)
prog_bar
=
mmcv
.
ProgressBar
(
len
(
dataset
))
for
i
,
data
in
enumerate
(
data_loader
):
with
torch
.
no_grad
():
result
=
model
(
return_loss
=
False
,
**
data
)
batch_size
=
len
(
result
)
#print("batch size api_test-------:",batch_size)
results
.
extend
(
result
)
if
show
or
out_dir
:
scores
=
np
.
vstack
(
result
)
pred_score
=
np
.
max
(
scores
,
axis
=
1
)
pred_label
=
np
.
argmax
(
scores
,
axis
=
1
)
pred_class
=
[
model
.
CLASSES
[
lb
]
for
lb
in
pred_label
]
img_metas
=
data
[
'img_metas'
].
data
[
0
]
imgs
=
tensor2imgs
(
data
[
'img'
],
**
img_metas
[
0
][
'img_norm_cfg'
])
assert
len
(
imgs
)
==
len
(
img_metas
)
for
i
,
(
img
,
img_meta
)
in
enumerate
(
zip
(
imgs
,
img_metas
)):
h
,
w
,
_
=
img_meta
[
'img_shape'
]
img_show
=
img
[:
h
,
:
w
,
:]
ori_h
,
ori_w
=
img_meta
[
'ori_shape'
][:
-
1
]
img_show
=
mmcv
.
imresize
(
img_show
,
(
ori_w
,
ori_h
))
if
out_dir
:
out_file
=
osp
.
join
(
out_dir
,
img_meta
[
'ori_filename'
])
else
:
out_file
=
None
result_show
=
{
'pred_score'
:
pred_score
[
i
],
'pred_label'
:
pred_label
[
i
],
'pred_class'
:
pred_class
[
i
]
}
model
.
module
.
show_result
(
img_show
,
result_show
,
show
=
show
,
out_file
=
out_file
,
**
show_kwargs
)
batch_size
=
data
[
'img'
].
size
(
0
)
#print("batch size api_test:",batch_size)
prog_bar
.
update
(
batch_size
)
return
results
def
multi_gpu_test
(
model
,
data_loader
,
tmpdir
=
None
,
gpu_collect
=
False
):
"""Test model with multiple gpus.
This method tests model with multiple gpus and collects the results
under two different modes: gpu and cpu modes. By setting 'gpu_collect=True'
it encodes results to gpu tensors and use gpu communication for results
collection. On cpu mode it saves the results on different gpus to 'tmpdir'
and collects them by the rank 0 worker.
Args:
model (nn.Module): Model to be tested.
data_loader (nn.Dataloader): Pytorch data loader.
tmpdir (str): Path of directory to save the temporary results from
different gpus under cpu mode.
gpu_collect (bool): Option to use either gpu or cpu to collect results.
Returns:
list: The prediction results.
"""
model
.
eval
()
results
=
[]
dataset
=
data_loader
.
dataset
rank
,
world_size
=
get_dist_info
()
for
i
,
data
in
enumerate
(
data_loader
):
if
i
<
100
:
#print("warm up................")
with
torch
.
no_grad
():
result
=
model
(
return_loss
=
False
,
**
data
)
#print("warm up end ................")
if
rank
==
0
:
# Check if tmpdir is valid for cpu_collect
if
(
not
gpu_collect
)
and
(
tmpdir
is
not
None
and
osp
.
exists
(
tmpdir
)):
raise
OSError
((
f
'The tmpdir
{
tmpdir
}
already exists.'
,
' Since tmpdir will be deleted after testing,'
,
' please make sure you specify an empty one.'
))
prog_bar
=
mmcv
.
ProgressBar
(
len
(
dataset
))
time
.
sleep
(
2
)
dist
.
barrier
()
for
i
,
data
in
enumerate
(
data_loader
):
with
torch
.
no_grad
():
result
=
model
(
return_loss
=
False
,
**
data
)
if
isinstance
(
result
,
list
):
results
.
extend
(
result
)
else
:
results
.
append
(
result
)
if
rank
==
0
:
batch_size
=
data
[
'img'
].
size
(
0
)
#print("batch size api_test-----:",batch_size * world_size)
for
_
in
range
(
batch_size
*
world_size
):
prog_bar
.
update
()
# collect results from all ranks
if
gpu_collect
:
results
=
collect_results_gpu
(
results
,
len
(
dataset
))
else
:
results
=
collect_results_cpu
(
results
,
len
(
dataset
),
tmpdir
)
return
results
def
collect_results_cpu
(
result_part
,
size
,
tmpdir
=
None
):
rank
,
world_size
=
get_dist_info
()
# create a tmp dir if it is not specified
if
tmpdir
is
None
:
MAX_LEN
=
512
# 32 is whitespace
dir_tensor
=
torch
.
full
((
MAX_LEN
,
),
32
,
dtype
=
torch
.
uint8
,
device
=
'cuda'
)
if
rank
==
0
:
mmcv
.
mkdir_or_exist
(
'.dist_test'
)
tmpdir
=
tempfile
.
mkdtemp
(
dir
=
'.dist_test'
)
tmpdir
=
torch
.
tensor
(
bytearray
(
tmpdir
.
encode
()),
dtype
=
torch
.
uint8
,
device
=
'cuda'
)
dir_tensor
[:
len
(
tmpdir
)]
=
tmpdir
dist
.
broadcast
(
dir_tensor
,
0
)
tmpdir
=
dir_tensor
.
cpu
().
numpy
().
tobytes
().
decode
().
rstrip
()
else
:
mmcv
.
mkdir_or_exist
(
tmpdir
)
# dump the part result to the dir
mmcv
.
dump
(
result_part
,
osp
.
join
(
tmpdir
,
f
'part_
{
rank
}
.pkl'
))
dist
.
barrier
()
# collect all parts
if
rank
!=
0
:
return
None
else
:
# load results of all parts from tmp dir
part_list
=
[]
for
i
in
range
(
world_size
):
part_file
=
osp
.
join
(
tmpdir
,
f
'part_
{
i
}
.pkl'
)
part_result
=
mmcv
.
load
(
part_file
)
part_list
.
append
(
part_result
)
# sort the results
ordered_results
=
[]
for
res
in
zip
(
*
part_list
):
ordered_results
.
extend
(
list
(
res
))
# the dataloader may pad some samples
ordered_results
=
ordered_results
[:
size
]
# remove tmp dir
shutil
.
rmtree
(
tmpdir
)
return
ordered_results
def
collect_results_gpu
(
result_part
,
size
):
rank
,
world_size
=
get_dist_info
()
# dump result part to tensor with pickle
part_tensor
=
torch
.
tensor
(
bytearray
(
pickle
.
dumps
(
result_part
)),
dtype
=
torch
.
uint8
,
device
=
'cuda'
)
# gather all result part tensor shape
shape_tensor
=
torch
.
tensor
(
part_tensor
.
shape
,
device
=
'cuda'
)
shape_list
=
[
shape_tensor
.
clone
()
for
_
in
range
(
world_size
)]
dist
.
all_gather
(
shape_list
,
shape_tensor
)
# padding result part tensor to max length
shape_max
=
torch
.
tensor
(
shape_list
).
max
()
part_send
=
torch
.
zeros
(
shape_max
,
dtype
=
torch
.
uint8
,
device
=
'cuda'
)
part_send
[:
shape_tensor
[
0
]]
=
part_tensor
part_recv_list
=
[
part_tensor
.
new_zeros
(
shape_max
)
for
_
in
range
(
world_size
)
]
# gather all result part
dist
.
all_gather
(
part_recv_list
,
part_send
)
if
rank
==
0
:
part_list
=
[]
for
recv
,
shape
in
zip
(
part_recv_list
,
shape_list
):
part_result
=
pickle
.
loads
(
recv
[:
shape
[
0
]].
cpu
().
numpy
().
tobytes
())
part_list
.
append
(
part_result
)
# sort the results
ordered_results
=
[]
for
res
in
zip
(
*
part_list
):
ordered_results
.
extend
(
list
(
res
))
# the dataloader may pad some samples
ordered_results
=
ordered_results
[:
size
]
return
ordered_results
\ No newline at end of file
openmmlab_test/mmclassification-0.24.1/mmcls/apis/test_old.py
0 → 100644
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
import
os.path
as
osp
import
pickle
import
shutil
import
tempfile
import
time
import
mmcv
import
numpy
as
np
import
torch
import
torch.distributed
as
dist
from
mmcv.image
import
tensor2imgs
from
mmcv.runner
import
get_dist_info
def
single_gpu_test
(
model
,
data_loader
,
show
=
False
,
out_dir
=
None
,
**
show_kwargs
):
"""Test model with local single gpu.
This method tests model with a single gpu and supports showing results.
Args:
model (:obj:`torch.nn.Module`): Model to be tested.
data_loader (:obj:`torch.utils.data.DataLoader`): Pytorch data loader.
show (bool): Whether to show the test results. Defaults to False.
out_dir (str): The output directory of result plots of all samples.
Defaults to None, which means not to write output files.
**show_kwargs: Any other keyword arguments for showing results.
Returns:
list: The prediction results.
"""
#dummy = torch.rand(1, 3, 608, 608).cuda()
#model = torch.jit.script(model).eval()
#model = torch_blade.optimize(model, allow_tracing=True,model_inputs=(dummy,))
model
.
eval
()
results
=
[]
start
=
0
end
=
0
dataset
=
data_loader
.
dataset
prog_bar
=
mmcv
.
ProgressBar
(
len
(
dataset
))
for
i
,
data
in
enumerate
(
data_loader
):
with
torch
.
no_grad
():
#dummy = torch.rand(32, 3, 224, 224).cuda()
#print("-------------------:",data['img'].shape)
#model = torch.jit.script(model).eval()
#model = torch_blade.optimize(model, allow_tracing=True,model_inputs=data['img'])
#print("------------------------:",data['img'].shape)
start
=
time
.
time
()
result
=
model
(
return_loss
=
False
,
**
data
)
end
=
time
.
time
()
#print("====================:",end-start)
batch_size
=
len
(
result
)
#print("=============:",batch_size)
results
.
extend
(
result
)
if
show
or
out_dir
:
scores
=
np
.
vstack
(
result
)
pred_score
=
np
.
max
(
scores
,
axis
=
1
)
pred_label
=
np
.
argmax
(
scores
,
axis
=
1
)
pred_class
=
[
model
.
CLASSES
[
lb
]
for
lb
in
pred_label
]
img_metas
=
data
[
'img_metas'
].
data
[
0
]
imgs
=
tensor2imgs
(
data
[
'img'
],
**
img_metas
[
0
][
'img_norm_cfg'
])
assert
len
(
imgs
)
==
len
(
img_metas
)
for
i
,
(
img
,
img_meta
)
in
enumerate
(
zip
(
imgs
,
img_metas
)):
h
,
w
,
_
=
img_meta
[
'img_shape'
]
img_show
=
img
[:
h
,
:
w
,
:]
ori_h
,
ori_w
=
img_meta
[
'ori_shape'
][:
-
1
]
img_show
=
mmcv
.
imresize
(
img_show
,
(
ori_w
,
ori_h
))
if
out_dir
:
out_file
=
osp
.
join
(
out_dir
,
img_meta
[
'ori_filename'
])
else
:
out_file
=
None
result_show
=
{
'pred_score'
:
pred_score
[
i
],
'pred_label'
:
pred_label
[
i
],
'pred_class'
:
pred_class
[
i
]
}
model
.
module
.
show_result
(
img_show
,
result_show
,
show
=
show
,
out_file
=
out_file
,
**
show_kwargs
)
batch_size
=
data
[
'img'
].
size
(
0
)
prog_bar
.
update
(
batch_size
)
return
results
def
multi_gpu_test
(
model
,
data_loader
,
tmpdir
=
None
,
gpu_collect
=
False
):
"""Test model with multiple gpus.
This method tests model with multiple gpus and collects the results
under two different modes: gpu and cpu modes. By setting 'gpu_collect=True'
it encodes results to gpu tensors and use gpu communication for results
collection. On cpu mode it saves the results on different gpus to 'tmpdir'
and collects them by the rank 0 worker.
Args:
model (nn.Module): Model to be tested.
data_loader (nn.Dataloader): Pytorch data loader.
tmpdir (str): Path of directory to save the temporary results from
different gpus under cpu mode.
gpu_collect (bool): Option to use either gpu or cpu to collect results.
Returns:
list: The prediction results.
"""
model
.
eval
()
results
=
[]
dataset
=
data_loader
.
dataset
rank
,
world_size
=
get_dist_info
()
if
rank
==
0
:
# Check if tmpdir is valid for cpu_collect
if
(
not
gpu_collect
)
and
(
tmpdir
is
not
None
and
osp
.
exists
(
tmpdir
)):
raise
OSError
((
f
'The tmpdir
{
tmpdir
}
already exists.'
,
' Since tmpdir will be deleted after testing,'
,
' please make sure you specify an empty one.'
))
prog_bar
=
mmcv
.
ProgressBar
(
len
(
dataset
))
time
.
sleep
(
2
)
dist
.
barrier
()
for
i
,
data
in
enumerate
(
data_loader
):
with
torch
.
no_grad
():
result
=
model
(
return_loss
=
False
,
**
data
)
if
isinstance
(
result
,
list
):
results
.
extend
(
result
)
else
:
results
.
append
(
result
)
if
rank
==
0
:
batch_size
=
data
[
'img'
].
size
(
0
)
for
_
in
range
(
batch_size
*
world_size
):
prog_bar
.
update
()
# collect results from all ranks
if
gpu_collect
:
results
=
collect_results_gpu
(
results
,
len
(
dataset
))
else
:
results
=
collect_results_cpu
(
results
,
len
(
dataset
),
tmpdir
)
return
results
def
collect_results_cpu
(
result_part
,
size
,
tmpdir
=
None
):
rank
,
world_size
=
get_dist_info
()
# create a tmp dir if it is not specified
if
tmpdir
is
None
:
MAX_LEN
=
512
# 32 is whitespace
dir_tensor
=
torch
.
full
((
MAX_LEN
,
),
32
,
dtype
=
torch
.
uint8
,
device
=
'cuda'
)
if
rank
==
0
:
mmcv
.
mkdir_or_exist
(
'.dist_test'
)
tmpdir
=
tempfile
.
mkdtemp
(
dir
=
'.dist_test'
)
tmpdir
=
torch
.
tensor
(
bytearray
(
tmpdir
.
encode
()),
dtype
=
torch
.
uint8
,
device
=
'cuda'
)
dir_tensor
[:
len
(
tmpdir
)]
=
tmpdir
dist
.
broadcast
(
dir_tensor
,
0
)
tmpdir
=
dir_tensor
.
cpu
().
numpy
().
tobytes
().
decode
().
rstrip
()
else
:
mmcv
.
mkdir_or_exist
(
tmpdir
)
# dump the part result to the dir
mmcv
.
dump
(
result_part
,
osp
.
join
(
tmpdir
,
f
'part_
{
rank
}
.pkl'
))
dist
.
barrier
()
# collect all parts
if
rank
!=
0
:
return
None
else
:
# load results of all parts from tmp dir
part_list
=
[]
for
i
in
range
(
world_size
):
part_file
=
osp
.
join
(
tmpdir
,
f
'part_
{
i
}
.pkl'
)
part_result
=
mmcv
.
load
(
part_file
)
part_list
.
append
(
part_result
)
# sort the results
ordered_results
=
[]
for
res
in
zip
(
*
part_list
):
ordered_results
.
extend
(
list
(
res
))
# the dataloader may pad some samples
ordered_results
=
ordered_results
[:
size
]
# remove tmp dir
shutil
.
rmtree
(
tmpdir
)
return
ordered_results
def
collect_results_gpu
(
result_part
,
size
):
rank
,
world_size
=
get_dist_info
()
# dump result part to tensor with pickle
part_tensor
=
torch
.
tensor
(
bytearray
(
pickle
.
dumps
(
result_part
)),
dtype
=
torch
.
uint8
,
device
=
'cuda'
)
# gather all result part tensor shape
shape_tensor
=
torch
.
tensor
(
part_tensor
.
shape
,
device
=
'cuda'
)
shape_list
=
[
shape_tensor
.
clone
()
for
_
in
range
(
world_size
)]
dist
.
all_gather
(
shape_list
,
shape_tensor
)
# padding result part tensor to max length
shape_max
=
torch
.
tensor
(
shape_list
).
max
()
part_send
=
torch
.
zeros
(
shape_max
,
dtype
=
torch
.
uint8
,
device
=
'cuda'
)
part_send
[:
shape_tensor
[
0
]]
=
part_tensor
part_recv_list
=
[
part_tensor
.
new_zeros
(
shape_max
)
for
_
in
range
(
world_size
)
]
# gather all result part
dist
.
all_gather
(
part_recv_list
,
part_send
)
if
rank
==
0
:
part_list
=
[]
for
recv
,
shape
in
zip
(
part_recv_list
,
shape_list
):
part_result
=
pickle
.
loads
(
recv
[:
shape
[
0
]].
cpu
().
numpy
().
tobytes
())
part_list
.
append
(
part_result
)
# sort the results
ordered_results
=
[]
for
res
in
zip
(
*
part_list
):
ordered_results
.
extend
(
list
(
res
))
# the dataloader may pad some samples
ordered_results
=
ordered_results
[:
size
]
return
ordered_results
openmmlab_test/mmclassification-0.24.1/mmcls/apis/test_time.py
0 → 100644
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
import
os.path
as
osp
import
pickle
import
shutil
import
tempfile
import
time
import
mmcv
import
numpy
as
np
import
torch
import
torch.distributed
as
dist
from
mmcv.image
import
tensor2imgs
from
mmcv.runner
import
get_dist_info
def
single_gpu_test
(
model
,
data_loader
,
show
=
False
,
out_dir
=
None
,
**
show_kwargs
):
"""Test model with local single gpu.
This method tests model with a single gpu and supports showing results.
Args:
model (:obj:`torch.nn.Module`): Model to be tested.
data_loader (:obj:`torch.utils.data.DataLoader`): Pytorch data loader.
show (bool): Whether to show the test results. Defaults to False.
out_dir (str): The output directory of result plots of all samples.
Defaults to None, which means not to write output files.
**show_kwargs: Any other keyword arguments for showing results.
Returns:
list: The prediction results.
"""
#dummy = torch.rand(1, 3, 608, 608).cuda()
#model = torch.jit.script(model).eval()
#model = torch_blade.optimize(model, allow_tracing=True,model_inputs=(dummy,))
model
.
eval
()
results
=
[]
start
=
0
end
=
0
dataset
=
data_loader
.
dataset
#prog_bar = mmcv.ProgressBar(len(dataset))
ips_num
=
0
j
=
0
start
=
time
.
time
()
for
i
,
data
in
enumerate
(
data_loader
):
end_time1
=
time
.
time
()
time1
=
end_time1
-
start
with
torch
.
no_grad
():
result
=
model
(
return_loss
=
False
,
**
data
)
end_time2
=
time
.
time
()
time2
=
end_time2
-
end_time1
batch_size
=
len
(
result
)
ips
=
batch_size
/
time2
if
i
>
0
:
ips_num
=
ips_num
+
ips
j
=
j
+
1
print
(
"=============batch_size1:"
,
batch_size
)
results
.
extend
(
result
)
if
show
or
out_dir
:
scores
=
np
.
vstack
(
result
)
pred_score
=
np
.
max
(
scores
,
axis
=
1
)
pred_label
=
np
.
argmax
(
scores
,
axis
=
1
)
pred_class
=
[
model
.
CLASSES
[
lb
]
for
lb
in
pred_label
]
img_metas
=
data
[
'img_metas'
].
data
[
0
]
imgs
=
tensor2imgs
(
data
[
'img'
],
**
img_metas
[
0
][
'img_norm_cfg'
])
assert
len
(
imgs
)
==
len
(
img_metas
)
for
i
,
(
img
,
img_meta
)
in
enumerate
(
zip
(
imgs
,
img_metas
)):
h
,
w
,
_
=
img_meta
[
'img_shape'
]
img_show
=
img
[:
h
,
:
w
,
:]
ori_h
,
ori_w
=
img_meta
[
'ori_shape'
][:
-
1
]
img_show
=
mmcv
.
imresize
(
img_show
,
(
ori_w
,
ori_h
))
if
out_dir
:
out_file
=
osp
.
join
(
out_dir
,
img_meta
[
'ori_filename'
])
else
:
out_file
=
None
result_show
=
{
'pred_score'
:
pred_score
[
i
],
'pred_label'
:
pred_label
[
i
],
'pred_class'
:
pred_class
[
i
]
}
model
.
module
.
show_result
(
img_show
,
result_show
,
show
=
show
,
out_file
=
out_file
,
**
show_kwargs
)
batch_size
=
data
[
'img'
].
size
(
0
)
print
(
"=============batch_size2:"
,
batch_size
)
#prog_bar.update(batch_size)
print
(
"batch size is %d ,data load cost time: %f s model cost time: %f s,ips: %f"
%
(
batch_size
,
time1
,
time2
,(
batch_size
/
time2
)))
start
=
time
.
time
()
ips_avg
=
ips_num
/
j
print
(
"Avg ips is %f"
%
ips_avg
)
return
results
def
multi_gpu_test
(
model
,
data_loader
,
tmpdir
=
None
,
gpu_collect
=
False
):
"""Test model with multiple gpus.
This method tests model with multiple gpus and collects the results
under two different modes: gpu and cpu modes. By setting 'gpu_collect=True'
it encodes results to gpu tensors and use gpu communication for results
collection. On cpu mode it saves the results on different gpus to 'tmpdir'
and collects them by the rank 0 worker.
Args:
model (nn.Module): Model to be tested.
data_loader (nn.Dataloader): Pytorch data loader.
tmpdir (str): Path of directory to save the temporary results from
different gpus under cpu mode.
gpu_collect (bool): Option to use either gpu or cpu to collect results.
Returns:
list: The prediction results.
"""
model
.
eval
()
results
=
[]
dataset
=
data_loader
.
dataset
rank
,
world_size
=
get_dist_info
()
if
rank
==
0
:
# Check if tmpdir is valid for cpu_collect
if
(
not
gpu_collect
)
and
(
tmpdir
is
not
None
and
osp
.
exists
(
tmpdir
)):
raise
OSError
((
f
'The tmpdir
{
tmpdir
}
already exists.'
,
' Since tmpdir will be deleted after testing,'
,
' please make sure you specify an empty one.'
))
prog_bar
=
mmcv
.
ProgressBar
(
len
(
dataset
))
time
.
sleep
(
2
)
#dist.barrier()
ips_num
=
0
j_num
=
0
satrt
=
time
.
time
()
for
i
,
data
in
enumerate
(
data_loader
):
end_time1
=
time
.
time
()
time1
=
end_time1
-
satrt
with
torch
.
no_grad
():
result
=
model
(
return_loss
=
False
,
**
data
)
end_time2
=
time
.
time
()
time2
=
end_time2
-
end_time1
if
isinstance
(
result
,
list
):
results
.
extend
(
result
)
else
:
results
.
append
(
result
)
if
rank
==
0
:
batch_size
=
data
[
'img'
].
size
(
0
)
for
_
in
range
(
batch_size
*
world_size
):
prog_bar
.
update
()
batch_size_global
=
batch_size
*
world_size
ips
=
batch_size_global
/
time2
#print("samples_per_gpu is %d ,data load cost time %f s,ips:%f" % (batch_size,time1,ips))
if
i
>
0
:
ips_num
=
ips_num
+
ips
j_num
=
j_num
+
1
# collect results from all ranks
if
gpu_collect
:
results
=
collect_results_gpu
(
results
,
len
(
dataset
))
else
:
results
=
collect_results_cpu
(
results
,
len
(
dataset
),
tmpdir
)
if
rank
==
0
:
ips_avg
=
ips_num
/
j_num
print
(
"Avg IPS is %f "
%
ips_avg
)
return
results
def
collect_results_cpu
(
result_part
,
size
,
tmpdir
=
None
):
rank
,
world_size
=
get_dist_info
()
# create a tmp dir if it is not specified
if
tmpdir
is
None
:
MAX_LEN
=
512
# 32 is whitespace
dir_tensor
=
torch
.
full
((
MAX_LEN
,
),
32
,
dtype
=
torch
.
uint8
,
device
=
'cuda'
)
if
rank
==
0
:
mmcv
.
mkdir_or_exist
(
'.dist_test'
)
tmpdir
=
tempfile
.
mkdtemp
(
dir
=
'.dist_test'
)
tmpdir
=
torch
.
tensor
(
bytearray
(
tmpdir
.
encode
()),
dtype
=
torch
.
uint8
,
device
=
'cuda'
)
dir_tensor
[:
len
(
tmpdir
)]
=
tmpdir
dist
.
broadcast
(
dir_tensor
,
0
)
tmpdir
=
dir_tensor
.
cpu
().
numpy
().
tobytes
().
decode
().
rstrip
()
else
:
mmcv
.
mkdir_or_exist
(
tmpdir
)
# dump the part result to the dir
mmcv
.
dump
(
result_part
,
osp
.
join
(
tmpdir
,
f
'part_
{
rank
}
.pkl'
))
dist
.
barrier
()
# collect all parts
if
rank
!=
0
:
return
None
else
:
# load results of all parts from tmp dir
part_list
=
[]
for
i
in
range
(
world_size
):
part_file
=
osp
.
join
(
tmpdir
,
f
'part_
{
i
}
.pkl'
)
part_result
=
mmcv
.
load
(
part_file
)
part_list
.
append
(
part_result
)
# sort the results
ordered_results
=
[]
for
res
in
zip
(
*
part_list
):
ordered_results
.
extend
(
list
(
res
))
# the dataloader may pad some samples
ordered_results
=
ordered_results
[:
size
]
# remove tmp dir
shutil
.
rmtree
(
tmpdir
)
return
ordered_results
def
collect_results_gpu
(
result_part
,
size
):
rank
,
world_size
=
get_dist_info
()
# dump result part to tensor with pickle
part_tensor
=
torch
.
tensor
(
bytearray
(
pickle
.
dumps
(
result_part
)),
dtype
=
torch
.
uint8
,
device
=
'cuda'
)
# gather all result part tensor shape
shape_tensor
=
torch
.
tensor
(
part_tensor
.
shape
,
device
=
'cuda'
)
shape_list
=
[
shape_tensor
.
clone
()
for
_
in
range
(
world_size
)]
dist
.
all_gather
(
shape_list
,
shape_tensor
)
# padding result part tensor to max length
shape_max
=
torch
.
tensor
(
shape_list
).
max
()
part_send
=
torch
.
zeros
(
shape_max
,
dtype
=
torch
.
uint8
,
device
=
'cuda'
)
part_send
[:
shape_tensor
[
0
]]
=
part_tensor
part_recv_list
=
[
part_tensor
.
new_zeros
(
shape_max
)
for
_
in
range
(
world_size
)
]
# gather all result part
dist
.
all_gather
(
part_recv_list
,
part_send
)
if
rank
==
0
:
part_list
=
[]
for
recv
,
shape
in
zip
(
part_recv_list
,
shape_list
):
part_result
=
pickle
.
loads
(
recv
[:
shape
[
0
]].
cpu
().
numpy
().
tobytes
())
part_list
.
append
(
part_result
)
# sort the results
ordered_results
=
[]
for
res
in
zip
(
*
part_list
):
ordered_results
.
extend
(
list
(
res
))
# the dataloader may pad some samples
ordered_results
=
ordered_results
[:
size
]
return
ordered_results
openmmlab_test/mmclassification-0.24.1/mmcls/apis/train.py
0 → 100644
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
import
random
import
warnings
import
numpy
as
np
import
torch
import
torch.distributed
as
dist
from
mmcv.runner
import
(
DistSamplerSeedHook
,
Fp16OptimizerHook
,
build_optimizer
,
build_runner
,
get_dist_info
)
from
mmcls.core
import
DistEvalHook
,
DistOptimizerHook
,
EvalHook
from
mmcls.datasets
import
build_dataloader
,
build_dataset
from
mmcls.utils
import
(
get_root_logger
,
wrap_distributed_model
,
wrap_non_distributed_model
)
def
init_random_seed
(
seed
=
None
,
device
=
'cuda'
):
"""Initialize random seed.
If the seed is not set, the seed will be automatically randomized,
and then broadcast to all processes to prevent some potential bugs.
Args:
seed (int, Optional): The seed. Default to None.
device (str): The device where the seed will be put on.
Default to 'cuda'.
Returns:
int: Seed to be used.
"""
if
seed
is
not
None
:
return
seed
# Make sure all ranks share the same random seed to prevent
# some potential bugs. Please refer to
# https://github.com/open-mmlab/mmdetection/issues/6339
rank
,
world_size
=
get_dist_info
()
seed
=
np
.
random
.
randint
(
2
**
31
)
if
world_size
==
1
:
return
seed
if
rank
==
0
:
random_num
=
torch
.
tensor
(
seed
,
dtype
=
torch
.
int32
,
device
=
device
)
else
:
random_num
=
torch
.
tensor
(
0
,
dtype
=
torch
.
int32
,
device
=
device
)
dist
.
broadcast
(
random_num
,
src
=
0
)
return
random_num
.
item
()
def
set_random_seed
(
seed
,
deterministic
=
False
):
"""Set random seed.
Args:
seed (int): Seed to be used.
deterministic (bool): Whether to set the deterministic option for
CUDNN backend, i.e., set `torch.backends.cudnn.deterministic`
to True and `torch.backends.cudnn.benchmark` to False.
Default: False.
"""
random
.
seed
(
seed
)
np
.
random
.
seed
(
seed
)
torch
.
manual_seed
(
seed
)
torch
.
cuda
.
manual_seed_all
(
seed
)
if
deterministic
:
torch
.
backends
.
cudnn
.
deterministic
=
True
torch
.
backends
.
cudnn
.
benchmark
=
False
def
train_model
(
model
,
dataset
,
cfg
,
distributed
=
False
,
validate
=
False
,
timestamp
=
None
,
device
=
None
,
meta
=
None
):
"""Train a model.
This method will build dataloaders, wrap the model and build a runner
according to the provided config.
Args:
model (:obj:`torch.nn.Module`): The model to be run.
dataset (:obj:`mmcls.datasets.BaseDataset` | List[BaseDataset]):
The dataset used to train the model. It can be a single dataset,
or a list of dataset with the same length as workflow.
cfg (:obj:`mmcv.utils.Config`): The configs of the experiment.
distributed (bool): Whether to train the model in a distributed
environment. Defaults to False.
validate (bool): Whether to do validation with
:obj:`mmcv.runner.EvalHook`. Defaults to False.
timestamp (str, optional): The timestamp string to auto generate the
name of log files. Defaults to None.
device (str, optional): TODO
meta (dict, optional): A dict records some import information such as
environment info and seed, which will be logged in logger hook.
Defaults to None.
"""
logger
=
get_root_logger
()
# prepare data loaders
dataset
=
dataset
if
isinstance
(
dataset
,
(
list
,
tuple
))
else
[
dataset
]
# The default loader config
loader_cfg
=
dict
(
# cfg.gpus will be ignored if distributed
num_gpus
=
cfg
.
ipu_replicas
if
device
==
'ipu'
else
len
(
cfg
.
gpu_ids
),
dist
=
distributed
,
round_up
=
True
,
seed
=
cfg
.
get
(
'seed'
),
sampler_cfg
=
cfg
.
get
(
'sampler'
,
None
),
)
# The overall dataloader settings
loader_cfg
.
update
({
k
:
v
for
k
,
v
in
cfg
.
data
.
items
()
if
k
not
in
[
'train'
,
'val'
,
'test'
,
'train_dataloader'
,
'val_dataloader'
,
'test_dataloader'
]
})
# The specific dataloader settings
train_loader_cfg
=
{
**
loader_cfg
,
**
cfg
.
data
.
get
(
'train_dataloader'
,
{})}
data_loaders
=
[
build_dataloader
(
ds
,
**
train_loader_cfg
)
for
ds
in
dataset
]
# put model on gpus
if
distributed
:
find_unused_parameters
=
cfg
.
get
(
'find_unused_parameters'
,
False
)
# Sets the `find_unused_parameters` parameter in
# torch.nn.parallel.DistributedDataParallel
model
=
wrap_distributed_model
(
model
,
cfg
.
device
,
broadcast_buffers
=
False
,
find_unused_parameters
=
find_unused_parameters
)
else
:
model
=
wrap_non_distributed_model
(
model
,
cfg
.
device
,
device_ids
=
cfg
.
gpu_ids
)
# build runner
optimizer
=
build_optimizer
(
model
,
cfg
.
optimizer
)
if
cfg
.
get
(
'runner'
)
is
None
:
cfg
.
runner
=
{
'type'
:
'EpochBasedRunner'
,
'max_epochs'
:
cfg
.
total_epochs
}
warnings
.
warn
(
'config is now expected to have a `runner` section, '
'please set `runner` in your config.'
,
UserWarning
)
if
device
==
'ipu'
:
if
not
cfg
.
runner
[
'type'
].
startswith
(
'IPU'
):
cfg
.
runner
[
'type'
]
=
'IPU'
+
cfg
.
runner
[
'type'
]
if
'options_cfg'
not
in
cfg
.
runner
:
cfg
.
runner
[
'options_cfg'
]
=
{}
cfg
.
runner
[
'options_cfg'
][
'replicationFactor'
]
=
cfg
.
ipu_replicas
cfg
.
runner
[
'fp16_cfg'
]
=
cfg
.
get
(
'fp16'
,
None
)
runner
=
build_runner
(
cfg
.
runner
,
default_args
=
dict
(
model
=
model
,
batch_processor
=
None
,
optimizer
=
optimizer
,
work_dir
=
cfg
.
work_dir
,
logger
=
logger
,
meta
=
meta
))
# an ugly walkaround to make the .log and .log.json filenames the same
runner
.
timestamp
=
timestamp
# fp16 setting
fp16_cfg
=
cfg
.
get
(
'fp16'
,
None
)
if
fp16_cfg
is
None
and
device
==
'npu'
:
fp16_cfg
=
{
'loss_scale'
:
'dynamic'
}
if
fp16_cfg
is
not
None
:
if
device
==
'ipu'
:
from
mmcv.device.ipu
import
IPUFp16OptimizerHook
optimizer_config
=
IPUFp16OptimizerHook
(
**
cfg
.
optimizer_config
,
loss_scale
=
fp16_cfg
[
'loss_scale'
],
distributed
=
distributed
)
else
:
optimizer_config
=
Fp16OptimizerHook
(
**
cfg
.
optimizer_config
,
loss_scale
=
fp16_cfg
[
'loss_scale'
],
distributed
=
distributed
)
elif
distributed
and
'type'
not
in
cfg
.
optimizer_config
:
optimizer_config
=
DistOptimizerHook
(
**
cfg
.
optimizer_config
)
else
:
optimizer_config
=
cfg
.
optimizer_config
# register hooks
runner
.
register_training_hooks
(
cfg
.
lr_config
,
optimizer_config
,
cfg
.
checkpoint_config
,
cfg
.
log_config
,
cfg
.
get
(
'momentum_config'
,
None
),
custom_hooks_config
=
cfg
.
get
(
'custom_hooks'
,
None
))
if
distributed
and
cfg
.
runner
[
'type'
]
==
'EpochBasedRunner'
:
runner
.
register_hook
(
DistSamplerSeedHook
())
# register eval hooks
if
validate
:
val_dataset
=
build_dataset
(
cfg
.
data
.
val
,
dict
(
test_mode
=
True
))
# The specific dataloader settings
val_loader_cfg
=
{
**
loader_cfg
,
'shuffle'
:
False
,
# Not shuffle by default
'sampler_cfg'
:
None
,
# Not use sampler by default
'drop_last'
:
False
,
# Not drop last by default
**
cfg
.
data
.
get
(
'val_dataloader'
,
{}),
}
val_dataloader
=
build_dataloader
(
val_dataset
,
**
val_loader_cfg
)
eval_cfg
=
cfg
.
get
(
'evaluation'
,
{})
eval_cfg
[
'by_epoch'
]
=
cfg
.
runner
[
'type'
]
!=
'IterBasedRunner'
eval_hook
=
DistEvalHook
if
distributed
else
EvalHook
# `EvalHook` needs to be executed after `IterTimerHook`.
# Otherwise, it will cause a bug if use `IterBasedRunner`.
# Refers to https://github.com/open-mmlab/mmcv/issues/1261
runner
.
register_hook
(
eval_hook
(
val_dataloader
,
**
eval_cfg
),
priority
=
'LOW'
)
if
cfg
.
resume_from
:
runner
.
resume
(
cfg
.
resume_from
)
elif
cfg
.
load_from
:
runner
.
load_checkpoint
(
cfg
.
load_from
)
runner
.
run
(
data_loaders
,
cfg
.
workflow
)
openmmlab_test/mmclassification-0.24.1/mmcls/core/__init__.py
0 → 100644
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
from
.evaluation
import
*
# noqa: F401, F403
from
.hook
import
*
# noqa: F401, F403
from
.optimizers
import
*
# noqa: F401, F403
from
.utils
import
*
# noqa: F401, F403
openmmlab_test/mmclassification-0.24.1/mmcls/core/evaluation/__init__.py
0 → 100644
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
from
.eval_hooks
import
DistEvalHook
,
EvalHook
from
.eval_metrics
import
(
calculate_confusion_matrix
,
f1_score
,
precision
,
precision_recall_f1
,
recall
,
support
)
from
.mean_ap
import
average_precision
,
mAP
from
.multilabel_eval_metrics
import
average_performance
__all__
=
[
'precision'
,
'recall'
,
'f1_score'
,
'support'
,
'average_precision'
,
'mAP'
,
'average_performance'
,
'calculate_confusion_matrix'
,
'precision_recall_f1'
,
'EvalHook'
,
'DistEvalHook'
]
openmmlab_test/mmclassification-0.24.1/mmcls/core/evaluation/eval_hooks.py
0 → 100644
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
import
os.path
as
osp
import
torch.distributed
as
dist
from
mmcv.runner
import
DistEvalHook
as
BaseDistEvalHook
from
mmcv.runner
import
EvalHook
as
BaseEvalHook
from
torch.nn.modules.batchnorm
import
_BatchNorm
class
EvalHook
(
BaseEvalHook
):
"""Non-Distributed evaluation hook.
Comparing with the ``EvalHook`` in MMCV, this hook will save the latest
evaluation results as an attribute for other hooks to use (like
`MMClsWandbHook`).
"""
def
__init__
(
self
,
dataloader
,
**
kwargs
):
super
(
EvalHook
,
self
).
__init__
(
dataloader
,
**
kwargs
)
self
.
latest_results
=
None
def
_do_evaluate
(
self
,
runner
):
"""perform evaluation and save ckpt."""
results
=
self
.
test_fn
(
runner
.
model
,
self
.
dataloader
)
self
.
latest_results
=
results
runner
.
log_buffer
.
output
[
'eval_iter_num'
]
=
len
(
self
.
dataloader
)
key_score
=
self
.
evaluate
(
runner
,
results
)
# the key_score may be `None` so it needs to skip the action to save
# the best checkpoint
if
self
.
save_best
and
key_score
:
self
.
_save_ckpt
(
runner
,
key_score
)
class
DistEvalHook
(
BaseDistEvalHook
):
"""Non-Distributed evaluation hook.
Comparing with the ``EvalHook`` in MMCV, this hook will save the latest
evaluation results as an attribute for other hooks to use (like
`MMClsWandbHook`).
"""
def
__init__
(
self
,
dataloader
,
**
kwargs
):
super
(
DistEvalHook
,
self
).
__init__
(
dataloader
,
**
kwargs
)
self
.
latest_results
=
None
def
_do_evaluate
(
self
,
runner
):
"""perform evaluation and save ckpt."""
# Synchronization of BatchNorm's buffer (running_mean
# and running_var) is not supported in the DDP of pytorch,
# which may cause the inconsistent performance of models in
# different ranks, so we broadcast BatchNorm's buffers
# of rank 0 to other ranks to avoid this.
if
self
.
broadcast_bn_buffer
:
model
=
runner
.
model
for
name
,
module
in
model
.
named_modules
():
if
isinstance
(
module
,
_BatchNorm
)
and
module
.
track_running_stats
:
dist
.
broadcast
(
module
.
running_var
,
0
)
dist
.
broadcast
(
module
.
running_mean
,
0
)
tmpdir
=
self
.
tmpdir
if
tmpdir
is
None
:
tmpdir
=
osp
.
join
(
runner
.
work_dir
,
'.eval_hook'
)
results
=
self
.
test_fn
(
runner
.
model
,
self
.
dataloader
,
tmpdir
=
tmpdir
,
gpu_collect
=
self
.
gpu_collect
)
self
.
latest_results
=
results
if
runner
.
rank
==
0
:
print
(
'
\n
'
)
runner
.
log_buffer
.
output
[
'eval_iter_num'
]
=
len
(
self
.
dataloader
)
key_score
=
self
.
evaluate
(
runner
,
results
)
# the key_score may be `None` so it needs to skip the action to
# save the best checkpoint
if
self
.
save_best
and
key_score
:
self
.
_save_ckpt
(
runner
,
key_score
)
openmmlab_test/mmclassification-0.24.1/mmcls/core/evaluation/eval_metrics.py
0 → 100644
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
from
numbers
import
Number
import
numpy
as
np
import
torch
from
torch.nn.functional
import
one_hot
def
calculate_confusion_matrix
(
pred
,
target
):
"""Calculate confusion matrix according to the prediction and target.
Args:
pred (torch.Tensor | np.array): The model prediction with shape (N, C).
target (torch.Tensor | np.array): The target of each prediction with
shape (N, 1) or (N,).
Returns:
torch.Tensor: Confusion matrix
The shape is (C, C), where C is the number of classes.
"""
if
isinstance
(
pred
,
np
.
ndarray
):
pred
=
torch
.
from_numpy
(
pred
)
if
isinstance
(
target
,
np
.
ndarray
):
target
=
torch
.
from_numpy
(
target
)
assert
(
isinstance
(
pred
,
torch
.
Tensor
)
and
isinstance
(
target
,
torch
.
Tensor
)),
\
(
f
'pred and target should be torch.Tensor or np.ndarray, '
f
'but got
{
type
(
pred
)
}
and
{
type
(
target
)
}
.'
)
# Modified from PyTorch-Ignite
num_classes
=
pred
.
size
(
1
)
pred_label
=
torch
.
argmax
(
pred
,
dim
=
1
).
flatten
()
target_label
=
target
.
flatten
()
assert
len
(
pred_label
)
==
len
(
target_label
)
with
torch
.
no_grad
():
indices
=
num_classes
*
target_label
+
pred_label
matrix
=
torch
.
bincount
(
indices
,
minlength
=
num_classes
**
2
)
matrix
=
matrix
.
reshape
(
num_classes
,
num_classes
)
return
matrix
def
precision_recall_f1
(
pred
,
target
,
average_mode
=
'macro'
,
thrs
=
0.
):
"""Calculate precision, recall and f1 score according to the prediction and
target.
Args:
pred (torch.Tensor | np.array): The model prediction with shape (N, C).
target (torch.Tensor | np.array): The target of each prediction with
shape (N, 1) or (N,).
average_mode (str): The type of averaging performed on the result.
Options are 'macro' and 'none'. If 'none', the scores for each
class are returned. If 'macro', calculate metrics for each class,
and find their unweighted mean.
Defaults to 'macro'.
thrs (Number | tuple[Number], optional): Predictions with scores under
the thresholds are considered negative. Default to 0.
Returns:
tuple: tuple containing precision, recall, f1 score.
The type of precision, recall, f1 score is one of the following:
+----------------------------+--------------------+-------------------+
| Args | ``thrs`` is number | ``thrs`` is tuple |
+============================+====================+===================+
| ``average_mode`` = "macro" | float | list[float] |
+----------------------------+--------------------+-------------------+
| ``average_mode`` = "none" | np.array | list[np.array] |
+----------------------------+--------------------+-------------------+
"""
allowed_average_mode
=
[
'macro'
,
'none'
]
if
average_mode
not
in
allowed_average_mode
:
raise
ValueError
(
f
'Unsupport type of averaging
{
average_mode
}
.'
)
if
isinstance
(
pred
,
np
.
ndarray
):
pred
=
torch
.
from_numpy
(
pred
)
assert
isinstance
(
pred
,
torch
.
Tensor
),
\
(
f
'pred should be torch.Tensor or np.ndarray, but got
{
type
(
pred
)
}
.'
)
if
isinstance
(
target
,
np
.
ndarray
):
target
=
torch
.
from_numpy
(
target
).
long
()
assert
isinstance
(
target
,
torch
.
Tensor
),
\
f
'target should be torch.Tensor or np.ndarray, '
\
f
'but got
{
type
(
target
)
}
.'
if
isinstance
(
thrs
,
Number
):
thrs
=
(
thrs
,
)
return_single
=
True
elif
isinstance
(
thrs
,
tuple
):
return_single
=
False
else
:
raise
TypeError
(
f
'thrs should be a number or tuple, but got
{
type
(
thrs
)
}
.'
)
num_classes
=
pred
.
size
(
1
)
pred_score
,
pred_label
=
torch
.
topk
(
pred
,
k
=
1
)
pred_score
=
pred_score
.
flatten
()
pred_label
=
pred_label
.
flatten
()
gt_positive
=
one_hot
(
target
.
flatten
(),
num_classes
)
precisions
=
[]
recalls
=
[]
f1_scores
=
[]
for
thr
in
thrs
:
# Only prediction values larger than thr are counted as positive
pred_positive
=
one_hot
(
pred_label
,
num_classes
)
if
thr
is
not
None
:
pred_positive
[
pred_score
<=
thr
]
=
0
class_correct
=
(
pred_positive
&
gt_positive
).
sum
(
0
)
precision
=
class_correct
/
np
.
maximum
(
pred_positive
.
sum
(
0
),
1.
)
*
100
recall
=
class_correct
/
np
.
maximum
(
gt_positive
.
sum
(
0
),
1.
)
*
100
f1_score
=
2
*
precision
*
recall
/
np
.
maximum
(
precision
+
recall
,
torch
.
finfo
(
torch
.
float32
).
eps
)
if
average_mode
==
'macro'
:
precision
=
float
(
precision
.
mean
())
recall
=
float
(
recall
.
mean
())
f1_score
=
float
(
f1_score
.
mean
())
elif
average_mode
==
'none'
:
precision
=
precision
.
detach
().
cpu
().
numpy
()
recall
=
recall
.
detach
().
cpu
().
numpy
()
f1_score
=
f1_score
.
detach
().
cpu
().
numpy
()
else
:
raise
ValueError
(
f
'Unsupport type of averaging
{
average_mode
}
.'
)
precisions
.
append
(
precision
)
recalls
.
append
(
recall
)
f1_scores
.
append
(
f1_score
)
if
return_single
:
return
precisions
[
0
],
recalls
[
0
],
f1_scores
[
0
]
else
:
return
precisions
,
recalls
,
f1_scores
def
precision
(
pred
,
target
,
average_mode
=
'macro'
,
thrs
=
0.
):
"""Calculate precision according to the prediction and target.
Args:
pred (torch.Tensor | np.array): The model prediction with shape (N, C).
target (torch.Tensor | np.array): The target of each prediction with
shape (N, 1) or (N,).
average_mode (str): The type of averaging performed on the result.
Options are 'macro' and 'none'. If 'none', the scores for each
class are returned. If 'macro', calculate metrics for each class,
and find their unweighted mean.
Defaults to 'macro'.
thrs (Number | tuple[Number], optional): Predictions with scores under
the thresholds are considered negative. Default to 0.
Returns:
float | np.array | list[float | np.array]: Precision.
+----------------------------+--------------------+-------------------+
| Args | ``thrs`` is number | ``thrs`` is tuple |
+============================+====================+===================+
| ``average_mode`` = "macro" | float | list[float] |
+----------------------------+--------------------+-------------------+
| ``average_mode`` = "none" | np.array | list[np.array] |
+----------------------------+--------------------+-------------------+
"""
precisions
,
_
,
_
=
precision_recall_f1
(
pred
,
target
,
average_mode
,
thrs
)
return
precisions
def
recall
(
pred
,
target
,
average_mode
=
'macro'
,
thrs
=
0.
):
"""Calculate recall according to the prediction and target.
Args:
pred (torch.Tensor | np.array): The model prediction with shape (N, C).
target (torch.Tensor | np.array): The target of each prediction with
shape (N, 1) or (N,).
average_mode (str): The type of averaging performed on the result.
Options are 'macro' and 'none'. If 'none', the scores for each
class are returned. If 'macro', calculate metrics for each class,
and find their unweighted mean.
Defaults to 'macro'.
thrs (Number | tuple[Number], optional): Predictions with scores under
the thresholds are considered negative. Default to 0.
Returns:
float | np.array | list[float | np.array]: Recall.
+----------------------------+--------------------+-------------------+
| Args | ``thrs`` is number | ``thrs`` is tuple |
+============================+====================+===================+
| ``average_mode`` = "macro" | float | list[float] |
+----------------------------+--------------------+-------------------+
| ``average_mode`` = "none" | np.array | list[np.array] |
+----------------------------+--------------------+-------------------+
"""
_
,
recalls
,
_
=
precision_recall_f1
(
pred
,
target
,
average_mode
,
thrs
)
return
recalls
def
f1_score
(
pred
,
target
,
average_mode
=
'macro'
,
thrs
=
0.
):
"""Calculate F1 score according to the prediction and target.
Args:
pred (torch.Tensor | np.array): The model prediction with shape (N, C).
target (torch.Tensor | np.array): The target of each prediction with
shape (N, 1) or (N,).
average_mode (str): The type of averaging performed on the result.
Options are 'macro' and 'none'. If 'none', the scores for each
class are returned. If 'macro', calculate metrics for each class,
and find their unweighted mean.
Defaults to 'macro'.
thrs (Number | tuple[Number], optional): Predictions with scores under
the thresholds are considered negative. Default to 0.
Returns:
float | np.array | list[float | np.array]: F1 score.
+----------------------------+--------------------+-------------------+
| Args | ``thrs`` is number | ``thrs`` is tuple |
+============================+====================+===================+
| ``average_mode`` = "macro" | float | list[float] |
+----------------------------+--------------------+-------------------+
| ``average_mode`` = "none" | np.array | list[np.array] |
+----------------------------+--------------------+-------------------+
"""
_
,
_
,
f1_scores
=
precision_recall_f1
(
pred
,
target
,
average_mode
,
thrs
)
return
f1_scores
def
support
(
pred
,
target
,
average_mode
=
'macro'
):
"""Calculate the total number of occurrences of each label according to the
prediction and target.
Args:
pred (torch.Tensor | np.array): The model prediction with shape (N, C).
target (torch.Tensor | np.array): The target of each prediction with
shape (N, 1) or (N,).
average_mode (str): The type of averaging performed on the result.
Options are 'macro' and 'none'. If 'none', the scores for each
class are returned. If 'macro', calculate metrics for each class,
and find their unweighted sum.
Defaults to 'macro'.
Returns:
float | np.array: Support.
- If the ``average_mode`` is set to macro, the function returns
a single float.
- If the ``average_mode`` is set to none, the function returns
a np.array with shape C.
"""
confusion_matrix
=
calculate_confusion_matrix
(
pred
,
target
)
with
torch
.
no_grad
():
res
=
confusion_matrix
.
sum
(
1
)
if
average_mode
==
'macro'
:
res
=
float
(
res
.
sum
().
numpy
())
elif
average_mode
==
'none'
:
res
=
res
.
numpy
()
else
:
raise
ValueError
(
f
'Unsupport type of averaging
{
average_mode
}
.'
)
return
res
openmmlab_test/mmclassification-
speed-benchmark
/mmcls/core/evaluation/mean_ap.py
→
openmmlab_test/mmclassification-
0.24.1
/mmcls/core/evaluation/mean_ap.py
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
import
numpy
as
np
import
torch
def
average_precision
(
pred
,
target
):
"""Calculate the average precision for a single class.
r
"""Calculate the average precision for a single class.
AP summarizes a precision-recall curve as the weighted mean of maximum
precisions obtained for any r'>r, where r is the recall:
..math::
\
\
text{AP} =
\
\
sum_n (R_n - R_{n-1}) P_n
..
math::
\text{AP} = \sum_n (R_n - R_{n-1}) P_n
Note that no approximation is involved since the curve is piecewise
constant.
...
...
openmmlab_test/mmclassification-
speed-benchmark
/mmcls/core/evaluation/multilabel_eval_metrics.py
→
openmmlab_test/mmclassification-
0.24.1
/mmcls/core/evaluation/multilabel_eval_metrics.py
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
import
warnings
import
numpy
as
np
...
...
openmmlab_test/mmclassification-0.24.1/mmcls/core/export/__init__.py
0 → 100644
View file @
0fd8347d
# Copyright (c) OpenMMLab. All rights reserved.
from
.test
import
ONNXRuntimeClassifier
,
TensorRTClassifier
__all__
=
[
'ONNXRuntimeClassifier'
,
'TensorRTClassifier'
]
Prev
1
…
29
30
31
32
33
34
35
36
37
…
42
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment