README.md 5.8 KB
Newer Older
sunxx1's avatar
sunxx1 committed
1
# VGG16
renzhc's avatar
renzhc committed
2

sunxx1's avatar
sunxx1 committed
3
## 论文
renzhc's avatar
renzhc committed
4

Rayyyyy's avatar
Rayyyyy committed
5
`VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION`
renzhc's avatar
renzhc committed
6

sunxx1's avatar
sunxx1 committed
7
8
- https://arxiv.org/abs/1409.1556

sunxx1's avatar
sunxx1 committed
9
## 模型结构
renzhc's avatar
renzhc committed
10

sunxx1's avatar
sunxx1 committed
11
VGG模型是2014年ILSVRC竞赛的第二名,第一名是GoogLeNet。但是VGG模型在多个迁移学习任务中的表现要优于GoogLeNet。而且,从图像中提取CNN特征,VGG模型是首选算法。
sunxx1's avatar
sunxx1 committed
12

sunxx1's avatar
sunxx1 committed
13
14
![20231124132639](./images/20231124132639.png)

sunxx1's avatar
sunxx1 committed
15
## 算法原理
renzhc's avatar
renzhc committed
16

sunxx1's avatar
sunxx1 committed
17
VGG16共有16个层,是一个相当深的卷积神经网络。VGG各种级别的结构都采用了5段卷积,每一段有一个或多个卷积层。
sunxx1's avatar
sunxx1 committed
18

sunxx1's avatar
sunxx1 committed
19
![20231124132925](./images/20231124132925.png)
sunxx1's avatar
sunxx1 committed
20

sunxx1's avatar
sunxx1 committed
21
## 环境配置
renzhc's avatar
renzhc committed
22

sunxx1's avatar
sunxx1 committed
23
### Docker(方法一)
sunxx1's avatar
sunxx1 committed
24

renzhc's avatar
renzhc committed
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
推荐使用docker方式运行,拉取提供的docker镜像

```shell
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
```

基于拉取的镜像创建容器

```shell
# <your IMAGE ID or NAME>用以上拉取的docker的镜像ID或名称替换
docker run -it --name=vgg16-mmcv --network=host --ipc=host --shm-size=16g  --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro <your IMAGE ID> bash
```

克隆并安装git仓库,安装相关依赖

```python
git clone --recursive http://developer.hpccube.com/codes/modelzoo/vgg16_mmcv.git
cd vgg16_mmcv/mmpretrain-mmcv
pip install -e .
sunxx1's avatar
sunxx1 committed
44
45
pip install -r requirements.txt
```
sunxx1's avatar
sunxx1 committed
46

sunxx1's avatar
sunxx1 committed
47
### Dockerfile(方法二)
renzhc's avatar
renzhc committed
48
49

```bash
sunxx1's avatar
sunxx1 committed
50
51
cd vgg16_mmcv/docker
docker build --no-cache -t vgg16_mmcv:latest .
renzhc's avatar
renzhc committed
52
53
54
55
docker run -it --name=vgg16 --network=host --ipc=host --shm-size=16g  --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro <your IMAGE ID> bash
pip install -e .
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:
# pip install -r requirements.txt
sunxx1's avatar
sunxx1 committed
56
57
58
```

### Anaconda(方法三)
renzhc's avatar
renzhc committed
59

sunxx1's avatar
sunxx1 committed
60
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
renzhc's avatar
renzhc committed
61

sunxx1's avatar
sunxx1 committed
62
```plaintext
renzhc's avatar
renzhc committed
63
64
65
66
67
DTK驱动: DTK-24.04.1
python==3.10
torch==2.1.0
torchvision==0.16.0+das1.1.git7d45932.abi1.dtk2404.torch2.1
mmcv==2.0.1+das1.1.gite58da25.abi1.dtk2404.torch2.1.0
sunxx1's avatar
sunxx1 committed
68
69
70
71
Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应
```

2、其它非特殊库参照requirements.txt安装
renzhc's avatar
renzhc committed
72

sunxx1's avatar
sunxx1 committed
73
74
75
76
```plaintext
pip install -r requirements.txt
```

sunxx1's avatar
sunxx1 committed
77
## 数据集
sunxx1's avatar
sunxx1 committed
78

renzhc's avatar
renzhc committed
79
80
81
### ImageNet

在本项目中可以使用ImageNet数据集。ImageNet数据集官方下载地址:https://image-net.org。
dcuai's avatar
dcuai committed
82

renzhc's avatar
renzhc committed
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
也可于SCNet快速下载[imagenet-2012](http://113.200.138.88:18080/aidatasets/project-dependency/imagenet-2012),下载其中的ILSVRC2012_img_train.tar和ILSVRC2012_img_val.tar,并按照以下方式解包

```bash
cd mmpretrain-mmcv/data/imagenet
mkdir train && cd train
tar -xvf ILSVRC2012_img_train.tar
```

解包后是1000个tar文件,每个tar对应了一个类别,分别解包至对应文件夹,可利用如下shell脚本。

```bash
for tarfile in *.tar; do
    dirname="${tarfile%.tar}"
    mkdir "$dirname"
    tar -xvf "$tarfile" -C "$dirname"
done
```

目录结构如下
sunxx1's avatar
sunxx1 committed
102

sunxx1's avatar
sunxx1 committed
103
```
renzhc's avatar
renzhc committed
104
105
106
107
108
109
110
111
112
data
└── imagenet
    ├── train
    │   ├── n01440764
    │   │   ├── n01440764_10026.JPEG
    │   │   ├── n01440764_10027.JPEG
    ├──val
    │   ├── n01440764 
    │   │   ├── ILSVRC2012_val_00000293.JPEG
sunxx1's avatar
sunxx1 committed
113
```
sunxx1's avatar
sunxx1 committed
114

renzhc's avatar
renzhc committed
115
116
117
118
### Tiny-ImageNet-200

由于ImageNet完整数据集较大,可以使用[tiny-imagenet-200](http://cs231n.stanford.edu/tiny-imagenet-200.zip)进行测试,可于SCNet快速下载[tiny-imagenet-200-scnet](http://113.200.138.88:18080/aidatasets/project-dependency/tiny-imagenet-200) ,此时需要对配置脚本进行一些修改:

renzhc's avatar
renzhc committed
119
- dataset配置文件(configs/\_\_base\_\_/datasets/{DATASET_CONFIG}.py)中,需要对以下字段进行修改
renzhc's avatar
renzhc committed
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
  
  ```python
   # dataset settings
    dataset_type = 'CustomDataset'  # 修改为CustomDataset
    data_preprocessor = dict(
        num_classes=200,  # 修改类别为200
        ...
    )
    ...
    train_dataloader = dict(
        batch_size=32,
        num_workers=5,
        dataset=dict(
            type=dataset_type,
            data_root='data/imagenet',
            data_prefix='train',  # 改为data_prefix='train',val_dataloader中同理
            pipeline=train_pipeline),
        sampler=dict(type='DefaultSampler', shuffle=True),
    )
  ```

- model配置文件(configs/\_\_base\_\_/models/{MODEL_CONFIG}.py)中,同样需要将类别相关的值设置为200。
  
  ```python
  # model settings
  model = dict(
      type='ImageClassifier',
      ...
      head=dict(
          type='LinearClsHead',
          num_classes=200,  # 将类别数改为200
          ...
      ))
  ```
  
  本仓库的mmpretrain-mmcv中提供了使用tiny-imagenet-200进行训练的若干配置脚本,可参考进行设置。

sunxx1's avatar
sunxx1 committed
157
## 训练
sunxx1's avatar
sunxx1 committed
158

renzhc's avatar
renzhc committed
159
160
将训练数据集解压后放置于mmpretrain-mmcv/data/,对于tiny-imagenet,目录结构如下:

Rayyyyy's avatar
Rayyyyy committed
161
```
renzhc's avatar
renzhc committed
162
163
164
165
166
167
168
data
└── imagenet
    ├── test/
    ├── train/
    ├── val/
    ├── wnids.txt
    └── words.txt
Rayyyyy's avatar
Rayyyyy committed
169
```
dcuai's avatar
dcuai committed
170

renzhc's avatar
renzhc committed
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
### 单机8卡训练

- tiny-imagenet-200

```shell
bash tools/dist_train.sh vgg16-test.py 8
```

- imagenet

```shell
bash tools/dist_train.sh configs/vgg/vgg16_8xb32_in1k.py 8
```

如需其他卡数训练,将命令中的8改为所需卡数即可;

如遇端口占用问题,可在tools/dist_train.sh修改端口。
dcuai's avatar
dcuai committed
188

sunxx1's avatar
sunxx1 committed
189

sunxx1's avatar
sunxx1 committed
190

sunxx1's avatar
sunxx1 committed
191
## 应用场景
renzhc's avatar
renzhc committed
192

sunxx1's avatar
sunxx1 committed
193
### 算法类别
renzhc's avatar
renzhc committed
194

sunxx1's avatar
sunxx1 committed
195
196
197
图像分类

### 热点行业
renzhc's avatar
renzhc committed
198
199

制造,能源,交通,网安,安防
sunxx1's avatar
sunxx1 committed
200

dcuai's avatar
dcuai committed
201
## 源码仓库及问题反馈
renzhc's avatar
renzhc committed
202

renzhc's avatar
renzhc committed
203
https://developer.hpccube.com/codes/modelzoo/vgg16_mmcv
sunxx1's avatar
sunxx1 committed
204

dcuai's avatar
dcuai committed
205
## 参考资料
renzhc's avatar
renzhc committed
206
207

https://github.com/open-mmlab/mmpretrainhttps://github.com/open-mmlab/mmpretrain