README.md 2.4 KB
Newer Older
renzhc's avatar
renzhc committed
1
# MMPretrain-MMCV
renzhc's avatar
renzhc committed
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

## 环境配置

### Docker(方法一)

推荐使用docker方式运行,拉取提供的docker镜像

```shell
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
```

基于拉取的镜像创建容器

```shell
# <your IMAGE ID or NAME>用以上拉取的docker的镜像ID或名称替换 
docker run -it --name=mobilenetv2 --network=host --ipc=host --shm-size=16g --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro -v $PWD/mobilenetv2_mmcv:/home/mobilenetv2_mmcv <your IMAGE ID> bash
```

克隆git仓库,并安装相关依赖

```shell
renzhc's avatar
renzhc committed
23
24
git clone http://developer.hpccube.com/codes/OpenDAS/mmpretrain-mmcv.git
cd mmpretrain-mmcv 
renzhc's avatar
renzhc committed
25
pip install -e .
renzhc's avatar
renzhc committed
26
27
28
29
30
pip install -r requirements.txt
```

### Dockerfile(方法二)

renzhc's avatar
renzhc committed
31
32
    cd mmpretrain/docker
    docker build --no-cache -t mmpretrain:latest .
renzhc's avatar
renzhc committed
33
    docker run -it --name=mmpretrain --network=host --ipc=host --shm-size=16g  --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --privileged --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal:/opt/hyhal:ro -v $PWD/mmpretrain:/home/mmpretrain <your IMAGE ID> bash
renzhc's avatar
renzhc committed
34
35
36
    pip install -e .
    # 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:
    pip install -r requirements.txt
renzhc's avatar
renzhc committed
37
38
39
40
41
42
43
44
45
46
47
48
49
50

### Anaconda(方法三)

1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/

```shell
DTK驱动: DTK-24.04.1 
python==3.10 
torch==2.1.0 
torchvision==0.16.0+das1.1.git7d45932.abi1.dtk2404.torch2.1 mmcv==2.0.1+das1.1.gite58da25.abi1.dtk2404.torch2.1.0
```

Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应

renzhc's avatar
renzhc committed
51
52
53
2、安装mmpretrain仓库源码

```shell
renzhc's avatar
renzhc committed
54
cd mmpretrain-mmcv
renzhc's avatar
renzhc committed
55
56
57
58
59
60
61
62
63
64
65
pip install -e .
```

3、其它非特殊库参照requirements.txt安装

```shell
pip install -r requirements.txt
```



renzhc's avatar
renzhc committed
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

## 示例

本仓库中提供了几个在tiny imagenet下进行测试的脚本

如用8卡从零开始训练resnet50的运行方式如下,

```shell
bash tools/dist_train.sh resnet50-test.py 8
```

更多的配置在configs目录下,均可通过以下方式运行

```shell
bash tools/dist_train.sh <配置文件脚本> <训练用卡数>
```