README.md 1.32 KB
Newer Older
hepj's avatar
hepj committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
## 简介

使用pytorch框架计算conformer网络

一些一阶查看README_ORIGIN.md

[conformer的GIT网址](https://github.com/pengzhiliang/Conformer)

## 运行前准备

```
#修改_amp_state.py:(python3.6/site-packages/apex/amp/_amp_state.py)
if TORCH_MAJOR == 1 and TORCH_MINOR < 8:
    from torch._six import container_abcs
else:
    import collections.abc as container_abcs
改为:
if TORCH_MAJOR == 1 and TORCH_MINOR < 8:
    #from torch._six import container_abcs
    import collections.abc as container_abcs
else:
    import collections.abc as container_abcs
```



```
#修改helpers.py :(python3.6/site-packages/timm/models/layers/helpers.py)

修改:
from torch._six import container_abcs
改为:
import collections.abc as container_abcs
```

## 数据集地址

昆山服务器存有数据集,地址:

/public/software/apps/DeepLearning/Data/ImageNet-pytorch

hepj's avatar
hepj committed
42
## 单卡
hepj's avatar
hepj committed
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

```
#启动
./run1.sh
```

sh脚本中--nnodes 为机器数 ,--nproc_per_node每个机器显卡数目,

对于python参数:

--num_workers 为显卡数,--data-path为数据路径,--output_dir为输出文件夹

## 多卡

```
#运行
./run4.sh
```

hepj's avatar
hepj committed
62
63
64
65
66
67
68
## 多机多卡

```
cd 2node-run-comformer
sbatch run_conformer_4dcus.sh (按照自己情况对#SBATCH -p、#SBATCH -J进行修改,运行结果保存在相应的slurm文件中)
```