README.md 5.27 KB
Newer Older
Sugon_ldc's avatar
Sugon_ldc committed
1
# Conformer_Wenet_PyTorch
Sugon_ldc's avatar
Sugon_ldc committed
2

Sugon_ldc's avatar
Sugon_ldc committed
3
## 论文
Sugon_ldc's avatar
Sugon_ldc committed
4

Sugon_ldc's avatar
Sugon_ldc committed
5
`Conformer: Local Features Coupling Global Representations for Visual Recognition`
Sugon_ldc's avatar
Sugon_ldc committed
6

Sugon_ldc's avatar
Sugon_ldc committed
7
- [https://arxiv.org/abs/2105.03889](https://arxiv.org/abs/2105.03889)
Sugon_ldc's avatar
Sugon_ldc committed
8
9
10

## 模型结构

Sugon_ldc's avatar
Sugon_ldc committed
11
Conformer模型是一种结合了Transformer的自注意力机制和卷积神经网络的模型结构,用于语音识别和自然语言处理任务,具有时域和频域特征的建模能力。
Sugon_ldc's avatar
Sugon_ldc committed
12

Sugon_ldc's avatar
Sugon_ldc committed
13
![model](./img/model.png)
Sugon_ldc's avatar
Sugon_ldc committed
14

Sugon_ldc's avatar
Sugon_ldc committed
15
## 算法原理
Sugon_ldc's avatar
Sugon_ldc committed
16

Sugon_ldc's avatar
Sugon_ldc committed
17
Conformer算法原理是通过结合多层的Transformer编码器和深度卷积神经网络,实现对输入序列的时域和频域特征进行建模,从而提高语音识别和自然语言处理任务的性能。
Sugon_ldc's avatar
Sugon_ldc committed
18

Sugon_ldc's avatar
Sugon_ldc committed
19
![conformer encoder](./img/conformer_encoder.png)
Sugon_ldc's avatar
Sugon_ldc committed
20

Sugon_ldc's avatar
Sugon_ldc committed
21
## 环境配置
Sugon_ldc's avatar
Sugon_ldc committed
22

Sugon_ldc's avatar
Sugon_ldc committed
23
### Docker(方法一)
Sugon_ldc's avatar
Sugon_ldc committed
24

Sugon_ldc's avatar
Sugon_ldc committed
25
此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
Sugon_ldc's avatar
Sugon_ldc committed
26

Sugon_ldc's avatar
Sugon_ldc committed
27
28
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-22.10-py38-latest
Sugon_ldc's avatar
Sugon_ldc committed
29

Sugon_ldc's avatar
Sugon_ldc committed
30
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
Sugon_ldc's avatar
Sugon_ldc committed
31

Sugon_ldc's avatar
Sugon_ldc committed
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
cd /path/workspace/
pip3 install typeguard==2.13.3
```

### Dockerfile(方法二)

此处提供dockerfile的使用方法

```
cd ./docker
docker build --no-cache -t conformer .
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
```

### Anaconda(方法三)

此处提供本地配置、编译的详细步骤,例如:
Sugon_ldc's avatar
Sugon_ldc committed
49

Sugon_ldc's avatar
Sugon_ldc committed
50
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
Sugon_ldc's avatar
Sugon_ldc committed
51
52

```
Sugon_ldc's avatar
Sugon_ldc committed
53
54
55
56
57
58
59
60
61
62
63
64
DTK驱动:dtk22.10
python:python3.8
torch:1.10
torchvision:0.10
```

`Tips:以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`

其它非深度学习库参照requirements.txt安装:

```
pip3 install -r requirements.txt
Sugon_ldc's avatar
Sugon_ldc committed
65
66
67
pip3 install typeguard==2.13.3
```

Sugon_ldc's avatar
Sugon_ldc committed
68
69
70
71
72
73
74
## 数据集

`Aishell`

- [http://openslr.org/33/](http://openslr.org/33/)

此处提供数据预处理脚本的使用方法
Sugon_ldc's avatar
Sugon_ldc committed
75
76

```
Sugon_ldc's avatar
Sugon_ldc committed
77
#如果自行下载了aishell数据集,只需要在run.sh文件中修改数据集路径,然后执行如下指令即可
Sugon_ldc's avatar
Sugon_ldc committed
78
cd ./examples/aishell/s0
Sugon_ldc's avatar
Sugon_ldc committed
79
80
81
#设置stage为-1会自动下载数据集,若有下载好的数据集,可手动设置run.sh脚本中的data路径即可省去下载过程
bash run.sh --stage -1 --stop_stage -1

Sugon_ldc's avatar
Sugon_ldc committed
82
83
84
85
86
87
88
bash run.sh --stage 0 --stop_stage 0

bash run.sh --stage 1 --stop_stage 1

bash run.sh --stage 2 --stop_stage 2

bash run.sh --stage 3 --stop_stage 3
Sugon_ldc's avatar
Sugon_ldc committed
89

Sugon_ldc's avatar
Sugon_ldc committed
90
91
```

Sugon_ldc's avatar
Sugon_ldc committed
92
93
94
95
预处理好的训练数据目录结构如下,用于正常训练的完整数据集请按此目录结构进行制备:
该工程数据集分为两个部分,一个是原始数据,另一个是索引和音频提取的特征文件

1、原始数据
Sugon_ldc's avatar
Sugon_ldc committed
96
97

```
Sugon_ldc's avatar
Sugon_ldc committed
98
99
100
101
102
103
104
105
106
107
108
109
├── data_aishell
│   ├── transcript
│   │   └── aishell_transcript_v0.8.txt
│   └── wav
│       ├── dev
│       ├── test
│       └── train
├── data_aishell.tgz
├── resource_aishell
│   ├── lexicon.txt
│   └── speaker.info
└── resource_aishell.tgz
Sugon_ldc's avatar
Sugon_ldc committed
110
111
```

Sugon_ldc's avatar
Sugon_ldc committed
112
2、索引和音频提取的特征文件
Sugon_ldc's avatar
Sugon_ldc committed
113

Sugon_ldc's avatar
Sugon_ldc committed
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
```
├── dev
│   ├── data.list
│   ├── text
│   └── wav.scp
├── dict
│   └── lang_char.txt
├── local
│   ├── dev
│   │   ├── text
│   │   ├── transcripts.txt
│   │   ├── utt.list
│   │   ├── wav.flist
│   │   ├── wav.scp
│   │   └── wav.scp_all
│   ├── test
│   │   ├── text
│   │   ├── transcripts.txt
│   │   ├── utt.list
│   │   ├── wav.flist
│   │   ├── wav.scp
│   │   └── wav.scp_all
│   └── train
│       ├── text
│       ├── transcripts.txt
│       ├── utt.list
│       ├── wav.flist
│       ├── wav.scp
│       └── wav.scp_all
├── test
│   ├── data.list
│   ├── text
│   └── wav.scp
└── train
    ├── data.list
    ├── global_cmvn
    ├── text
    └── wav.scp
```



## 训练
Sugon_ldc's avatar
Sugon_ldc committed
157
158

```
Sugon_ldc's avatar
Sugon_ldc committed
159
160
161
162
163
164
165
166
# 默认是4卡,可以通过修改run_train.sh文件修改卡数
bash train.sh 
```

## 推理

```
# 默认使用exp/conformer/final.pt进行推理,可以手动修改
Sugon_ldc's avatar
Sugon_ldc committed
167
168
169
bash validate.sh
```

Sugon_ldc's avatar
Sugon_ldc committed
170
171
## result

172
![result](./img/result.png)
Sugon_ldc's avatar
Sugon_ldc committed
173
174
175
176
177
178
179
180
181
182
183
184
185
186

### 精度

测试数据:[aishell](http://openslr.org/33/),使用的加速卡:Z100L。

根据测试结果情况填写表格:

| 卡数 | 数据精度 |  精度   |
| :--: | :------: | :-----: |
|  4   |   fp32   | 93.1294 |

## 应用场景

### 算法类别
Sugon_ldc's avatar
Sugon_ldc committed
187

Sugon_ldc's avatar
Sugon_ldc committed
188
`语音识别`
Sugon_ldc's avatar
Sugon_ldc committed
189

Sugon_ldc's avatar
Sugon_ldc committed
190
### 热点应用行业
Sugon_ldc's avatar
Sugon_ldc committed
191

Sugon_ldc's avatar
Sugon_ldc committed
192
`金融、通信、广媒`
Sugon_ldc's avatar
Sugon_ldc committed
193
194
195

## 源码仓库及问题反馈

Sugon_ldc's avatar
Sugon_ldc committed
196
- [https://developer.hpccube.com/codes/modelzoo/conformer_pytorch](https://developer.hpccube.com/codes/modelzoo/conformer_pytorch)
Sugon_ldc's avatar
Sugon_ldc committed
197

Sugon_ldc's avatar
Sugon_ldc committed
198
## 参考资料
Sugon_ldc's avatar
Sugon_ldc committed
199

Sugon_ldc's avatar
Sugon_ldc committed
200
- https://github.com/wenet-e2e/wenet