README.md 6.15 KB
Newer Older
Sugon_ldc's avatar
Sugon_ldc committed
1
# Conformer_Wenet_PyTorch
Sugon_ldc's avatar
Sugon_ldc committed
2

Sugon_ldc's avatar
Sugon_ldc committed
3
## 论文
Sugon_ldc's avatar
Sugon_ldc committed
4

Sugon_ldc's avatar
Sugon_ldc committed
5
`WeNet: Production Oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit`
Sugon_ldc's avatar
Sugon_ldc committed
6

Sugon_ldc's avatar
Sugon_ldc committed
7
- [https://arxiv.org/pdf/2102.01547.pdf](https://arxiv.org/pdf/2102.01547.pdf)
Sugon_ldc's avatar
Sugon_ldc committed
8
9
10

## 模型结构

Sugon_ldc's avatar
Sugon_ldc committed
11
Conformer模型是一种结合了Transformer的自注意力机制和卷积神经网络的模型结构,用于语音识别和自然语言处理任务,具有时域和频域特征的建模能力。
Sugon_ldc's avatar
Sugon_ldc committed
12

Sugon_ldc's avatar
Sugon_ldc committed
13
![model](./img/model.png)
Sugon_ldc's avatar
Sugon_ldc committed
14

Sugon_ldc's avatar
Sugon_ldc committed
15
## 算法原理
Sugon_ldc's avatar
Sugon_ldc committed
16

Sugon_ldc's avatar
Sugon_ldc committed
17
Conformer算法原理是通过结合多层的Transformer编码器和深度卷积神经网络,实现对输入序列的时域和频域特征进行建模,从而提高语音识别和自然语言处理任务的性能。
Sugon_ldc's avatar
Sugon_ldc committed
18

Sugon_ldc's avatar
Sugon_ldc committed
19
![conformer encoder](./img/conformer_encoder.png)
Sugon_ldc's avatar
Sugon_ldc committed
20

Sugon_ldc's avatar
Sugon_ldc committed
21
## 环境配置
Sugon_ldc's avatar
Sugon_ldc committed
22

Sugon_ldc's avatar
Sugon_ldc committed
23
### Docker(方法一)
Sugon_ldc's avatar
Sugon_ldc committed
24

Sugon_ldc's avatar
Sugon_ldc committed
25
此处提供[光源](https://www.sourcefind.cn/#/service-details)拉取docker镜像的地址与使用步骤
Sugon_ldc's avatar
Sugon_ldc committed
26

Sugon_ldc's avatar
Sugon_ldc committed
27
28
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-22.10-py38-latest
Sugon_ldc's avatar
Sugon_ldc committed
29

Sugon_ldc's avatar
Sugon_ldc committed
30
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
Sugon_ldc's avatar
Sugon_ldc committed
31

Sugon_ldc's avatar
Sugon_ldc committed
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
cd /path/workspace/
pip3 install typeguard==2.13.3
```

### Dockerfile(方法二)

此处提供dockerfile的使用方法

```
cd ./docker
docker build --no-cache -t conformer .
docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
```

### Anaconda(方法三)

此处提供本地配置、编译的详细步骤,例如:
Sugon_ldc's avatar
Sugon_ldc committed
49

Sugon_ldc's avatar
Sugon_ldc committed
50
关于本项目DCU显卡所需的特殊深度学习库可从[光合](https://developer.hpccube.com/tool/)开发者社区下载安装。
Sugon_ldc's avatar
Sugon_ldc committed
51
52

```
Sugon_ldc's avatar
Sugon_ldc committed
53
54
55
56
57
58
59
60
61
62
63
64
DTK驱动:dtk22.10
python:python3.8
torch:1.10
torchvision:0.10
```

`Tips:以上dtk驱动、python、paddle等DCU相关工具版本需要严格一一对应`

其它非深度学习库参照requirements.txt安装:

```
pip3 install -r requirements.txt
Sugon_ldc's avatar
Sugon_ldc committed
65
66
67
pip3 install typeguard==2.13.3
```

Sugon_ldc's avatar
Sugon_ldc committed
68
69
70
71
72
73
## 数据集

`Aishell`

- [http://openslr.org/33/](http://openslr.org/33/)

luopl's avatar
luopl committed
74
75
76
77
78
79
80

`Aishell`的在SCNet上的快速下载链接:

- [http://113.200.138.88:18080/aidatasets/project-dependency/aishell/-/raw/master/data_aishell.tgz](http://113.200.138.88:18080/aidatasets/project-dependency/aishell/-/raw/master/data_aishell.tgz)

- [http://113.200.138.88:18080/aidatasets/project-dependency/aishell/-/raw/master/resource_aishell.tgz](http://113.200.138.88:18080/aidatasets/project-dependency/aishell/-/raw/master/resource_aishell.tgz)

Sugon_ldc's avatar
Sugon_ldc committed
81
此处提供数据预处理脚本的使用方法
Sugon_ldc's avatar
Sugon_ldc committed
82
83

```
Sugon_ldc's avatar
Sugon_ldc committed
84
#如果自行下载了aishell数据集,只需要在run.sh文件中修改数据集路径,然后执行如下指令即可
Sugon_ldc's avatar
Sugon_ldc committed
85
cd ./examples/aishell/s0
Sugon_ldc's avatar
Sugon_ldc committed
86
87
88
#设置stage为-1会自动下载数据集,若有下载好的数据集,可手动设置run.sh脚本中的data路径即可省去下载过程
bash run.sh --stage -1 --stop_stage -1

Sugon_ldc's avatar
Sugon_ldc committed
89
90
91
92
93
94
95
bash run.sh --stage 0 --stop_stage 0

bash run.sh --stage 1 --stop_stage 1

bash run.sh --stage 2 --stop_stage 2

bash run.sh --stage 3 --stop_stage 3
Sugon_ldc's avatar
Sugon_ldc committed
96

Sugon_ldc's avatar
Sugon_ldc committed
97
98
```

Sugon_ldc's avatar
Sugon_ldc committed
99
100
101
102
预处理好的训练数据目录结构如下,用于正常训练的完整数据集请按此目录结构进行制备:
该工程数据集分为两个部分,一个是原始数据,另一个是索引和音频提取的特征文件

1、原始数据
Sugon_ldc's avatar
Sugon_ldc committed
103
104

```
Sugon_ldc's avatar
Sugon_ldc committed
105
106
107
108
109
110
111
112
113
114
115
116
├── data_aishell
│   ├── transcript
│   │   └── aishell_transcript_v0.8.txt
│   └── wav
│       ├── dev
│       ├── test
│       └── train
├── data_aishell.tgz
├── resource_aishell
│   ├── lexicon.txt
│   └── speaker.info
└── resource_aishell.tgz
Sugon_ldc's avatar
Sugon_ldc committed
117
118
```

Sugon_ldc's avatar
Sugon_ldc committed
119
2、索引和音频提取的特征文件
Sugon_ldc's avatar
Sugon_ldc committed
120

Sugon_ldc's avatar
Sugon_ldc committed
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
```
├── dev
│   ├── data.list
│   ├── text
│   └── wav.scp
├── dict
│   └── lang_char.txt
├── local
│   ├── dev
│   │   ├── text
│   │   ├── transcripts.txt
│   │   ├── utt.list
│   │   ├── wav.flist
│   │   ├── wav.scp
│   │   └── wav.scp_all
│   ├── test
│   │   ├── text
│   │   ├── transcripts.txt
│   │   ├── utt.list
│   │   ├── wav.flist
│   │   ├── wav.scp
│   │   └── wav.scp_all
│   └── train
│       ├── text
│       ├── transcripts.txt
│       ├── utt.list
│       ├── wav.flist
│       ├── wav.scp
│       └── wav.scp_all
├── test
│   ├── data.list
│   ├── text
│   └── wav.scp
└── train
    ├── data.list
    ├── global_cmvn
    ├── text
    └── wav.scp
```



## 训练
Sugon_ldc's avatar
Sugon_ldc committed
164
165

```
Sugon_ldc's avatar
Sugon_ldc committed
166
# 默认是4卡,可以通过修改run_train.sh文件修改卡数
Sugon_ldc's avatar
Sugon_ldc committed
167
# 需要注意训练默认在evaluation过程输出识别的结果,结果的输出会增加训练的时间,单独测试可以在/wenet/bin/recognize.py 文件中注释掉355行的logging.info('{} {}'.format(key, args.connect_symbol.join(content))),不显示输出的结果,从而减少训练的耗时
Sugon_ldc's avatar
Sugon_ldc committed
168
169
170
171
172
173
174
bash train.sh 
```

## 推理

```
# 默认使用exp/conformer/final.pt进行推理,可以手动修改
Sugon_ldc's avatar
Sugon_ldc committed
175
# 注意如果训练过程中关闭了日志输出,需要手动打开,否则将不会输出识别的内容
Sugon_ldc's avatar
Sugon_ldc committed
176
177
178
bash validate.sh
```

Sugon_ldc's avatar
Sugon_ldc committed
179
180
## result

181
![result](./img/result.png)
Sugon_ldc's avatar
Sugon_ldc committed
182
183
184
185
186
187
188
189
190
191
192
193
194
195

### 精度

测试数据:[aishell](http://openslr.org/33/),使用的加速卡:Z100L。

根据测试结果情况填写表格:

| 卡数 | 数据精度 |  精度   |
| :--: | :------: | :-----: |
|  4   |   fp32   | 93.1294 |

## 应用场景

### 算法类别
Sugon_ldc's avatar
Sugon_ldc committed
196

Sugon_ldc's avatar
Sugon_ldc committed
197
`语音识别`
Sugon_ldc's avatar
Sugon_ldc committed
198

Sugon_ldc's avatar
Sugon_ldc committed
199
### 热点应用行业
Sugon_ldc's avatar
Sugon_ldc committed
200

201
`金融,通信,广媒`
Sugon_ldc's avatar
Sugon_ldc committed
202
203
204

## 源码仓库及问题反馈

Sugon_ldc's avatar
Sugon_ldc committed
205
- [https://developer.hpccube.com/codes/modelzoo/conformer_pytorch](https://developer.hpccube.com/codes/modelzoo/conformer_pytorch)
Sugon_ldc's avatar
Sugon_ldc committed
206

Sugon_ldc's avatar
Sugon_ldc committed
207
## 参考资料
Sugon_ldc's avatar
Sugon_ldc committed
208

Sugon_ldc's avatar
Sugon_ldc committed
209
- https://github.com/wenet-e2e/wenet