README.md 3.95 KB
Newer Older
dcuai's avatar
dcuai committed
1
# WeNet
yangql's avatar
yangql committed
2
## 论文
Rayyyyy's avatar
Rayyyyy committed
3
`WeNet: Production Oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit`
yangql's avatar
yangql committed
4
- https://arxiv.org/pdf/2102.01547.pdf
Rayyyyy's avatar
Rayyyyy committed
5

yangql's avatar
yangql committed
6
7
8
9
## 模型结构
WeNet是一种hybird连接主义时间分类(CTC)/注意力架构,以transformer或conformer作为编码器和注意力解码器来重新存储CTC假设。为了在统一的模型中实现流和非流,以及使用了一种基于动态块的注意力策略,该策略允许自注意力以随机长度集中在正确的上下文上。

![img](./Doc/images/wenet1.PNG)
Rayyyyy's avatar
Rayyyyy committed
10

yangql's avatar
yangql committed
11
12
13
14
## 算法原理
底层堆栈完全基于PyTorch及其生态系统。中间的堆栈由两部分组成。开发研究模型时,TorchScript用于开发模型,Torchaudio用于动态特征提取,分布式数据并行(DDP)用于分布式训练,torch实时(JIT)用于模型导出,PyTorch量化用于模型量化,LibTorch用于生产运行时。LibTorch产品用于托管生产模型,旨在支持各种硬件和平台,如CPU、GPU(CUDA)Linux、Android和iOS。顶部堆栈显示了对WeNet中生产管道的典型研究

![img](./Doc/images/wenet2.PNG)
Rayyyyy's avatar
Rayyyyy committed
15

yangql's avatar
yangql committed
16
17
18
19
## 环境配置
### Docker(方法一)
拉取镜像:
```
20
docker pull image.sourcefind.cn:5000/dcu/admin/base/migraphx:4.3.0-ubuntu20.04-dtk24.04.1-py3.10
yangql's avatar
yangql committed
21
22
23
```
创建并启动容器,安装相关依赖:
```
24
25
docker run --shm-size 16g --network=host --name=wenet_onnxruntime -v /opt/hyhal:/opt/hyhal:ro --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/wenet_onnxruntime:/home/wenet_onnxruntime -it <Your Image ID> /bin/bash

yangql's avatar
yangql committed
26
27
28
29
30
31
# 激活dtk
source /opt/dtk/env.sh
```
### Dockerfile(方法二)
此处提供dockerfile的使用方法
```
yangql's avatar
yangql committed
32
33
cd ./docker

yangql's avatar
yangql committed
34
35
docker build --no-cache -t wenet_onnxruntime:2.0 .

36
docker run --shm-size 16g --network=host --name=wenet_onnxruntime -v /opt/hyhal:/opt/hyhal:ro --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $PWD/wenet_onnxruntime:/home/wenet_onnxruntime -it <Your Image ID> /bin/bash
yangql's avatar
yangql committed
37
38
39
```

## 数据集
Rayyyyy's avatar
Rayyyyy committed
40
下载Aishell数据集:[Aishell](http://113.200.138.88:18080/aidatasets/project-dependency/aishell/)
yangql's avatar
yangql committed
41
42
43
44
45
46
47

```
AISHELL_data/
    |
    wav
        |
        speaker001
yangql's avatar
yangql committed
48
            |
yangql's avatar
yangql committed
49
50
51
52
53
54
55
56
57
58
59
60
            000001.wav
            000002.wav
            ...
        speaker002
            ...
        ...
    transcript
        |
        transcript001.txt
        transcript002.txt
        ...
```
Rayyyyy's avatar
Rayyyyy committed
61

yangql's avatar
yangql committed
62
63
## 推理
### C++版本推理
yangql's avatar
yangql committed
64
65
66
67
68
69
70
本次采用经wenet模型完成问题语音识别任务,首先需要下载[模型](http://113.200.138.88:18080/aimodels/findsource-dependency/wenet_onnxruntime)至Resouce/models/,这里说明一下下载方法。
```
sudo apt-get update
sudo apt-get install git-lfs
git lfs clone http://113.200.138.88:18080/aimodels/findsource-dependency/wenet_onnxruntime.git
```
下面介绍如何运行C++代码示例,C++示例的详细说明见Doc目录下的Tutorial_Cpp.md。
yangql's avatar
yangql committed
71
72
73

#### 构建工程
```
yangql's avatar
yangql committed
74
cd /home/wenet_onnxruntime
yangql's avatar
yangql committed
75
76
77
78
79
export LD_LIBRARY_PATH=$PWD/openfst-1.7.6/src/lib:$LD_LIBRARY_PATH
mkdir build && cd build
cmake ..
make install
```
Rayyyyy's avatar
Rayyyyy committed
80

yangql's avatar
yangql committed
81
#### 设置推理参数
yangql's avatar
yangql committed
82
```
yangql's avatar
yangql committed
83
cd /home/wenet_onnxruntime
yangql's avatar
yangql committed
84
85
86
87
88
89
export GLOG_logtostderr=1
export GLOG_v=2
wav_path=./Resource/BAC009S0764W0344.wav
onnx_dir=./Resource/models
units=./Resource/units.txt
```
Rayyyyy's avatar
Rayyyyy committed
90

yangql's avatar
yangql committed
91
92
93
#### 运行示例
```
# 进入wenet onnxruntime工程根目录
yangql's avatar
yangql committed
94
cd /home/wenet_onnxruntime
yangql's avatar
yangql committed
95
96
97
98

# 执行示例程序
./build/Src/bin/decoder_main --onnx_dir $onnx_dir  --wav_path $wav_path  --unit_path $units 2>&1 | tee log.txt
```
Rayyyyy's avatar
Rayyyyy committed
99

yangql's avatar
yangql committed
100
101
102
103
104
105
106
107
108
109
## result
```
test Final result: 脚踝和小腿的能力变弱
Decoded 3091ms audio taken 466ms.
Total: decoded 3091ms audio taken 466ms.
RTF: 0.1508
```

### 精度

Rayyyyy's avatar
Rayyyyy committed
110

yangql's avatar
yangql committed
111
112
## 应用场景
### 算法类别
Rayyyyy's avatar
Rayyyyy committed
113
114
语音识别

yangql's avatar
yangql committed
115
### 热点应用行业
Rayyyyy's avatar
Rayyyyy committed
116
制造,金融,交通,教育
yangql's avatar
yangql committed
117
118

## 源码仓库及问题反馈
yangql's avatar
yangql committed
119
- https://developer.hpccube.com/codes/modelzoo/wenet_onnxruntime
Rayyyyy's avatar
Rayyyyy committed
120

yangql's avatar
yangql committed
121
122
123
## 参考资料
- https://github.com/wenet-e2e/wenet
- https://wenet.org.cn/wenet/