README.md 6.16 KB
Newer Older
dcuai's avatar
dcuai committed
1
# MooER
wangwei990215's avatar
wangwei990215 committed
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
## 论文
- https://arxiv.org/abs/2408.05101

## 模型结构
MooER模型是一个由摩尔线程开发的、基于大语言模型(Large Language Model,LLM)的语音识别和语音翻译系统。模型结构如图:<br>
![模型结构](images/model_structure.png)

## 算法原理
通过摩耳框架,您可以基于大语言模型(Large Language Model,LLM),以端到端的方式,将输入语音自动转录为文本(即语音识别),并将其翻译为其它语言(即语音翻译)

## 环境配置
### Docker(方法一)
此处提供[光源](https://sourcefind.cn/#/main-page)拉取镜像的地址与使用步骤
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu22.04-dtk24.04.2-py3.10

docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash

# 安装依赖项:
pip install -r requirements.txt
```

### Dockerfile(方法二)
此处提供Dockerfile的使用方法
```
cd ./docker
docker build --no-cache -t mooer:latest
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
pip install -r requirements.txt
```

### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
```
dcuai's avatar
dcuai committed
36
DTK软件栈:dtk24.04.2
wangwei990215's avatar
wangwei990215 committed
37
Python:3.10
dcuai's avatar
dcuai committed
38
torch:2.3.0
wangwei990215's avatar
wangwei990215 committed
39
40
torchaudio:2.1.2
```
dcuai's avatar
dcuai committed
41
Tips:以上dtk驱动、python、pytorch等DCU相关工具版本需要严格一一对应
wangwei990215's avatar
wangwei990215 committed
42
43
44
45
46
47
48
49
50
其它非深度学习库参照requirements.txt安装:
```
pip install -r requirements.txt
```
## 数据集

## 训练

## 推理
dcuai's avatar
dcuai committed
51
1:下载预训练模型MooER-MTL-5K,推荐使用scnet快速下载链接[MooER-MTL-5K](),官方下载地址[ModelScope](https://modelscope.cn/models/MooreThreadsSpeech/MooER-MTL-5K)或者[HF-Mirror](https://hf-mirror.com/mtspeech/MooER-MTL-5K)。
wangwei990215's avatar
wangwei990215 committed
52
53
54
55
56
57
58
59
60
61
62
```
# 使用ModelScope
git lfs clone https://modelscope.cn/models/MooreThreadsSpeech/MooER-MTL-5K

# 使用HF-Mirror
git lfs clone https://hf-mirror.com/mtspeech/MooER-MTL-5K
```
将下载后的文件放置在 `pretrained_models` 文件夹中。
```shell
cp MooER-MTL-5K/* pretrained_models
```
dcuai's avatar
dcuai committed
63
2:下载Qwen2-7B-Instruct,推荐使用scnet快速下载链接 [`Qwen2-7B-Instruct`](http://113.200.138.88:18080/aimodels/Qwen2-7B-Instruct)官方下载地址[ModelScope](https://modelscope.cn/models/qwen/qwen2-7b-instruct)或者[HF-Mirror](https://hf-mirror.com/Qwen/Qwen2-7B-Instruct)
wangwei990215's avatar
wangwei990215 committed
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119

将下载后的文件放在 `pretrained_models/Qwen2-7B-Instruct` 文件夹中。

最后,确保下载的文件按照下面的文件结构放置。模型文件损坏或安放位置不正确会导致运行出错。

```text
./pretrained_models/
|-- paraformer_encoder
|   |-- am.mvn                           
|   `-- paraformer-encoder.pth           
|-- asr
|   |-- adapter_project.pt               
|   `-- lora_weights
|       |-- README.md
|       |-- adapter_config.json          
|       `-- adapter_model.bin            
|-- ast
|   |-- adapter_project.pt               
|   `-- lora_weights
|       |-- README.md
|       |-- adapter_config.json          
|       `-- adapter_model.bin            
|-- asr_ast_mtl
|   |-- adapter_project.pt               
|   `-- lora_weights
|       |-- README.md
|       |-- adapter_config.json          
|       `-- adapter_model.bin            
|-- Qwen2-7B-Instruct
|   |-- model-00001-of-00004.safetensors 
|   |-- model-00002-of-00004.safetensors 
|   |-- model-00003-of-00004.safetensors 
|   |-- model-00004-of-00004.safetensors 
|   |-- model.safetensors.index.json
|   |-- config.json                      
|   |-- configuration.json               
|   |-- generation_config.json           
|   |-- merges.txt                       
|   |-- tokenizer.json                   
|   |-- tokenizer_config.json            
|   |-- vocab.json                      
|   |-- LICENSE
|   `-- README.md
|-- README.md
`-- configuration.json
```
3:最后,在上述工作准备好后可以执行代码进行推理:<br>

`demo`文件夹下提供了一个示例语音文件用于测试。<br>
首先设置环境变量:
```
# 设置环境变量
export PYTHONIOENCODING=UTF-8
export LC_ALL=C
export PYTHONPATH=$PWD/src:$PYTHONPATH
```
dcuai's avatar
dcuai committed
120
**同时进行ASR和AST:**
wangwei990215's avatar
wangwei990215 committed
121
122
123
124
```
# 使用指定的音频文件
python inference.py --wav_path /path/to/your_audio_file
```
dcuai's avatar
dcuai committed
125
126
<br>
**指定语音识别模型,仅输出识别结果:**
wangwei990215's avatar
wangwei990215 committed
127
128
129
130
131
132
133
134
135
```
python inference.py --task asr \
    --cmvn_path pretrained_models/paraformer_encoder/am.mvn \
    --encoder_path pretrained_models/paraformer_encoder/paraformer-encoder.pth \
    --llm_path pretrained_models/Qwen2-7B-Instruct \
    --adapter_path pretrained_models/asr/adapter_project.pt \
    --lora_dir pretrained_models/asr/lora_weights \
    --wav_path /path/to/your_audio_file
```
dcuai's avatar
dcuai committed
136
137
<br>
**指定语音翻译模型,仅输出中译英结果**
wangwei990215's avatar
wangwei990215 committed
138
139
140
141
142
143
144
145
146
```
python inference.py --task ast \
    --cmvn_path pretrained_models/paraformer_encoder/am.mvn \
    --encoder_path pretrained_models/paraformer_encoder/paraformer-encoder.pth \
    --llm_path pretrained_models/Qwen2-7B-Instruct \
    --adapter_path pretrained_models/ast/adapter_project.pt \
    --lora_dir pretrained_models/ast/lora_weights \
    --wav_path /path/to/your_audio_file
```
dcuai's avatar
dcuai committed
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
## result
**ASR和AST**
```
ASR: 欢迎使用由摩尔线程开发的基于大语言模型的语音识别及语音翻译系统
AST: Welcome to use the voice recognition and voice translation system based on the large language model developed by Moore Threads.
```
**ASR**
```
ASR: 欢迎使用由摩尔线程开发的基于大语言模型的语音识别及语音翻译系统
```
**AST**
```
AST: Welcome to use the voice recognition and voice translation system based on the large language model developed by Moore Threads.
```
### 精度

wangwei990215's avatar
wangwei990215 committed
163
164
165

## 应用场景
### 算法分类
dcuai's avatar
dcuai committed
166
`语音识别,语音翻译`
wangwei990215's avatar
wangwei990215 committed
167
### 热点应用行业
dcuai's avatar
dcuai committed
168
`教育,医疗,科研`
wangwei990215's avatar
wangwei990215 committed
169
170
171
172
173

## 源码仓库及问题反馈
https://developer.sourcefind.cn/codes/modelzoo/mooer_pytorch

## 参考资料
dcuai's avatar
dcuai committed
174
https://github.com/MooreThreads/MooER