README.md 5.81 KB
Newer Older
dcuai's avatar
dcuai committed
1
# MooER
wangwei990215's avatar
wangwei990215 committed
2
3
4
5
6
7
8
9
## 论文
- https://arxiv.org/abs/2408.05101

## 模型结构
MooER模型是一个由摩尔线程开发的、基于大语言模型(Large Language Model,LLM)的语音识别和语音翻译系统。模型结构如图:<br>
![模型结构](images/model_structure.png)

## 算法原理
dcuai's avatar
dcuai committed
10
通过摩耳框架,可以基于大语言模型(Large Language Model,LLM),以端到端的方式,将输入语音自动转录为文本(即语音识别),并将其翻译为其它语言(即语音翻译)。
wangwei990215's avatar
wangwei990215 committed
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

## 环境配置
### Docker(方法一)
此处提供[光源](https://sourcefind.cn/#/main-page)拉取镜像的地址与使用步骤
```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu22.04-dtk24.04.2-py3.10

docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash

# 安装依赖项:
pip install -r requirements.txt
```

### Dockerfile(方法二)
此处提供Dockerfile的使用方法
```
cd ./docker
docker build --no-cache -t mooer:latest
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
pip install -r requirements.txt
```

### Anaconda(方法三)
chenzk's avatar
chenzk committed
34
关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.sourcefind.cn/tool/
wangwei990215's avatar
wangwei990215 committed
35
```
dcuai's avatar
dcuai committed
36
DTK软件栈:dtk24.04.2
wangwei990215's avatar
wangwei990215 committed
37
Python:3.10
dcuai's avatar
dcuai committed
38
torch:2.3.0
wangwei990215's avatar
wangwei990215 committed
39
40
torchaudio:2.1.2
```
dcuai's avatar
dcuai committed
41
Tips:以上dtk驱动、python、pytorch等DCU相关工具版本需要严格一一对应
wangwei990215's avatar
wangwei990215 committed
42
43
44
45
46
47
48
49
50
其它非深度学习库参照requirements.txt安装:
```
pip install -r requirements.txt
```
## 数据集

## 训练

## 推理
chenzk's avatar
chenzk committed
51
1:下载预训练模型MooER-MTL-5K,官方下载地址[ModelScope](https://modelscope.cn/models/MooreThreadsSpeech/MooER-MTL-5K)或者[HF-Mirror](https://hf-mirror.com/mtspeech/MooER-MTL-5K)
wangwei990215's avatar
wangwei990215 committed
52
53
54
55
```
将下载后的文件放置在 `pretrained_models` 文件夹中。
cp MooER-MTL-5K/* pretrained_models
```
chenzk's avatar
chenzk committed
56
2:下载Qwen2-7B-Instruct,官方下载地址[ModelScope](https://modelscope.cn/models/qwen/qwen2-7b-instruct)或者[HF-Mirror](https://hf-mirror.com/Qwen/Qwen2-7B-Instruct)
wangwei990215's avatar
wangwei990215 committed
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

将下载后的文件放在 `pretrained_models/Qwen2-7B-Instruct` 文件夹中。

最后,确保下载的文件按照下面的文件结构放置。模型文件损坏或安放位置不正确会导致运行出错。

```text
./pretrained_models/
|-- paraformer_encoder
|   |-- am.mvn                           
|   `-- paraformer-encoder.pth           
|-- asr
|   |-- adapter_project.pt               
|   `-- lora_weights
|       |-- README.md
|       |-- adapter_config.json          
|       `-- adapter_model.bin            
|-- ast
|   |-- adapter_project.pt               
|   `-- lora_weights
|       |-- README.md
|       |-- adapter_config.json          
|       `-- adapter_model.bin            
|-- asr_ast_mtl
|   |-- adapter_project.pt               
|   `-- lora_weights
|       |-- README.md
|       |-- adapter_config.json          
|       `-- adapter_model.bin            
|-- Qwen2-7B-Instruct
|   |-- model-00001-of-00004.safetensors 
|   |-- model-00002-of-00004.safetensors 
|   |-- model-00003-of-00004.safetensors 
|   |-- model-00004-of-00004.safetensors 
|   |-- model.safetensors.index.json
|   |-- config.json                      
|   |-- configuration.json               
|   |-- generation_config.json           
|   |-- merges.txt                       
|   |-- tokenizer.json                   
|   |-- tokenizer_config.json            
|   |-- vocab.json                      
|   |-- LICENSE
|   `-- README.md
|-- README.md
`-- configuration.json
```
3:最后,在上述工作准备好后可以执行代码进行推理:<br>

`demo`文件夹下提供了一个示例语音文件用于测试。<br>
首先设置环境变量:
```
# 设置环境变量
export PYTHONIOENCODING=UTF-8
export LC_ALL=C
export PYTHONPATH=$PWD/src:$PYTHONPATH
```
dcuai's avatar
dcuai committed
113
**同时进行ASR和AST:**
wangwei990215's avatar
wangwei990215 committed
114
115
116
117
```
# 使用指定的音频文件
python inference.py --wav_path /path/to/your_audio_file
```
dcuai's avatar
dcuai committed
118
**指定语音识别模型,仅输出识别结果:**
wangwei990215's avatar
wangwei990215 committed
119
120
121
122
123
124
125
126
127
```
python inference.py --task asr \
    --cmvn_path pretrained_models/paraformer_encoder/am.mvn \
    --encoder_path pretrained_models/paraformer_encoder/paraformer-encoder.pth \
    --llm_path pretrained_models/Qwen2-7B-Instruct \
    --adapter_path pretrained_models/asr/adapter_project.pt \
    --lora_dir pretrained_models/asr/lora_weights \
    --wav_path /path/to/your_audio_file
```
dcuai's avatar
dcuai committed
128
**指定语音翻译模型,仅输出中译英结果**
wangwei990215's avatar
wangwei990215 committed
129
130
131
132
133
134
135
136
137
```
python inference.py --task ast \
    --cmvn_path pretrained_models/paraformer_encoder/am.mvn \
    --encoder_path pretrained_models/paraformer_encoder/paraformer-encoder.pth \
    --llm_path pretrained_models/Qwen2-7B-Instruct \
    --adapter_path pretrained_models/ast/adapter_project.pt \
    --lora_dir pretrained_models/ast/lora_weights \
    --wav_path /path/to/your_audio_file
```
dcuai's avatar
dcuai committed
138
139
140
141
142
143
144
145
146
147
148
149
150
151
## result
**ASR和AST**
```
ASR: 欢迎使用由摩尔线程开发的基于大语言模型的语音识别及语音翻译系统
AST: Welcome to use the voice recognition and voice translation system based on the large language model developed by Moore Threads.
```
**ASR**
```
ASR: 欢迎使用由摩尔线程开发的基于大语言模型的语音识别及语音翻译系统
```
**AST**
```
AST: Welcome to use the voice recognition and voice translation system based on the large language model developed by Moore Threads.
```
dcuai's avatar
dcuai committed
152

dcuai's avatar
dcuai committed
153
154
### 精度

wangwei990215's avatar
wangwei990215 committed
155
156
157

## 应用场景
### 算法分类
dcuai's avatar
dcuai committed
158
`语音识别,语音翻译`
wangwei990215's avatar
wangwei990215 committed
159
### 热点应用行业
dcuai's avatar
dcuai committed
160
`教育,医疗,科研`
wangwei990215's avatar
wangwei990215 committed
161
162
163
164
165

## 源码仓库及问题反馈
https://developer.sourcefind.cn/codes/modelzoo/mooer_pytorch

## 参考资料
dcuai's avatar
dcuai committed
166
https://github.com/MooreThreads/MooER