"vscode:/vscode.git/clone" did not exist on "f41c467cb989f9c077e545029787fb2ba5005bcb"
Commit d71093e8 authored by wangwei990215's avatar wangwei990215
Browse files

Update README.md

parent 944725f2
...@@ -61,25 +61,27 @@ cd FunASR ...@@ -61,25 +61,27 @@ cd FunASR
pip3 install -e ./ pip3 install -e ./
``` ```
### 推理 ### 推理
### 非实时语音识别/paraformer ### No-streaming 语音识别
``` ```
from funasr import AutoModel from funasr import AutoModel
# paraformer-zh is a multi-functional asr model # paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need # use vad, punc, spk or not as you need
model = AutoModel( model = AutoModel(
model=model_dir, model="paraformer-zh",
vad_model="fsmn-vad", vad_model="fsmn-vad",
punc_model="ct-punc") punc_model="ct-punc")
res = model.generate(input=f"{model.model_path}/example/asr_example.wav", res = model.generate(input="test_audio/asr_example_zh.wav")
batch_size_s=300,
hotword='魔搭')
print(res) print(res)
``` ```
参数说明: 参数说明:
- model_dir:模型名称,或本地磁盘中的模型路径。 - model_dir:模型名称,或本地磁盘中的模型路径。
- vad_model:表示开启VAD,VAD的作用是将长音频切割成短音频,此时推理耗时包括了VAD与SenseVoice总耗时,为链路耗时,如果需要单独测试SenseVoice模型耗时,可以关闭VAD模型。 - vad_model:表示开启VAD,VAD的作用是将长音频切割成短音频,此时推理耗时包括了VAD与SenseVoice总耗时,为链路耗时,如果需要单独测试SenseVoice模型耗时,可以关闭VAD模型。<br>
- punc_model:针对输出文字的标点符号进行优化。<br>
### 实时语音识别 执行效果图:
![no-streaming 结果](images/resault_no_streaming.png)
### Streaming 语音识别
``` ```
from funasr import AutoModel from funasr import AutoModel
...@@ -92,7 +94,7 @@ model = AutoModel(model="paraformer-zh-streaming") ...@@ -92,7 +94,7 @@ model = AutoModel(model="paraformer-zh-streaming")
import soundfile import soundfile
import os import os
wav_file = os.path.join(model.model_path, "example/asr_example.wav") wav_file = os.path.join(model.model_path, "test_audio/asr_example_zh.wav")
speech, sample_rate = soundfile.read(wav_file) speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600ms chunk_stride = chunk_size[1] * 960 # 600ms
...@@ -104,8 +106,16 @@ for i in range(total_chunk_num): ...@@ -104,8 +106,16 @@ for i in range(total_chunk_num):
res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back) res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
print(res) print(res)
``` ```
注:chunk_size为流式延时配置,[0,10,5]表示上屏实时出字粒度为10*60=600ms,未来信息为5*60=300ms。每次推理输入为600ms(采样点数为16000*0.6=960),输出为对应文字,最后一个语音片段输入需要设置is_final=True来强制输出最后一个字。 注:chunk_size为流式延时配置,[0,10,5]表示上屏实时出字粒度为10*60=600ms,未来信息为5*60=300ms。每次推理输入为600ms(采样点数为16000*0.6=960),输出为对应文字,最后一个语音片段输入需要设置is_final=True来强制输出最后一个字。<br>
执行效果图:
![streaming结果](images/resault_streaming.png)
上述streaming和no-streaming推理所用到的模型可从以下网址下载:
- paraformer-zh:https://hf-mirror.com/funasr/paraformer-zh
- paraformer-zh-streaming:https://hf-mirror.com/funasr/paraformer-zh-streaming
- fsmn-vad:https://hf-mirror.com/funasr/fsmn-vad
- ct-punc:https://hf-mirror.com/funasr/ct-punc
## 应用场景 ## 应用场景
### 算法分类 ### 算法分类
...@@ -116,4 +126,4 @@ for i in range(total_chunk_num): ...@@ -116,4 +126,4 @@ for i in range(total_chunk_num):
## 源码仓库及问题反馈 ## 源码仓库及问题反馈
https://developer.hpccube.com/codes/modelzoo/paraformer_funasr_pytorch https://developer.hpccube.com/codes/modelzoo/paraformer_funasr_pytorch
## 参考资料 ## 参考资料
https://github.com/modelscope/FunASR https://github.com/modelscope/FunASR
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment