Update README.md

d71093e8 · wangwei990215 · 944725f2 · d71093e8
Commit d71093e8 authored Oct 21, 2024 by wangwei990215
Hide whitespace changes
Inline Side-by-side

Showing with 20 additions and 10 deletions

README.md README.md +20 -10

No files found.
--- a/README.md
+++ b/README.md
@@ -61,25 +61,27 @@ cd FunASR
 pip3 install -e ./
 ```
 ### 推理
-### 非实时语音识别/paraformer
+### No-streaming 语音识别
 ```
 from funasr import AutoModel
 # paraformer-zh is a multi-functional asr model
 # use vad, punc, spk or not as you need
 model = AutoModel(
-    model=model_dir,  
+    model="paraformer-zh",  
    vad_model="fsmn-vad",
    punc_model="ct-punc")
-res = model.generate(input=f"{model.model_path}/example/asr_example.wav", 
+res = model.generate(input="test_audio/asr_example_zh.wav")
-            batch_size_s=300, 
-            hotword='魔搭')
 print(res)
 ```
 参数说明：
 - model_dir：模型名称，或本地磁盘中的模型路径。
- vad_model：表示开启VAD，VAD的作用是将长音频切割成短音频，此时推理耗时包括了VAD与SenseVoice总耗时，为链路耗时，如果需要单独测试SenseVoice模型耗时，可以关闭VAD模型。
+- vad_model：表示开启VAD，VAD的作用是将长音频切割成短音频，此时推理耗时包括了VAD与SenseVoice总耗时，为链路耗时，如果需要单独测试SenseVoice模型耗时，可以关闭VAD模型。<br>
+- punc_model：针对输出文字的标点符号进行优化。<br>
-### 实时语音识别
+执行效果图：
+![no-streaming 结果](images/resault_no_streaming.png)
+### Streaming 语音识别
 ```
 from funasr import AutoModel
@@ -92,7 +94,7 @@ model = AutoModel(model="paraformer-zh-streaming")
 import soundfile
 import os
-wav_file = os.path.join(model.model_path, "example/asr_example.wav")
+wav_file = os.path.join(model.model_path, "test_audio/asr_example_zh.wav")
 speech, sample_rate = soundfile.read(wav_file)
 chunk_stride = chunk_size[1] * 960 # 600ms
@@ -104,8 +106,16 @@ for i in range(total_chunk_num):
    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
    print(res)
 ```
-注：chunk_size为流式延时配置，[0,10,5]表示上屏实时出字粒度为10*60=600ms，未来信息为5*60=300ms。每次推理输入为600ms（采样点数为16000*0.6=960），输出为对应文字，最后一个语音片段输入需要设置is_final=True来强制输出最后一个字。
+注：chunk_size为流式延时配置，[0,10,5]表示上屏实时出字粒度为10*60=600ms，未来信息为5*60=300ms。每次推理输入为600ms（采样点数为16000*0.6=960），输出为对应文字，最后一个语音片段输入需要设置is_final=True来强制输出最后一个字。<br>
+执行效果图：
+![streaming结果](images/resault_streaming.png)
+上述streaming和no-streaming推理所用到的模型可从以下网址下载：
+- paraformer-zh：https://hf-mirror.com/funasr/paraformer-zh
+- paraformer-zh-streaming：https://hf-mirror.com/funasr/paraformer-zh-streaming
+- fsmn-vad：https://hf-mirror.com/funasr/fsmn-vad
+- ct-punc：https://hf-mirror.com/funasr/ct-punc
 ## 应用场景
 ### 算法分类
@@ -116,4 +126,4 @@ for i in range(total_chunk_num):
 ## 源码仓库及问题反馈
 https://developer.hpccube.com/codes/modelzoo/paraformer_funasr_pytorch
 ## 参考资料
 https://github.com/modelscope/FunASR
\ No newline at end of file