Update readme.md

ca5cf161 · changhl · 29b2fa9c · ca5cf161 · ca5cf161 · ca5cf161
Commit ca5cf161 authored Aug 23, 2024 by changhl
Hide whitespace changes
Inline Side-by-side

Showing with 46 additions and 18 deletions

README.md README.md +36 -18

icon.png icon.png +0 -0

model.properties model.properties +10 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -74,6 +74,13 @@ pip3 install -r requirements.txt
 ## 数据集

 **在本案例中已经构建了案例测试数据，无需手动下载数据集。若需要完整数据集，请按照下载链接进行下载**
+- SCnet快速下载链接：
+  - [CMU_ARCTIC数据集下载](http://113.200.138.88:18080/aidatasets/cmu-arctic-xvectors)
+  - [librispeech_asr数据集下载](http://113.200.138.88:18080/aidatasets/librispeech_asr_dummy)
+  
+- 官方下载链接：
+  - [CMU_ARCTIC数据集下载](https://hf-mirror.com/datasets/Matthijs/cmu-arctic-xvectors)
+  - [librispeech_asr数据集下载](http://www.openslr.org/12)

 `CMU_ARCTIC`：说话人识别的数据集，其将每个说话人的声音特征描述为(1,512)的张量，音频特征文件以npy格式存储。

@@ -88,7 +95,8 @@ pip3 install -r requirements.txt
 ```
 - `cmu_us_awb_arctic-wav-arctic_a0001.npy`：说话人的声音特征文件。

-    [CMU_ARCTIC数据集下载](https://hf-mirror.com/datasets/Matthijs/cmu-arctic-xvectors)
+
+


 `librispeech_asr`：语音识别数据集，数据集中包括音频文件以及文本转录文件。其中音频文件以flac格式存储，文本转录文件以txt格式存储。
@@ -120,28 +128,33 @@ LibriSpeech
  - `1272-128104`:说话人ID(1272)-文本章节ID(128204)。
  - `1272-128104-0000.flac`:说话人ID(1272)-文本章节ID(128204)-文本片段ID(0)的音频文件。
  - `1272-128104.trans.txt`:说话人ID(1272)-文本章节ID(128204)的转录文本文件。
-    [librispeech_asr数据集下载](http://www.openslr.org/12)

-
-## 推理
-
-**推理前先下载所需权重文件**
- 使用HF镜像中microsoft官方提供的模型权重文件
+## 预训练模型
+**推理前先下载预训练好的权重文件**
+- SCnet下载地址：
+  - [tts模型权重下载地址](http://113.200.138.88:18080/aimodels/speecht5_tts)
+  - [vc模型权重下载地址](http://113.200.138.88:18080/aimodels/speecht5_vc)
+  - [asr模型权重下载地址](http://113.200.138.88:18080/aimodels/speecht5_asr)
+  - [hifigan模型权重下载地址](http://113.200.138.88:18080/aimodels/speecht5_hifigan)
 - 官方下载地址：
  - [tts模型权重下载地址](https://hf-mirror.com/microsoft/speecht5_tts)
  - [vc模型权重下载地址](https://hf-mirror.com/microsoft/speecht5_vc)
  - [asr模型权重下载地址](https://hf-mirror.com/microsoft/speecht5_asr)
  - [hifigan模型权重下载地址](https://hf-mirror.com/microsoft/speecht5_hifigan)

+
+## 推理
+
 ### TTS推理


 ```
-python speech_tts.py -hip 7 -m model/tts -v model/hifigan -t "hi, nice to meet you." -s data/CMU_ARCTIC/cmu_us_awb_arctic-wav-arctic_a0001.npy
+cd inference
+python speech_tts.py -hip 7 -m model/tts -v model/hifigan -t "hi, nice to meet you." -s ../data/CMU_ARCTIC/cmu_us_awb_arctic-wav-arctic_a0001.npy
 ```
  - -hip： 显卡序号，默认为0。
-    - 当默认使用‘0’卡时：需要先export HIP_VISIBLE_DEVICES设置可见卡；
-    - 当不默认使用‘0’卡时
+    - 当export HIP_VISIBLE_DEVICES指定可见卡后，**无需该参数**，默认使用‘0’卡
+    - 当没有指定可见卡时，需要使用该参数规定运算的显卡
  - -m： tts模型路径
  - -v： 声码器hifigan的模型路径
  - -t： 文本输入，因为输入文本中包含空格，需要用" "将输入文本包含在内。
@@ -150,11 +163,12 @@ python speech_tts.py -hip 7 -m model/tts -v model/hifigan -t "hi, nice to meet y

 ### VC推理
 ```
-python speech_vc.py -hip 7 -m model/speecht5_vc -v model/speecht5_hifigan -is data/librispeech/dev-clean/1272/128104/1272-128104-0000.flac -s data/CMU_ARCTIC/cmu_us_awb_arctic-wav-arctic_a0001.npy
+cd inference
+python speech_vc.py -hip 7 -m model/speecht5_vc -v model/speecht5_hifigan -is ../data/librispeech/dev-clean/1272/128104/1272-128104-0000.flac -s data/CMU_ARCTIC/cmu_us_awb_arctic-wav-arctic_a0001.npy
 ```
  - -hip： 显卡序号，默认为0。
-    - 当默认使用‘0’卡时：需要先export HIP_VISIBLE_DEVICES设置可见卡；
-    - 当不默认使用‘0’卡时
+    - 当export HIP_VISIBLE_DEVICES指定可见卡后，**无需该参数**，默认使用‘0’卡
+    - 当没有指定可见卡时，需要使用该参数规定运算的显卡
  - -m： vc模型路径
  - -v： 声码器hifigan的模型路径
  - -is：语音输入
@@ -163,11 +177,12 @@ python speech_vc.py -hip 7 -m model/speecht5_vc -v model/speecht5_hifigan -is da

 ### ASR推理
 ```
-python speech_asr.py -hip 7 -m model/speecht5_asr -is data/librispeech/dev-clean/1272/128104/1272-128104-0000.flac
+cd inference
+python speech_asr.py -hip 7 -m model/speecht5_asr -is ../data/librispeech/dev-clean/1272/128104/1272-128104-0000.flac
 ```
  - -hip： 显卡序号，默认为0。
-    - 当默认使用‘0’卡时：需要先export HIP_VISIBLE_DEVICES设置可见卡；
-    - 当不默认使用‘0’卡时
+    - 当export HIP_VISIBLE_DEVICES指定可见卡后，**无需该参数**，默认使用‘0’卡
+    - 当没有指定可见卡时，需要使用该参数规定运算的显卡
  - -m： asr模型路径
  - -is：语音输入
  - -res： 结果输出文件tts.wav的存储路径
@@ -190,11 +205,14 @@ python speech_asr.py -hip 7 -m model/speecht5_asr -is data/librispeech/dev-clean
 ## 应用场景

 ### 算法分类
-语音类大模型
+```
+语音识别，人声变声，语音合成
+```

 ### 热点应用行业
+```
 金融，通信，广媒
-
+```

 ## 源码仓库及问题反馈


--- a/icon.png
+++ b/icon.png
--- a/model.properties
+++ b/model.properties
+#模型编码
+modelCode=870
+# 模型名称
+modelName=speecht5_pytorch
+# 模型描述
+modelDescription=speecht5是微软推出的语音模型,支持文本到语音,语音到语音,语音到文本的多个模态的推理。
+# 应用场景(多个标签以英文逗号分割)
+appScenario=推理,语音识别,人声变声,语音合成,金融,通信,广媒
+# 框架类型(多个标签以英文逗号分割)
+frameType=PyTorch
\ No newline at end of file