Update README, add icon and update requirements

9e220a2e · ACzhangchao · 59c204a1 · 9e220a2e · 9e220a2e · 9e220a2e
Commit 9e220a2e authored Jan 14, 2025 by ACzhangchao
Hide whitespace changes
Inline Side-by-side

Showing with 108 additions and 18 deletions

README.md README.md +99 -10

icon.png icon.png +0 -0

requirements.txt requirements.txt +9 -8

No files found.
--- a/README.md
+++ b/README.md
@@ -6,21 +6,66 @@
 https://arxiv.org/abs/1712.05884
+## 模型结构
+Tacotron2与第一代相比剔除了CBHG模块，改为LSTM和卷积层，在保证语音合成质量的前提下简化了模型结构，提高训练和推理效率，在Vocoder部分使用可训练的WaveNet替换掉第一代中的Griffin-Lim算法，能够以高质量和高保真度生成音频波形
+![](D:\相关图片\tacotron2模型结构.png)
+## 算法原理
+Tacotron 2 模型通过使用编码器-解码器架构结合注意力机制，将文本序列转换为梅尔频谱图，然后利用WaveNet声码器将这些频谱图转化为自然语音波形，其核心在于端到端的训练方式和高质量语音合成能力。
+![](D:\相关图片\LSTM.png)
 ## 环境配置
-### Docker
+### Docker（方法一）
 拉取镜像，启动并进入容器
 ```
-docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-py3.10-dtk24.04.3-ubuntu20.04
-docker run -it  --shm-size 80g --network=host --name=tacotron2 --privileged  --device /dev/m--device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal/:/opt/hyhal/:ro -v /public/DL_DATA/AI/publicdata/ASR/LJSpeech-1.1:/LJSpeech-1.1:ro image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 /bin/bash
+# <Image ID>用上面拉取docker镜像的ID替换
+# <Host Path>主机端路径
+# <Container Path>容器映射路径
+docker run --shm-size 80g --network=host --name=tacotron2 -v /opt/hyhal:/opt/hyhal:ro --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v <Host Path>:<Container Path> <Image ID> /bin/bash
+```
+### Dockerfile（方法二）
+```
+# <Host Path>主机端路径
+# <Container Path>容器映射路径
+docker build -t tacotron2_image .
+docker run --shm-size 80g \
+    --network=host \
+    --name=tacotron2 \
+    -v /opt/hyhal:/opt/hyhal:ro \
+    --privileged \
+    --device=/dev/kfd \
+    --device=/dev/dri \
+    --group-add video \
+    --cap-add=SYS_PTRACE \
+    --security-opt seccomp=unconfined \
+    -v /public/opendas/DL_DATA/LJSpeech-1.1:/LJSpeech-1.1:ro \
+    -it tacotron2_image
+```
+### Anaconda（方法三）
+```
+conda create -n tacotron2 python=3.10
+#主要库版本有：
+#DTK：24.04.3
+#python3.10
+#torch：2.1.0
 ```
 ### 拉取代码仓
 ```
-http://developer.hpccube.com/codes/modelzoo/tacotron2.git
+git clone http://developer.sourcefind.cn/codes/modelzoo/tacotron2.git
 ```
 ```
@@ -45,24 +90,68 @@ sed -i -- 's,DUMMY,/LJSpeech-1.1/wavs,g' filelists/*.txt
 pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
 ```
-## 单卡训练
+## 数据集
+采用的数据集为LJSpeech-1.1，用于语音合成的数据集，包含语音和文本信息，语音为wav格式，文本以csv格式保存。
+官方链接：https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
+SCNet：[AIDatasets / project-dependency / LJSpeech-1.1 · GitLab](http://113.200.138.88:18080/aidatasets/project-dependency/ljspeech-1.1)
+## 训练
+### 单卡训练
 ```
 bash run_single.sh
 ```
-## 多卡训练
+### 多卡训练
-运行脚本
 ```
 bash run_multi.sh
 ```
-## 模型推理
+## 推理
 将inference.py中的”checkpoint_path“和”waveglow_path“换成自己的路径，运行inference.py
 ```python
+export HIP_VISIBLE_DEVICES 设置可见卡
 python inference.py
 ```
\ No newline at end of file
+## result
+```
+输入："Waveglow is really awesome!"
+输出：“./output/output_audio.wav”以及“./output/mel_spectrograms.png”
+```
+### 精度
+无
+## 应用场景
+### 算法分类
+```
+语音合成
+```
+### 热点应用行业
+```
+金融、制造、科研、政府、教育、气象
+```
+## 预训练权重
+## 源码仓库及问题反馈
+http://developer.sourcefind.cn/codes/modelzoo/tacotron2.git
+## 参考
+[NVIDIA/tacotron2: Tacotron 2 - PyTorch implementation with faster-than-realtime inference](https://github.com/NVIDIA/tacotron2)
\ No newline at end of file
--- a/icon.png
+++ b/icon.png
--- a/requirements.txt
+++ b/requirements.txt
-matplotlib
+matplotlib==3.9.2
-tensorflow
+tensorflow==2.18.0
-#numpy
+numpy==1.24.3
-inflect
+inflect==7.5.0
-librosa
+librosa==0.10.2.post1
-scipy
+scipy==1.14.1
-Unidecode
+Unidecode==1.3.8
-pillow
+pillow==11.0.0
+IPython==8.31.0
\ No newline at end of file