"vscode:/vscode.git/clone" did not exist on "5a3467e623c4d74c1f9f2a1239e5a6e0d91042fc"
Commit 9e220a2e authored by ACzhangchao's avatar ACzhangchao
Browse files

Update README, add icon and update requirements

parent 59c204a1
...@@ -6,21 +6,66 @@ ...@@ -6,21 +6,66 @@
https://arxiv.org/abs/1712.05884 https://arxiv.org/abs/1712.05884
## 模型结构
Tacotron2与第一代相比剔除了CBHG模块,改为LSTM和卷积层,在保证语音合成质量的前提下简化了模型结构,提高训练和推理效率,在Vocoder部分使用可训练的WaveNet替换掉第一代中的Griffin-Lim算法,能够以高质量和高保真度生成音频波形
![](D:\相关图片\tacotron2模型结构.png)
## 算法原理
Tacotron 2 模型通过使用编码器-解码器架构结合注意力机制,将文本序列转换为梅尔频谱图,然后利用WaveNet声码器将这些频谱图转化为自然语音波形,其核心在于端到端的训练方式和高质量语音合成能力。
![](D:\相关图片\LSTM.png)
## 环境配置 ## 环境配置
### Docker ### Docker(方法一)
拉取镜像,启动并进入容器 拉取镜像,启动并进入容器
``` ```
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-py3.10-dtk24.04.3-ubuntu20.04
docker run -it --shm-size 80g --network=host --name=tacotron2 --privileged --device /dev/m--device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /opt/hyhal/:/opt/hyhal/:ro -v /public/DL_DATA/AI/publicdata/ASR/LJSpeech-1.1:/LJSpeech-1.1:ro image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 /bin/bash # <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker run --shm-size 80g --network=host --name=tacotron2 -v /opt/hyhal:/opt/hyhal:ro --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v <Host Path>:<Container Path> <Image ID> /bin/bash
```
### Dockerfile(方法二)
```
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker build -t tacotron2_image .
docker run --shm-size 80g \
--network=host \
--name=tacotron2 \
-v /opt/hyhal:/opt/hyhal:ro \
--privileged \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-v /public/opendas/DL_DATA/LJSpeech-1.1:/LJSpeech-1.1:ro \
-it tacotron2_image
```
### Anaconda(方法三)
```
conda create -n tacotron2 python=3.10
#主要库版本有:
#DTK:24.04.3
#python3.10
#torch:2.1.0
``` ```
### 拉取代码仓 ### 拉取代码仓
``` ```
http://developer.hpccube.com/codes/modelzoo/tacotron2.git git clone http://developer.sourcefind.cn/codes/modelzoo/tacotron2.git
``` ```
``` ```
...@@ -45,24 +90,68 @@ sed -i -- 's,DUMMY,/LJSpeech-1.1/wavs,g' filelists/*.txt ...@@ -45,24 +90,68 @@ sed -i -- 's,DUMMY,/LJSpeech-1.1/wavs,g' filelists/*.txt
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
``` ```
## 单卡训练 ## 数据集
采用的数据集为LJSpeech-1.1,用于语音合成的数据集,包含语音和文本信息,语音为wav格式,文本以csv格式保存。
官方链接:https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
SCNet:[AIDatasets / project-dependency / LJSpeech-1.1 · GitLab](http://113.200.138.88:18080/aidatasets/project-dependency/ljspeech-1.1)
## 训练
### 单卡训练
``` ```
bash run_single.sh bash run_single.sh
``` ```
## 多卡训练 ### 多卡训练
运行脚本
``` ```
bash run_multi.sh bash run_multi.sh
``` ```
## 模型推理 ## 推理
将inference.py中的”checkpoint_path“和”waveglow_path“换成自己的路径,运行inference.py 将inference.py中的”checkpoint_path“和”waveglow_path“换成自己的路径,运行inference.py
```python ```python
export HIP_VISIBLE_DEVICES 设置可见卡
python inference.py python inference.py
``` ```
\ No newline at end of file
## result
```
输入:"Waveglow is really awesome!"
输出:“./output/output_audio.wav”以及“./output/mel_spectrograms.png”
```
### 精度
## 应用场景
### 算法分类
```
语音合成
```
### 热点应用行业
```
金融、制造、科研、政府、教育、气象
```
## 预训练权重
## 源码仓库及问题反馈
http://developer.sourcefind.cn/codes/modelzoo/tacotron2.git
## 参考
[NVIDIA/tacotron2: Tacotron 2 - PyTorch implementation with faster-than-realtime inference](https://github.com/NVIDIA/tacotron2)
\ No newline at end of file
icon.png

64.4 KB

matplotlib matplotlib==3.9.2
tensorflow tensorflow==2.18.0
#numpy numpy==1.24.3
inflect inflect==7.5.0
librosa librosa==0.10.2.post1
scipy scipy==1.14.1
Unidecode Unidecode==1.3.8
pillow pillow==11.0.0
IPython==8.31.0
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment