README.md 3.65 KB
Newer Older
ACzhangchao's avatar
ACzhangchao committed
1
2
3
4
5
6
7

# Tacotron 2 (without wavenet)

## 论文

https://arxiv.org/abs/1712.05884

8
9
10
11
## 模型结构

Tacotron2与第一代相比剔除了CBHG模块,改为LSTM和卷积层,在保证语音合成质量的前提下简化了模型结构,提高训练和推理效率,在Vocoder部分使用可训练的WaveNet替换掉第一代中的Griffin-Lim算法,能够以高质量和高保真度生成音频波形

ACzhangchao's avatar
ACzhangchao committed
12
![](https://developer.sourcefind.cn/codes/modelzoo/tacotron2/-/raw/main/tacotron2%E6%A8%A1%E5%9E%8B%E7%BB%93%E6%9E%84.png?inline=false)
13
14
15
16
17

## 算法原理

Tacotron 2 模型通过使用编码器-解码器架构结合注意力机制,将文本序列转换为梅尔频谱图,然后利用WaveNet声码器将这些频谱图转化为自然语音波形,其核心在于端到端的训练方式和高质量语音合成能力。

ACzhangchao's avatar
ACzhangchao committed
18
![](https://developer.sourcefind.cn/codes/modelzoo/tacotron2/-/raw/main/LSTM.png?inline=false)
19

ACzhangchao's avatar
ACzhangchao committed
20
21
## 环境配置

22
### Docker(方法一)
ACzhangchao's avatar
ACzhangchao committed
23
24
25
26

拉取镜像,启动并进入容器

```
27
28
29
30
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-py3.10-dtk24.04.3-ubuntu20.04
# <Image ID>用上面拉取docker镜像的ID替换
# <Host Path>主机端路径
# <Container Path>容器映射路径
ACzhangchao's avatar
ACzhangchao committed
31
docker run --shm-size 80g --network=host --name=tacotron2 -v /opt/hyhal:/opt/hyhal:ro --privileged --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v <Host Path>:<Container Path> -it <Image ID> /bin/bash
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
```

### Dockerfile(方法二)

```
# <Host Path>主机端路径
# <Container Path>容器映射路径
docker build -t tacotron2_image .
docker run --shm-size 80g \
    --network=host \
    --name=tacotron2 \
    -v /opt/hyhal:/opt/hyhal:ro \
    --privileged \
    --device=/dev/kfd \
    --device=/dev/dri \
    --group-add video \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
ACzhangchao's avatar
ACzhangchao committed
50
    --v <Host Path>:<Container Path> <Image ID> \
51
52
53
54
55
56
57
58
59
60
61
    -it tacotron2_image
```

### Anaconda(方法三)

```
conda create -n tacotron2 python=3.10
#主要库版本有:
#DTK:24.04.3
#python3.10
#torch:2.1.0
ACzhangchao's avatar
ACzhangchao committed
62
63
64
65
66
```

### 拉取代码仓

```
67
git clone http://developer.sourcefind.cn/codes/modelzoo/tacotron2.git
ACzhangchao's avatar
ACzhangchao committed
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
```

```
cd tacotron2
```

### 初始化子模块

```
git submodule init; git submodule update
```

### 更新.wav路径

```
sed -i -- 's,DUMMY,/LJSpeech-1.1/wavs,g' filelists/*.txt
```

### 安装python依赖

```
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
```

92
93
94
95
96
97
98
99
100
101
102
## 数据集

采用的数据集为LJSpeech-1.1,用于语音合成的数据集,包含语音和文本信息,语音为wav格式,文本以csv格式保存。

官方链接:https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2

SCNet:[AIDatasets / project-dependency / LJSpeech-1.1 · GitLab](http://113.200.138.88:18080/aidatasets/project-dependency/ljspeech-1.1)

## 训练

### 单卡训练
ACzhangchao's avatar
ACzhangchao committed
103
104
105
106
107

```
bash run_single.sh
```

108
### 多卡训练
ACzhangchao's avatar
ACzhangchao committed
109
110
111
112
113

```
bash run_multi.sh
```

114
## 推理
ACzhangchao's avatar
ACzhangchao committed
115
116
117
118

将inference.py中的”checkpoint_path“和”waveglow_path“换成自己的路径,运行inference.py

```python
119
export HIP_VISIBLE_DEVICES 设置可见卡
ACzhangchao's avatar
ACzhangchao committed
120
python inference.py
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
```

## result

```
输入:"Waveglow is really awesome!"
输出:“./output/output_audio.wav”以及“./output/mel_spectrograms.png”
```

### 精度



## 应用场景

### 算法分类

```
语音合成
```

### 热点应用行业

```
金融、制造、科研、政府、教育、气象
```

## 预训练权重

## 源码仓库及问题反馈

dcuai's avatar
dcuai committed
152
http://developer.sourcefind.cn/codes/modelzoo/tacotron2_pytorch
153
154
155

## 参考

dcuai's avatar
dcuai committed
156
[NVIDIA/tacotron2: Tacotron 2 - PyTorch implementation with faster-than-realtime inference](https://github.com/NVIDIA/tacotron2)