"...composable_kernel_onnx.git" did not exist on "923578a389636bafec3435c3752d75a3757ce005"
README.md 4.17 KB
Newer Older
chenzk's avatar
v1.0  
chenzk committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# StableTTS
StableTTS是一款用于中英文语音生成的快速轻量级TTS模型,只有10M参数。
## 论文
`未发表论文`

## 模型结构
受Stable Diffusion 3的启发,将流匹配和DiT相结合成开源TTS模型。
<div align=center>
    <img src="./doc/structure.png"/>
</div>

## 算法原理
Hierspeech++的扩散卷积转换器模块是原始 DiT和FFT的组合,以获得更好的韵律,流匹配解码器中,在DiT模块之前添加一个FiLM层,以条件时间步长嵌入到模型中。
<div align=center>
    <img src="./doc/algorithm.png"/>
</div>

## 环境配置
```
mv stabletts_pytorch StableTTS # 去框架名后缀
```

### Docker(方法一)
```
dcuai's avatar
dcuai committed
25
26
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
# <your IMAGE ID>为以上拉取的docker的镜像ID替换
chenzk's avatar
v1.0  
chenzk committed
27
28
29
docker run -it --shm-size=32G -v $PWD/StableTTS:/home/StableTTS -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name stabletts <your IMAGE ID> bash
cd /home/StableTTS
pip install -r requirements.txt # requirements.txt
dcuai's avatar
dcuai committed
30

chenzk's avatar
v1.0  
chenzk committed
31
32
33
34
35
36
37
```
### Dockerfile(方法二)
```
cd StableTTS/docker
docker build --no-cache -t stabletts:latest .
docker run --shm-size=32G --name stabletts -v /opt/hyhal:/opt/hyhal:ro --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video -v $PWD/../../StableTTS:/home/StableTTS -it stabletts bash
# 若遇到Dockerfile启动的方式安装环境需要长时间等待,可注释掉里面的pip安装,启动容器后再安装python库:pip install -r requirements.txt。
dcuai's avatar
dcuai committed
38

chenzk's avatar
v1.0  
chenzk committed
39
40
41
```
### Anaconda(方法三)
1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装:
chenzk's avatar
chenzk committed
42
- https://developer.sourcefind.cn/tool/
chenzk's avatar
v1.0  
chenzk committed
43
```
dcuai's avatar
dcuai committed
44
45
DTK驱动:dtk24.04.1
python:python3.10
chenzk's avatar
v1.0  
chenzk committed
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
torch:2.1.0
torchvision:0.16.0
torchaudio:2.1.2
```
```
```
`Tips:以上dtk驱动、python、torch等DCU相关工具版本需要严格一一对应。`

2、其它非特殊库参照requirements.txt安装
```
pip install -r requirements.txt # requirements.txt
```

## 数据集
本步骤说明采用标贝女声数据集`BZNSYP`,其余音色数据参照[`recipes`](./recipes/)下的文件说明进行下载使用,项目中已提供[`BZNSYP`](./recipes/raw_datasets/BZNSYP.zip)迷你数据集进行试用,解压即可,完整BZNSYP数据集请从以下官网下载:
- https://www.data-baker.com/data/index/TNtts/

数据目录结构如下:
```
recipes/raw_datasets/BZNSYP
    ├── Wave
    ├── ├── xxx.wav
    ├── └── xxx.wav
    └── PhoneLabeling
    ├── ├── xxx.interval
    ├── └── xxx.interval
    └── ProsodyLabeling
    ├── └── 000001-010000.txt
```
数据预处理命令为:
```
cd recipes
python BZNSYP_标贝女声.py
mv filelists/bznsyp.txt ../filelists/filelist.txt
cd ..
python preprocess.py # 生成训练需要用的filelists/filelist.json与stableTTS_datasets/mels
```


## 训练
### 单机单卡
```
export HIP_VISIBLE_DEVICES=0
cd StableTTS
python train.py
```
更多资料可参考源项目的[`README_origin`](./README_origin.md)

## 推理
100h chinese: `checkpoint-zh_0.pt`
- https://huggingface.co/KdaiP/StableTTS/blob/main/checkpoint-zh_0.pt

2k english + chinese + japanese: `vocoder.pt`
- https://huggingface.co/KdaiP/StableTTS/blob/main/vocoder.pt
chenzk's avatar
chenzk committed
100

chenzk's avatar
v1.0  
chenzk committed
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
```
export HIP_VISIBLE_DEVICES=0
mv vocoder.pt ./checkpoints/vocoder.pt 
python inference.py
# 使用默认权重:
# tts_checkpoint_path = './checkpoints/checkpoint-zh_0.pt'
# vocoder_checkpoint_path = './checkpoints/vocoder.pt'。
```

## result
`输入:`
```
'你好,世界!' # 文本
'./audio.wav' # 音色
```
`输出:`
```
'generate.wav' # 合成声音
```

### 精度
max epoch为1000,推理框架:pytorch。

|  device   |  Loss  |
|:---------:|:------:|
| DCU Z100L | 1.9369 |
| GPU V100S | 1.9382 |

## 应用场景
### 算法类别
`语音合成`
### 热点应用行业
`金融,电商,教育,制造,医疗,能源`
## 源码仓库及问题反馈
chenzk's avatar
chenzk committed
135
- http://developer.sourcefind.cn/codes/modelzoo/stabletts_pytorch.git
chenzk's avatar
v1.0  
chenzk committed
136
137
138
## 参考资料
- https://github.com/KdaiP/StableTTS.git
- https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf