README.md 3.31 KB
Newer Older
Yoach Lacombe's avatar
Yoach Lacombe committed
1
2
ATTENTION: don't forget to add group_by_length in configs.

Yoach Lacombe's avatar
Yoach Lacombe committed
3
# Parler-TTS
sanchit-gandhi's avatar
setup  
sanchit-gandhi committed
4

Yoach Lacombe's avatar
Yoach Lacombe committed
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
[[Paper we reproduce]](https://arxiv.org/abs/2402.01912)
[[Models]](https://huggingface.co/parler-tts)
[[Training Code]](training)
[[Interactive Demo]](TODO - linked to spaces)

> [!IMPORTANT]
> We're proud to release Parler-TTS v0.1, our first 300M-parameters Parler-TTS model, trained on 10.5K hours of audio data.

Parler-TTS is a reproduction of the text-to-speech (TTS) model from the paper [Natural language guidance of high-fidelity text-to-speech with synthetic annotations](https://www.text-description-to-speech.com)
by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively. 

Contrarily to standard TTS models, Parler-TTS allows you to directly describe the speaker characteristics with a simple text description where you can modulate gender, pitch, speaking style, accent, etc.

## Inference

> [!TIP]
> You can directly try it out in an interactive demo [here](TODO: add link to spaces)!

Using Parler-TTS is as simple as "bonjour". Simply use the following inference snippet.

```py
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer, AutoFeatureExtractor
import soundfile as sf

# TODO: change repo id

model = ParlerTTSForConditionalGeneration.from_pretrained("ylacombe/parler_tts_300M_v0.09")
tokenizer = AutoTokenizer.from_pretrained("ylacombe/parler_tts_300M_v0.09")

prompt = "Hey, how are you doing today?"
description = "A female speaker with a slightly low-pitched voice delivers her words quite expressively, in a very confined sounding environment with clear audio quality. She speaks very fast."

input_ids = tokenizer(description, return_tensors="pt").input_ids
prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
```


## Installation steps

Parler-TTS has light-weight dependencies and can be installed in one line:
```sh
pip install parler-tts
sanchit-gandhi's avatar
sanchit-gandhi committed
52
53
```

Yoach Lacombe's avatar
Yoach Lacombe committed
54
## Gradio demo
sanchit-gandhi's avatar
sanchit-gandhi committed
55

Yoach Lacombe's avatar
Yoach Lacombe committed
56
You can host your own Parler-TTS demo. First, install [`gradio`](https://www.gradio.app/) with:
sanchit-gandhi's avatar
sanchit-gandhi committed
57

Yoach Lacombe's avatar
Yoach Lacombe committed
58
59
60
```sh
pip install gradio
```
sanchit-gandhi's avatar
sanchit-gandhi committed
61

Yoach Lacombe's avatar
Yoach Lacombe committed
62
Then, run:
sanchit-gandhi's avatar
sanchit-gandhi committed
63

Yoach Lacombe's avatar
Yoach Lacombe committed
64
65
66
```python
python helpers/gradio_demo/app.py
```
sanchit-gandhi's avatar
sanchit-gandhi committed
67
68


Yoach Lacombe's avatar
Yoach Lacombe committed
69
## Acknowledgements
sanchit-gandhi's avatar
sanchit-gandhi committed
70

Yoach Lacombe's avatar
Yoach Lacombe committed
71
This library builds on top of a number of open-source giants, to whom we'd like to extend our warmest thanks for providing these tools!
sanchit-gandhi's avatar
sanchit-gandhi committed
72

Yoach Lacombe's avatar
Yoach Lacombe committed
73
74
75
Special thanks to:
- Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively, for publishing such a promising and clear research paper: [Natural language guidance of high-fidelity text-to-speech with synthetic annotations](https://arxiv.org/abs/2402.01912).
- and the many libraries used, namely [datasets](https://huggingface.co/docs/datasets/v2.17.0/en/index), [accelerate](https://huggingface.co/docs/accelerate/en/index), [jiwer](https://github.com/jitsi/jiwer), [wandb](https://wandb.ai/), and [transformers](https://huggingface.co/docs/transformers/index).
sanchit-gandhi's avatar
setup  
sanchit-gandhi committed
76

Yoach Lacombe's avatar
Yoach Lacombe committed
77
78
79
80
81
82
83
84
85
86
87
## Citation
```
@misc{lacombe-etal-2024-parler-tts,
  author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi},
  title = {Parler-TTS},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ylacombe/dataspeech}}
}
```