Commit a87db05c authored by Yoach Lacombe's avatar Yoach Lacombe
Browse files

Add quick inde

parent 0e5f2734
# Parler-TTS # Parler-TTS
[[Paper we reproduce]](https://arxiv.org/abs/2402.01912)
[[Models]](https://huggingface.co/parler-tts)
[[Training Code]](training)
[[Interactive Demo]](https://huggingface.co/spaces/parler-tts/parler_tts_mini)
> [!IMPORTANT] > [!IMPORTANT]
> We're proud to release Parler-TTS v0.1, our first 300M parameter model, trained on 10.5K hours of audio data. > We're proud to release Parler-TTS v0.1, our first 300M parameter model, trained on 10.5K hours of audio data.
> In the coming weeks, we'll be working on scaling up to 50k hours of data, in preparation for the v1 model. > In the coming weeks, we'll be working on scaling up to 50k hours of data, in preparation for the v1 model.
...@@ -15,6 +10,15 @@ Contrarily to other TTS models, Parler-TTS is a **fully open-source** release. A ...@@ -15,6 +10,15 @@ Contrarily to other TTS models, Parler-TTS is a **fully open-source** release. A
This repository contains the inference and training code for Parler-TTS. It is designed to accompany the [Data-Speech](https://github.com/huggingface/dataspeech) repository for dataset annotation. This repository contains the inference and training code for Parler-TTS. It is designed to accompany the [Data-Speech](https://github.com/huggingface/dataspeech) repository for dataset annotation.
## 📖 Quick Index
* [Installation](#installation)
* [Usage](#usage)
* [Training](#training)
* [Demo](https://huggingface.co/spaces/parler-tts/parler_tts_mini)
* [Model weights and datasets](https://huggingface.co/parler-tts)
## Usage ## Usage
> [!TIP] > [!TIP]
...@@ -44,7 +48,7 @@ audio_arr = generation.cpu().numpy().squeeze() ...@@ -44,7 +48,7 @@ audio_arr = generation.cpu().numpy().squeeze()
sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate) sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
``` ```
## Installation steps ## Installation
Parler-TTS has light-weight dependencies and can be installed in one line: Parler-TTS has light-weight dependencies and can be installed in one line:
...@@ -66,26 +70,6 @@ Special thanks to: ...@@ -66,26 +70,6 @@ Special thanks to:
- Descript for the [DAC codec model](https://github.com/descriptinc/descript-audio-codec) - Descript for the [DAC codec model](https://github.com/descriptinc/descript-audio-codec)
- Hugging Face 🤗 for providing compute resources and time to explore! - Hugging Face 🤗 for providing compute resources and time to explore!
## Contribution
Contributions are welcome, as the project offers many possibilities for improvement and exploration.
Namely, we're looking at ways to improve both quality and speed:
- Datasets:
- Train on more data
- Add more features such as accents
- Training:
- Add PEFT compatibility to do Lora fine-tuning.
- Add possibility to train without description column.
- Add notebook training.
- Explore multilingual training.
- Explore mono-speaker finetuning.
- Explore more architectures.
- Optimization:
- Compilation and static cache
- Support to FA2 and SDPA
- Evaluation:
- Add more evaluation metrics
## Citation ## Citation
...@@ -112,3 +96,25 @@ If you found this repository useful, please consider citing this work and also t ...@@ -112,3 +96,25 @@ If you found this repository useful, please consider citing this work and also t
primaryClass={cs.SD} primaryClass={cs.SD}
} }
``` ```
## Contribution
Contributions are welcome, as the project offers many possibilities for improvement and exploration.
Namely, we're looking at ways to improve both quality and speed:
- Datasets:
- Train on more data
- Add more features such as accents
- Training:
- Add PEFT compatibility to do Lora fine-tuning.
- Add possibility to train without description column.
- Add notebook training.
- Explore multilingual training.
- Explore mono-speaker finetuning.
- Explore more architectures.
- Optimization:
- Compilation and static cache
- Support to FA2 and SDPA
- Evaluation:
- Add more evaluation metrics
...@@ -207,5 +207,5 @@ Thus, the script generalises to any number of training datasets. ...@@ -207,5 +207,5 @@ Thus, the script generalises to any number of training datasets.
> [!IMPORTANT] > [!IMPORTANT]
> Starting training a new model from scratch can easily be overwhelming, here how the training of v0.01 looked like: [logs](https://api.wandb.ai/links/ylacombe/ea449l81) > Starting training a new model from scratch can easily be overwhelming,so here's what training looked like for v0.1: [logs](https://api.wandb.ai/links/ylacombe/ea449l81)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment