Commit b30e5194 authored by Yoach Lacombe's avatar Yoach Lacombe
Browse files

add contribution section

parent 59f811d0
# TODOs
add possibility to train without description column
# TODO: add CE per codebook
\ No newline at end of file
...@@ -15,7 +15,7 @@ by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectiv ...@@ -15,7 +15,7 @@ by Dan Lyth and Simon King, from Stability AI and Edinburgh University respectiv
Contrarily to standard TTS models, Parler-TTS allows you to directly describe the speaker characteristics with a simple text description where you can modulate gender, pitch, speaking style, accent, etc. Contrarily to standard TTS models, Parler-TTS allows you to directly describe the speaker characteristics with a simple text description where you can modulate gender, pitch, speaking style, accent, etc.
## Inference ## Usage
> [!TIP] > [!TIP]
> You can directly try it out in an interactive demo [here](TODO: add link to spaces)! > You can directly try it out in an interactive demo [here](TODO: add link to spaces)!
...@@ -44,7 +44,6 @@ sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate) ...@@ -44,7 +44,6 @@ sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
``` ```
## Installation steps ## Installation steps
Parler-TTS has light-weight dependencies and can be installed in one line: Parler-TTS has light-weight dependencies and can be installed in one line:
...@@ -75,6 +74,24 @@ Special thanks to: ...@@ -75,6 +74,24 @@ Special thanks to:
- Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively, for publishing such a promising and clear research paper: [Natural language guidance of high-fidelity text-to-speech with synthetic annotations](https://arxiv.org/abs/2402.01912). - Dan Lyth and Simon King, from Stability AI and Edinburgh University respectively, for publishing such a promising and clear research paper: [Natural language guidance of high-fidelity text-to-speech with synthetic annotations](https://arxiv.org/abs/2402.01912).
- and the many libraries used, namely [datasets](https://huggingface.co/docs/datasets/v2.17.0/en/index), [accelerate](https://huggingface.co/docs/accelerate/en/index), [jiwer](https://github.com/jitsi/jiwer), [wandb](https://wandb.ai/), and [transformers](https://huggingface.co/docs/transformers/index). - and the many libraries used, namely [datasets](https://huggingface.co/docs/datasets/v2.17.0/en/index), [accelerate](https://huggingface.co/docs/accelerate/en/index), [jiwer](https://github.com/jitsi/jiwer), [wandb](https://wandb.ai/), and [transformers](https://huggingface.co/docs/transformers/index).
## Contribution
Contributions are welcome, as the project offers many possibilities for improvement and exploration.
Namely, we're looking at ways to improve both quality and speed:
- Datasets:
- Train on more data
- Add more features such as accents
- Training:
- Add PEFT compatibility to do Lora fine-tuning.
- Add possibility to train without description column.
- Explore multilingual training.
- Explore mono-speaker finetuning.
- Explore more architectures.
- Optimization:
- Compilation and static cache
- Support to FA2 and SDPA
## Citation ## Citation
``` ```
@misc{lacombe-etal-2024-parler-tts, @misc{lacombe-etal-2024-parler-tts,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment