README.md

# CMU ARCTIC Databases

http://www.festvox.org/cmu_arctic/

## tts1

Unlike the other `tts1` recipes, `arctic/tts1` will finetune the pretrained models, which was trained on `mailabs/tts1`, since CMU ARCTIC Databases is too small to train the E2E-TTS model from scrach.

The statistics of feature vectors `cmvn.ark` and the dictionary file `dict` are not created but pretrained model's ones will be used.

Also, it is necessary to use exactly the same architecture to finetune the pretrained model so please be careful `train_confg` is the same as the pretrained model's one.

## vc1

We support acoustic-to-acoustic feature conversion based on parallel data with two kinds of models, which are based on Tacotron2 and Transformer, respectively.

A pretraining technique utilizing TTS can be employed to boost the performance, as described in [1]. We provide the base pretrained model, which was trained on the M-AILABS dataset (the en_US judy speaker was used, approx. 32 hrs of data).

Note that you still need to train from scratch (or fine-tune using the base pretrained model) the VC model by yourself.

## Reference

- [1] Huang, Wen-Chin, et al. "Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining." arXiv preprint arXiv:1912.06813 (2019).