init model

4130a52d · changhl · eb6a18fd · 4130a52d · 4130a52d · 4130a52d
Commit 4130a52d authored Aug 27, 2024 by changhl
20 changed files
--- a/README.md
+++ b/README.md
-# Tacotron2_pytorch
+# Tacotron2
+## 论文
+  - https://arxiv.org/pdf/1712.05884
+## 开源代码
+  - https://github.com/NVIDIA/tacotron2
+## 模型结构
+Tacotron2是由Google Brain在2017年提出来的一个End-to-End语音合成框架。该模型主要由两部分构成：
+- 声谱预测网络：一个Encoder-Attention-Decoder网络，用于将输入的字符序列预测为梅尔频谱的帧序列
+- 声码器（vocoder）：一个WaveNet的修订版，用于将预测的梅尔频谱帧序列产生时域波形
+<div align="center">
+    <img src="./images/architecture.png"/>
+</div>
+## 算法原理
+在这个架构中，Tacotron2将原先Tacotron的RNN模型进行改进，使用了LSTM模型，加入了遗忘门、输入门、输出门等门控结构，优化了梯度消失的问题，使得模型在反向传播的记忆力上有所提升，提高了合成的语音的质量。
+<div align="center">
+    <img src="./images/algorithm.png"/>
+</div>
+## 环境配置
+### Docker (方法一)
+**注意修改路径参数**
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
+docker run -it --network=host --ipc=host --name=your_container_name --shm-size=32G --device=/dev/kfd --device=/dev/mkfd --device=/dev/dri -v /opt/hyhal:/opt/hyhal:ro -v /path/your_code_data/:/path/your_code_data/ --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10 /bin/bash
+cd /path/your_code_data/
+pip3 install -r requirements.txt
+```
+### Dockerfile (方法二)
+```
+cd ./docker
+docker build --no-cache -t tacotron2 .
+docker run -it -v /path/your_code_data/:/path/your_code_data/ --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
+pip3 install -r requirements.txt
+```
+### Anaconda (方法三)
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装： https://developer.hpccube.com/tool/
+```
+DTK软件栈：dtk24.04.1
+python：python3.10
+torch：2.1.0
+torchvision：0.16.0
+torchaudio: 2.1.2
+```
+Tips：以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应
+2、其他非特殊库直接按照requirements.txt安装
+```
+pip3 install -r requirements.txt
+```
+## 数据集
+- SCnet快速下载链接：
+  - [LJSpeech数据集下载](http://113.200.138.88:18080/aidatasets/lj_speech)
+- 官方下载链接：
+  - [LJSpeech数据集下载](https://keithito.com/LJ-Speech-Dataset/)
+```LJSpeech-1.1```:用于语音合成的数据集，包含语音和文本信息，语音为wav格式，文本以csv格式保存。
+```
+├── LJSpeech-1.1
+│   ├──wav
+│   │   ├── LJ001-0001.wav
+│   │   ├── LJ001-0002.wav
+│   │   ├── LJ001-0003.wav
+│   │   ├── ...
+│   ├──metadata.csv
+│   ├──README
+```
+- LJSpeech
+  - wav：语音数据目录
+    - LJ001-0001.wav：语音文件
+    - LJ001-0002.wav：语音文件
+    - ...
+  - metadata.csv：文本信息文件
+    - 第一列：语音文件名称
+    - 第二列：文本信息
+    - 第三列：规范化后的文本信息
+  - README：说明文档
+## 预训练模型
+**推理前先下载预训练好的权重文件**
+- SCnet下载地址：
+  - [tacotron2模型权重下载地址](http://113.200.138.88:18080/aimodels/tacotron2_ljspeech)
+  - [hifigan模型权重下载地址](http://113.200.138.88:18080/aimodels/hifigan_ljspeech)
+- 官方下载地址：
+  - [tacotron2模型权重下载地址](https://hf-mirror.com/speechbrain/tts-tacotron2-ljspeech)
+  - [hifigan模型权重下载地址](https://hf-mirror.com/speechbrain/tts-hifigan-ljspeech)
+## 训练
+**确保当前的工作目录为tacotron2_pytorch，指定可见卡**
+### 单卡
+```
+export  HIP_VISIBLE_DEVICES 设置可见卡
+bash train_s.sh $dataset_path $save_path
+```
+- $dataset_path:数据集路径
+- $save_path:训练权重保存路径
+### 多卡
+```
+export  HIP_VISIBLE_DEVICES 设置可见卡
+bash train_m.sh $dataset_path $save_path
+```
+- $dataset_path:数据集路径
+- $save_path:训练权重保存路径
+## 推理
+```
+export  HIP_VISIBLE_DEVICES 设置可见卡
+python3 inference.py -m modelpath_tacotron2 -v modelpath_hifigan -t "hi, nice to meet you" 
+```
+- -m:tacotron2模型权重路径
+- -v:hifigan模型权重路径
+- -t:输入文本
+- -res:结果文件保存路径
+## result
+```
+  输入：“hi,nice to meet you”
+  输出：./res/example.wav
+```
+## 应用场景
+### 算法分类
+```
+语音合成
+```
+### 热点应用行业
+```
+金融，通信，广媒
+```
+## 源码仓库及问题反馈
+https://developer.hpccube.com/codes/modelzoo/tacotron2_pytorch
+## 参考
+[GitHub - NVIDIA/tacotron2](https://github.com/NVIDIA/tacotron2)
+[HF - speechbrain/tts-tacotron2-ljspeech](https://hf-mirror.com/speechbrain/tts-tacotron2-ljspeech)
\ No newline at end of file
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.10
+RUN source /opt/dtk/env.sh
\ No newline at end of file
--- a/icon.png
+++ b/icon.png
--- a/images/algorithm.png
+++ b/images/algorithm.png
--- a/images/architecture.png
+++ b/images/architecture.png
--- a/inference.py
+++ b/inference.py
+import torchaudio
+from speechbrain.inference.TTS import Tacotron2
+from speechbrain.inference.vocoders import HIFIGAN
+import os
+import argparse
+def parse_opt(known=False):
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-m', '--model-path', type=str, default="", help="the tacotron2 model path")
+    parser.add_argument('-v', '--vocoder-path', type=str, default="", help="the vocoder model path")
+    parser.add_argument('-t', '--text', type=str, default="Autumn, the season of change.", help="input text")
+    parser.add_argument('-res', '--result_path', type=str, default="./res", help="the path to save wav file")
+    opt = parser.parse_known_args()[0] if known else parser.parse_args()
+    return opt
+def main(opt):    
+    tacotron2 = Tacotron2.from_hparams(source=opt.model_path, run_opts={"device":"cuda"})
+    hifi_gan = HIFIGAN.from_hparams(source=opt.vocoder_path,run_opts={"device":"cuda"})
+    # Running the TTS
+    mel_output, mel_length, alignment = tacotron2.encode_text(opt.text)
+    # Running Vocoder (spectrogram-to-waveform)
+    waveforms = hifi_gan.decode_batch(mel_output)
+    # Save the waverform
+    torchaudio.save(os.path.join(opt.result_path, 'example.wav'),waveforms.squeeze(1).cpu(), 22050)
+if __name__ == "__main__":
+    main(opt=parse_opt())
--- a/model.properties
+++ b/model.properties
+#模型编码
+modelCode=917
+# 模型名称
+modelName=tacotron2_pytorch
+# 模型描述
+modelDescription=Tacotron2是由Google Brain在2017年提出来的一个End-to-End语音合成框架。
+# 应用场景(多个标签以英文逗号分割)
+appScenario=训练,推理,语音合成,金融,通信,广媒
+# 框架类型(多个标签以英文逗号分割)
+frameType=PyTorch
\ No newline at end of file
--- a/requirements.txt
+++ b/requirements.txt
+soundfile==0.12.1
+librosa==0.10.2.post1
+speechbrain==1.0.0
+hyperpyyaml>=0.0.1
+joblib>=0.14.1
+pre-commit>=2.3.0
+pygtrie>=2.1,<3.0
+tgt==1.5
+unidecode==1.3.8
\ No newline at end of file
--- a/res/example.wav
+++ b/res/example.wav
--- a/speechbrain/recipes/LJSpeech/TTS/README.md
+++ b/speechbrain/recipes/LJSpeech/TTS/README.md
+# Text-to-Speech (with LJSpeech)
+This folder contains the recipes for training TTS systems (including vocoders) with the popular LJSpeech dataset.
+# Dataset
+The dataset can be downloaded from here:
+https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2
+# Installing Extra Dependencies
+Before proceeding, ensure you have installed the necessary additional dependencies. To do this, simply run the following command in your terminal:
+```
+pip install -r extra_requirements.txt
+```
+# Tacotron 2
+The subfolder "tacotron2" contains the recipe for training the popular [tacotron2](https://arxiv.org/abs/1712.05884) TTS model.
+To run this recipe, go into the "tacotron2" folder and run:
+```
+python train.py --device=cuda:0 --max_grad_norm=1.0 --data_folder=/your_folder/LJSpeech-1.1 hparams/train.yaml
+```
+The training logs are available [here](https://www.dropbox.com/sh/1npvo1g1ncafipf/AAC5DR1ErF2Q9V4bd1DHqX43a?dl=0).
+You can find the pre-trained model with an easy-inference function on [HuggingFace](https://huggingface.co/speechbrain/tts-tacotron2-ljspeech).
+# FastSpeech2
+The subfolder "fastspeech2" contains the recipes for training the non-autoregressive transformer based TTS model [FastSpeech2](https://arxiv.org/abs/2006.04558).
+### FastSpeech2 with pre-extracted durations from a forced aligner
+Training FastSpeech2 requires pre-extracted phoneme alignments (durations). The LJSpeech phoneme alignments from Montreal Forced Aligner are automatically downloaded, decompressed and stored at this location: ```/your_folder/LJSpeech-1.1/TextGrid```.
+To run this recipe, please first install the extra-dependencies :
+```
+pip install -r extra_requirements.txt
+````
+Then go into the "fastspeech2" folder and run:
+```
+python train.py --data_folder=/your_folder/LJSpeech-1.1 hparams/train.yaml
+```
+Training takes about 3 minutes/epoch on 1 * V100 32G.
+The training logs are available [here](https://www.dropbox.com/scl/fo/vtgbltqdrvw9r0vs7jz67/h?rlkey=cm2mwh5rce5ad9e90qaciypox&dl=0).
+You can find the pre-trained model with an easy-inference function on [HuggingFace](https://huggingface.co/speechbrain/tts-fastspeech2-ljspeech).
+### FastSpeech2 with internal alignment
+This recipe allows training FastSpeech2 without forced aligner referring to [One TTS Alignment To Rule Them All](https://arxiv.org/pdf/2108.10447.pdf). The alignment can be learnt by an internal alignment network that is added to FastSpeech2. This recipe aims to simplify training when using custom data and provide better alignments for punctuations.
+To run this recipe, please first install the extra-requirements:
+```
+pip install -r extra_requirements.txt
+```
+Then go into the "fastspeech2" folder and run:
+```
+python train_internal_alignment.py hparams/train_internal_alignment.yaml --data_folder=/your_folder/LJSpeech-1.1
+```
+The data preparation includes a grapheme-to-phoneme process for the entire corpus which may take several hours. Training takes about 5 minutes/epoch on 1 * V100 32G.
+The training logs are available [here](https://www.dropbox.com/scl/fo/4ctkc6jjas3uij9dzcwta/h?rlkey=i0k086d77flcsdx40du1ppm2d&dl=0).
+You can find the pre-trained model with an easy-inference function on [HuggingFace](https://huggingface.co/speechbrain/tts-fastspeech2-internal-alignment-ljspeech).
+# HiFiGAN (Vocoder)
+The subfolder "vocoder/hifigan/" contains the [HiFiGAN vocoder](https://arxiv.org/pdf/2010.05646.pdf).
+The vocoder is a neural network that converts a spectrogram into a waveform (it can be used on top of Tacotron2/FastSpeech2).
+We suggest using `tensorboard_logger` by setting `use_tensorboard: True` in the yaml file, thus `Tensorboard` should be installed.
+To run this recipe, go into the "vocoder/hifigan/" folder and run:
+```
+python train.py hparams/train.yaml --data_folder /path/to/LJspeech
+```
+Training takes about 10 minutes/epoch on an nvidia RTX8000.
+The training logs are available [here](https://www.dropbox.com/sh/m2xrdssiroipn8g/AAD-TqPYLrSg6eNxUkcImeg4a?dl=0)
+You can find the pre-trained model with an easy-inference function on [HuggingFace](https://huggingface.co/speechbrain/tts-hifigan-ljspeech).
+# DiffWave (Vocoder)
+The subfolder "vocoder/diffwave/" contains the [Diffwave](https://arxiv.org/pdf/2009.09761.pdf) vocoder.
+DiffWave is a versatile diffusion model for audio synthesis, which produces high-fidelity audio in different waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation.
+Here it serves as a vocoder that generates waveforms given spectrograms as conditions (it can be used on top of Tacotron2/FastSpeech2).
+To run this recipe, go into the "vocoder/diffwave/" folder and run:
+```
+python train.py hparams/train.yaml --data_folder /path/to/LJspeech
+```
+The scripts will output a synthesized audio to `<output_folder>/samples` for a certain interval of training epoch.
+We suggest using tensorboard_logger by setting `use_tensorboard: True` in the yaml file, thus torch.Tensorboard should be installed.
+Training takes about 6 minutes/epoch on 1 * V100 32G.
+The training logs are available [here](https://www.dropbox.com/sh/tbhpn1xirtaix68/AACvYaVDiUGAKURf2o-fvgMoa?dl=0)
+For inference, by setting `fast_sampling: True` , a fast sampling can be realized by passing user-defined variance schedules. According to the paper, high-quality audios can be generated with only 6 steps. This is highly recommended.
+You can find the pre-trained model with an easy-inference function on [HuggingFace](https://huggingface.co/speechbrain/tts-diffwave-ljspeech).
+# HiFiGAN Unit Vocoder
+The subfolder "vocoder/hifigan_discrete/" contains the [HiFiGAN Unit vocoder](https://arxiv.org/abs/2406.10735). This vocoder is a neural network designed to transform discrete self-supervised representations into waveform data.
+This is suitable for a wide range of generative tasks such as speech enhancement, separation, text-to-speech, voice cloning, etc. Please read [DASB - Discrete Audio and Speech Benchmark](https://arxiv.org/abs/2406.14294) for more information.
+To run this recipe successfully, start by installing the necessary extra dependencies:
+```bash
+pip install -r extra_requirements.txt
+```
+Before training the vocoder, you need to choose a speech encoder to extract representations that will be used as discrete audio input. We support k-means models using features from HuBERT, WavLM, or Wav2Vec2. Below are the available self-supervised speech encoders for which we provide pre-trained k-means checkpoints:
+| Encoder  | HF model                                |
+|----------|-----------------------------------------|
+| HuBERT   | facebook/hubert-large-ll60k             |
+| Wav2Vec2 | facebook/wav2vec2-large-960h-lv60-self  |
+| WavLM    | microsoft/wavlm-large                   |
+Checkpoints are available in the HF [SSL_Quantization](https://huggingface.co/speechbrain/SSL_Quantization) repository. Alternatively, you can train your own k-means model by following instructions in the "LJSpeech/quantization" README.
+Next, configure the SSL model type, k-means model, and corresponding hub in your YAML configuration file. Follow these steps:
+1. Navigate to the "vocoder/hifigan_discrete/hparams" folder and open "train.yaml" file.
+2. Modify the `encoder_type` field to specify one of the SSL models: "HuBERT", "WavLM", or "Wav2Vec2".
+3. Update the `encoder_hub` field with the specific name of the SSL Hub associated with your chosen model type.
+If you have trained your own k-means model, follow these additional steps:
+4. Update the `kmeans_folder` field with the specific name of the SSL Hub containing your trained k-means model. Please follow the same file structure as the official one in [SSL_Quantization](https://huggingface.co/speechbrain/SSL_Quantization).
+5. Update the `kmeans_dataset` field with the specific name of the dataset on which the k-means model was trained.
+6. Update the `num_clusters` field according to the number of clusters of your k-means model.
+Finally, navigate back to the "vocoder/hifigan_discrete/" folder and run the following command:
+```bash
+python train.py hparams/train.yaml --data_folder=/path/to/LJspeech
+```
+Training typically takes around 4 minutes per epoch when using an NVIDIA A100 40G.
+# **About SpeechBrain**
+- Website: https://speechbrain.github.io/
+- Code: https://github.com/speechbrain/speechbrain/
+- HuggingFace: https://huggingface.co/speechbrain/
+# **Citing SpeechBrain**
+Please, cite SpeechBrain if you use it for your research or business.
+```bibtex
+@misc{ravanelli2024opensourceconversationalaispeechbrain,
+      title={Open-Source Conversational AI with SpeechBrain 1.0},
+      author={Mirco Ravanelli and Titouan Parcollet and Adel Moumen and Sylvain de Langen and Cem Subakan and Peter Plantinga and Yingzhi Wang and Pooneh Mousavi and Luca Della Libera and Artem Ploujnikov and Francesco Paissan and Davide Borra and Salah Zaiem and Zeyu Zhao and Shucong Zhang and Georgios Karakasidis and Sung-Lin Yeh and Pierre Champion and Aku Rouhe and Rudolf Braun and Florian Mai and Juan Zuluaga-Gomez and Seyed Mahed Mousavi and Andreas Nautsch and Xuechen Liu and Sangeet Sagar and Jarod Duret and Salima Mdhaffar and Gaelle Laperriere and Mickael Rouvier and Renato De Mori and Yannick Esteve},
+      year={2024},
+      eprint={2407.00463},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2407.00463},
+}
+@misc{speechbrain,
+  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
+  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
+  year={2021},
+  eprint={2106.04624},
+  archivePrefix={arXiv},
+  primaryClass={eess.AS},
+  note={arXiv:2106.04624}
+}
+```
--- a/speechbrain/recipes/LJSpeech/TTS/extra_requirements.txt
+++ b/speechbrain/recipes/LJSpeech/TTS/extra_requirements.txt
+# Needed only for quantization
+scikit-learn
+# Needed only with use_tensorboard=True
+# torchvision is needed to save spectrograms
+tensorboard
+tgt
+torchvision
+unidecode
--- a/speechbrain/recipes/LJSpeech/TTS/fastspeech2/hparams/train.yaml
+++ b/speechbrain/recipes/LJSpeech/TTS/fastspeech2/hparams/train.yaml
+############################################################################
+# Model: FastSpeech2
+# Tokens: Raw characters (English text)
+# Training: LJSpeech
+# Authors: Sathvik Udupa, Yingzhi Wang, Pradnya Kandarkar
+# ############################################################################
+###################################
+# Experiment Parameters and setup #
+###################################
+seed: 1234
+__set_seed: !apply:torch.manual_seed [!ref <seed>]
+output_folder: !ref results/fastspeech2/<seed>
+save_folder: !ref <output_folder>/save
+train_log: !ref <output_folder>/train_log.txt
+epochs: 500
+train_spn_predictor_epochs: 8
+progress_samples: True
+progress_sample_path: !ref <output_folder>/samples
+progress_samples_min_run: 10
+progress_samples_interval: 10
+progress_batch_sample_size: 4
+#################################
+# Data files and pre-processing #
+#################################
+data_folder: #!PLACEHOLDER # e.g., /data/Database/LJSpeech-1.1
+train_json: !ref <save_folder>/train.json
+valid_json: !ref <save_folder>/valid.json
+test_json: !ref <save_folder>/test.json
+splits: ["train", "valid"]
+split_ratio: [90, 10]
+skip_prep: False
+################################
+# Audio Parameters             #
+################################
+sample_rate: 22050
+hop_length: 256
+win_length: null
+n_mel_channels: 80
+n_fft: 1024
+mel_fmin: 0.0
+mel_fmax: 8000.0
+power: 1
+norm: "slaney"
+mel_scale: "slaney"
+dynamic_range_compression: True
+mel_normalized: False
+min_max_energy_norm: True
+min_f0: 65  #(torchaudio pyin values)
+max_f0: 2093 #(torchaudio pyin values)
+################################
+# Optimization Hyperparameters #
+################################
+learning_rate: 0.0001
+weight_decay: 0.000001
+max_grad_norm: 1.0
+batch_size: 32 #minimum 2
+num_workers_train: 16
+num_workers_valid: 4
+betas: [0.9, 0.98]
+################################
+# Model Parameters and model   #
+################################
+# Input parameters
+lexicon:
+    - AA
+    - AE
+    - AH
+    - AO
+    - AW
+    - AY
+    - B
+    - CH
+    - D
+    - DH
+    - EH
+    - ER
+    - EY
+    - F
+    - G
+    - HH
+    - IH
+    - IY
+    - JH
+    - K
+    - L
+    - M
+    - N
+    - NG
+    - OW
+    - OY
+    - P
+    - R
+    - S
+    - SH
+    - T
+    - TH
+    - UH
+    - UW
+    - V
+    - W
+    - Y
+    - Z
+    - ZH
+    - spn
+n_symbols: 42 #fixed depending on symbols in the lexicon +1 for a dummy symbol used for padding
+padding_idx: 0
+# Encoder parameters
+enc_num_layers: 4
+enc_num_head: 2
+enc_d_model: 384
+enc_ffn_dim: 1024
+enc_k_dim: 384
+enc_v_dim: 384
+enc_dropout: 0.2
+# Decoder parameters
+dec_num_layers: 4
+dec_num_head: 2
+dec_d_model: 384
+dec_ffn_dim: 1024
+dec_k_dim: 384
+dec_v_dim: 384
+dec_dropout: 0.2
+# Postnet parameters
+postnet_embedding_dim: 512
+postnet_kernel_size: 5
+postnet_n_convolutions: 5
+postnet_dropout: 0.5
+# common
+normalize_before: True
+ffn_type: 1dcnn #1dcnn or ffn
+ffn_cnn_kernel_size_list: [9, 1]
+# variance predictor
+dur_pred_kernel_size: 3
+pitch_pred_kernel_size: 3
+energy_pred_kernel_size: 3
+variance_predictor_dropout: 0.5
+# silent phoneme token predictor
+spn_predictor: !new:speechbrain.lobes.models.FastSpeech2.SPNPredictor
+    enc_num_layers: !ref <enc_num_layers>
+    enc_num_head: !ref <enc_num_head>
+    enc_d_model: !ref <enc_d_model>
+    enc_ffn_dim: !ref <enc_ffn_dim>
+    enc_k_dim: !ref <enc_k_dim>
+    enc_v_dim: !ref <enc_v_dim>
+    enc_dropout: !ref <enc_dropout>
+    normalize_before: !ref <normalize_before>
+    ffn_type: !ref <ffn_type>
+    ffn_cnn_kernel_size_list: !ref <ffn_cnn_kernel_size_list>
+    n_char: !ref <n_symbols>
+    padding_idx: !ref <padding_idx>
+#model
+model: !new:speechbrain.lobes.models.FastSpeech2.FastSpeech2
+    enc_num_layers: !ref <enc_num_layers>
+    enc_num_head: !ref <enc_num_head>
+    enc_d_model: !ref <enc_d_model>
+    enc_ffn_dim: !ref <enc_ffn_dim>
+    enc_k_dim: !ref <enc_k_dim>
+    enc_v_dim: !ref <enc_v_dim>
+    enc_dropout: !ref <enc_dropout>
+    dec_num_layers: !ref <dec_num_layers>
+    dec_num_head: !ref <dec_num_head>
+    dec_d_model: !ref <dec_d_model>
+    dec_ffn_dim: !ref <dec_ffn_dim>
+    dec_k_dim: !ref <dec_k_dim>
+    dec_v_dim: !ref <dec_v_dim>
+    dec_dropout: !ref <dec_dropout>
+    normalize_before: !ref <normalize_before>
+    ffn_type: !ref <ffn_type>
+    ffn_cnn_kernel_size_list: !ref <ffn_cnn_kernel_size_list>
+    n_char: !ref <n_symbols>
+    n_mels: !ref <n_mel_channels>
+    postnet_embedding_dim: !ref <postnet_embedding_dim>
+    postnet_kernel_size: !ref <postnet_kernel_size>
+    postnet_n_convolutions: !ref <postnet_n_convolutions>
+    postnet_dropout: !ref <postnet_dropout>
+    padding_idx: !ref <padding_idx>
+    dur_pred_kernel_size: !ref <dur_pred_kernel_size>
+    pitch_pred_kernel_size: !ref <pitch_pred_kernel_size>
+    energy_pred_kernel_size: !ref <energy_pred_kernel_size>
+    variance_predictor_dropout: !ref <variance_predictor_dropout>
+mel_spectogram: !name:speechbrain.lobes.models.FastSpeech2.mel_spectogram
+    sample_rate: !ref <sample_rate>
+    hop_length: !ref <hop_length>
+    win_length: !ref <win_length>
+    n_fft: !ref <n_fft>
+    n_mels: !ref <n_mel_channels>
+    f_min: !ref <mel_fmin>
+    f_max: !ref <mel_fmax>
+    power: !ref <power>
+    normalized: !ref <mel_normalized>
+    min_max_energy_norm: !ref <min_max_energy_norm>
+    norm: !ref <norm>
+    mel_scale: !ref <mel_scale>
+    compression: !ref <dynamic_range_compression>
+criterion: !new:speechbrain.lobes.models.FastSpeech2.Loss
+    log_scale_durations: True
+    duration_loss_weight: 1.0
+    pitch_loss_weight: 1.0
+    energy_loss_weight: 1.0
+    ssim_loss_weight: 1.0
+    mel_loss_weight: 1.0
+    postnet_mel_loss_weight: 1.0
+    spn_loss_weight: 1.0
+    spn_loss_max_epochs: !ref <train_spn_predictor_epochs>
+vocoder: "hifi-gan"
+pretrained_vocoder: True
+vocoder_source: speechbrain/tts-hifigan-ljspeech
+vocoder_download_path: tmpdir_vocoder
+modules:
+    spn_predictor: !ref <spn_predictor>
+    model: !ref <model>
+train_dataloader_opts:
+    batch_size: !ref <batch_size>
+    drop_last: False  #True #False
+    num_workers: !ref <num_workers_train>
+    shuffle: True
+    collate_fn: !new:speechbrain.lobes.models.FastSpeech2.TextMelCollate
+valid_dataloader_opts:
+    batch_size: !ref <batch_size>
+    num_workers: !ref <num_workers_valid>
+    shuffle: False
+    collate_fn: !new:speechbrain.lobes.models.FastSpeech2.TextMelCollate
+#optimizer
+opt_class: !name:torch.optim.Adam
+    lr: !ref <learning_rate>
+    weight_decay: !ref <weight_decay>
+    betas: !ref <betas>
+noam_annealing: !new:speechbrain.nnet.schedulers.NoamScheduler
+    lr_initial: !ref <learning_rate>
+    n_warmup_steps: 4000
+#epoch object
+epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
+    limit: !ref <epochs>
+train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
+    save_file: !ref <train_log>
+#checkpointer
+checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
+    checkpoints_dir: !ref <save_folder>
+    recoverables:
+        spn_predictor: !ref <spn_predictor>
+        model: !ref <model>
+        lr_annealing: !ref <noam_annealing>
+        counter: !ref <epoch_counter>
+input_encoder: !new:speechbrain.dataio.encoder.TextEncoder
+progress_sample_logger: !new:speechbrain.utils.train_logger.ProgressSampleLogger
+    output_path: !ref <progress_sample_path>
+    batch_sample_size: !ref <progress_batch_sample_size>
+    formats:
+        raw_batch: raw
--- a/speechbrain/recipes/LJSpeech/TTS/fastspeech2/hparams/train_internal_alignment.yaml
+++ b/speechbrain/recipes/LJSpeech/TTS/fastspeech2/hparams/train_internal_alignment.yaml
+############################################################################
+# Model: FastSpeech2 with internal alignment
+# Tokens: Phonemes (ARPABET)
+# Dataset: LJSpeech
+# Authors: Yingzhi Wang 2023
+# ############################################################################
+###################################
+# Experiment Parameters and setup #
+###################################
+seed: 1234
+__set_seed: !apply:torch.manual_seed [!ref <seed>]
+output_folder: !ref results/fastspeech2_internal_alignment/<seed>
+save_folder: !ref <output_folder>/save
+train_log: !ref <output_folder>/train_log.txt
+epochs: 500
+progress_samples: True
+progress_sample_path: !ref <output_folder>/samples
+progress_samples_min_run: 10
+progress_samples_interval: 10
+progress_batch_sample_size: 4
+#################################
+# Data files and pre-processing #
+#################################
+data_folder: !PLACEHOLDER # e.g., /data/Database/LJSpeech-1.1
+train_json: !ref <save_folder>/train.json
+valid_json: !ref <save_folder>/valid.json
+test_json: !ref <save_folder>/test.json
+splits: ["train", "valid"]
+split_ratio: [90, 10]
+skip_prep: False
+################################
+# Audio Parameters             #
+################################
+sample_rate: 22050
+hop_length: 256
+win_length: null
+n_mel_channels: 80
+n_fft: 1024
+mel_fmin: 0.0
+mel_fmax: 8000.0
+power: 1
+norm: "slaney"
+mel_scale: "slaney"
+dynamic_range_compression: True
+mel_normalized: False
+min_max_energy_norm: True
+min_f0: 65  #(torchaudio pyin values)
+max_f0: 2093 #(torchaudio pyin values)
+################################
+# Optimization Hyperparameters #
+################################
+learning_rate: 0.0001
+weight_decay: 0.000001
+max_grad_norm: 1.0
+batch_size: 16 #minimum 2
+betas: [0.9, 0.998]
+num_workers_train: 16
+num_workers_valid: 4
+################################
+# Model Parameters and model   #
+################################
+# Input parameters
+lexicon:
+    - "AA"
+    - "AE"
+    - "AH"
+    - "AO"
+    - "AW"
+    - "AY"
+    - "B"
+    - "CH"
+    - "D"
+    - "DH"
+    - "EH"
+    - "ER"
+    - "EY"
+    - "F"
+    - "G"
+    - "HH"
+    - "IH"
+    - "IY"
+    - "JH"
+    - "K"
+    - "L"
+    - "M"
+    - "N"
+    - "NG"
+    - "OW"
+    - "OY"
+    - "P"
+    - "R"
+    - "S"
+    - "SH"
+    - "T"
+    - "TH"
+    - "UH"
+    - "UW"
+    - "V"
+    - "W"
+    - "Y"
+    - "Z"
+    - "ZH"
+    - "-"
+    - "!"
+    - "'"
+    - "("
+    - ")"
+    - ","
+    - "."
+    - ":"
+    - ";"
+    - "?"
+    - " "
+n_symbols: 52 #fixed depending on symbols in the lexicon (+1 for a dummy symbol used for padding, +1 for unknown)
+padding_idx: 0
+hidden_channels: 512
+# Encoder parameters
+enc_num_layers: 4
+enc_num_head: 2
+enc_d_model: !ref <hidden_channels>
+enc_ffn_dim: 1024
+enc_k_dim: !ref <hidden_channels>
+enc_v_dim: !ref <hidden_channels>
+enc_dropout: 0.2
+# Aligner parameters
+in_query_channels: 80
+in_key_channels: !ref <hidden_channels> # 512 in the paper
+attn_channels: 80
+temperature: 0.0005
+# Decoder parameters
+dec_num_layers: 4
+dec_num_head: 2
+dec_d_model: !ref <hidden_channels>
+dec_ffn_dim: 1024
+dec_k_dim: !ref <hidden_channels>
+dec_v_dim: !ref <hidden_channels>
+dec_dropout: 0.2
+# Postnet parameters
+postnet_embedding_dim: 512
+postnet_kernel_size: 5
+postnet_n_convolutions: 5
+postnet_dropout: 0.2
+# common
+normalize_before: True
+ffn_type: 1dcnn #1dcnn or ffn
+ffn_cnn_kernel_size_list: [9, 1]
+# variance predictor
+dur_pred_kernel_size: 3
+pitch_pred_kernel_size: 3
+energy_pred_kernel_size: 3
+variance_predictor_dropout: 0.5
+#model
+model: !new:speechbrain.lobes.models.FastSpeech2.FastSpeech2WithAlignment
+    enc_num_layers: !ref <enc_num_layers>
+    enc_num_head: !ref <enc_num_head>
+    enc_d_model: !ref <enc_d_model>
+    enc_ffn_dim: !ref <enc_ffn_dim>
+    enc_k_dim: !ref <enc_k_dim>
+    enc_v_dim: !ref <enc_v_dim>
+    enc_dropout: !ref <enc_dropout>
+    in_query_channels: !ref <in_query_channels>
+    in_key_channels: !ref <in_key_channels>
+    attn_channels: !ref <attn_channels>
+    temperature: !ref <temperature>
+    dec_num_layers: !ref <dec_num_layers>
+    dec_num_head: !ref <dec_num_head>
+    dec_d_model: !ref <dec_d_model>
+    dec_ffn_dim: !ref <dec_ffn_dim>
+    dec_k_dim: !ref <dec_k_dim>
+    dec_v_dim: !ref <dec_v_dim>
+    dec_dropout: !ref <dec_dropout>
+    normalize_before: !ref <normalize_before>
+    ffn_type: !ref <ffn_type>
+    ffn_cnn_kernel_size_list: !ref <ffn_cnn_kernel_size_list>
+    n_char: !ref <n_symbols>
+    n_mels: !ref <n_mel_channels>
+    postnet_embedding_dim: !ref <postnet_embedding_dim>
+    postnet_kernel_size: !ref <postnet_kernel_size>
+    postnet_n_convolutions: !ref <postnet_n_convolutions>
+    postnet_dropout: !ref <postnet_dropout>
+    padding_idx: !ref <padding_idx>
+    dur_pred_kernel_size: !ref <dur_pred_kernel_size>
+    pitch_pred_kernel_size: !ref <pitch_pred_kernel_size>
+    energy_pred_kernel_size: !ref <energy_pred_kernel_size>
+    variance_predictor_dropout: !ref <variance_predictor_dropout>
+mel_spectogram: !name:speechbrain.lobes.models.FastSpeech2.mel_spectogram
+    sample_rate: !ref <sample_rate>
+    hop_length: !ref <hop_length>
+    win_length: !ref <win_length>
+    n_fft: !ref <n_fft>
+    n_mels: !ref <n_mel_channels>
+    f_min: !ref <mel_fmin>
+    f_max: !ref <mel_fmax>
+    power: !ref <power>
+    normalized: !ref <mel_normalized>
+    min_max_energy_norm: !ref <min_max_energy_norm>
+    norm: !ref <norm>
+    mel_scale: !ref <mel_scale>
+    compression: !ref <dynamic_range_compression>
+criterion: !new:speechbrain.lobes.models.FastSpeech2.LossWithAlignment
+    log_scale_durations: True
+    duration_loss_weight: 1.0
+    pitch_loss_weight: 1.0
+    energy_loss_weight: 1.0
+    ssim_loss_weight: 1.0
+    mel_loss_weight: 1.0
+    postnet_mel_loss_weight: 1.0
+    aligner_loss_weight: 1.0
+    binary_alignment_loss_weight: 0.2
+    binary_alignment_loss_warmup_epochs: 1
+    binary_alignment_loss_max_epochs: 80
+vocoder: "hifi-gan"
+pretrained_vocoder: True
+vocoder_source: speechbrain/tts-hifigan-ljspeech
+vocoder_download_path: tmpdir_vocoder
+modules:
+    model: !ref <model>
+train_dataloader_opts:
+    batch_size: !ref <batch_size>
+    drop_last: False  #True #False
+    num_workers: !ref <num_workers_train>
+    shuffle: True
+    collate_fn: !new:speechbrain.lobes.models.FastSpeech2.TextMelCollateWithAlignment
+valid_dataloader_opts:
+    batch_size: !ref <batch_size>
+    num_workers: !ref <num_workers_valid>
+    shuffle: False
+    collate_fn: !new:speechbrain.lobes.models.FastSpeech2.TextMelCollateWithAlignment
+#optimizer
+opt_class: !name:torch.optim.Adam
+    lr: !ref <learning_rate>
+    weight_decay: !ref <weight_decay>
+    betas: !ref <betas>
+noam_annealing: !new:speechbrain.nnet.schedulers.NoamScheduler
+    lr_initial: !ref <learning_rate>
+    n_warmup_steps: 4000
+#epoch object
+epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
+    limit: !ref <epochs>
+train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
+    save_file: !ref <train_log>
+#checkpointer
+checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
+    checkpoints_dir: !ref <save_folder>
+    recoverables:
+        model: !ref <model>
+        lr_annealing: !ref <noam_annealing>
+        counter: !ref <epoch_counter>
+input_encoder: !new:speechbrain.dataio.encoder.TextEncoder
+progress_sample_logger: !new:speechbrain.utils.train_logger.ProgressSampleLogger
+    output_path: !ref <progress_sample_path>
+    batch_sample_size: !ref <progress_batch_sample_size>
+    formats:
+        raw_batch: raw
--- a/speechbrain/recipes/LJSpeech/TTS/fastspeech2/ljspeech_prepare.py
+++ b/speechbrain/recipes/LJSpeech/TTS/fastspeech2/ljspeech_prepare.py
+../../ljspeech_prepare.py
\ No newline at end of file
--- a/speechbrain/recipes/LJSpeech/TTS/fastspeech2/train.py
+++ b/speechbrain/recipes/LJSpeech/TTS/fastspeech2/train.py
--- a/speechbrain/recipes/LJSpeech/TTS/fastspeech2/train_internal_alignment.py
+++ b/speechbrain/recipes/LJSpeech/TTS/fastspeech2/train_internal_alignment.py
+"""
+Recipe for training the FastSpeech2 Text-To-Speech model
+Instead of using pre-extracted phoneme durations from MFA,
+This recipe trains an internal alignment from scratch, as introduced in:
+https://arxiv.org/pdf/2108.10447.pdf (One TTS Alignment To Rule Them All)
+To run this recipe, do the following:
+# python train_internal_alignment.py hparams/train_internal_alignment.yaml
+Authors
+* Yingzhi Wang 2023
+"""
+import logging
+import os
+import sys
+from pathlib import Path
+import numpy as np
+import torch
+import torchaudio
+from hyperpyyaml import load_hyperpyyaml
+import speechbrain as sb
+from speechbrain.inference.vocoders import HIFIGAN
+from speechbrain.utils.data_utils import scalarize
+os.environ["TOKENIZERS_PARALLELISM"] = "false"
+logger = logging.getLogger(__name__)
+class FastSpeech2Brain(sb.Brain):
+    def on_fit_start(self):
+        """Gets called at the beginning of ``fit()``, on multiple processes
+        if ``distributed_count > 0`` and backend is ddp and initializes statistics
+        """
+        self.hparams.progress_sample_logger.reset()
+        self.last_epoch = 0
+        self.last_batch = None
+        self.last_loss_stats = {}
+        return super().on_fit_start()
+    def compute_forward(self, batch, stage):
+        """Computes the forward pass
+        Arguments
+        ---------
+        batch: str
+            a single batch
+        stage: speechbrain.Stage
+            the training stage
+        Returns
+        -------
+        the model output
+        """
+        inputs, _ = self.batch_to_device(batch)
+        return self.hparams.model(*inputs)
+    def on_fit_batch_end(self, batch, outputs, loss, should_step):
+        """At the end of the optimizer step, apply noam annealing and logging."""
+        if should_step:
+            self.hparams.noam_annealing(self.optimizer)
+    def compute_objectives(self, predictions, batch, stage):
+        """Computes the loss given the predicted and targeted outputs.
+        Arguments
+        ---------
+        predictions : torch.Tensor
+            The model generated spectrograms and other metrics from `compute_forward`.
+        batch : PaddedBatch
+            This batch object contains all the relevant tensors for computation.
+        stage : sb.Stage
+            One of sb.Stage.TRAIN, sb.Stage.VALID, or sb.Stage.TEST.
+        Returns
+        -------
+        loss : torch.Tensor
+            A one-element tensor used for backpropagating the gradient.
+        """
+        x, y, metadata = self.batch_to_device(batch, return_metadata=True)
+        self.last_batch = [x[0], y[-1], y[-2], predictions[0], *metadata]
+        self._remember_sample([x[0], *y, *metadata], predictions)
+        loss = self.hparams.criterion(
+            predictions, y, self.hparams.epoch_counter.current
+        )
+        self.last_loss_stats[stage] = scalarize(loss)
+        return loss["total_loss"]
+    def _remember_sample(self, batch, predictions):
+        """Remembers samples of spectrograms and the batch for logging purposes
+        Arguments
+        ---------
+        batch: tuple
+            a training batch
+        predictions: tuple
+            predictions (raw output of the FastSpeech2
+             model)
+        """
+        (
+            phoneme_padded,
+            mel_padded,
+            pitch,
+            energy,
+            output_lengths,
+            input_lengths,
+            labels,
+            wavs,
+        ) = batch
+        (
+            mel_post,
+            postnet_mel_out,
+            predict_durations,
+            predict_pitch,
+            average_pitch,
+            predict_energy,
+            average_energy,
+            predict_mel_lens,
+            alignment_durations,
+            alignment_soft,
+            alignment_logprob,
+            alignment_mas,
+        ) = predictions
+        self.hparams.progress_sample_logger.remember(
+            target=self.process_mel(mel_padded, output_lengths),
+            output=self.process_mel(postnet_mel_out, output_lengths),
+            raw_batch=self.hparams.progress_sample_logger.get_batch_sample(
+                {
+                    "tokens": phoneme_padded,
+                    "input_lengths": input_lengths,
+                    "mel_target": mel_padded,
+                    "mel_out": postnet_mel_out,
+                    "mel_lengths": predict_mel_lens,
+                    "durations": alignment_durations,
+                    "predict_durations": predict_durations,
+                    "labels": labels,
+                    "wavs": wavs,
+                }
+            ),
+        )
+    def process_mel(self, mel, len, index=0):
+        """Converts a mel spectrogram to one that can be saved as an image
+        sample  = sqrt(exp(mel))
+        Arguments
+        ---------
+        mel: torch.Tensor
+            the mel spectrogram (as used in the model)
+        len: int
+            length of the mel spectrogram
+        index: int
+            batch index
+        Returns
+        -------
+        mel: torch.Tensor
+            the spectrogram, for image saving purposes
+        """
+        assert mel.dim() == 3
+        return torch.sqrt(torch.exp(mel[index][: len[index]]))
+    def on_stage_end(self, stage, stage_loss, epoch):
+        """Gets called at the end of an epoch.
+        Arguments
+        ---------
+        stage : sb.Stage
+            One of sb.Stage.TRAIN, sb.Stage.VALID, sb.Stage.TEST
+        stage_loss : float
+            The average loss for all of the data processed in this stage.
+        epoch : int
+            The currently-starting epoch. This is passed
+            `None` during the test stage.
+        """
+        # At the end of validation, we can write
+        if stage == sb.Stage.VALID:
+            # Update learning rate
+            self.last_epoch = epoch
+            lr = self.hparams.noam_annealing.current_lr
+            # The train_logger writes a summary to stdout and to the logfile.
+            self.hparams.train_logger.log_stats(  # 1#2#
+                stats_meta={"Epoch": epoch, "lr": lr},
+                train_stats=self.last_loss_stats[sb.Stage.TRAIN],
+                valid_stats=self.last_loss_stats[sb.Stage.VALID],
+            )
+            output_progress_sample = (
+                self.hparams.progress_samples
+                and epoch % self.hparams.progress_samples_interval == 0
+                and epoch >= self.hparams.progress_samples_min_run
+            )
+            if output_progress_sample:
+                logger.info("Saving predicted samples")
+                inference_mel, mel_lens = self.run_inference()
+                self.hparams.progress_sample_logger.save(epoch)
+                self.run_vocoder(inference_mel, mel_lens)
+            # Save the current checkpoint and delete previous checkpoints.
+            # UNCOMMENT THIS
+            self.checkpointer.save_and_keep_only(
+                meta=self.last_loss_stats[stage],
+                min_keys=["total_loss"],
+            )
+        # We also write statistics about test data spectogram to stdout and to the logfile.
+        if stage == sb.Stage.TEST:
+            self.hparams.train_logger.log_stats(
+                {"Epoch loaded": self.hparams.epoch_counter.current},
+                test_stats=self.last_loss_stats[sb.Stage.TEST],
+            )
+    def run_inference(self):
+        """Produces a sample in inference mode with predicted durations."""
+        if self.last_batch is None:
+            return
+        tokens, *_ = self.last_batch
+        (
+            _,
+            postnet_mel_out,
+            _,
+            _,
+            _,
+            _,
+            _,
+            predict_mel_lens,
+            _,
+            _,
+            _,
+            _,
+        ) = self.hparams.model(tokens)
+        self.hparams.progress_sample_logger.remember(
+            infer_output=self.process_mel(
+                postnet_mel_out, [len(postnet_mel_out[0])]
+            )
+        )
+        return postnet_mel_out, predict_mel_lens
+    def run_vocoder(self, inference_mel, mel_lens):
+        """Uses a pretrained vocoder to generate audio from predicted mel
+        spectogram. By default, uses speechbrain hifigan.
+        Arguments
+        ---------
+        inference_mel: torch.Tensor
+            predicted mel from fastspeech2 inference
+        mel_lens: torch.Tensor
+            predicted mel lengths from fastspeech2 inference
+            used to mask the noise from padding
+        Returns
+        -------
+        None
+        """
+        if self.last_batch is None:
+            return
+        *_, wavs = self.last_batch
+        inference_mel = inference_mel[: self.hparams.progress_batch_sample_size]
+        mel_lens = mel_lens[0 : self.hparams.progress_batch_sample_size]
+        assert (
+            self.hparams.vocoder == "hifi-gan"
+            and self.hparams.pretrained_vocoder is True
+        ), "Specified vocoder not supported yet"
+        logger.info(
+            f"Generating audio with pretrained {self.hparams.vocoder_source} vocoder"
+        )
+        hifi_gan = HIFIGAN.from_hparams(
+            source=self.hparams.vocoder_source,
+            savedir=self.hparams.vocoder_download_path,
+        )
+        waveforms = hifi_gan.decode_batch(
+            inference_mel.transpose(2, 1), mel_lens, self.hparams.hop_length
+        )
+        for idx, wav in enumerate(waveforms):
+            path = os.path.join(
+                self.hparams.progress_sample_path,
+                str(self.last_epoch),
+                f"pred_{Path(wavs[idx]).stem}.wav",
+            )
+            torchaudio.save(path, wav, self.hparams.sample_rate)
+    def batch_to_device(self, batch, return_metadata=False):
+        """Transfers the batch to the target device
+        Arguments
+        ---------
+        batch: tuple
+            the batch to use
+        return_metadata: bool
+            Whether to additionally return labels and wavs.
+        Returns
+        -------
+        x: tuple
+            phonemes, spectrogram, pitch, energy
+        y: tuple
+            spectrogram, pitch, energy, mel_lengths, input_lengths
+        metadata: tuple
+            labels, wavs
+        """
+        (
+            phoneme_padded,
+            input_lengths,
+            mel_padded,
+            pitch_padded,
+            energy_padded,
+            output_lengths,
+            # len_x,
+            labels,
+            wavs,
+        ) = batch
+        # durations = durations.to(self.device, non_blocking=True).long()
+        phonemes = phoneme_padded.to(self.device, non_blocking=True).long()
+        input_lengths = input_lengths.to(self.device, non_blocking=True).long()
+        spectogram = mel_padded.to(self.device, non_blocking=True).float()
+        pitch = pitch_padded.to(self.device, non_blocking=True).float()
+        energy = energy_padded.to(self.device, non_blocking=True).float()
+        mel_lengths = output_lengths.to(self.device, non_blocking=True).long()
+        x = (phonemes, spectogram, pitch, energy)
+        y = (spectogram, pitch, energy, mel_lengths, input_lengths)
+        metadata = (labels, wavs)
+        if return_metadata:
+            return x, y, metadata
+        return x, y
+def dataio_prepare(hparams):
+    "Creates the datasets and their data processing pipelines."
+    # Load lexicon
+    lexicon = hparams["lexicon"]
+    input_encoder = hparams.get("input_encoder")
+    # add a dummy symbol for idx 0 - used for padding.
+    lexicon = ["@@"] + lexicon
+    input_encoder.update_from_iterable(lexicon, sequence_input=False)
+    input_encoder.add_unk()
+    # load audio, text and durations on the fly; encode audio and text.
+    @sb.utils.data_pipeline.takes("wav", "phonemes", "pitch")
+    @sb.utils.data_pipeline.provides("mel_text_pair")
+    def audio_pipeline(wav, phonemes, pitch):
+        phoneme_seq = input_encoder.encode_sequence_torch(phonemes).int()
+        audio, fs = torchaudio.load(wav)
+        audio = audio.squeeze()
+        mel, energy = hparams["mel_spectogram"](audio=audio)
+        pitch = np.load(pitch)
+        pitch = torch.from_numpy(pitch)
+        pitch = pitch[: mel.shape[-1]]
+        return phoneme_seq, mel, pitch, energy, len(phoneme_seq), len(mel)
+    # define splits and load it as sb dataset
+    datasets = {}
+    for dataset in hparams["splits"]:
+        datasets[dataset] = sb.dataio.dataset.DynamicItemDataset.from_json(
+            json_path=hparams[f"{dataset}_json"],
+            replacements={"data_root": hparams["data_folder"]},
+            dynamic_items=[audio_pipeline],
+            output_keys=["mel_text_pair", "wav", "label", "pitch"],
+        )
+    return datasets
+def main():
+    hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])
+    with open(hparams_file) as fin:
+        hparams = load_hyperpyyaml(fin, overrides)
+    sb.utils.distributed.ddp_init_group(run_opts)
+    sb.create_experiment_directory(
+        experiment_directory=hparams["output_folder"],
+        hyperparams_to_save=hparams_file,
+        overrides=overrides,
+    )
+    from ljspeech_prepare import prepare_ljspeech
+    sb.utils.distributed.run_on_main(
+        prepare_ljspeech,
+        kwargs={
+            "data_folder": hparams["data_folder"],
+            "save_folder": hparams["save_folder"],
+            "splits": hparams["splits"],
+            "split_ratio": hparams["split_ratio"],
+            "model_name": hparams["model"].__class__.__name__,
+            "seed": hparams["seed"],
+            "pitch_n_fft": hparams["n_fft"],
+            "pitch_hop_length": hparams["hop_length"],
+            "pitch_min_f0": hparams["min_f0"],
+            "pitch_max_f0": hparams["max_f0"],
+            "skip_prep": hparams["skip_prep"],
+            "use_custom_cleaner": True,
+            "device": "cuda",
+        },
+    )
+    datasets = dataio_prepare(hparams)
+    # Brain class initialization
+    fastspeech2_brain = FastSpeech2Brain(
+        modules=hparams["modules"],
+        opt_class=hparams["opt_class"],
+        hparams=hparams,
+        run_opts=run_opts,
+        checkpointer=hparams["checkpointer"],
+    )
+    # Training
+    fastspeech2_brain.fit(
+        fastspeech2_brain.hparams.epoch_counter,
+        datasets["train"],
+        datasets["valid"],
+        train_loader_kwargs=hparams["train_dataloader_opts"],
+        valid_loader_kwargs=hparams["valid_dataloader_opts"],
+    )
+if __name__ == "__main__":
+    main()
--- a/speechbrain/recipes/LJSpeech/TTS/tacotron2/__pycache__/ljspeech_prepare.cpython-310.pyc
+++ b/speechbrain/recipes/LJSpeech/TTS/tacotron2/__pycache__/ljspeech_prepare.cpython-310.pyc
--- a/speechbrain/recipes/LJSpeech/TTS/tacotron2/hparams/train.yaml
+++ b/speechbrain/recipes/LJSpeech/TTS/tacotron2/hparams/train.yaml
+############################################################################
+# Model: Tacotron2
+# Tokens: Raw characters (English text)
+# losses: Transducer
+# Training: LJSpeech
+# Authors: Georges Abous-Rjeili, Artem Ploujnikov, Yingzhi Wang
+# ############################################################################
+###################################
+# Experiment Parameters and setup #
+###################################
+seed: 1234
+__set_seed: !apply:torch.manual_seed [!ref <seed>]
+output_folder: !ref ./results/tacotron2/<seed>
+save_folder: !ref <output_folder>/save
+train_log: !ref <output_folder>/train_log.txt
+epochs: 750
+keep_checkpoint_interval: 50
+###################################
+# Progress Samples                #
+###################################
+# Progress samples are used to monitor the progress
+# of an ongoing training session by outputting samples
+# of spectrograms, alignments, etc at regular intervals
+# Whether to enable progress samples
+progress_samples: True
+# The path where the samples will be stored
+progress_sample_path: !ref <output_folder>/samples
+# The interval, in epochs. For instance, if it is set to 5,
+# progress samples will be output every 5 epochs
+progress_samples_interval: 1
+# The sample size for raw batch samples saved in batch.pth
+# (useful mostly for model debugging)
+progress_batch_sample_size: 3
+#################################
+# Data files and pre-processing #
+#################################
+data_folder: !PLACEHOLDER # e.g, /localscratch/ljspeech
+train_json: !ref <save_folder>/train.json
+valid_json: !ref <save_folder>/valid.json
+test_json: !ref <save_folder>/test.json
+splits: ["train", "valid"]
+split_ratio: [90, 10]
+skip_prep: False
+# Use the original preprocessing from nvidia
+# The cleaners to be used (applicable to nvidia only)
+text_cleaners: ['english_cleaners']
+################################
+# Audio Parameters             #
+################################
+sample_rate: 22050
+hop_length: 256
+win_length: 1024
+n_mel_channels: 80
+n_fft: 1024
+mel_fmin: 0.0
+mel_fmax: 8000.0
+mel_normalized: False
+power: 1
+norm: "slaney"
+mel_scale: "slaney"
+dynamic_range_compression: True
+################################
+# Optimization Hyperparameters #
+################################
+learning_rate: 0.001
+weight_decay: 0.000006
+batch_size: 64 #minimum 2
+num_workers: 8
+mask_padding: True
+guided_attention_sigma: 0.2
+guided_attention_weight: 50.0
+guided_attention_weight_half_life: 10.
+guided_attention_hard_stop: 50
+gate_loss_weight: 1.0
+train_dataloader_opts:
+  batch_size: !ref <batch_size>
+  drop_last: False  #True #False
+  num_workers: !ref <num_workers>
+  collate_fn: !new:speechbrain.lobes.models.Tacotron2.TextMelCollate
+valid_dataloader_opts:
+  batch_size: !ref <batch_size>
+  num_workers: !ref <num_workers>
+  collate_fn: !new:speechbrain.lobes.models.Tacotron2.TextMelCollate
+test_dataloader_opts:
+  batch_size: !ref <batch_size>
+  num_workers: !ref <num_workers>
+  collate_fn: !new:speechbrain.lobes.models.Tacotron2.TextMelCollate
+################################
+# Model Parameters and model   #
+################################
+n_symbols: 148 #fixed depending on symbols in textToSequence
+symbols_embedding_dim: 512
+# Encoder parameters
+encoder_kernel_size: 5
+encoder_n_convolutions: 3
+encoder_embedding_dim: 512
+# Decoder parameters
+# The number of frames in the target per encoder step
+n_frames_per_step: 1
+decoder_rnn_dim: 1024
+prenet_dim: 256
+max_decoder_steps: 1000
+gate_threshold: 0.5
+p_attention_dropout: 0.1
+p_decoder_dropout: 0.1
+decoder_no_early_stopping: False
+# Attention parameters
+attention_rnn_dim: 1024
+attention_dim: 128
+# Location Layer parameters
+attention_location_n_filters: 32
+attention_location_kernel_size: 31
+# Mel-post processing network parameters
+postnet_embedding_dim: 512
+postnet_kernel_size: 5
+postnet_n_convolutions: 5
+mel_spectogram: !name:speechbrain.lobes.models.Tacotron2.mel_spectogram
+  sample_rate: !ref <sample_rate>
+  hop_length: !ref <hop_length>
+  win_length: !ref <win_length>
+  n_fft: !ref <n_fft>
+  n_mels: !ref <n_mel_channels>
+  f_min: !ref <mel_fmin>
+  f_max: !ref <mel_fmax>
+  power: !ref <power>
+  normalized: !ref <mel_normalized>
+  norm: !ref <norm>
+  mel_scale: !ref <mel_scale>
+  compression: !ref <dynamic_range_compression>
+#model
+model: !new:speechbrain.lobes.models.Tacotron2.Tacotron2
+  mask_padding: !ref <mask_padding>
+  n_mel_channels: !ref <n_mel_channels>
+  # symbols
+  n_symbols: !ref <n_symbols>
+  symbols_embedding_dim: !ref <symbols_embedding_dim>
+  # encoder
+  encoder_kernel_size: !ref <encoder_kernel_size>
+  encoder_n_convolutions: !ref <encoder_n_convolutions>
+  encoder_embedding_dim: !ref <encoder_embedding_dim>
+  # attention
+  attention_rnn_dim: !ref <attention_rnn_dim>
+  attention_dim: !ref <attention_dim>
+  # attention location
+  attention_location_n_filters: !ref <attention_location_n_filters>
+  attention_location_kernel_size: !ref <attention_location_kernel_size>
+  # decoder
+  n_frames_per_step: !ref <n_frames_per_step>
+  decoder_rnn_dim: !ref <decoder_rnn_dim>
+  prenet_dim: !ref <prenet_dim>
+  max_decoder_steps: !ref <max_decoder_steps>
+  gate_threshold: !ref <gate_threshold>
+  p_attention_dropout: !ref <p_attention_dropout>
+  p_decoder_dropout: !ref <p_decoder_dropout>
+  # postnet
+  postnet_embedding_dim: !ref <postnet_embedding_dim>
+  postnet_kernel_size: !ref <postnet_kernel_size>
+  postnet_n_convolutions: !ref <postnet_n_convolutions>
+  decoder_no_early_stopping: !ref <decoder_no_early_stopping>
+guided_attention_scheduler: !new:speechbrain.nnet.schedulers.StepScheduler
+  initial_value: !ref <guided_attention_weight>
+  half_life: !ref <guided_attention_weight_half_life>
+criterion: !new:speechbrain.lobes.models.Tacotron2.Loss
+  gate_loss_weight: !ref <gate_loss_weight>
+  guided_attention_weight: !ref <guided_attention_weight>
+  guided_attention_sigma: !ref <guided_attention_sigma>
+  guided_attention_scheduler: !ref <guided_attention_scheduler>
+  guided_attention_hard_stop: !ref <guided_attention_hard_stop>
+modules:
+  model: !ref <model>
+#optimizer
+opt_class: !name:torch.optim.Adam
+  lr: !ref <learning_rate>
+  weight_decay: !ref <weight_decay>
+#epoch object
+epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
+  limit: !ref <epochs>
+train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
+  save_file: !ref <train_log>
+#annealing_function
+lr_annealing: !new:speechbrain.nnet.schedulers.IntervalScheduler
+  intervals:
+    - steps: 6000
+      lr: 0.0005
+    - steps: 8000
+      lr: 0.0003
+    - steps: 10000
+      lr: 0.0001
+#checkpointer
+checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
+  checkpoints_dir: !ref <save_folder>
+  recoverables:
+    model: !ref <model>
+    counter: !ref <epoch_counter>
+    scheduler: !ref <lr_annealing>
+#infer: !name:speechbrain.lobes.models.Tacotron2.infer
+progress_sample_logger: !new:speechbrain.utils.train_logger.ProgressSampleLogger
+  output_path: !ref <progress_sample_path>
+  batch_sample_size: !ref <progress_batch_sample_size>
+  formats:
+    raw_batch: raw
--- a/speechbrain/recipes/LJSpeech/TTS/tacotron2/ljspeech_prepare.py
+++ b/speechbrain/recipes/LJSpeech/TTS/tacotron2/ljspeech_prepare.py
--- a/speechbrain/recipes/LJSpeech/TTS/tacotron2/results/tacotron2/1234/env.log
+++ b/speechbrain/recipes/LJSpeech/TTS/tacotron2/results/tacotron2/1234/env.log
+SpeechBrain system description
+==============================
+Python version:
+3.10.12 (main, May 26 2024, 00:14:02) [GCC 9.4.0]
+==============================
+Installed Python packages:
+accelerate==0.31.0
+addict==2.4.0
+aiosignal==1.3.1
+aitemplate @ http://10.6.10.68:8000/release/aitemplate/dtk24.04.1/aitemplate-0.0.1%2Bdas1.1.git5d8aa20.dtk2404.torch2.1.0-py3-none-any.whl#sha256=ad763a7cfd3935857cf10a07a2a97899fd64dda481add2f48de8b8930bd341dd
+annotated-types==0.7.0
+anyio==4.4.0
+apex @ http://10.6.10.68:8000/release/apex/dtk24.04.1/apex-1.1.0%2Bdas1.1.gitf477a3a.abi1.dtk2404.torch2.1.0-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=85eb662d13d6e6c3b61c2d878378c2338c4479bc03a1912c3eabddc2d9d08aa1
+attrs==23.2.0
+audioread==3.0.1
+bitsandbytes @ http://10.6.10.68:8000/release/bitsandbyte/dtk24.04.1/bitsandbytes-0.42.0%2Bdas1.1.gitce85679.abi1.dtk2404.torch2.1.0-py3-none-any.whl#sha256=6324e330c8d12b858d39f4986c0ed0836fcb05f539cee92a7cf558e17954ae0d
+certifi==2024.6.2
+cffi==1.17.0
+cfgv==3.4.0
+charset-normalizer==3.3.2
+click==8.1.7
+coloredlogs==15.0.1
+contourpy==1.2.1
+cycler==0.12.1
+decorator==5.1.1
+deepspeed @ http://10.6.10.68:8000/release/deepspeed/dtk24.04.1/deepspeed-0.12.3%2Bgita724046.abi1.dtk2404.torch2.1.0-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=2c158ed2dab21f4f09e7fc29776cb43a1593b13cec33168ce3483f318b852fc9
+distlib==0.3.8
+dnspython==2.6.1
+dropout-layer-norm @ http://10.6.10.68:8000/release/flash_attn/dtk24.04.1/dropout_layer_norm-0.1%2Bdas1.1gitc7a8c18.abi1.dtk2404.torch2.1-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=ae10c7cc231a8e38492292e91e76ba710d7679762604c0a7f10964b2385cdbd7
+einops==0.8.0
+email_validator==2.1.1
+exceptiongroup==1.2.1
+fastapi==0.111.0
+fastapi-cli==0.0.4
+fastpt @ http://10.6.10.68:8000/release/fastpt/dtk24.04.1/fastpt-1.0.0%2Bdas1.1.abi1.dtk2404-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=ecf30dadcd2482adb1107991edde19b6559b8237379dbb0a3e6eb7306aad3f9a
+filelock==3.15.1
+fire==0.6.0
+flash-attn @ http://10.6.10.68:8000/release/flash_attn/dtk24.04.1/flash_attn-2.0.4%2Bdas1.1gitc7a8c18.abi1.dtk2404.torch2.1-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=7ca8e78ee0624b1ff0e91e9fc265e61b9510f02123a010ac71a2f8e5d08a62f7
+flatbuffers==24.3.25
+fonttools==4.53.0
+frozenlist==1.4.1
+fsspec==2024.6.0
+fused-dense-lib @ http://10.6.10.68:8000/release/flash_attn/dtk24.04.1/fused_dense_lib-0.1%2Bdas1.1gitc7a8c18.abi1.dtk2404.torch2.1-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=7202dd258a86bb7a1572e3b44b90dae667b0c948bf0f420b05924a107aaaba03
+h11==0.14.0
+hjson==3.1.0
+httpcore==1.0.5
+httptools==0.6.1
+httpx==0.27.0
+huggingface-hub==0.23.4
+humanfriendly==10.0
+HyperPyYAML==1.2.2
+hypothesis==5.35.1
+identify==2.6.0
+idna==3.7
+importlib_metadata==7.1.0
+Jinja2==3.1.4
+joblib==1.4.2
+jsonschema==4.22.0
+jsonschema-specifications==2023.12.1
+kiwisolver==1.4.5
+layer-check-pt @ http://10.6.10.68:8000/release/layercheck/dtk24.04.1/layer_check_pt-1.2.3.git59a087a.abi1.dtk2404.torch2.1.0-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=807adae2d4d4b74898777f81e1b94f1af4d881afe6a7826c7c910b211accbea7
+lazy_loader==0.4
+librosa==0.10.2.post1
+lightop @ http://10.6.10.68:8000/release/lightop/dtk24.04.1/lightop-0.4%2Bdas1.1git8e60f07.abi1.dtk2404.torch2.1-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=2f2c88fd3fe4be179f44c4849e9224cb5b2b259843fc5a2d088e468b7a14c1b1
+llvmlite==0.43.0
+lmdeploy @ http://10.6.10.68:8000/release/lmdeploy/dtk24.04.1/lmdeploy-0.2.6%2Bdas1.1.git6ba90df.abi1.dtk2404.torch2.1.0-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=92ecee2c8b982f86e5c3219ded24d2ede219f415bf2cd4297f989a03387a203c
+markdown-it-py==3.0.0
+MarkupSafe==2.1.5
+matplotlib==3.9.0
+mdurl==0.1.2
+mmcv @ http://10.6.10.68:8000/release/mmcv/dtk24.04.1/mmcv-2.0.1%2Bdas1.1.gite58da25.abi1.dtk2404.torch2.1.0-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=7a937ae22f81b44d9100907e11303c31bf9a670cb4c92e361675674a41a8a07f
+mmengine==0.10.4
+mmengine-lite==0.10.4
+mpmath==1.3.0
+msgpack==1.0.8
+networkx==3.3
+ninja==1.11.1.1
+nodeenv==1.9.1
+numba==0.60.0
+numpy==1.24.3
+onnxruntime @ http://10.6.10.68:8000/release/onnxruntime/dtk24.04.1/onnxruntime-1.15.0%2Bdas1.1.git739f24d.abi1.dtk2404-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=d0d24167188d2c85f1ed4110fc43e62ea40c74280716d9b5fe9540256f17869a
+opencv-python==4.10.0.82
+orjson==3.10.5
+packaging==24.1
+pandas==2.2.2
+peft==0.9.0
+pillow==10.3.0
+platformdirs==4.2.2
+pooch==1.8.2
+pre-commit==3.8.0
+prometheus_client==0.20.0
+protobuf==5.27.1
+psutil==5.9.8
+py-cpuinfo==9.0.0
+pycparser==2.22
+pydantic==2.7.4
+pydantic_core==2.18.4
+Pygments==2.18.0
+pygtrie==2.5.0
+pynvml==11.5.0
+pyparsing==3.1.2
+python-dateutil==2.9.0.post0
+python-dotenv==1.0.1
+python-multipart==0.0.9
+pytz==2024.1
+PyYAML==6.0.1
+ray==2.9.1
+referencing==0.35.1
+regex==2024.5.15
+requests==2.32.3
+rich==13.7.1
+rotary-emb @ http://10.6.10.68:8000/release/flash_attn/dtk24.04.1/rotary_emb-0.1%2Bdas1.1gitc7a8c18.abi1.dtk2404.torch2.1-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=cc15ec6ae73875515243d7f5c96ab214455a33a4a99eb7f1327f773cae1e6721
+rpds-py==0.18.1
+ruamel.yaml==0.18.6
+ruamel.yaml.clib==0.2.8
+safetensors==0.4.3
+scikit-learn==1.5.1
+scipy==1.13.1
+sentencepiece==0.2.0
+shellingham==1.5.4
+shortuuid==1.0.13
+six==1.16.0
+sniffio==1.3.1
+sortedcontainers==2.4.0
+soundfile==0.12.1
+soxr==0.5.0
+speechbrain==1.0.0
+starlette==0.37.2
+sympy==1.12.1
+termcolor==2.4.0
+tgt==1.5
+threadpoolctl==3.5.0
+tiktoken==0.7.0
+tokenizers==0.15.0
+tomli==2.0.1
+torch @ http://10.6.10.68:8000/release/pytorch/dtk24.04.1/torch-2.1.0%2Bdas1.1.git3ac1bdd.abi1.dtk2404-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=5fd3bcef3aa197c0922727913aca53db9ce3f2fd4a9b22bba1973c3d526377f9
+torchaudio @ http://10.6.10.68:8000/release/torchaudio/dtk24.04.1/torchaudio-2.1.2%2Bdas1.1.git63d9a68.abi1.dtk2404.torch2.1.0-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=4fcc556a7a2fffe64ddd57f22e5972b1b2b723f6fdfdaa305bd01551036df38b
+torchvision @ http://10.6.10.68:8000/release/vision/dtk24.04.1/torchvision-0.16.0%2Bdas1.1.git7d45932.abi1.dtk2404.torch2.1-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=e3032e1bcc0857b54391d66744f97e5cff0dc7e7bb508196356ee927fb81ec01
+tqdm==4.66.4
+transformers==4.38.0
+triton @ http://10.6.10.68:8000/release/triton/dtk24.04.1/triton-2.1.0%2Bdas1.1.git4bf1007a.abi1.dtk2404.torch2.1.0-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=4c30d45dab071e65d1704a5cd189b14c4ac20bd59a7061032dfd631b1fc37645
+typer==0.12.3
+typing_extensions==4.12.2
+tzdata==2024.1
+ujson==5.10.0
+Unidecode==1.3.8
+urllib3==2.2.1
+uvicorn==0.30.1
+uvloop==0.19.0
+virtualenv==20.26.3
+vllm @ http://10.6.10.68:8000/release/vllm/dtk24.04.1/vllm-0.3.3%2Bdas1.1.gitdf6349c.abi1.dtk2404.torch2.1.0-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=48d265b07efa36f028eca45a3667fa10d3cf30eb1b8f019b62e3b255fb9e49c4
+watchfiles==0.22.0
+websockets==12.0
+xentropy-cuda-lib @ http://10.6.10.68:8000/release/flash_attn/dtk24.04.1/xentropy_cuda_lib-0.1%2Bdas1.1gitc7a8c18.abi1.dtk2404.torch2.1-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=91b058d6a5fd2734a5085d68e08d3a1f948fe9c0119c46885d19f55293e2cce4
+xformers @ http://10.6.10.68:8000/release/xformers/dtk24.04.1/xformers-0.0.25%2Bdas1.1.git8ef8bc1.abi1.dtk2404.torch2.1.0-cp310-cp310-manylinux_2_31_x86_64.whl#sha256=ca87fd065753c1be3b9fad552eba02d30cd3f4c673f01e81a763834eb5cbb9cc
+yapf==0.40.2
+zipp==3.19.2
+==============================
+Could not get git revision==============================
+ROCm version:
+5.7.24213
\ No newline at end of file